We're so early
Random thoughts and reflections building with AI for the last 4 months
The last four months have felt like 40 years & on Friday last week things started to takeoff for Lazyweb which is awesome but also got me thinking about How early AI actually is
Iâve been trying to keep my head above water with all the new products, announcements, and features shipping. And for the first time in a decade, I can build the ideas I wanted to build but didnât have the skills to attempt.
Still, basic workflows are harder than they should be.
My typical day building Lazyweb oscillates between âthis is crazyâ and âwtf, why is it so hard to do this?â
The models are far ahead. The foundational building blocks are behind.
Here is what stood out:
Working with chat windows is good for tasks where the output is text, and horrendous when the output is anything else: video, images, etc.
AI dev tools are everywhere, and trust infrastructure is thin. Is it better to use Exa, Firecrawl, or Parallel for my specific search use case? Good luck. You need to test every option across a messy set of nondeterministic use cases or scour Reddit for ads disguised as reviews.
Software that tries to mimic how humans behave, whether computer use or browser use, is slower than humans and less reliable than a 12-year-old doing the same thing. Watching agents navigate my workspace is fun, but right now it feels like novelty, not productivity.
You need to pick one skill/plugin/MCP for a particular JTBD, otherwise the agent spirals. For instance, you canât give the agent three or four frontend skills and expect it to pick the right one. Giving it four different search tools confuses it instead of helping it do sharper research by comparing where those tools diverge.
Every new agent starts from scratch, and keeping track of its tools, skills, plugins, knowledge, and context is a pain. Try a bunch of AI assistants like Lindy or Poke and you have to start from scratch every time.
There isnât a good AI-native video editor yet. A lot of people are trying, but the ones I tried are not good enough.
On that same note, sound in AI-generated videos is still bad, despite ElevenLabs and Suno figuring out voice AI.
Beyond the first-party products subsidized by OpenAI and Anthropic, AI is too expensive to do meaningful work at scale. You can still experiment and automate some workflows here and there.
For non-coding tasks, Cowork and Codex are unreliable because they donât do a good job inferring âwhat is rightâ when there are no tests to verify the output, and it takes more time to set up evals and guardrails than to do the actual task. For one-off tasks, bad trade.
AI is most valuable at the points of integration: products that connect to all your MCPs, data, repos, etc. That could be Linear, Notion, Claude Code/Cowork, Codex, ChatGPT, etc. Funny enough, every SaaS productivity product is trying to become that point of integration, but setting these up is annoying. There should be a source of truth for granting access to all your tools, data, and MCPs instead of going through 20 different 2FA flows every time.
Agents canât sign up for agentic tools, which is funny. They canât buy them either. There are some solutions out there to solve this but theyâre not being adopted at the scale to make them useful.
For long, repeating tasks, agents need to keep track of their prior work. Otherwise, they get stuck doing the same work. Once they do the same thing for two or three weeks and canât keep all the prior work in context, they start acting funny.
SEO ruined the internet and is holding back AI agents. When ChatGPT got access to its first tool, internet search, that was a leap. Suddenly, this powerful app could understand what was happening in the world. But decades of businesses have polluted the internet by trying to game search recommendation algorithms and win customers. Garbage in, garbage out. We need a new way for agents to traverse the internet, optimized for usefulness, novelty, and correctness, not PageRank.
I havenât found a good way to manage API keys when Iâm spinning up agents and projects left and right. This Infisical product looks promising, though.
AI agents canât act on alerts or new events. They run on a schedule. I tried setting up alerts for Grand Prix ticket price drops, but the agent canât react in real time. I have to ask it to check my email every X hours.
Skill invocation is flaky. You need to babysit agents and be explicit about which skills to use and when. Codex is generally better than Claude at this, but there is still room for improvement on both.
When Claude Cowork does long, intensive research, like keeping track of AI launches, quality degrades a few hours in. The workaround Iâve found for long-running, open-ended research is to chunk it and schedule a fresh session every 10, 20, or 60 minutes.
AI doesnât hold itself accountable when work goes haywire.
AI is still not funny.
Every new model update makes you realize how dump the previous version is.
I donât say this to be cynical.
I say this because itâs early, and the problem space is wide open.
If youâre solving one of these problems, or know someone who is, please let me know (:
Until next time,
Ali Abouelatta

