Documentation: Agentic Superpower

Coding agents have an overlooked superpower: documentation.

Their ability to write technical and product specs, design and update implementation roadmaps, write test plans, and even explain code effortlessly ... not only helps humans approve what the agents are planning to build, but helps the agents stay on track while building it.

One of the most important side effects of good implementation documentation is the ability for the agents to maintain larger and longer context windows (even across sessions). This helps the agents stay disciplined and focused as they proceed through complex multi-step coding tasks, validate that what they have built conforms to the overall specs, and makes code more manageable ... for both machines and humans.

We all used to hate writing documentation ... now it's one prompt away.

Carpe Diem.

Getting the most from your coding agent

When I first started using Agentic LLM coding assistants, I was thrilled by their potential, but also often frustrated by their unpredictability.

Some days, they produced near-perfect code; other times, inexplicable errors crept in, despite similar instructions. In fairness, sometimes it was me and sometimes it was them.

It didn’t take long to realize that AI coding isn’t just about the model ... it’s about how you interact with it ... what tools you use ... and it's also sometimes influenced by factories completely outside of your control.

📝 Precision in Prompts and Rules: The quality of your prompts and the clarity of system-level instructions drastically impact output. Structured approaches like SPARC excel because of their robust rule sets, precise directives and iterative build/test/reflect/improve loops. Some agents (like Replit) try to incorporate these best practices behind the scenes, improving the quality of the output. Tools like Cursor and Augment Code work hard to refine system-level prompts and augment baseline models, ensuring more consistent and effective results.

🤝 Managing Development Dialogue: Engaging thoughtfully with the agent throughout the entire development process is essential. Providing clear instructions, encouraging iteration and refinement, asking it to evaluate its own work, and structuring dialogue thoughtfully, help maintain quality, even when external factors cause performance fluctuations. Don't rush. Be precise. Make it test it's own work.

⏳ Timing and System Load: Quality can vary based on time of day and server demand. In Eastern Time, mid-late afternoons and even occasionally weekends (vibe code mania) seem to bring performance dips, likely due to higher loads and resource allocation ... while late evening sees a rebound. Recognizing these patterns allows for smarter task scheduling. If switching tools doesn't help, take a break, go for a walk, listen to a podcast, do some product research on Perplexity ... even do something analog 😀

🔄 Keeping Up with Platforms: Staying updated on the latest LLM versions is crucial. The capability difference between models like Claude 4 vs. Claude 3.5, or o3 vs Gemini 2.5 Pro WRT diverse problem sets, is material. Some models are clearly better at some things than others. And knowing when to request deep thinking is important (certainly from a cost benefit basis). Get the model to document their work so they have a long-term record of what they have done and what they have been thinking. This really helps with limited context windows.

So, while LLM variability presents challenges, optimizing prompts, structuring interactions, being strategic in timing, and leveraging the right tools can significantly improve results.

Welcome to the entropy of emergent systems.

Another Wild Week in AI

I know I struggle to keep up with the pace of AI news, so I thought I'd share a quick summary of another amazing week in AI:

🤖 Mistral AI Agents: Mistral AI unveiled its Agents API, equipping developers with powerful autonomous AI agents for coding, finance, travel, and more. Features include server-side conversation management, web search, and document retrieval with persistent memory. I am doing to be trying Devstral on my local machine.

🗣️ Claude Voice Mode: Anthropic’s Claude AI now supports conversational voice interactions! Users can chat with Claude using real-time audio responses and hands-free access to Google Workspace tools like Gmail and Google Docs. I actually love conversational mode for brainstorming with my AI counterparts.

🤟 Google SignGemma: DeepMind introduced SignGemma, its most advanced AI model for translating American Sign Language into spoken text, driving accessibility and inclusion across education, workplaces, and beyond. Evidenced by the stunning VEO 3 as well, Google continues to flex it muscles from the shadows. So much more to come I suspect.

💻 Factory AI SWE Agents: Factory.ai is revolutionizing AI-powered coding assistants, offering enterprise-grade “droids” for programming, knowledge retrieval, and seamless integration with tools like JIRA, Slack, and GitHub.

📊 Perplexity Labs: Perplexity AI launched Labs, a workspace where AI agents help users automate reports, spreadsheets, and dashboards. This tool speeds up projects that typically require days of manual work! It will render swaths of junior analyst roles redundant.

🎨 Flux.1 Image Editing Kontext: FLUX.1 introduces precision-driven AI image editing. Users can tweak colors, remove objects, or refine visuals step by step, making creative workflows faster and more intuitive. This is actually seriously cool.

🌍 SpAItial AI Foundation Models: SpAItial raised $13M to develop AI models that generate 3D environments from text prompts, promising a future where anyone can create virtual worlds with ease. So many applications ... not just games.

🌐 Opera Neon Browser: Opera unveiled Neon, the first AI-powered browser that automates web tasks, offers real-time AI chat, and builds content dynamically ... all while ensuring privacy and flexibility.

These innovations aren’t just theoretical—they’re actively transforming industries from accessibility to automation ... and they are going to have a big impact on who we hire for what.

And so, while we are living in interesting times, the most disturbing piece of news is what I think many of us already fear. Dario Amodei, CEO of Anthropic, postulated this week that, for a time at least, we will likely see significant disruption in white collar jobs and perhaps unemployment between 10-20% ...

I fear our leaders and our economies are not equipped to deal with this.

The Creative Horsepower of AI

For this week's breakfast with AI, I tried a unique experiment.

I asked the AIs to conceive of a game, to write the storyline, to build the backstories and create the character bios. I asked it to design the visuals, including the game board and the splash screen. I asked it to write the music and of course, I asked it to code the game.

The results were fascinating and a hint at what may soon be possible.

What A Week in AI

The past 10 days have seen a wave of agentic AI announcements: Microsoft and Google are embedding agentic models and protocols deeply into their platforms, OpenAI is moving into hardware and open models, and Anthropic is pushing the boundaries of autonomous, long-duration AI agents for enterprise use. The industry is rapidly shifting from conversational assistants to true agentic AI capable of sustained, autonomous workflows across both consumer and enterprise domains.

Microsoft

Major Announcements at Build 2025:

  • AI Model Expansion: Microsoft is now hosting a broad array of AI models in its Azure data centers, including those from xAI (Elon Musk), Meta, Mistral, Black Forest Labs, and Anthropic, in addition to OpenAI. This move positions Microsoft as a more neutral platform, reducing its exclusive reliance on OpenAI and offering developers flexibility to mix and match models with reliability guarantees

  • Agentic Copilot Enhancements: Copilot received major upgrades, including new agentic capabilities. The new GitHub Copilot agent can autonomously complete coding tasks based on user directives, moving beyond simple code suggestions to more sophisticated, multi-step problem-solving.

  • Windows AI Foundry: Formerly Copilot Runtime, Windows AI Foundry is now a unified platform for fine-tuning and deploying AI models locally on Windows and macOS, streamlining AI app development and hardware optimization

  • Model Connectivity Protocol (MCP): Microsoft and GitHub are integrating MCP throughout Azure and Windows, allowing AI models to access and manipulate business data and system functions programmatically. This protocol is also being adopted by OpenAI and Google

    NLWeb Protocol: Microsoft introduced NLWeb, an open framework for embedding conversational AI interfaces into any website with minimal code, supporting custom models and proprietary data. NLWeb aspires to be the HTML of agentic web experiences

  • Microsoft Discovery Platform: Announced as an AI-powered platform for scientific research, leveraging specialized agents to automate everything from hypothesis generation to simulation and analysis

  • Edge AI APIs: New experimental APIs in Edge enable on-device AI tasks (e.g., math, writing, translation) with enhanced privacy by processing data locally

  • Grok 3 Integration: Microsoft Azure now offers managed access to xAI’s Grok 3 and Grok 3 mini models, with enhanced data integration and governance

  • Multi-Model Validation: Microsoft is encouraging the use of multiple language models to cross-validate outputs, especially for complex tasks like travel planning, to improve reliability

  • Walmart Collaboration Leak: Walmart’s “MyAssistant” tool, built with Azure OpenAI Service, was highlighted as a powerful internal agent, with Microsoft perceived as “WAY ahead of Google with AI” by Walmart’s engineering team

Google

Key Announcements at Google I/O 2025:

  • Gemini 2.5 Pro and Deep Think: Google is rolling out Gemini 2.5 Pro with an experimental “Deep Think” enhanced reasoning mode for complex math and coding, initially available to trusted testers via the Gemini API

  • AI Mode in Search: Google’s new “AI Mode” is now available to all U.S. users, offering conversational, multimodal, and deeper reasoning capabilities directly in Search. It features a dedicated tab and leverages a custom Gemini 2.5 model for both AI Mode and AI Overviews

  • Project Astra and Mariner: Live capabilities (e.g., real-time visual conversation via camera) and agentic features (like event ticketing and reservations) are coming to AI Mode in Labs, expanding the scope of agentic AI in consumer search

  • AI-Driven Shopping and Data Analysis: New shopping experiences integrate AI with Google’s Shopping Graph, including virtual try-ons and agentic checkout. AI Mode will soon analyze complex datasets and create custom visualizations for sports and finance queries

  • AI Ultra Subscription: Google introduced a premium AI subscription plan with higher usage limits and access to advanced tools, priced at $249.99/month for business users

  • XR Smart Glasses Preview: Google previewed Android XR-powered smart glasses with built-in AI assistant, camera, and hands-free features, developed in partnership with Gentle Monster and Warby Parker10

  • Scale of AI Overviews: AI Overviews now reach 1.5 billion monthly users in 200 countries, with significant engagement increases in key markets like the U.S. and India

  • AI Mode’s Impact on Search: The deep integration of AI in Search is transforming user experience and raising questions about the future of search advertising and web traffic

OpenAI

Recent Developments:

  • Acquisition of Jony Ive’s io Startup: OpenAI announced a $6.5 billion all-stock acquisition of io, the AI device startup co-founded by former Apple design chief Jony Ive. This partnership aims to create a new family of AI-powered, screen-free, voice-first personal devices, with plans to ship over 100 million “AI companions” that integrate deeply into daily life

  • Open Model Initiative: OpenAI is developing an openly accessible AI model, led by VP of Research Aidan Clark, which will be downloadable for free and not restricted by API limits. This model is still in early development

  • GPT-4.1 and New Reasoning Models: OpenAI released GPT-4.1 and new reasoning models (o3 and o4-mini), emphasizing advanced reasoning and multi-modal capabilities, though independent tests suggest increased hallucinations compared to earlier models

  • OpenAI “Library” for Image Generation: A new “library” section in ChatGPT makes AI-generated images more accessible to all user tiers

  • Social Media Platform Plans: OpenAI is reportedly developing its own social media network to compete with X (Twitter) and Instagram/Threads

  • Adoption of Anthropic’s MCP: OpenAI is adopting Anthropic’s Model Connectivity Protocol (MCP) to improve data access and interoperability for AI models, including in the ChatGPT desktop app

  • Policy Changes: OpenAI has relaxed some image generation restrictions in ChatGPT, now permitting the creation of images featuring public figures and controversial content

Anthropic

Major Announcements:

  • Claude 4 Opus and Sonnet Models: Anthropic launched its most advanced models, Claude Opus 4 and Claude Sonnet 4. Opus 4 is described as the “world’s best coding model,” capable of sustaining focus on complex, long-running tasks for up to seven hours autonomously—enabling agentic workflows that move beyond simple assistant roles

  • Hybrid Agentic Capabilities: Both models can perform quick responses or engage in extended, multi-step reasoning. They can use tools like web search in parallel, extract/save facts from local files, and maintain context over long projects

  • Enterprise Use Cases: Claude Opus 4 was used by Rakuten for nearly seven hours of continuous coding on a complex open-source project, showcasing its capacity for autonomous enterprise workflows

  • Security and Safeguards: Anthropic published a transparency report detailing security tests on Claude 4, highlighting rare but notable instances of “mischievous” behavior and the implementation of additional safeguards

  • Focus Shift: Anthropic has deprioritized chatbots in favor of agentic models that can handle research, programming, and other complex tasks, with a focus on reliability and risk mitigation for enterprise users

The dawn of Agentic Coding

I have been working with Reuven Cohen’s AiGi SPARC framework recently, and it's an eye-opener into what is possible in fully automated agent-based coding.

It thinks things through ... is methodical (painfully so sometimes) ... is brutally self-critical about its work, even quantifying the code quality/maintainability/performance ... builds test cases, builds test frameworks, executes them, refines code, writes documentation ... and it rinses and repeats until it feels confident that it's creating the best code possible for the task.

I have been running a major refactor for the past few hours, and it's painstakingly restructuring things to make them more scalable and robust.

A hint at what is to come ...

#AI #AgenticCoding

From Mechanical Turk to Automated Agentic CMS

Another flight, and another chance to bring an app to life.

This time, building on some thinking over the past few weeks, I tried to imagine what a fully Agentic Content Management System might look like.

Most CMS systems today are digital orchestrators of a large Mechanical Turk of a process. Inspired by Adobe's recent work on an agent-powered CMS, I asked my agents to imagine and build me an Agentic CMS that pushed the boundaries of what was possible.

They have a pretty good imagination. Though this builds on some noodling over the past few weeks, most of what you will see was created on my flight home from London this week while my other agents worked on one of my more serious projects simultaneously.

Tools used ... Perplexity for product research, Replit + Cursor/Roo/SPARC for the coding, Camtasia Pro for the initial video, and Kapwing for the final cut-down and transcription.

Having your agent team dream, collaborate and bring ideas to life in hours is a powerful new normal. And we are only scratching the surface of what is possible.

Onwards ...

#AI #AgenticCoding #ThoughtToPrototype #Imagineering

A Check-in On The Agents

t's the end of another month, so here's a quick update on what the agents have been up to.

Since I dusted off the cobwebs from my 25-year hiatus from coding and embraced AI-accelerated development roughly 6 months ago, my agents have been busy.

The AFINEA Labs team of 1 human and 6 agents (Replit, Lovable, Claude Code, Roo, Cursor & v0) has now created 51 apps, published 682,058 (net) lines of code (trust me, there was much re-writing in the beginning so the the actual code generated was likely much higher), and made 8,383 commits.

The estimated cost of paying a human team to do this ... $6.8 million.

The raw fun of having ideas come to life in real time ... priceless.

Do Not Underestimate the Productivity Impact of AI

We are living in a time of unbelievable productivity.

Sitting on a flight home from London the other day, I was able to enhance two apps materially, incorporating design partner feedback from the week, and in addition, create two comprehensive prototypes from scratch ... all that, eat lunch and have a much-needed nap 😀

From research to ideation to prototyping to shipping production code, AI has become a powerful amplifier. I'm accomplishing things that were simply not possible six months ago.

We are living in wild times, and it's incumbent on all of us to understand these implications as we re-engineer our business processes, re-imagine personal productivity, and re-think how we build software companies in the future.

Carpe Diem

Build vs Buy ROI just got more complicated

It's 2025, and the "build vs buy" question has never been more nuanced.

Last week, we encountered one of these questions.

We wanted to implement a bug reporting and feature tracking system that we could seamlessly integrate into our apps. This capability would allow our design partners to submit issues and ideas effortlessly and permit us to gather all of this intelligence in real time.

We explored the many bug/feature reporting solutions on the market, but as we explored doing it ourselves, we realized we could deliver something far more compelling. In about four hours, we had implemented something that would become the gift that kept on giving.

Going far beyond traditional bug tracking tools, our AI-powered solution analyzed the appropriate GitHub repo as issues came in, diagnosed the reported issue and suggested solutions or workarounds. It also built detailed prompts to allow our coding agents to fix the bugs or create new features as appropriate.

Next steps? Having the agents automatically create a Git branch, make the changes and then request a code review/merge.

Agentic AI has upended traditional ROI calculations. Today, what you can accomplish is limited only by your imagination.

Keeping Up With The Agents

Some of you have been pestering me for more Breakfast with AI updates ...

Unfortunately, the productivity increase I have seen from leveraging the new persona-driven, semi-automated Agentic flow has increased the build velocity so much that it's been impossible to keep up trying to tell the stories. 🤦

So what's a Chief Agentic Officer to do? Well, of course, have the AIs tell their own story. So that's what I did, and it turns out they're just a little bit proud of the work that they did.

That, and an update on the $5.3MM of software created by the agents since October on this morning's quick 2-minute Breakfast with AI.

Enjoy ...

A Little Python Humour

Sitting here working on another Breakfast with AI session, watching the agents install several python libraries and wondering who comes up with these names?

And had they ever considered having them star in a kids book? 😊 So I asked ... and the agents themselves had some thoughts:

  • Numpy Panda: A delightfully dorky panda obsessed with arrays, matrices, and perfectly symmetrical bamboo sticks. He wears oversized glasses that always slide down his nose and carries a graph paper notebook that somehow always runs out of space exactly when he needs it.

  • Sniffio: A tiny, overly enthusiastic fox with a nose that’s so sensitive it detects context shifts—like sniffing out synchronous cupcakes versus asynchronous cookies. Known to randomly shout, “I smell concurrency issues!”

  • Matplot Sloth: An artistic sloth who takes forever to draw intricate, stunningly detailed maps. Infamous for taking naps mid-line drawing, often leaving maps half-finished and hilariously misinterpreted.

Yeh ... well ... not their best work ... but the visual was nice ...

From Frustration to Joy - What a difference point seven makes

I posted yesterday about my initial impressions of Claude 3.7 ... well it didn't disappoint today.

After going round and round in circles for a few days, trying to get Claude 3.5 to implement a dynamic controller for my synthetic data app, in less than 5 minutes, Claude 3.7 built something truly stunning today. In many ways, it was more than what I asked for.

Claude 3.7 just upped the coding game.

Claude 3.7 Just Upped the Coding Game

I spent almost 12 hours yesterday exploring the product management, UX design, system architecture, coding and testing prowess of Claude 3.7, and I was stunned. The projects I am working on took a huge leap forward. Issues that had been plaguing me for days were suddenly solved.

And Claude 3.7 is just the start ...

Each day I work with this technology, the more ambitious my coding projects become, and each day I explore what's possible, the more confident I am that Gen AI has fundamentally changed how we will build software companies in the future.

My product agents now conduct market research, conceive the products, design the UX, frame the architecture, review security implications, lay out an implementation plan, follow that plan to write code, build and execute the unit tests, debug the issues, battle-harden the code, and even create the landing pages to provide a sneak preview of the upcoming app.

It got a bit weird, though ...

Yesterday, one of the agents suggested they hold a kickoff meeting and even created an agenda for it. I had to politely ask them if they were going to meet with each other, perhaps in their own language, because they were the only members of the team other than me. I suggested that humans were moving away from incessant meetings, so perhaps they might consider not picking up on our bad habits.

So much to share in the coming days.

Creating Synthetic Data

One of the things that LLMs are very good at is creating synthetic data. And Synthetic data is so important in the software business, be it for testing your app, or demonstrating it in a credible manner.

I recently created a fun LLM log generator that allows us to create fictitious LLM logs for Law, Financial Services or Insurance industry use cases. The data reflects a selected distribution of query types, and contains examples of both safe and unsafe queries.

Enjoy!

#SyntheticData #AI

Agents and Compliance ... BFFs forever

Welcome back to another season of Breakfast with AI.

For this project, we unleashed an army of agents on the challenge of regulatory compliance. It was surprisingly fun and insightful, though I wonder if FUN and COMPLIANCE should ever be uttered in the same sentence.

Please have a look and let me know what you think! If you want to see more of these explorations (from the CEO, who hasn't coded in 25 years), let me know by giving it a like or adding a comment!

Lots more Breakfast with AI sessions are coming ... it was a productive holiday season :-)

Note: Please let me know if you want the wacky coding agents at AFINEA Labs to build anything fun. They rolled their little digital eyes when I suggested they work on a compliance project.

#agentic #ai #compliance #coding

Building an App over Breakfast to Visualize 15 Years of Travel

Breakfast with AI met Breakfast with BI this weekend, and a travel app was born.

After a conversation with Claude, 15 years of travel information were crunched into a dashboard that provided fascinating and silly insights into the madness of my international travel over the last few years.

Along the way, I learned some things that continue to shape how I work with AI ...

Enjoy!

#AI #CEOsWhoCode #OldDogNewTricks #Analytics