Agentic performance update

Well, another month has gone by (wow seems like just yesterday that I posted April's results) ... welcome to the intensity of AI years ...  

At any rate, here's a look at what the agents have been up to since October 2024, when I started my "could a CEO who hadn't coded for 25 years, code an app using AI" journey.

The agents and I have now delivered the equivalent of $10.8MM in software ... across 10,665 commits spanning 62 repos. Most of these are fun ... several of these are serious ... some of these are in production and around which emerging startups are being built.

The economic ROI of delivering $10.8MM in software developed for probably  $10,000 in token and system costs ... compelling.

The joy of bringing things to life at the speed of thought ... priceless.

Carpe Diem!

Documentation: Agentic Superpower

Coding agents have an overlooked superpower: documentation.

Their ability to write technical and product specs, design and update implementation roadmaps, write test plans, and even explain code effortlessly ... not only helps humans approve what the agents are planning to build, but helps the agents stay on track while building it.

One of the most important side effects of good implementation documentation is the ability for the agents to maintain larger and longer context windows (even across sessions). This helps the agents stay disciplined and focused as they proceed through complex multi-step coding tasks, validate that what they have built conforms to the overall specs, and makes code more manageable ... for both machines and humans.

We all used to hate writing documentation ... now it's one prompt away.

Carpe Diem.

Getting the most from your coding agent

When I first started using Agentic LLM coding assistants, I was thrilled by their potential, but also often frustrated by their unpredictability.

Some days, they produced near-perfect code; other times, inexplicable errors crept in, despite similar instructions. In fairness, sometimes it was me and sometimes it was them.

It didn’t take long to realize that AI coding isn’t just about the model ... it’s about how you interact with it ... what tools you use ... and it's also sometimes influenced by factories completely outside of your control.

📝 Precision in Prompts and Rules: The quality of your prompts and the clarity of system-level instructions drastically impact output. Structured approaches like SPARC excel because of their robust rule sets, precise directives and iterative build/test/reflect/improve loops. Some agents (like Replit) try to incorporate these best practices behind the scenes, improving the quality of the output. Tools like Cursor and Augment Code work hard to refine system-level prompts and augment baseline models, ensuring more consistent and effective results.

🤝 Managing Development Dialogue: Engaging thoughtfully with the agent throughout the entire development process is essential. Providing clear instructions, encouraging iteration and refinement, asking it to evaluate its own work, and structuring dialogue thoughtfully, help maintain quality, even when external factors cause performance fluctuations. Don't rush. Be precise. Make it test it's own work.

⏳ Timing and System Load: Quality can vary based on time of day and server demand. In Eastern Time, mid-late afternoons and even occasionally weekends (vibe code mania) seem to bring performance dips, likely due to higher loads and resource allocation ... while late evening sees a rebound. Recognizing these patterns allows for smarter task scheduling. If switching tools doesn't help, take a break, go for a walk, listen to a podcast, do some product research on Perplexity ... even do something analog 😀

🔄 Keeping Up with Platforms: Staying updated on the latest LLM versions is crucial. The capability difference between models like Claude 4 vs. Claude 3.5, or o3 vs Gemini 2.5 Pro WRT diverse problem sets, is material. Some models are clearly better at some things than others. And knowing when to request deep thinking is important (certainly from a cost benefit basis). Get the model to document their work so they have a long-term record of what they have done and what they have been thinking. This really helps with limited context windows.

So, while LLM variability presents challenges, optimizing prompts, structuring interactions, being strategic in timing, and leveraging the right tools can significantly improve results.

Welcome to the entropy of emergent systems.

The Creative Horsepower of AI

For this week's breakfast with AI, I tried a unique experiment.

I asked the AIs to conceive of a game, to write the storyline, to build the backstories and create the character bios. I asked it to design the visuals, including the game board and the splash screen. I asked it to write the music and of course, I asked it to code the game.

The results were fascinating and a hint at what may soon be possible.

The dawn of Agentic Coding

I have been working with Reuven Cohen’s AiGi SPARC framework recently, and it's an eye-opener into what is possible in fully automated agent-based coding.

It thinks things through ... is methodical (painfully so sometimes) ... is brutally self-critical about its work, even quantifying the code quality/maintainability/performance ... builds test cases, builds test frameworks, executes them, refines code, writes documentation ... and it rinses and repeats until it feels confident that it's creating the best code possible for the task.

I have been running a major refactor for the past few hours, and it's painstakingly restructuring things to make them more scalable and robust.

A hint at what is to come ...

#AI #AgenticCoding

From Mechanical Turk to Automated Agentic CMS

Another flight, and another chance to bring an app to life.

This time, building on some thinking over the past few weeks, I tried to imagine what a fully Agentic Content Management System might look like.

Most CMS systems today are digital orchestrators of a large Mechanical Turk of a process. Inspired by Adobe's recent work on an agent-powered CMS, I asked my agents to imagine and build me an Agentic CMS that pushed the boundaries of what was possible.

They have a pretty good imagination. Though this builds on some noodling over the past few weeks, most of what you will see was created on my flight home from London this week while my other agents worked on one of my more serious projects simultaneously.

Tools used ... Perplexity for product research, Replit + Cursor/Roo/SPARC for the coding, Camtasia Pro for the initial video, and Kapwing for the final cut-down and transcription.

Having your agent team dream, collaborate and bring ideas to life in hours is a powerful new normal. And we are only scratching the surface of what is possible.

Onwards ...

#AI #AgenticCoding #ThoughtToPrototype #Imagineering

A Check-in On The Agents

t's the end of another month, so here's a quick update on what the agents have been up to.

Since I dusted off the cobwebs from my 25-year hiatus from coding and embraced AI-accelerated development roughly 6 months ago, my agents have been busy.

The AFINEA Labs team of 1 human and 6 agents (Replit, Lovable, Claude Code, Roo, Cursor & v0) has now created 51 apps, published 682,058 (net) lines of code (trust me, there was much re-writing in the beginning so the the actual code generated was likely much higher), and made 8,383 commits.

The estimated cost of paying a human team to do this ... $6.8 million.

The raw fun of having ideas come to life in real time ... priceless.

Do Not Underestimate the Productivity Impact of AI

We are living in a time of unbelievable productivity.

Sitting on a flight home from London the other day, I was able to enhance two apps materially, incorporating design partner feedback from the week, and in addition, create two comprehensive prototypes from scratch ... all that, eat lunch and have a much-needed nap 😀

From research to ideation to prototyping to shipping production code, AI has become a powerful amplifier. I'm accomplishing things that were simply not possible six months ago.

We are living in wild times, and it's incumbent on all of us to understand these implications as we re-engineer our business processes, re-imagine personal productivity, and re-think how we build software companies in the future.

Carpe Diem

Build vs Buy ROI just got more complicated

It's 2025, and the "build vs buy" question has never been more nuanced.

Last week, we encountered one of these questions.

We wanted to implement a bug reporting and feature tracking system that we could seamlessly integrate into our apps. This capability would allow our design partners to submit issues and ideas effortlessly and permit us to gather all of this intelligence in real time.

We explored the many bug/feature reporting solutions on the market, but as we explored doing it ourselves, we realized we could deliver something far more compelling. In about four hours, we had implemented something that would become the gift that kept on giving.

Going far beyond traditional bug tracking tools, our AI-powered solution analyzed the appropriate GitHub repo as issues came in, diagnosed the reported issue and suggested solutions or workarounds. It also built detailed prompts to allow our coding agents to fix the bugs or create new features as appropriate.

Next steps? Having the agents automatically create a Git branch, make the changes and then request a code review/merge.

Agentic AI has upended traditional ROI calculations. Today, what you can accomplish is limited only by your imagination.

Keeping Up With The Agents

Some of you have been pestering me for more Breakfast with AI updates ...

Unfortunately, the productivity increase I have seen from leveraging the new persona-driven, semi-automated Agentic flow has increased the build velocity so much that it's been impossible to keep up trying to tell the stories. 🤦

So what's a Chief Agentic Officer to do? Well, of course, have the AIs tell their own story. So that's what I did, and it turns out they're just a little bit proud of the work that they did.

That, and an update on the $5.3MM of software created by the agents since October on this morning's quick 2-minute Breakfast with AI.

Enjoy ...

A Little Python Humour

Sitting here working on another Breakfast with AI session, watching the agents install several python libraries and wondering who comes up with these names?

And had they ever considered having them star in a kids book? 😊 So I asked ... and the agents themselves had some thoughts:

  • Numpy Panda: A delightfully dorky panda obsessed with arrays, matrices, and perfectly symmetrical bamboo sticks. He wears oversized glasses that always slide down his nose and carries a graph paper notebook that somehow always runs out of space exactly when he needs it.

  • Sniffio: A tiny, overly enthusiastic fox with a nose that’s so sensitive it detects context shifts—like sniffing out synchronous cupcakes versus asynchronous cookies. Known to randomly shout, “I smell concurrency issues!”

  • Matplot Sloth: An artistic sloth who takes forever to draw intricate, stunningly detailed maps. Infamous for taking naps mid-line drawing, often leaving maps half-finished and hilariously misinterpreted.

Yeh ... well ... not their best work ... but the visual was nice ...

From Frustration to Joy - What a difference point seven makes

I posted yesterday about my initial impressions of Claude 3.7 ... well it didn't disappoint today.

After going round and round in circles for a few days, trying to get Claude 3.5 to implement a dynamic controller for my synthetic data app, in less than 5 minutes, Claude 3.7 built something truly stunning today. In many ways, it was more than what I asked for.

Claude 3.7 just upped the coding game.

Claude 3.7 Just Upped the Coding Game

I spent almost 12 hours yesterday exploring the product management, UX design, system architecture, coding and testing prowess of Claude 3.7, and I was stunned. The projects I am working on took a huge leap forward. Issues that had been plaguing me for days were suddenly solved.

And Claude 3.7 is just the start ...

Each day I work with this technology, the more ambitious my coding projects become, and each day I explore what's possible, the more confident I am that Gen AI has fundamentally changed how we will build software companies in the future.

My product agents now conduct market research, conceive the products, design the UX, frame the architecture, review security implications, lay out an implementation plan, follow that plan to write code, build and execute the unit tests, debug the issues, battle-harden the code, and even create the landing pages to provide a sneak preview of the upcoming app.

It got a bit weird, though ...

Yesterday, one of the agents suggested they hold a kickoff meeting and even created an agenda for it. I had to politely ask them if they were going to meet with each other, perhaps in their own language, because they were the only members of the team other than me. I suggested that humans were moving away from incessant meetings, so perhaps they might consider not picking up on our bad habits.

So much to share in the coming days.

Creating Synthetic Data

One of the things that LLMs are very good at is creating synthetic data. And Synthetic data is so important in the software business, be it for testing your app, or demonstrating it in a credible manner.

I recently created a fun LLM log generator that allows us to create fictitious LLM logs for Law, Financial Services or Insurance industry use cases. The data reflects a selected distribution of query types, and contains examples of both safe and unsafe queries.

Enjoy!

#SyntheticData #AI

Agents and Compliance ... BFFs forever

Welcome back to another season of Breakfast with AI.

For this project, we unleashed an army of agents on the challenge of regulatory compliance. It was surprisingly fun and insightful, though I wonder if FUN and COMPLIANCE should ever be uttered in the same sentence.

Please have a look and let me know what you think! If you want to see more of these explorations (from the CEO, who hasn't coded in 25 years), let me know by giving it a like or adding a comment!

Lots more Breakfast with AI sessions are coming ... it was a productive holiday season :-)

Note: Please let me know if you want the wacky coding agents at AFINEA Labs to build anything fun. They rolled their little digital eyes when I suggested they work on a compliance project.

#agentic #ai #compliance #coding

Building an App over Breakfast to Visualize 15 Years of Travel

Breakfast with AI met Breakfast with BI this weekend, and a travel app was born.

After a conversation with Claude, 15 years of travel information were crunched into a dashboard that provided fascinating and silly insights into the madness of my international travel over the last few years.

Along the way, I learned some things that continue to shape how I work with AI ...

Enjoy!

#AI #CEOsWhoCode #OldDogNewTricks #Analytics

Microsoft is All In on AI

At Microsoft Ignite 2024, held on November 19, the company made significant announcements focused on its AI strategy, showcasing how AI will continue transforming workplace productivity, cloud infrastructure, and security. Here are the key highlights:

AI Agents and Copilot Enhancements

  • Copilot Actions: Microsoft introduced Copilot Actions, a new feature for Microsoft 365 Copilot that automates repetitive tasks such as summarizing meeting actions, preparing reports, and managing schedules. These AI agents can operate autonomously once set up, running tasks without constant prompts.

  • Autonomous AI Agents: Microsoft revealed autonomous agents that can act on users' behalf in the background. These agents plan, learn from processes, adapt to new conditions, and make decisions independently. They are designed to streamline workflows across platforms like SharePoint and Teams.

  • Agent SDK: Developers can now use the Agent SDK to build custom AI agents that integrate with Azure AI and Microsoft’s Copilot services. This SDK allows for deploying multi-channel agents across platforms like Teams and third-party messaging apps.

Azure AI Foundry

  • Azure AI Foundry: Microsoft introduced Azure AI Foundry, a platform for designing, managing, and deploying AI applications. The Foundry includes a portal for managing models and an SDK for integrating AI into business applications. It also offers tools for scaling AI agents and ensuring compliance with data privacy regulations.

  • AI Agent Service: The Azure AI Agent Service will allow developers to orchestrate and scale AI agents to automate business processes.

Multimodal Capabilities

  • Multimodal Agent Integration: Microsoft is enhancing Copilot Studio with multimodal capabilities. Agents will soon be able to analyze images and voice content in addition to text, allowing richer interaction across different media types.

AI-Powered Productivity Tools

  • Teams Enhancements: New features in Teams include an Interpreter Agent that can replicate a user’s voice in up to nine languages for real-time translation during meetings. This feature will roll out in early 2025.

  • PowerPoint Translation: PowerPoint users can use AI to translate entire presentations into other languages, further expanding the capabilities of Microsoft’s productivity suite.

Custom AI Chips

  • Custom Silicon Chips: Microsoft announced two custom-made AI chips designed to enhance the performance of its data centers and reduce reliance on external suppliers like Nvidia. These chips will improve the speed of AI applications while bolstering security.

AI Security Initiatives

  • Windows Security Overhaul: As part of its security push, Microsoft introduced new security measures for Windows systems to prevent incidents like the CrowdStrike breach. The updates include more robust controls over applications and drivers alongside antivirus processing.

Overall, Microsoft’s announcements at Ignite 2024 highlight its commitment to embedding AI deeper into enterprise workflows through autonomous agents, enhanced productivity tools, and custom infrastructure designed to scale AI securely.

AI helped me code a fully functional iOS app with no experience.

Today's 𝗕𝗿𝗲𝗮𝗸𝗳𝗮𝘀𝘁 𝘄𝗶𝘁𝗵 𝗔𝗜 mission was to code a native iOS app from scratch.

The app SafetyElephant provides real-time data on fires, earthquakes, and weather alerts near you or any region you plan to visit —all mapped, with details visible on demand.

The context:

(1) I have never built a Mobile App
(2) I have never used Xcode
(3) I have never used Swift

It sounds like a tall order ...

Well, I managed to build it. Check out how it all came together using Cursor AI and Xcode.

It was also a surprising amount of fun.

#AI #CEOsWhoCode #OldDogNewTricks