By week ten, the team lead was no longer reviewing code line by line. She was reviewing intent — reading AGENTS.md files, checking that the scaffolding was right, and validating that the output matched the specification. Her engineers had stopped thinking of themselves as code writers. They had become environment designers.
That shift — in role, in mindset, and in daily workflow — happened across a 20-person team in ten weeks. Not through a mandated tooling rollout, but through a structured methodology that changed what engineers were optimising for.
This is the story of what that transition looked like, what made it work, and what anyone attempting the same needs to understand before they start.
The starting point
The team in question builds and maintains an enterprise operations platform for a UK-based SaaS company. At the start of the engagement, the team was experienced and well-structured — backend, frontend, mobile, QA, and DevOps disciplines all represented. What they lacked was a systematic approach to AI tooling. Individual engineers were using AI assistants in ad-hoc ways. There was no shared standard, no measurement, no governance.
The goal was not to introduce AI tools. It was to transform how the entire organisation uses them — consistently, measurably, and in a way that compounds over time.
The approach: Harness Engineering
The methodology applied here is what I call Harness Engineering — a discipline distinct from simply installing an AI coding assistant and hoping for the best.
The core shift is this: engineers stop thinking of themselves as code writers and start thinking of themselves as environment designers. The question changes from “how do I write this?” to “how do I design a system where an AI agent can write this reliably, and I can validate that it did so correctly?”
In practice, this means building three things:
Intent specification — Structured documents that tell AI agents how to operate in each codebase. Every application gets an AGENTS.md file (conventions, patterns, what the agent should and should not do), a skills.md file (reusable capabilities the agent can draw on), and a guardrails.md file (explicit constraints and failure modes to avoid). This is not documentation for humans — it is context engineering for agents.
Feedback loop design — CI/CD pipelines, automated tests, and verification gates that let agents self-correct before a human reviews their output. The faster and more reliable the feedback loop, the more autonomously an agent can operate without producing cascading errors.
Review as quality gate — Engineers do not rubber-stamp AI output. They evaluate it. A healthy acceptance rate reflects active judgment, not passive approval. Suggestions that do not meet the standard are rejected. This discipline is what keeps quality high as AI contribution scales up.
What was deployed
The full team was moved onto Cursor as the primary IDE from day one. No pilot group, no phased rollout by seniority — every engineer, every discipline, at the same time. This decision mattered: partial adoption creates a two-tier team and slows the cultural shift.
For reasoning-heavy tasks — architecture reviews, complex problem-solving, documentation — Claude and Gemini Pro were available to selected engineers. Senior engineers began a pilot of Claude Code for agentic, autonomous task completion: multi-step workflows where the agent operates with significant independence, not just completing lines.
Regular show-and-tell sessions were established from the first week. Engineers shared what was working, what prompting approaches produced better results, where the tools fell short. This peer-to-peer knowledge transfer accelerated adoption more than any formal training would have.
What changed
Within the first month, AI-assisted workflows were visible in the data. By the end of the second month, the numbers had compounded significantly.
The headline outcome: the majority of committed code across the team was AI-generated — written by an agent, reviewed and accepted by an engineer, and merged through the same CI/CD pipeline and QA process as any other code. This was not an experiment. It was the standard way the team worked.
Three things stood out beyond the raw contribution rate:
QA output increased. This is counterintuitive to those who assume AI-generated code is lower quality. The opposite happened. AI was used to generate test cases alongside production code — in Robot Framework and Python — which meant more automated test coverage, not less. The engineering culture treated AI as a way to do more of the right things, not a shortcut around them.
The acceptance rate reflected active review. Roughly one in three AI suggestions was rejected. That is healthy. It means engineers were evaluating output, not accepting it uncritically. The acceptance rate is a proxy for engineering judgment being applied — and the data showed it was.
Budget discipline held. AI tooling was treated as a line item to be managed, not a cost to be minimised. Spend stayed under the approved ceiling throughout the engagement. Responsible adoption and ambitious adoption are not in tension.
What the research says
At around the same time as this engagement, Laura Tacho — CTO at DX and author of the largest independent study of AI and developer productivity to date — presented findings from 121,000 developers across 450+ companies at The Pragmatic Summit in San Francisco.
Her research cuts through the noise on AI and productivity. The finding that stands out most:
Most organisations plateau at around 10% productivity gain from AI tooling. The reason is not the tools — it is how they are adopted. Companies that use AI only for individual tasks, without embedding it into their CI pipelines, documentation standards, and team workflows, hit a ceiling quickly.
The organisations that break through the plateau share three characteristics: they set clear goals and measure results rigorously; they treat Developer Experience as a strategic priority; and they have fast CI pipelines, clear documentation, and well-defined service boundaries before they introduce AI at scale.
This team had all three. That is not coincidence — it is a prerequisite.
Tacho’s research also found that AI produces dramatically different outcomes depending on the organisation. Some companies saw production incidents double after AI adoption. Others saw a 50% reduction. The difference is not the tool. It is the environment around it. AI amplifies what already exists — both the good and the bad.
The Harness Engineering parallel
In February 2026, OpenAI published a detailed account of how a small team of engineers built a production system exceeding one million lines of code in five months — with zero manually-written code. Every line of application logic, tests, CI configuration, and documentation was generated by AI agents.
The approach they described maps precisely to what was applied in this engagement: engineers as architects of agent environments, repositories as the single source of truth, automated feedback loops as the mechanism for quality control.
The terminology is different. The principles are identical.
This is not a coincidence. It reflects where serious software engineering is heading. The teams that will define engineering productivity over the next five years are not the ones that adopted AI tools earliest — they are the ones that built the right environments for those tools to operate in.
What comes next
The engagement established the foundation. The roadmap from here is about deepening and formalising what already works:
Every new application built, maintained, and scaled entirely using AI agents — not as an experiment, but as the default.
Agent governance files (agents.md, skills.md, guardrails.md) deployed across every application in the codebase, not just new ones. This is the work of encoding institutional knowledge in a form that agents can act on.
Senior engineers moving from AI-assisted development to fully agentic workflows — where Claude Code or equivalent operates autonomously on defined tasks, with engineers reviewing outputs rather than directing every step.
All technical documentation produced in AI-native format from the point of creation — structured, versioned, agent-legible.
What this engagement demonstrates
AI-native engineering at team scale is achievable. It does not require a greenfield codebase, a specialised team, or a multi-year transformation programme. It requires the right foundations, the right governance, and the discipline to measure outcomes honestly.
The hard part is not the tools. It is the cultural shift — from engineers who write code to engineers who design systems that generate code. That shift does not happen automatically when you install Cursor. It happens when the organisation commits to it, measures it, and builds the environment that makes it possible.
If you are leading an engineering team and considering a similar transformation, or if you want to understand whether your current environment is ready for AI-native development, let’s talk.
The Harness Engineering section of the Handbook covers the methodology in more detail.
