TL;DR

A May 2026 Google whitepaper by Addy Osmani, Shubham Saboo and Sokratis Kartakis argues that AI-assisted software development is moving from code writing to intent setting. The paper says model choice is only a small part of agent performance, while verification, tools, context and engineering controls determine whether AI code is safe for production.

A new Google whitepaper, The New SDLC With Vibe Coding, argues that the main shift in software engineering is not a new programming language or framework, but a move from writing code directly to expressing intent and verifying machine-generated output, a change that could reshape how teams budget, test and govern AI-assisted development.

The paper, written by Addy Osmani, Shubham Saboo and Sokratis Kartakis, reports that 85% of professional developers regularly use AI coding agents, 51% use them daily, and about 41% of new code is AI-generated. Those figures are presented by the authors as evidence that AI coding tools have moved into routine engineering work.

The central claim is that the model itself accounts for only a small share of agent behavior. The paper describes an agent as a combination of a model and a surrounding harness: prompts, tools, context rules, sandboxes, hooks, sub-agents, observability and CI checks.

The source material attributes several performance gains to changes outside the model, including a Terminal Bench 2.0 result in which an agent reportedly moved from outside the top 30 to the top five after harness changes using the same model. A separate LangChain experiment is cited as improving an agent score by 13.7 points through prompt, tool and middleware changes.

AI Dispatch · Field Notes

Google · Osmani, Saboo & Kartakis · May 2026

The model is only 10%

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified

Vibe Coding

Casual prompts · “does it seem to work?” · disposable code · high risk

Structured AI-Assisted

Detailed prompts + constraints · manual testing · features in real codebases

Agentic Engineering

Formal specs · automated tests + evals + CI gates · production scale · low risk

Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.

The idea worth building your strategy around

Agent = Model + Harness

~10%

HARNESS — prompts · tools · context · hooks · sandboxes · observability

MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S

Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.

“Most agent failures, examined honestly, are configuration failures” — a missing tool, a vague rule, a noisy context.

The economics: it’s a token-cost problem (CapEx vs OpEx)

Vibe Coding

Low CapEx · High OpEx

Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.

Agentic Engineering

High CapEx · Low OpEx

Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.

85%

of devs use AI coding agents (51% daily)

41%

of all new code is AI-generated

~90%

of agent behavior is the harness, not the model

+19%

longer on some tasks (METR) — verification is the cost

The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, “The New SDLC With Vibe Coding,” Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.

thorstenmeyerai.com

Verification Becomes The Cost Center

The paper matters because it reframes the business risk of AI coding. If generated code is already widespread, the harder problem is no longer whether AI can produce software, but whether teams can prove that output is correct, maintainable and secure enough to ship.

Google’s authors draw a line between casual “vibe coding” and what they call agentic engineering. In their framing, quick prompts and surface-level checks may be acceptable for prototypes or disposable scripts, while production systems need formal specifications, automated tests, evals, CI gates and human review of architecture.

The cost argument is also direct. The paper casts casual AI coding as low upfront cost but high long-term operating cost, because repeated fix loops, unclear code ownership, maintenance burden and security remediation can add up. Agentic engineering, by contrast, requires more early investment in specs, evals and context design, but may reduce rework if those systems raise first-pass success.

Amazon

AI coding tools with verification features

As an affiliate, we earn on qualifying purchases.

From Vibes To Engineering Controls

The term “vibe coding” was popularized by Andrej Karpathy in 2025 and has since been used loosely for many forms of AI-assisted coding. The Google paper narrows that meaning by placing workflows on a spectrum rather than treating all AI coding as the same practice.

At one end is casual prompting: a developer asks for code, runs it, pastes back errors and accepts output when it appears to work. At the other end is a managed workflow in which agents operate inside written requirements, test suites, evaluation rubrics, restricted tools and review gates.

The paper’s more practical contribution is its claim that tests and evals serve different roles. Tests check deterministic behavior, such as whether a known input produces the expected output. Evals are meant to judge less predictable work, such as whether an agent chose suitable tools or met a quality bar.

“generation is solved; verification, judgment, and direction are the new craft”
— Osmani, Saboo and Kartakis, in Google’s whitepaper

Electromagnetic Inductance Tester 2 Pcs, in-Circuit Inductor Tester, Induction Detector for Motherboard Repair, in Circuit PCB Board Coil Testers Tool

【Fast Detection】Designed for quick troubleshooting of inductors on PCB boards with high sensitivity contact detection, helping locate potential…

As an affiliate, we earn on qualifying purchases.

Benchmarks Leave Open Questions

Several points remain unsettled. The paper’s reported adoption figures and benchmark results are attributed to the authors and cited sources, but the source material does not provide enough detail to independently verify how broadly those results apply across companies, languages, codebases or regulated environments.

It is also unclear how durable the “10% model, 90% harness” split is. The figure is presented as a rough framing, not a universal measurement. Different teams may see different results depending on codebase maturity, test quality, tooling access, security requirements and the skill of engineers designing agent workflows.

The source material also notes a commercial angle: while the concepts are described as tool-agnostic, the Google paper points readers toward Google products such as Gemini, Jules and the Agent Development Kit. Readers should separate the engineering model from vendor-specific recommendations.

Production-Ready AI Agents: The Senior Engineer's Complete Guide to Building, Debugging, Evaluating, and Scaling LLM Agent Systems in Production Environments

As an affiliate, we earn on qualifying purchases.

Teams Rework AI Guardrails

The next stage is likely to be practical rather than theoretical. Engineering leaders will need to decide where AI agents can act, what tools they may use, which tasks require human approval, and which tests or evals must pass before generated code reaches production.

For readers managing software teams, the paper points to a near-term checklist: improve specifications, build stronger test coverage, add evals for agent behavior, route simple work to cheaper models when appropriate, and track failures by harness design rather than blaming the model by default.

GitHub Copilot for Autonomous Software Engineering: Harness the Power of AI to Optimize Code, Automate Tasks, and Supercharge Your Development Process

As an affiliate, we earn on qualifying purchases.

Key Questions

What is the actual news development?

Google published a May 2026 whitepaper arguing that AI-assisted software development is moving from manual coding toward intent-driven workflows backed by tests, evals and agent controls.

Is this breaking news or analysis?

This is an analysis piece based on a recent Google whitepaper and a source analysis from Thorsten Meyer AI, not a breaking incident or product launch.

What is confirmed?

The confirmed development is the publication and content of the Google whitepaper as described in the supplied source material. Adoption percentages, benchmark outcomes and cost comparisons are claims attributed to the paper and cited experiments.

Why does this matter to developers?

It suggests that model selection alone may not decide whether AI coding works well. Teams may need stronger specs, tests, evals, security checks and observability to make generated code dependable.

What remains uncertain?

It is still unclear how broadly the paper’s ratios and benchmark lessons apply across different organizations, legacy systems and high-risk software environments.

Source: Thorsten Meyer AI

This article is for informational purposes only and is not medical advice. Always consult a qualified healthcare professional about your specific situation.

The Model Is Only 10%: The Real Lesson of the New SDLC

Up next

Pentagon restores mandatory flu shots for all recruits as boot camp outbreak sickens nearly 300

Author

The Dark Psychology Team

Share article

The model is only 10%

Verification Becomes The Cost Center

AI coding tools with verification features

From Vibes To Engineering Controls

Electromagnetic Inductance Tester 2 Pcs, in-Circuit Inductor Tester, Induction Detector for Motherboard Repair, in Circuit PCB Board Coil Testers Tool

Benchmarks Leave Open Questions

Production-Ready AI Agents: The Senior Engineer's Complete Guide to Building, Debugging, Evaluating, and Scaling LLM Agent Systems in Production Environments

Teams Rework AI Guardrails

GitHub Copilot for Autonomous Software Engineering: Harness the Power of AI to Optimize Code, Automate Tasks, and Supercharge Your Development Process