Accelerating Delivery: How AI Agents Can Own the Initial Code Draft

Accelerating Delivery: How AI Agents Can Own the Initial Code Draft

Stephen Walker

May 05, 2026

Why This Matters in the Era of Agentic Everything

The gap between a Jira ticket and the first pull request (PR) is often a graveyard of productivity. Context switching, boilerplate setup, and requirement analysis can eat up hours of a senior engineer's day before they write a single line of business logic.

Recently, industry leaders have showcased "Harness Engineering" [1][2] — the practice of building autonomous agent workflows that handle end-to-end coding tasks. While their results are inspiring, the barrier to entry can feel insurmountable.

The challenge for most teams is finding a starting point that balances long-term vision with immediate ROI. We recently helped a large-scale, high-traffic consumer services platform bridge this gap. Instead of an all-or-nothing architectural overhaul, we took a pragmatic first step: automating the journey from Jira ticket to PR.

 

The Vision: "Let's work on FEAT-123"

Our goal was simple. We wanted to enable a developer to provide a single prompt — "Let's work on FEAT-123" — and have a coordinated team of AI agents handle the planning, coding, and verification. This culminates in a PR ready for human review.

To make this a reality, we focused on three strategic pillars: safety, determinism, and quality.

 

Safety Through Pragmatic Guardrails

One of the primary concerns for any engineering leader is "agent drift" — an AI making unauthorized or hallucinated changes. While an isolated digital sandbox is an ideal long-term goal, we proved that teams can achieve significant safety and progress using existing infrastructure:

  • Branch Permissions: Restrict agent access to specific feature branches.
  • Scoped Access: Use scripts and hooks to define exactly what an agent can and cannot touch.
  • Human-in-the-Loop: Ensure that while the agent proposes the change, a human must always review, approve, and merge the code.

By defining the agent's scope as a specialized contributor rather than a system administrator, we mitigated risk while maximizing output.

 

Determinism in a Probabilistic World

Large Language Models (LLMs) are inherently probabilistic. They are designed to predict the next token, not follow a rigid execution path. To build a reliable workflow, we surround the AI with deterministic structures that ensure repeatable results.

The Hierarchy of Reliability

We maximize reliability by following a strict hierarchy for agent behavior, moving from hard-coded logic to flexible intelligence:

  1. Hooks and Scripts: The most deterministic way to store behavior. We use these for sensitive or repetitive tasks.
  2. Specialized Agents: Task-specific models with narrow, well-defined scopes.
  3. Project Rules and Memory: Agent context files in the repository that define team norms and architectural patterns.

Managing Context Rot

As the agent processes large text blocks and growing logs of activity, AI performance often degrades due to "context rot." We use agent orchestration to solve this. Instead of one large conversation, an orchestrator spawns specialized agents for focused tasks. This keeps the context window clean, ensuring the agent stays focused on the specific goal of the work item.

Operational Guardrails and Efficiency

We apply the same engineering rigor to agent workflows that we apply to our CI/CD pipelines. By implementing specific operational stop-losses, we ensure the system remains efficient and autonomous without unnecessary resource consumption:

  • Intent-Aware Execution: The workflow is configured to recognize the scope of a change before execution. For example, if an agent is only updating documentation, the harness bypasses the execution of the test suite and local builds. This prioritizes speed and ensures tokens are used only where they add functional value.
  • Configurable Review Cycles: We define a maximum number of iteration rounds between the planning or coding agents and their reviewers. If the agents cannot reach alignment after a set number of cycles, the workflow pauses for human resolution. This prevents the system from spinning on a complex logic conflict that requires human intuition.
  • Build and Integrity Ceilings: Coding agents are required to produce changes that pass the build, including all unit tests, linters, and typechecks. We configure a maximum number of repair attempts for these tasks; if the agent reaches this ceiling, it triggers an escape hatch for a developer to step in.

 

Quality Through Architectural Rigor

Speed is secondary to quality. To avoid the generic outputs often associated with basic LLM implementations, we built a multi-stage Agent Strike Team that uses internal feedback loops to mimic a high-performing engineering squad:

  • The Iterative Planning Cycle: A planning agent creates a phased execution plan. This plan undergoes an adversarial review by agents specialized in architectural soundness and requirements completeness. If the reviewers identify gaps, the plan returns to the planner for refinement. This iterative loop continues until the plan is fully validated.
  • Execution and Holistic Review: Coding agents implement each phase of the validated plan, writing unit tests and ensuring the code passes local linters as they go. Once all phases are complete, a separate Review Agent performs a holistic evaluation of the entire changeset. It checks for style consistency, software best practices, and ensures the requirements are fully captured in the tests. If the review agent finds issues, the coding agents must address them before the PR is opened.
  • The Self-Improvement Loop: After the work is complete, the agents perform a retrospective to identify new concepts or "gotchas" — such as a specific quirk in a legacy service integration. These insights are saved back to the repository's agent context files.

We moved from individual developers using AI tools to teams benefiting from a shared intelligence asset. By centralizing these improvements in the repository itself or a shared knowledge base, every agent team spawned by any developer immediately inherits the collective context of the entire organization. The system does not just work; it gets smarter for the whole team with every work item it completes.

 

The Results: A Shift in the Engineering Paradigm

The impact of these changes was transformative. By automating the journey from a Jira ticket to an initial PR, we saw a fourfold increase in PRs opened. However, in many organizations, an increase in volume is often met with a secondary concern: reviewer overwhelm. Engineering leaders rightly fear that AI-generated code will simply flood the pipeline with low-quality output that requires a senior engineer to deconstruct and fix. In our experience, this burnout happens when a developer cannot trust the majority of what the agent produces.

We mitigated this by ensuring the agent never started from a blank slate. By making ADRs and architectural patterns available to agent teams and utilizing an adversarial review stage to enforce those standards, the code that reaches the human is already in a high-quality, compliant state.

In edge cases where a manual fix is required, the self-improvement loop ensures those insights are codified back into the system's memory. This prevents the repetitive, frustrating review cycles that typically plague AI implementations. Ultimately, the developer's role evolved from manual code writing to high-level requirements gathering and intent validation, freeing senior engineers to focus on what matters most: architecture, intent, and quality.

 

Take the Next Step

Harness engineering does not have to be a multi-year R&D project. At 8th Light, we have already navigated the complexities of orchestration, safety guardrails, and reliability hierarchies so that our clients can focus on shipping value through a more efficient, modern delivery cycle.

If this forward-thinking approach to software delivery interests you — whether for building your next product or operationalizing these agentic workflows within an existing stack — let's get in touch to discuss how we can make it a reality for your team.

References

  1. R. Lopopolo, "Harness engineering: leveraging Codex in an agent-first world," OpenAI, Feb. 11, 2026. Available: https://openai.com/index/harness-engineering/
  2. A. Gray, "Minions: Stripe's one-shot, end-to-end coding agents," Stripe Dot Dev Blog, Feb. 9, 2026. Available: https://stripe.dev/blog/minions-stripes-one-shot-end-to-end-coding-agents

Stephen Walker

Senior Crafter

Stephen Walker is an experienced software professional and a current Master’s candidate in Artificial Intelligence. He has a proven track record of delivering high-quality software solutions across various industries, from healthcare to consumer media. Stephen specializes in bridging the gap between technical execution and business goals, having played a key role in initiatives to modernize legacy architectures and build complex mobile and web platforms from the ground up. Beyond his technical contributions, he has a history of mentoring stakeholders and leading teams through significant digital transitions to improve system performance and maintainability. His journey in technology began in middle school, where he first discovered coding on a TI-83+ calculator and hasn’t looked back since.