Scaling Payment Infrastructure 20x for an Ag Retail ERP Platform
AgTech ERP Provider
Industry
Agriculture
Service
Software Product Design and Development
Summary
-
20x throughput increase
From 960 payments per day to 19,200 per company per day scale-up potential.
-
Eliminated double-charging
Implemented comprehensive changes to prevent duplicate charges under any failure.
-
Processor integration
Delivered a new payment processor without service disruption or reliability challenges.
Measurable Outcomes
A leading provider of integrated business solutions for the agribusiness industry faced a critical challenge: their scheduled payment system, a core feature of their all-in-one ERP platform for ag retailers, was approaching capacity limits while carrying unacceptable double-charging risk.
We redesigned their payment architecture to deliver:
- 20x throughput increase: From 960 payments per day to 19,200+ payments per day, with architecture to scale to 19,200 per company per day.
- Eliminated double-charging risk: Implemented comprehensive idempotency and single-threaded queue processing to prevent duplicate charges under any failure scenario.
- Successful processor integration: Delivered a new payment processor without service disruption or reliability regression.
- Future-proof architecture: Designed per-company queue isolation strategy that scales well beyond current growth projections.
The Challenge: Growth Threatening Revenue-Critical Infrastructure
A leading provider of integrated business solutions for the agribusiness industry had built their all-in-one ERP platform for ag retailers on reliable, automated payments, connecting growers, distributors, and suppliers in time-sensitive transactions. As platform adoption accelerated among ag retailers, scheduled payments became one of the most popular features. But the infrastructure wasn’t built to scale.
The Perfect Storm - Four Factors
Imminent Capacity Ceiling
The scheduled payment system could process only 960 payments per day (one per minute over a 16-hour processing window). With growing user adoption, they were rapidly approaching this limit, and agricultural businesses do not accept “your payment will be delayed.”
Double-Charging Risk
Without proper concurrency controls, the system was vulnerable to processing the same payment multiple times. Network timeouts, processor failures, and automatic retries created dangerous race conditions.
Complex Processor Integration
The client needed to integrate a new payment processor to expand capabilities and reduce vendor dependency, but payment processors behave unpredictably in production. Timeouts, partial failures, and duplicate responses create ambiguous states that can corrupt data or double-charge customers.
Scheduled Payment Fragility
The existing system struggled with edge cases: canceled items that still triggered charges, payments stuck in infinite retry loops, and batch processing that occasionally missed critical runs.
For ag retailers managing complex supply chains where timing drives revenue—harvest windows, delivery schedules, seasonal purchasing—payment infrastructure failures in their ERP system have cascading business impacts that ripple through their entire operation.
Our Approach: Iterative Architecture Evolution
We took ownership of the payments and scheduled-payments platform, implementing a multi-phase approach that progressively increased capacity while eliminating reliability risks.
Phase 1: Establish Safety with Single-Threaded Processing
- Implemented comprehensive idempotency: Every payment operation became truly atomic, with state management that could survive any failure scenario—timeouts, duplicates, or processor errors.
- Leveraged AWS SQS controls: Limited processing to one payment at a time, eliminating race conditions.
- Added intelligent guards: Prevented processing of canceled or already-completed items.
This solved the double-charging problem but created a new constraint: throughput was limited to 960 payments per day. At the time of introducing scheduled payments, this was, surprisingly, sufficient. But we knew it would not last.
Phase 2: Optimize for Throughput Without Sacrificing Safety
As user adoption grew, we needed more capacity. We optimized the AWS SQS configuration to allow payments to be picked up as soon as possible while maintaining single-threaded safety:
- Reduced average invocation time to ~3 seconds through processing optimizations.
- Changed the throughput equation: 20 payments per minute → 1,200 per hour → 19,200 per day over the 16-hour processing window.
- Maintained zero double-charging risk: Increased speed without compromising reliability. This 20x throughput improvement bought headroom for growth while keeping operational complexity manageable.
Phase 3: Design for Scale, Per-Company Queue Isolation
Even 19,200 payments per day would not be enough indefinitely. We identified a key architectural insight: the safety constraints we were managing were actually per-company, not system-wide.
This led to our next architecture evolution:
- Leverage AWS SQS messageGroupId: Assign each company its own logical queue within the SQS infrastructure.
- Enable true parallelization: Each company gets single-threaded safety for their payments, but multiple companies process simultaneously.
- Scale with user growth: The architecture now supports 19,200 payments per day per company, well beyond projected needs for the foreseeable future.
Foundation Work: Event-Driven Resilience
Throughout all phases, we strengthened the underlying platform:
Intelligent Scheduled Payment Execution
- Rebuilt selection and execution flow with hold-until timing logic and batch sizing controls.
- Created clear audit trails for every payment decision.
- Improved failure classification to distinguish transient issues from permanent failures.
Using AWS Lambda, SQS, and EventBridge, we built an architecture that processes work predictably, handles retries intelligently, and scales gracefully without manual intervention.
Safe Processor Integration
Delivered the new payment processor through careful contract adaptation, comprehensive edge-case handling, and staged rollout, all while maintaining zero-downtime operations.
Results and Impact: From Constraint to Competitive Advantage
Immediate Business Value
Revenue Protection
The system now operates with confidence under retries and timeouts, eliminating double-charging scenarios and “unknown state” outcomes that previously required manual investigation and customer service intervention.
Proven Scale
What started as a 960 payment-per-day system now reliably processes 19,200+ payments daily, with an architecture designed to scale to 19,200 per company per day. This transformed scheduled payments from a scaling bottleneck into one of the platform’s most popular features.
Operational Confidence
Clear batch and concurrency controls, combined with improved monitoring and failure intelligence, give the engineering team full visibility into payment operations and the confidence to scale without surprises.
Seamless Capability Expansion
The new payment processor was integrated without destabilizing the wider payment pipeline, expanding the company’s strategic options while maintaining the reliability customers depend on.
Strategic Impact
The architecture evolution demonstrates a critical principle: you can increase system capacity without increasing operational risk. By thinking through the problem iteratively—first establish safety, then optimize throughput, then design for scale—we built a payment infrastructure for their ERP platform that grows with their retail customer base.
For ag retailers relying on a comprehensive ERP platform where payment timing directly impacts their relationships with growers and suppliers, this infrastructure transformation changed the conversation from:
Will payments work?
to
How fast can we grow our retail customer base?
Looking Forward
The per-company queue isolation strategy positions the platform to scale far beyond current projections. The team does not anticipate any company exceeding 19,200 payments per day, and if they do, the architecture supports further horizontal scaling without fundamental redesign.
Is Your ERP Platform’s Payment Infrastructure Ready to Scale?
If you are facing similar challenges: capacity constraints, reliability concerns, or complex processor integrations in mission-critical payment systems for your platform or ERP system, we would welcome the opportunity to discuss your infrastructure challenges.
We specialize in hardening revenue-critical infrastructure for growing platform and ERP providers. Our team brings deep expertise in payment systems, event-driven architectures, and building resilient distributed systems that grow with your customer base.
Schedule a conversation to explore how we can help you build payment systems that scale with confidence while protecting revenue.