Worked Example — High-Scale Recommendation API
This example walks through a complete design-and-simulation session for a high-throughput read API — a movie recommendation service designed to serve 10,000 requests per second concurrently. It illustrates how PinPole surfaces bottlenecks iteratively and guides the architecture toward a production-ready state.
This example is drawn from a real PinPole simulation session.
Target load: 10,000 RPS
Goal: Zero throttling, sub-200ms p99 latency
Starting architecture
CloudFront → API Gateway → Lambda → DynamoDB
Iteration 1 — Baseline simulation
Run the simulation at 10,000 RPS with a Constant traffic pattern.
Results:
| Node | Status |
|---|---|
| CloudFront | ✅ Healthy |
| API Gateway | ✅ Healthy |
| Lambda | ❌ Throttling — concurrency limit reached, ~32% error rate |
| DynamoDB | ✅ Healthy |
Lambda is the first and only bottleneck. The architecture cannot handle the target load in its current state.
AI Recommendation: Add SQS before Lambda to buffer incoming requests and prevent throttling.
Iteration 2 — Add SQS buffering and increase Lambda concurrency
Accept the SQS recommendation. Wire: CloudFront → API Gateway → SQS → Lambda → DynamoDB.
Also set Lambda provisioned concurrency to 1,000 in the Node Configuration panel. This keeps 1,000 Lambda instances warm at all times, eliminating cold start latency for the expected concurrency volume.
Stop and restart the simulation after changing concurrency settings — resuming a paused run will not reflect the new warm pool size.
Run at 10,000 RPS.
Results:
| Observation | Detail |
|---|---|
| Lambda cold start error rate | Drops to ~5% (residual fraction exceeding provisioned capacity) |
| DynamoDB | ✅ Healthy |
| API Gateway | Showing some throttling at peak |
AI Recommendation: API Gateway is a single point of failure — consider adding a load balancer. Also: increase Lambda memory.
Iteration 3 — Add ALB and multi-region topology
Accept the load balancer recommendation. Add an Application Load Balancer (ALB) before API Gateway. Extend the architecture to two regions (e.g. Mumbai and US East) to distribute load:
CloudFront → ALB
→ Region 1: API Gateway → SQS → Lambda → DynamoDB
→ Region 2: API Gateway → SQS → Lambda → DynamoDB
Also increase Lambda memory to 1,024 MB. Lambda allocates CPU proportionally to memory — increasing memory increases processing speed and reduces execution duration, which reduces both latency and cost.
AI Recommendation: Introduce SNS fan-out between the API Gateway and the SQS queues to enable proper multi-region message distribution.
Iteration 4 — Add SNS fan-out and enable DynamoDB DAX
Accept the SNS recommendation. The topology becomes:
CloudFront → ALB
→ API Gateway → SNS Topic
→ SQS (Region 1) → Lambda (Region 1) → DynamoDB (Region 1)
→ SQS (Region 2) → Lambda (Region 2) → DynamoDB (Region 2)
Enable DAX on both DynamoDB nodes. DAX adds an in-memory cache in front of DynamoDB, dramatically improving read throughput for repeated lookups of the same data — common in recommendation workloads where popular items are read thousands of times per second.
Switch DynamoDB from provisioned to on-demand capacity mode to avoid hot-partition throttling under variable load.
Iteration 5 — Final simulation at 10,000 RPS
Run at 10,000 RPS with a Spike traffic pattern.
Results:
| Node | Status |
|---|---|
| All nodes | ✅ Healthy |
| Lambda | Warm instances handling burst; cold start errors near zero |
| DynamoDB | Scaling automatically; no hot partition signals |
| Estimated cost | ~$560–$580/day at this load level |
AI Recommendations output: Cost optimisation opportunities — API Gateway is the most expensive component. Consider reserved capacity savings plans for predictable load portions.
Final architecture summary
| Component | Configuration |
|---|---|
| CloudFront | Balanced cache mode |
| ALB | Multi-region routing |
| API Gateway | Two regional instances |
| SNS | Standard topic, fan-out to two regional SQS queues |
| SQS | Standard queues, visibility timeout 60s |
| Lambda | 1,024 MB, provisioned concurrency 1,000 (per region) |
| DynamoDB | DAX enabled, on-demand capacity mode |
Key lessons from this session
- Lambda throttling is almost always the first bottleneck in serverless API architectures. Address it with SQS buffering and provisioned concurrency before investigating other nodes.
- Always stop and restart after changing concurrency settings. Resuming a paused simulation will not pick up new concurrency values.
- API Gateway becomes the final bottleneck at very high RPS once Lambda and DynamoDB are healthy. At that point, the fix is architectural (move to async / add CloudFront) rather than configurational.
- DAX and on-demand mode together make DynamoDB effectively invisible as a bottleneck for read-heavy workloads. Enable both during design and tune later.
- The Cloud Terminal was used throughout this session to query Lambda and DynamoDB service limits without leaving PinPole — saving multiple trips to AWS documentation.