Worked Example — High-Scale Recommendation API

This example walks through a complete design-and-simulation session for a high-throughput read API — a movie recommendation service designed to serve 10,000 requests per second concurrently. It illustrates how PinPole surfaces bottlenecks iteratively and guides the architecture toward a production-ready state.

note

This example is drawn from a real PinPole simulation session.

Target load: 10,000 RPS
Goal: Zero throttling, sub-200ms p99 latency

Starting architecture

CloudFront → API Gateway → Lambda → DynamoDB

Iteration 1 — Baseline simulation

Run the simulation at 10,000 RPS with a Constant traffic pattern.

Results:

Node	Status
CloudFront	✅ Healthy
API Gateway	✅ Healthy
Lambda	❌ Throttling — concurrency limit reached, ~32% error rate
DynamoDB	✅ Healthy

Lambda is the first and only bottleneck. The architecture cannot handle the target load in its current state.

AI Recommendation: Add SQS before Lambda to buffer incoming requests and prevent throttling.

Iteration 2 — Add SQS buffering and increase Lambda concurrency

Accept the SQS recommendation. Wire: CloudFront → API Gateway → SQS → Lambda → DynamoDB.

Also set Lambda provisioned concurrency to 1,000 in the Node Configuration panel. This keeps 1,000 Lambda instances warm at all times, eliminating cold start latency for the expected concurrency volume.

Stop and restart required

Stop and restart the simulation after changing concurrency settings — resuming a paused run will not reflect the new warm pool size.

Run at 10,000 RPS.

Results:

Observation	Detail
Lambda cold start error rate	Drops to ~5% (residual fraction exceeding provisioned capacity)
DynamoDB	✅ Healthy
API Gateway	Showing some throttling at peak

AI Recommendation: API Gateway is a single point of failure — consider adding a load balancer. Also: increase Lambda memory.

Iteration 3 — Add ALB and multi-region topology

Accept the load balancer recommendation. Add an Application Load Balancer (ALB) before API Gateway. Extend the architecture to two regions (e.g. Mumbai and US East) to distribute load:

CloudFront → ALB
              → Region 1: API Gateway → SQS → Lambda → DynamoDB
              → Region 2: API Gateway → SQS → Lambda → DynamoDB

Also increase Lambda memory to 1,024 MB. Lambda allocates CPU proportionally to memory — increasing memory increases processing speed and reduces execution duration, which reduces both latency and cost.

AI Recommendation: Introduce SNS fan-out between the API Gateway and the SQS queues to enable proper multi-region message distribution.

Accept the SNS recommendation. The topology becomes:

CloudFront → ALB
              → API Gateway → SNS Topic
                               → SQS (Region 1) → Lambda (Region 1) → DynamoDB (Region 1)
                               → SQS (Region 2) → Lambda (Region 2) → DynamoDB (Region 2)

Enable DAX on both DynamoDB nodes. DAX adds an in-memory cache in front of DynamoDB, dramatically improving read throughput for repeated lookups of the same data — common in recommendation workloads where popular items are read thousands of times per second.

Switch DynamoDB from provisioned to on-demand capacity mode to avoid hot-partition throttling under variable load.

Iteration 5 — Final simulation at 10,000 RPS

Run at 10,000 RPS with a Spike traffic pattern.

Results:

Node	Status
All nodes	✅ Healthy
Lambda	Warm instances handling burst; cold start errors near zero
DynamoDB	Scaling automatically; no hot partition signals
Estimated cost	~$560–$580/day at this load level

AI Recommendations output: Cost optimisation opportunities — API Gateway is the most expensive component. Consider reserved capacity savings plans for predictable load portions.

Final architecture summary

Component	Configuration
CloudFront	Balanced cache mode
ALB	Multi-region routing
API Gateway	Two regional instances
SNS	Standard topic, fan-out to two regional SQS queues
SQS	Standard queues, visibility timeout 60s
Lambda	1,024 MB, provisioned concurrency 1,000 (per region)
DynamoDB	DAX enabled, on-demand capacity mode

Key lessons from this session

Lambda throttling is almost always the first bottleneck in serverless API architectures. Address it with SQS buffering and provisioned concurrency before investigating other nodes.
Always stop and restart after changing concurrency settings. Resuming a paused simulation will not pick up new concurrency values.
API Gateway becomes the final bottleneck at very high RPS once Lambda and DynamoDB are healthy. At that point, the fix is architectural (move to async / add CloudFront) rather than configurational.
DAX and on-demand mode together make DynamoDB effectively invisible as a bottleneck for read-heavy workloads. Enable both during design and tune later.
The Cloud Terminal was used throughout this session to query Lambda and DynamoDB service limits without leaving PinPole — saving multiple trips to AWS documentation.

Starting architecture​

Iteration 1 — Baseline simulation​

Iteration 2 — Add SQS buffering and increase Lambda concurrency​

Iteration 3 — Add ALB and multi-region topology​

Iteration 4 — Add SNS fan-out and enable DynamoDB DAX​

Iteration 5 — Final simulation at 10,000 RPS​

Final architecture summary​

Key lessons from this session​

Starting architecture

Iteration 1 — Baseline simulation

Iteration 2 — Add SQS buffering and increase Lambda concurrency

Iteration 3 — Add ALB and multi-region topology

Iteration 4 — Add SNS fan-out and enable DynamoDB DAX

Iteration 5 — Final simulation at 10,000 RPS

Final architecture summary

Key lessons from this session