Skip to main content

Worked Example — High-Scale Recommendation API

This example walks through a complete design-and-simulation session for a high-throughput read API — a movie recommendation service designed to serve 10,000 requests per second concurrently. It illustrates how PinPole surfaces bottlenecks iteratively and guides the architecture toward a production-ready state.

note

This example is drawn from a real PinPole simulation session.

Target load: 10,000 RPS
Goal: Zero throttling, sub-200ms p99 latency

Starting architecture

CloudFront → API Gateway → Lambda → DynamoDB

Iteration 1 — Baseline simulation

Run the simulation at 10,000 RPS with a Constant traffic pattern.

Results:

NodeStatus
CloudFront✅ Healthy
API Gateway✅ Healthy
Lambda❌ Throttling — concurrency limit reached, ~32% error rate
DynamoDB✅ Healthy

Lambda is the first and only bottleneck. The architecture cannot handle the target load in its current state.

AI Recommendation: Add SQS before Lambda to buffer incoming requests and prevent throttling.


Iteration 2 — Add SQS buffering and increase Lambda concurrency

Accept the SQS recommendation. Wire: CloudFront → API Gateway → SQS → Lambda → DynamoDB.

Also set Lambda provisioned concurrency to 1,000 in the Node Configuration panel. This keeps 1,000 Lambda instances warm at all times, eliminating cold start latency for the expected concurrency volume.

Stop and restart required

Stop and restart the simulation after changing concurrency settings — resuming a paused run will not reflect the new warm pool size.

Run at 10,000 RPS.

Results:

ObservationDetail
Lambda cold start error rateDrops to ~5% (residual fraction exceeding provisioned capacity)
DynamoDB✅ Healthy
API GatewayShowing some throttling at peak

AI Recommendation: API Gateway is a single point of failure — consider adding a load balancer. Also: increase Lambda memory.


Iteration 3 — Add ALB and multi-region topology

Accept the load balancer recommendation. Add an Application Load Balancer (ALB) before API Gateway. Extend the architecture to two regions (e.g. Mumbai and US East) to distribute load:

CloudFront → ALB
→ Region 1: API Gateway → SQS → Lambda → DynamoDB
→ Region 2: API Gateway → SQS → Lambda → DynamoDB

Also increase Lambda memory to 1,024 MB. Lambda allocates CPU proportionally to memory — increasing memory increases processing speed and reduces execution duration, which reduces both latency and cost.

AI Recommendation: Introduce SNS fan-out between the API Gateway and the SQS queues to enable proper multi-region message distribution.


Iteration 4 — Add SNS fan-out and enable DynamoDB DAX

Accept the SNS recommendation. The topology becomes:

CloudFront → ALB
→ API Gateway → SNS Topic
→ SQS (Region 1) → Lambda (Region 1) → DynamoDB (Region 1)
→ SQS (Region 2) → Lambda (Region 2) → DynamoDB (Region 2)

Enable DAX on both DynamoDB nodes. DAX adds an in-memory cache in front of DynamoDB, dramatically improving read throughput for repeated lookups of the same data — common in recommendation workloads where popular items are read thousands of times per second.

Switch DynamoDB from provisioned to on-demand capacity mode to avoid hot-partition throttling under variable load.


Iteration 5 — Final simulation at 10,000 RPS

Run at 10,000 RPS with a Spike traffic pattern.

Results:

NodeStatus
All nodes✅ Healthy
LambdaWarm instances handling burst; cold start errors near zero
DynamoDBScaling automatically; no hot partition signals
Estimated cost~$560–$580/day at this load level

AI Recommendations output: Cost optimisation opportunities — API Gateway is the most expensive component. Consider reserved capacity savings plans for predictable load portions.


Final architecture summary

ComponentConfiguration
CloudFrontBalanced cache mode
ALBMulti-region routing
API GatewayTwo regional instances
SNSStandard topic, fan-out to two regional SQS queues
SQSStandard queues, visibility timeout 60s
Lambda1,024 MB, provisioned concurrency 1,000 (per region)
DynamoDBDAX enabled, on-demand capacity mode

Key lessons from this session

  • Lambda throttling is almost always the first bottleneck in serverless API architectures. Address it with SQS buffering and provisioned concurrency before investigating other nodes.
  • Always stop and restart after changing concurrency settings. Resuming a paused simulation will not pick up new concurrency values.
  • API Gateway becomes the final bottleneck at very high RPS once Lambda and DynamoDB are healthy. At that point, the fix is architectural (move to async / add CloudFront) rather than configurational.
  • DAX and on-demand mode together make DynamoDB effectively invisible as a bottleneck for read-heavy workloads. Enable both during design and tune later.
  • The Cloud Terminal was used throughout this session to query Lambda and DynamoDB service limits without leaving PinPole — saving multiple trips to AWS documentation.