Capacity Planning for Startups: How Much Infrastructure Do You Actually Need?
Practical capacity planning guide for startups - framework, architecture-specific strategies, cost modeling, and planning for launches and seasonal events.
Capacity planning is the process of determining how much infrastructure you need to handle your current and future traffic. Get it wrong in one direction and you over-provision, spending $20,000/month on infrastructure that handles 20% utilization. Get it wrong in the other direction and your service falls over during the product launch you spent months preparing.
Most startups get capacity planning wrong in predictable ways. This guide covers the common failure modes, a practical five-step framework, architecture-specific strategies, and how to plan for the events that stress-test your capacity assumptions most.
Why Startups Get Capacity Planning Wrong
Over-provisioning failure: “We need to be ready for scale” leads to provisioning enterprise-grade infrastructure from day one. A startup with 1,000 daily users running on 20 EC2 instances and a multi-AZ database cluster is wasting $15,000/month. Cash spent on unused infrastructure is cash not spent on engineers, marketing, or product development.
Under-provisioning failure: “We’ll scale when we need to” leads to outages during the exact moments that matter most - product launches, press mentions, marketing campaigns. A 4-hour outage when TechCrunch links to you is not just a technical failure; it is a marketing catastrophe.
Estimate-without-data failure: Estimating future load based on gut feel rather than measured data produces wide confidence intervals. “I think we’ll get 10x more traffic” could mean 2x or 50x depending on how the campaign performs.
Staging-doesn’t-represent-production failure: Capacity planning based on staging performance data systematically underestimates production resource requirements. Staging has smaller datasets, warmer caches (fewer cold-start effects), and different traffic patterns.
The Five-Step Capacity Planning Framework
Step 1: Measure Your Current Baseline
Before projecting future needs, precisely understand your current resource utilization and performance characteristics.
Collect the following metrics from your current production environment:
- CPU utilization per instance at average and peak load
- Memory utilization per instance at average and peak
- Database connections in use at peak
- Database CPU and IOPS at peak query load
- Network throughput (in and out) at peak
- Request rate at peak (requests per second)
- p50, p95, p99 latency at peak load
Most APM tools (Datadog, New Relic, CloudWatch) provide this data with appropriate time windows. Pull metrics for a “worst recent day” - the highest traffic day in the past 30 days.
Step 2: Understand Your Traffic Growth Rate
If you have historical data, calculate your week-over-week or month-over-month traffic growth rate.
# Calculate growth rate from traffic data
import numpy as np
# Weekly unique visitors for last 8 weeks
weekly_traffic = [12000, 12500, 13800, 14200, 15100, 16400, 17800, 19200]
# Calculate week-over-week growth rates
growth_rates = [(weekly_traffic[i] / weekly_traffic[i-1]) - 1
for i in range(1, len(weekly_traffic))]
avg_weekly_growth = np.mean(growth_rates)
print(f"Average weekly growth rate: {avg_weekly_growth:.1%}")
# Project forward 12 weeks
current = weekly_traffic[-1]
projections = [current * ((1 + avg_weekly_growth) ** w) for w in range(1, 13)]
print(f"12-week projection: {projections[-1]:,.0f} weekly users")
If you do not have growth data (you are pre-launch or early stage), use industry benchmarks: B2B SaaS companies average 5-15% monthly growth in their early stage; B2C consumer apps see more variance, from 0% to 50%+ monthly growth.
Step 3: Model Resource Requirements
Once you know your growth trajectory, project resource requirements. The key relationship is between traffic volume and resource consumption.
Example calculation:
Current state:
- Peak traffic: 500 req/s
- CPU per instance: 60% at 500 req/s
- 4 x c5.xlarge instances (4 vCPU each)
- Total compute: 16 vCPU at 60% utilization
Target state (3x traffic growth over 6 months):
- Projected peak: 1,500 req/s
- Required vCPU: 16 vCPU * (1500/500) = 48 vCPU
- With 25% safety buffer: 60 vCPU
- Instance count: 15 x c5.xlarge, or 8 x c5.2xlarge
Note: This assumes linear scaling, which is only valid for stateless services.
Database and cache scaling require separate calculation.
Step 4: Run Load Tests to Validate
Mathematical models are a starting point, not a conclusion. Load tests reveal the non-linearities that pure calculation misses:
- Database query performance often degrades non-linearly as data volume grows
- Cache hit rates change as user base grows and diversity of accessed data increases
- Shared resources (databases, message queues) create contention at scale that single-instance testing does not reveal
Run load tests at 100%, 150%, and 200% of your projected peak. If performance degrades beyond your SLOs at 150%, your infrastructure plan needs revision before the load arrives.
Step 5: Build Cost Models
Capacity planning is also budget planning. Build a cost model that shows infrastructure costs at different traffic levels.
Traffic tier analysis for application stack:
Current (500 req/s):
Compute: 4 x c5.xlarge ($0.17/hr each) = $0.68/hr = $500/mo
Database: db.r5.large ($0.24/hr) = $175/mo
Cache: cache.r6g.large ($0.15/hr) = $110/mo
Total: ~$785/mo
3x growth (1,500 req/s):
Compute: 12 x c5.xlarge = $1,500/mo
Database: db.r5.2xlarge or 1 primary + 1 read replica = $525/mo
Cache: cache.r6g.xlarge = $200/mo
Total: ~$2,225/mo (cost grows sub-linearly due to economies of scale)
10x growth (5,000 req/s):
Compute: 40 x c5.xlarge or 10 x c5.4xlarge = $5,000/mo
Database: db.r5.4xlarge + 2 read replicas = $1,400/mo
Cache: cache.r6g.2xlarge = $400/mo
Total: ~$6,800/mo
Understanding the cost trajectory helps engineering and finance plan infrastructure budgets as the business grows.
Architecture-Specific Capacity Strategies
Monolithic Applications
Monoliths scale primarily by adding instances behind a load balancer. The database is almost always the bottleneck before compute is.
Compute scaling: Add instances horizontally. Ensure the load balancer health check and deregistration delays are tuned for fast failover.
Database scaling path:
- Add indexes (no infrastructure cost)
- Add read replica + read/write split (1.5-2x read capacity)
- Upgrade instance size (2-4x, but expensive)
- Add connection pooler (PgBouncer reduces connection overhead)
- Implement application-layer caching for expensive queries (Redis)
Capacity limits: A well-optimized monolith on a single PostgreSQL primary can handle 1,000-5,000 req/s before you need to consider more significant database scaling. Most startups never reach this limit.
Microservices Architecture
Microservices offer fine-grained scaling - each service can be scaled independently based on its own load. But this also means capacity planning is more complex: each service needs its own analysis.
Service-level capacity planning: For each service, determine the resources it needs at your projected peak traffic. Services that are on the critical path (user-facing, synchronous) need more headroom than background services.
Dependency bottleneck analysis: In microservices, the slowest service determines the performance of any workflow that depends on it. Map your critical user journeys and identify which services are on the critical path. Capacity-plan those services with extra headroom.
Database per service: Microservices architectures often use separate databases per service. This is better for independence but means you manage more databases. Right-size each database based on the service’s specific load, not a one-size-fits-all configuration.
Serverless Architecture
Serverless (Lambda, Cloud Functions, Cloud Run) has a different capacity model: you do not provision servers; the platform scales automatically. But there are limits and costs that require planning.
Concurrency limits: AWS Lambda has account-level concurrency limits (3,000 by default, 10,000 maximum with a quota increase). A spike that triggers thousands of simultaneous Lambda invocations can exhaust your concurrency limit, causing throttling.
Cold starts: Lambda functions that scale from zero have cold starts (250ms-2s depending on runtime). Plan for this in your latency budget, and consider provisioned concurrency for latency-sensitive functions.
Cost modeling: Serverless costs are based on invocations and execution time. Model costs at your projected invocation volume:
Lambda cost at 1M invocations/day:
Request cost: 1,000,000 * $0.0000002 = $0.20/day
Compute cost: 1,000,000 invocations * 500ms * 512MB / 1024 * $0.0000166667 = $4.07/day
Total: ~$130/month at 1M daily invocations
At 10M invocations/day: ~$1,300/month
At 100M invocations/day: ~$13,000/month (compare to EC2 alternatives at this scale)
At very high invocation volumes, the EC2/container alternatives become more cost-effective. The crossover point is typically around 50-100M monthly invocations for most function configurations.
Cost Per Transaction: The Right Unit of Measurement
Infrastructure cost in absolute dollars is less meaningful than cost per transaction. Tracking cost/transaction over time shows whether you are becoming more efficient as you scale.
Cost per transaction calculation:
Month 1:
Infrastructure cost: $800/month
Monthly active users: 1,000
Transactions per MAU: 50
Total transactions: 50,000
Cost per transaction: $800 / 50,000 = $0.016
Month 6:
Infrastructure cost: $2,500/month (3x growth in infrastructure)
Monthly active users: 5,000 (5x growth in users)
Transactions per MAU: 55 (slight increase as product matures)
Total transactions: 275,000 (5.5x growth)
Cost per transaction: $2,500 / 275,000 = $0.009 (44% reduction)
Declining cost per transaction indicates infrastructure efficiency gains from economies of scale, better optimization, and architectural improvements. Increasing cost per transaction is a warning sign: growth is becoming more expensive, not cheaper.
Planning for High-Impact Events
Predictable high-traffic events require explicit capacity planning because auto-scaling cannot respond fast enough to instantaneous traffic spikes.
Product Launches
A product launch generates a burst of traffic that peaks in the first few hours, then stabilizes. Characteristics:
- Traffic spike: 5-20x normal
- Duration of spike: 1-4 hours
- Traffic source: direct (press links, Product Hunt), not organic
Pre-launch actions:
- Run a load test at 20x normal traffic 1 week before launch
- Pre-scale compute to 3x normal capacity before the launch announcement
- Warm up caches by running a data prefetch job the night before
- Verify auto-scaling can handle the expected peak before it arrives
- Put your database on a larger instance class 24 hours before launch (this takes a few minutes of downtime for non-Aurora databases)
Marketing Campaigns
Email campaigns and paid advertising generate predictable traffic spikes tied to campaign send times.
Planning for email campaigns:
- Pull historical open and click-through rates
- Calculate expected simultaneous visitors assuming 20-30% of opens happen in the first hour
- Scale infrastructure before the campaign send, not in response to traffic
Planning for paid campaigns:
- Traffic is more sustained than email but has daytime/nighttime patterns
- Scale based on campaign daily budget and expected CTR
- Monitor in real-time for unexpected viral spread
Black Friday / Seasonal Events
E-commerce and consumer SaaS companies often see 5-30x traffic increases during peak shopping periods.
Capacity planning timeline:
- 8 weeks before: Run load tests at expected peak
- 4 weeks before: Identify and fix bottlenecks found in load tests
- 2 weeks before: Pre-scale infrastructure, verify auto-scaling policies
- 1 week before: Run final validation load test at projected peak
- Day of: Have an engineer on standby monitoring dashboards
Viral Traffic (Unplanned)
Viral traffic is the hardest to plan for because it is unpredictable. The goal is not to predict it but to ensure your auto-scaling is configured to handle 10x normal traffic automatically.
Viral readiness checklist:
- Auto-scaling responds within 3 minutes to traffic increases
- Maximum auto-scaling target is at least 10x current normal
- CDN handles all static asset traffic (reduces origin load dramatically)
- Database connection pooler handles burst connections without exhausting database connections
- On-call engineer can pre-scale manually within 5 minutes of notification
Load testing is the foundation of capacity planning. You cannot plan capacity accurately without data on how your system performs at different load levels. Our capacity planning assessment combines load testing with infrastructure cost modeling to give you a clear picture of what you need and when.
Know Your Scaling Ceiling
Book a free 30-minute capacity scope call with our load testing engineers. We review your architecture, traffic expectations, and upcoming scaling events — and scope the load test that will give you the data you need.
Talk to an Expert