Caching (Why Some Requests Never Reach Your Servers at All)
A first-principles explanation of caching, why it works, where it lives, and why it quietly powers fast systems.
A Strange but Familiar Experience
You open a website.
The first load is slow.
You refresh.
Suddenly, it’s instant.
Nothing changed. No new server was added. No code was deployed.
So why did it become fast?
Because the second time,
the system didn’t work at all.
It remembered.
The Core Idea (Without Jargon)
Caching is simple:
If the answer is already known, don’t recompute it.
Instead of:
- recalculating
- hitting databases
- calling downstream services
The system responds immediately.
Caching is not about speed alone.
It’s about avoiding unnecessary work.
A Simple Story: Asking for Directions
You ask someone for directions.
They explain it carefully.
Five minutes later, you ask again.
They don’t rethink the route. They just repeat the answer.
That repetition is caching.
Where Caching Lives (Not Just One Place)
Caching isn’t a single component.
It’s a behavior that appears at multiple layers.
Browser cache
Images, scripts, pages, API responsesCDN cache
Content served close to usersReverse proxy cache
Frequently requested responsesApplication cache
Computed results kept in memoryDatabase cache
Query results, indexes, buffers
Most requests are answered
before they reach your core logic.
And that’s intentional.
What a Cached Flow Looks Like
flowchart LR
User --> Cache
Cache -->|Miss| Server
Server --> Cache
Cache -->|Hit| User
- Cache miss → real work happens
- Cache hit → instant response
Good systems aim for more hits than misses.
Why Caching Changes Everything
Without caching:
- servers work for every request
- databases get hammered
- latency stacks up
With caching:
- systems feel fast
- load drops dramatically
- failures hurt less
Caching doesn’t just improve performance.
It buys breathing room.
⚠️ Common Trap
Trap: Treating caching as a free performance win.
Caching introduces:
- stale data
- consistency problems
- invalidation complexity
This leads to the classic saying:
“There are only two hard things in computer science:
cache invalidation and naming things.”
Caching shifts complexity — it doesn’t remove it.
A Real Failure You’ve Seen
Many large outages weren’t caused by traffic spikes.
They were caused by:
- bad cache keys
- missing invalidation
- stale data being served globally
Users didn’t see errors. They saw wrong information.
Caching failures are subtle — and dangerous.
How This Connects to What We’ve Learned
Reverse Proxy
Proxies often cache responses to protect servers.
https://vivekmolkar.com/posts/reverse-proxy/Load Balancing
Caching reduces pressure before load balancing even matters.
https://vivekmolkar.com/posts/load-balancing/Scalability vs Performance
Caching improves performance, but can hide scalability limits.
https://vivekmolkar.com/posts/scalability-vs-performance/
Caching works because the system chooses not to work.
🧪 Mini Exercise
Pick an API or page you know.
- What parts of the response are safe to cache?
- What parts must always be fresh?
- What happens if cached data is wrong?
If you can’t answer these, caching will eventually hurt you.
What’s Coming Next
Caching introduces a dangerous question:
What happens when cached data becomes wrong?
Next: Cache Invalidation
Why making things fast is easy — keeping them correct is hard.
