Caching (Why Some Requests Never Reach Your Servers at All)

A first-principles explanation of caching, why it works, where it lives, and why it quietly powers fast systems.

Posted Jan 7, 2026 Updated Jan 7, 2026

By Vivek Molkar

2 min read

Caching (Why Some Requests Never Reach Your Servers at All)

A Strange but Familiar Experience

You open a website.

The first load is slow.
You refresh.

Suddenly, it’s instant.

Nothing changed. No new server was added. No code was deployed.

So why did it become fast?

Because the second time,
the system didn’t work at all.

It remembered.

The Core Idea (Without Jargon)

Caching is simple:

If the answer is already known, don’t recompute it.

Instead of:

recalculating
hitting databases
calling downstream services

The system responds immediately.

Caching is not about speed alone.
It’s about avoiding unnecessary work.

A Simple Story: Asking for Directions

You ask someone for directions.

They explain it carefully.

Five minutes later, you ask again.

They don’t rethink the route. They just repeat the answer.

That repetition is caching.

Where Caching Lives (Not Just One Place)

Caching isn’t a single component.
It’s a behavior that appears at multiple layers.

Browser cache
Images, scripts, pages, API responses
CDN cache
Content served close to users
Reverse proxy cache
Frequently requested responses
Application cache
Computed results kept in memory
Database cache
Query results, indexes, buffers

Most requests are answered
before they reach your core logic.

And that’s intentional.

What a Cached Flow Looks Like

flowchart LR
    User --> Cache
    Cache -->|Miss| Server
    Server --> Cache
    Cache -->|Hit| User

Cache miss → real work happens
Cache hit → instant response

Good systems aim for more hits than misses.

Why Caching Changes Everything

Without caching:

servers work for every request
databases get hammered
latency stacks up

With caching:

systems feel fast
load drops dramatically
failures hurt less

Caching doesn’t just improve performance.
It buys breathing room.

⚠️ Common Trap

Trap: Treating caching as a free performance win.

Caching introduces:

stale data
consistency problems
invalidation complexity

This leads to the classic saying:

“There are only two hard things in computer science:
cache invalidation and naming things.”

Caching shifts complexity — it doesn’t remove it.

A Real Failure You’ve Seen

Many large outages weren’t caused by traffic spikes.

They were caused by:

bad cache keys
missing invalidation
stale data being served globally

Users didn’t see errors. They saw wrong information.

Caching failures are subtle — and dangerous.

How This Connects to What We’ve Learned

Reverse Proxy
Proxies often cache responses to protect servers.
https://vivekmolkar.com/posts/reverse-proxy/
Load Balancing
Caching reduces pressure before load balancing even matters.
https://vivekmolkar.com/posts/load-balancing/
Scalability vs Performance
Caching improves performance, but can hide scalability limits.
https://vivekmolkar.com/posts/scalability-vs-performance/

Caching works because the system chooses not to work.

🧪 Mini Exercise

Pick an API or page you know.

What parts of the response are safe to cache?
What parts must always be fresh?
What happens if cached data is wrong?

If you can’t answer these, caching will eventually hurt you.

What’s Coming Next

Caching introduces a dangerous question:

What happens when cached data becomes wrong?

Next: Cache Invalidation
Why making things fast is easy — keeping them correct is hard.

System Design

This post is licensed under CC BY 4.0 by the author.