logo
Published on

Why We Do Indexing: Not Just for Speed, but for Cost and Hardware Reality

3 min read

Authors
  • avatar
    Name
    Shuwen
    Twitter

Why We Do Indexing: Not Just for Speed, but for Cost and Hardware Reality

As software engineers, we are often asked a familiar question: why do we use database indexing?

The textbook answer is straightforward:

  • To make queries faster
  • To reduce time complexity from O(n) to O(log n)
  • By using data structures such as B-trees or B+-trees
  • At the cost of extra storage and maintenance during inserts, updates, and deletes

All of this is true. But it is not the full story.

There is a deeper, more practical reason behind indexing — one that matters greatly in real systems, cloud databases, and production costs.

The Missing Layer: Disk I/O Is the Real Bottleneck

Databases ultimately store data on disk.

Even in modern systems with caching and memory layers:

  • Memory access ≈ nanoseconds
  • Disk access ≈ milliseconds

That difference is millions of times slower.

When a query has no usable index, the database must:

  • Scan rows
  • Fetch pages from disk
  • Perform repeated I/O operations

Each of these steps translates into physical disk reads.

Indexing Reduces Disk Access — Not Just CPU Time

Indexes allow the database engine to:

  • Jump directly to relevant data pages
  • Read far fewer disk blocks
  • Avoid full table scans

This means:

  • Fewer disk reads
  • Less I/O pressure
  • Better cache utilization
  • Lower latency

In other words, indexing is fundamentally an I/O optimization, not just a CPU optimization.

Disk Has a Cost — and a Lifespan

Every disk read:

  • Consumes I/O bandwidth
  • Adds wear (especially on SSD-based systems)
  • Competes with other workloads

On physical hardware, excessive I/O:

  • Shortens disk lifespan
  • Increases operational risk

In cloud environments, it directly affects billing.

Cloud Reality: I/O Is Money

In managed databases such as Amazon Aurora, pricing includes:

  • Storage
  • Compute
  • I/O operations

Aurora charges based on:

  • The number of disk read and write requests

So when a query:

  • Scans millions of rows
  • Performs unnecessary disk reads
  • Repeats this thousands or millions of times

You are not just slowing down your system — you are burning money.

A Concrete Example

Imagine a legacy system with:

  • Poor or missing indexes
  • Heavy read traffic
  • Analytical or reporting queries

If that system generates:

100 billion I/O requests per month

Even a small per-request cost becomes enormous at that scale.

Now introduce:

  • Proper indexing
  • Query rewrites
  • Better access patterns

If indexing reduces disk reads by 80–90%:

  • Query latency drops
  • CPU usage stabilizes
  • Cloud cost drops dramatically

This is not theoretical. This happens all the time in real production systems.

The Trade-Off: Index Maintenance Is Worth It

Yes, indexing has costs:

  • Extra storage
  • Slower writes
  • Index maintenance during INSERT / UPDATE / DELETE

But in most real-world systems:

  • Reads vastly outnumber writes
  • Cloud databases charge for I/O
  • Disk access dominates cost and latency

So the trade-off strongly favors indexing.

A Better Way to Answer “Why Indexing?”

Instead of saying, “Indexing makes queries faster,” a stronger answer is:

Indexing reduces disk I/O, which improves performance, extends hardware life, and significantly lowers cloud costs.

Final Takeaway

Indexing exists because:

  • Disk access is slow
  • Disk access is expensive
  • Disk access is limited
  • Disk access is billed in the cloud

Speed is just the visible benefit. Cost and hardware reality are the hidden reasons.

© 2025 Shuwen