- Published on
Why We Do Indexing: Not Just for Speed, but for Cost and Hardware Reality
3 min read
- Authors
- Name
- Shuwen
Table of Contents
- Why We Do Indexing: Not Just for Speed, but for Cost and Hardware Reality
- The Missing Layer: Disk I/O Is the Real Bottleneck
- Indexing Reduces Disk Access — Not Just CPU Time
- Disk Has a Cost — and a Lifespan
- Cloud Reality: I/O Is Money
- A Concrete Example
- The Trade-Off: Index Maintenance Is Worth It
- A Better Way to Answer “Why Indexing?”
- Final Takeaway
Why We Do Indexing: Not Just for Speed, but for Cost and Hardware Reality
As software engineers, we are often asked a familiar question: why do we use database indexing?
The textbook answer is straightforward:
- To make queries faster
- To reduce time complexity from O(n) to O(log n)
- By using data structures such as B-trees or B+-trees
- At the cost of extra storage and maintenance during inserts, updates, and deletes
All of this is true. But it is not the full story.
There is a deeper, more practical reason behind indexing — one that matters greatly in real systems, cloud databases, and production costs.
The Missing Layer: Disk I/O Is the Real Bottleneck
Databases ultimately store data on disk.
Even in modern systems with caching and memory layers:
- Memory access ≈ nanoseconds
- Disk access ≈ milliseconds
That difference is millions of times slower.
When a query has no usable index, the database must:
- Scan rows
- Fetch pages from disk
- Perform repeated I/O operations
Each of these steps translates into physical disk reads.
Indexing Reduces Disk Access — Not Just CPU Time
Indexes allow the database engine to:
- Jump directly to relevant data pages
- Read far fewer disk blocks
- Avoid full table scans
This means:
- Fewer disk reads
- Less I/O pressure
- Better cache utilization
- Lower latency
In other words, indexing is fundamentally an I/O optimization, not just a CPU optimization.
Disk Has a Cost — and a Lifespan
Every disk read:
- Consumes I/O bandwidth
- Adds wear (especially on SSD-based systems)
- Competes with other workloads
On physical hardware, excessive I/O:
- Shortens disk lifespan
- Increases operational risk
In cloud environments, it directly affects billing.
Cloud Reality: I/O Is Money
In managed databases such as Amazon Aurora, pricing includes:
- Storage
- Compute
- I/O operations
Aurora charges based on:
- The number of disk read and write requests
So when a query:
- Scans millions of rows
- Performs unnecessary disk reads
- Repeats this thousands or millions of times
You are not just slowing down your system — you are burning money.
A Concrete Example
Imagine a legacy system with:
- Poor or missing indexes
- Heavy read traffic
- Analytical or reporting queries
If that system generates:
100 billion I/O requests per month
Even a small per-request cost becomes enormous at that scale.
Now introduce:
- Proper indexing
- Query rewrites
- Better access patterns
If indexing reduces disk reads by 80–90%:
- Query latency drops
- CPU usage stabilizes
- Cloud cost drops dramatically
This is not theoretical. This happens all the time in real production systems.
The Trade-Off: Index Maintenance Is Worth It
Yes, indexing has costs:
- Extra storage
- Slower writes
- Index maintenance during INSERT / UPDATE / DELETE
But in most real-world systems:
- Reads vastly outnumber writes
- Cloud databases charge for I/O
- Disk access dominates cost and latency
So the trade-off strongly favors indexing.
A Better Way to Answer “Why Indexing?”
Instead of saying, “Indexing makes queries faster,” a stronger answer is:
Indexing reduces disk I/O, which improves performance, extends hardware life, and significantly lowers cloud costs.
Final Takeaway
Indexing exists because:
- Disk access is slow
- Disk access is expensive
- Disk access is limited
- Disk access is billed in the cloud
Speed is just the visible benefit. Cost and hardware reality are the hidden reasons.
