Throwing hardware at a slow database is the most expensive solution
Database performance issues are usually fixable in software (queries, indexes, schema) far cheaper than in hardware (bigger instance, more memory). Below are the patterns that disproportionately move the needle.EXPLAIN, then EXPLAIN ANALYZE
Every slow query gets EXPLAIN'd before any other action.What to look for:
- Sequential scan on a large table — usually a missing index. Sometimes intentional (small table, index would be slower).
- Nested loop on big tables — usually a poorly chosen join order, sometimes a missing index on the join key.
- Sort + Limit pattern with large rows scanned — sort is using too much memory, or the query is scanning more than it needs.
- Hash join with hashtable spilling to disk — work_mem too small, or the join is fundamentally large.
- Bitmap heap scan — index in use, but the access pattern is read-many-rows-from-disk. Sometimes fine, sometimes a sign of low selectivity.
EXPLAIN is the plan; EXPLAIN ANALYZE is the actual execution including time per node. Use ANALYZE in non-prod (it actually runs the query).
Indexes — the right ones, not all of them[/HEADING>
- B-tree — the default. Use for equality and range queries on a column.
- Composite — multi-column index. Order matters: leftmost columns must be in the WHERE clause for the index to be usable.
- Partial — index only rows matching a predicate (e.g. `WHERE deleted_at IS NULL`). Smaller, faster.
- Expression — index on a function of a column (e.g. `LOWER(email)`). Required for case-insensitive search to use index.
- GIN / GIST — for full-text, JSONB, geospatial, array containment.
The cost of indexes[/HEADING>
Indexes speed reads and slow writes. Each insert / update / delete updates every index on the affected table. The team that adds an index per query without thinking has a write-bottleneck waiting to happen.
The discipline:
- Indexes are added with intent, against a specific slow query
- Periodically audit unused indexes (pg_stat_user_indexes — drop indexes that aren't being read)
- Prefer composite indexes that serve multiple queries over many single-column indexes
Query patterns that disproportionately help[/HEADING>
- LIMIT with ORDER BY uses indexes when ORDER BY matches the index
- EXISTS over IN for "does any row match" — stops at first match
- Avoid SELECT * — fetch only columns needed
- Avoid OFFSET on large datasets — use cursor-based pagination instead
- Use prepared statements — query plan reuse, parameterisation
- Batch INSERTs — much faster than per-row
- COPY for bulk loads — orders of magnitude faster than INSERTs
N+1 queries: the pattern that kills web apps[/HEADING>
The application loads a list of items, then for each item makes a separate query to fetch related data. 1 + N queries instead of 1 query with a JOIN.
The fix is at the application level (eager loading, JOIN explicitly, batch the children query). The signal: query log full of identical-looking queries with different IDs.
Cache hit ratios[/HEADING>
For Postgres:
- Buffer cache hit ratio — should be >99 % on healthy OLTP. Lower means working set doesn't fit in shared_buffers.
- Index cache hit ratio — same target.
If hit ratios are low, either give the database more memory or reduce the working set (better queries, partitioning, archiving).
Locking and contention[/HEADING>
- Lock waits in the slow query log = contention. Identify the blocking query and the blocked queries.
- Long-running transactions hold locks. Avoid by keeping transactions short.
- Update-heavy hotspots (a counter that everyone increments) → consider a different design (eventual aggregation, sharded counters).
Vacuum and bloat[/HEADING>
For Postgres:
- Vacuum reclaims space from dead tuples (post-update / delete)
- Autovacuum is on by default. Tune it for heavy-write tables.
- Bloat — table size larger than necessary due to unreclaimed space. pg_repack or VACUUM FULL to reclaim. Plan for it on big tables.
Connection costs[/HEADING>
Each database connection has memory overhead. 200 idle connections are not free.
- Connection pooling (pgBouncer for Postgres) reduces actual connections to the database
- Application connection pools sized to the realistic concurrent active queries, not the theoretical maximum
Specific hot spots to check[/HEADING>
- Auto-increment ID hot spots in distributed setups
- Triggers that fire on every row change in bulk operations
- Foreign keys without indexes on the child column
- TEXT columns in the row used in WHERE without trigram or GIN index
- Frequent boolean updates that constantly invalidate caches
Hardware as the last lever[/HEADING>
After software-level tuning has been done, hardware moves the needle:
- More RAM — keeps working set in cache
- Faster disks (NVMe) — reduces IO-wait
- More CPU — for query parallelism on Postgres 14+
In our experience, ~80 % of "the database is slow" issues resolve at the software layer. The other 20 % are genuinely hardware-bound or architectural.
One pattern we'd warn about[/HEADING>
Adding indexes blindly because "more indexes = faster queries". Each index is a write tax. Fewer well-chosen indexes beat many speculative ones.
One pattern that always pays off[/HEADING>
A weekly slow-query review. Top 10 slowest queries, top 10 most-frequent, top 10 by total time. Most teams find low-hanging optimisation in 30 minutes.
What's the most surprising performance fix you've shipped? And — for the Postgres folks — has pg_stat_statements + pg_hint_plan replaced your need for an APM tool?
Indexes speed reads and slow writes. Each insert / update / delete updates every index on the affected table. The team that adds an index per query without thinking has a write-bottleneck waiting to happen.
The discipline:
- Indexes are added with intent, against a specific slow query
- Periodically audit unused indexes (pg_stat_user_indexes — drop indexes that aren't being read)
- Prefer composite indexes that serve multiple queries over many single-column indexes
Query patterns that disproportionately help[/HEADING>
- LIMIT with ORDER BY uses indexes when ORDER BY matches the index
- EXISTS over IN for "does any row match" — stops at first match
- Avoid SELECT * — fetch only columns needed
- Avoid OFFSET on large datasets — use cursor-based pagination instead
- Use prepared statements — query plan reuse, parameterisation
- Batch INSERTs — much faster than per-row
- COPY for bulk loads — orders of magnitude faster than INSERTs
N+1 queries: the pattern that kills web apps[/HEADING>
The application loads a list of items, then for each item makes a separate query to fetch related data. 1 + N queries instead of 1 query with a JOIN.
The fix is at the application level (eager loading, JOIN explicitly, batch the children query). The signal: query log full of identical-looking queries with different IDs.
Cache hit ratios[/HEADING>
For Postgres:
- Buffer cache hit ratio — should be >99 % on healthy OLTP. Lower means working set doesn't fit in shared_buffers.
- Index cache hit ratio — same target.
If hit ratios are low, either give the database more memory or reduce the working set (better queries, partitioning, archiving).
Locking and contention[/HEADING>
- Lock waits in the slow query log = contention. Identify the blocking query and the blocked queries.
- Long-running transactions hold locks. Avoid by keeping transactions short.
- Update-heavy hotspots (a counter that everyone increments) → consider a different design (eventual aggregation, sharded counters).
Vacuum and bloat[/HEADING>
For Postgres:
- Vacuum reclaims space from dead tuples (post-update / delete)
- Autovacuum is on by default. Tune it for heavy-write tables.
- Bloat — table size larger than necessary due to unreclaimed space. pg_repack or VACUUM FULL to reclaim. Plan for it on big tables.
Connection costs[/HEADING>
Each database connection has memory overhead. 200 idle connections are not free.
- Connection pooling (pgBouncer for Postgres) reduces actual connections to the database
- Application connection pools sized to the realistic concurrent active queries, not the theoretical maximum
Specific hot spots to check[/HEADING>
- Auto-increment ID hot spots in distributed setups
- Triggers that fire on every row change in bulk operations
- Foreign keys without indexes on the child column
- TEXT columns in the row used in WHERE without trigram or GIN index
- Frequent boolean updates that constantly invalidate caches
Hardware as the last lever[/HEADING>
After software-level tuning has been done, hardware moves the needle:
- More RAM — keeps working set in cache
- Faster disks (NVMe) — reduces IO-wait
- More CPU — for query parallelism on Postgres 14+
In our experience, ~80 % of "the database is slow" issues resolve at the software layer. The other 20 % are genuinely hardware-bound or architectural.
One pattern we'd warn about[/HEADING>
Adding indexes blindly because "more indexes = faster queries". Each index is a write tax. Fewer well-chosen indexes beat many speculative ones.
One pattern that always pays off[/HEADING>
A weekly slow-query review. Top 10 slowest queries, top 10 most-frequent, top 10 by total time. Most teams find low-hanging optimisation in 30 minutes.
What's the most surprising performance fix you've shipped? And — for the Postgres folks — has pg_stat_statements + pg_hint_plan replaced your need for an APM tool?
The application loads a list of items, then for each item makes a separate query to fetch related data. 1 + N queries instead of 1 query with a JOIN.
The fix is at the application level (eager loading, JOIN explicitly, batch the children query). The signal: query log full of identical-looking queries with different IDs.
Cache hit ratios[/HEADING>
For Postgres:
- Buffer cache hit ratio — should be >99 % on healthy OLTP. Lower means working set doesn't fit in shared_buffers.
- Index cache hit ratio — same target.
If hit ratios are low, either give the database more memory or reduce the working set (better queries, partitioning, archiving).
Locking and contention[/HEADING>
- Lock waits in the slow query log = contention. Identify the blocking query and the blocked queries.
- Long-running transactions hold locks. Avoid by keeping transactions short.
- Update-heavy hotspots (a counter that everyone increments) → consider a different design (eventual aggregation, sharded counters).
Vacuum and bloat[/HEADING>
For Postgres:
- Vacuum reclaims space from dead tuples (post-update / delete)
- Autovacuum is on by default. Tune it for heavy-write tables.
- Bloat — table size larger than necessary due to unreclaimed space. pg_repack or VACUUM FULL to reclaim. Plan for it on big tables.
Connection costs[/HEADING>
Each database connection has memory overhead. 200 idle connections are not free.
- Connection pooling (pgBouncer for Postgres) reduces actual connections to the database
- Application connection pools sized to the realistic concurrent active queries, not the theoretical maximum
Specific hot spots to check[/HEADING>
- Auto-increment ID hot spots in distributed setups
- Triggers that fire on every row change in bulk operations
- Foreign keys without indexes on the child column
- TEXT columns in the row used in WHERE without trigram or GIN index
- Frequent boolean updates that constantly invalidate caches
Hardware as the last lever[/HEADING>
After software-level tuning has been done, hardware moves the needle:
- More RAM — keeps working set in cache
- Faster disks (NVMe) — reduces IO-wait
- More CPU — for query parallelism on Postgres 14+
In our experience, ~80 % of "the database is slow" issues resolve at the software layer. The other 20 % are genuinely hardware-bound or architectural.
One pattern we'd warn about[/HEADING>
Adding indexes blindly because "more indexes = faster queries". Each index is a write tax. Fewer well-chosen indexes beat many speculative ones.
One pattern that always pays off[/HEADING>
A weekly slow-query review. Top 10 slowest queries, top 10 most-frequent, top 10 by total time. Most teams find low-hanging optimisation in 30 minutes.
What's the most surprising performance fix you've shipped? And — for the Postgres folks — has pg_stat_statements + pg_hint_plan replaced your need for an APM tool?
- Lock waits in the slow query log = contention. Identify the blocking query and the blocked queries.
- Long-running transactions hold locks. Avoid by keeping transactions short.
- Update-heavy hotspots (a counter that everyone increments) → consider a different design (eventual aggregation, sharded counters).
Vacuum and bloat[/HEADING>
For Postgres:
- Vacuum reclaims space from dead tuples (post-update / delete)
- Autovacuum is on by default. Tune it for heavy-write tables.
- Bloat — table size larger than necessary due to unreclaimed space. pg_repack or VACUUM FULL to reclaim. Plan for it on big tables.
Connection costs[/HEADING>
Each database connection has memory overhead. 200 idle connections are not free.
- Connection pooling (pgBouncer for Postgres) reduces actual connections to the database
- Application connection pools sized to the realistic concurrent active queries, not the theoretical maximum
Specific hot spots to check[/HEADING>
- Auto-increment ID hot spots in distributed setups
- Triggers that fire on every row change in bulk operations
- Foreign keys without indexes on the child column
- TEXT columns in the row used in WHERE without trigram or GIN index
- Frequent boolean updates that constantly invalidate caches
Hardware as the last lever[/HEADING>
After software-level tuning has been done, hardware moves the needle:
- More RAM — keeps working set in cache
- Faster disks (NVMe) — reduces IO-wait
- More CPU — for query parallelism on Postgres 14+
In our experience, ~80 % of "the database is slow" issues resolve at the software layer. The other 20 % are genuinely hardware-bound or architectural.
One pattern we'd warn about[/HEADING>
Adding indexes blindly because "more indexes = faster queries". Each index is a write tax. Fewer well-chosen indexes beat many speculative ones.
One pattern that always pays off[/HEADING>
A weekly slow-query review. Top 10 slowest queries, top 10 most-frequent, top 10 by total time. Most teams find low-hanging optimisation in 30 minutes.
What's the most surprising performance fix you've shipped? And — for the Postgres folks — has pg_stat_statements + pg_hint_plan replaced your need for an APM tool?
Each database connection has memory overhead. 200 idle connections are not free.
- Connection pooling (pgBouncer for Postgres) reduces actual connections to the database
- Application connection pools sized to the realistic concurrent active queries, not the theoretical maximum
Specific hot spots to check[/HEADING>
- Auto-increment ID hot spots in distributed setups
- Triggers that fire on every row change in bulk operations
- Foreign keys without indexes on the child column
- TEXT columns in the row used in WHERE without trigram or GIN index
- Frequent boolean updates that constantly invalidate caches
Hardware as the last lever[/HEADING>
After software-level tuning has been done, hardware moves the needle:
- More RAM — keeps working set in cache
- Faster disks (NVMe) — reduces IO-wait
- More CPU — for query parallelism on Postgres 14+
In our experience, ~80 % of "the database is slow" issues resolve at the software layer. The other 20 % are genuinely hardware-bound or architectural.
One pattern we'd warn about[/HEADING>
Adding indexes blindly because "more indexes = faster queries". Each index is a write tax. Fewer well-chosen indexes beat many speculative ones.
One pattern that always pays off[/HEADING>
A weekly slow-query review. Top 10 slowest queries, top 10 most-frequent, top 10 by total time. Most teams find low-hanging optimisation in 30 minutes.
What's the most surprising performance fix you've shipped? And — for the Postgres folks — has pg_stat_statements + pg_hint_plan replaced your need for an APM tool?
After software-level tuning has been done, hardware moves the needle:
- More RAM — keeps working set in cache
- Faster disks (NVMe) — reduces IO-wait
- More CPU — for query parallelism on Postgres 14+
In our experience, ~80 % of "the database is slow" issues resolve at the software layer. The other 20 % are genuinely hardware-bound or architectural.
One pattern we'd warn about[/HEADING>
Adding indexes blindly because "more indexes = faster queries". Each index is a write tax. Fewer well-chosen indexes beat many speculative ones.
One pattern that always pays off[/HEADING>
A weekly slow-query review. Top 10 slowest queries, top 10 most-frequent, top 10 by total time. Most teams find low-hanging optimisation in 30 minutes.
What's the most surprising performance fix you've shipped? And — for the Postgres folks — has pg_stat_statements + pg_hint_plan replaced your need for an APM tool?
A weekly slow-query review. Top 10 slowest queries, top 10 most-frequent, top 10 by total time. Most teams find low-hanging optimisation in 30 minutes.
What's the most surprising performance fix you've shipped? And — for the Postgres folks — has pg_stat_statements + pg_hint_plan replaced your need for an APM tool?