Skip to main content

Query Optimization

Learn to write efficient SQL by identifying bottlenecks, reading EXPLAIN plans, designing indexes, and rewriting slow queries. Practice with real-world optimization challenges.

easy

Missing Index on Filter Column

IndexWHERE optimizationB-tree
!

Missing index usage — full table scan on 500K rows

Slow Query

SELECT * FROM orders
WHERE customer_id = 12345
ORDER BY created_at DESC;

Before

2.3s (Seq Scan, 500K rows)

After

4ms (Index Scan, 38 rows)

easy

The SELECT * Anti-Pattern

SELECT optimizationI/O reductionCovering Index
!

Unnecessary columns fetched — table has 40 columns including large TEXT/BLOB fields

Slow Query

SELECT *
FROM products
WHERE category_id = 5
ORDER BY price DESC
LIMIT 20;

Before

180ms (reads 40 cols, 6.2MB)

After

12ms (reads 3 cols, 0.4MB)

medium

Correlated Subquery N+1 Problem

JOIN vs SubqueryN+1 patternGROUP BY
!

N+1 query pattern — correlated subquery executes once per row (100K customers = 100K subqueries)

Slow Query

SELECT c.name,
  (SELECT COUNT(*)
   FROM orders o
   WHERE o.customer_id = c.id) AS order_count
FROM customers c
WHERE c.status = 'active';

Before

12.8s (100K subqueries)

After

340ms (single Hash Join + Aggregate)

medium

Function on Indexed Column

WHERE optimizationIndexSargable predicates
!

Function wrapping indexed column — prevents index usage, forces full table scan

Slow Query

SELECT *
FROM transactions
WHERE EXTRACT(YEAR FROM created_at) = 2024
  AND EXTRACT(MONTH FROM created_at) = 3;

Before

3.1s (Seq Scan on 2M rows)

After

25ms (Index Range Scan)

medium

N+1 Query from Application Code

N+1 patternJOIN vs SubqueryRound-trip reduction
!

N+1 query pattern — 501 database round-trips instead of 1

Slow Query

-- Query 1: fetch orders
SELECT id, total FROM orders
WHERE customer_id = 42 LIMIT 500;

-- Then for EACH order (x500 round-trips):
-- SELECT * FROM order_items
--   WHERE order_id = ?;

Before

4.2s (501 queries, network overhead)

After

45ms (1 query, Hash Join)

hard

DISTINCT on Large JOIN

JOIN vs SubqueryEXISTS optimizationDeduplication
!

DISTINCT after JOIN builds massive result set then deduplicates — O(n*m) then sort

Slow Query

SELECT DISTINCT c.id, c.name, c.email
FROM customers c
JOIN orders o ON o.customer_id = c.id;

Before

2.8s (2.3M intermediate rows, then sort)

After

180ms (semi-join, early exit)

medium

OR Preventing Index Usage

WHERE optimizationUNION ALLIndex
!

OR between differently-indexed columns forces sequential scan or expensive bitmap merge

Slow Query

SELECT id, name, email
FROM users
WHERE email = 'test@example.com'
   OR phone = '+1234567890';

Before

890ms (Seq Scan on 5M rows)

After

2ms (two Index Scans)

hard

Pagination with OFFSET

PaginationCursor-basedOFFSET anti-pattern
!

OFFSET scans and discards 20,000 rows — gets slower on every page

Slow Query

SELECT id, title, created_at
FROM posts
ORDER BY created_at DESC
LIMIT 20 OFFSET 20000;

Before

8.2s on page 1000 (scans 20K rows)

After

3ms on any page (reads 20 rows)

hard

Covering Index for Aggregation

IndexCovering IndexIndex Only Scan
!

Reads entire wide table (20 columns, 2M rows) when only 2 columns are needed for the aggregation

Slow Query

SELECT status,
  COUNT(*) AS cnt,
  SUM(total_amount) AS revenue
FROM orders
GROUP BY status;

Before

1.4s (Seq Scan, reads 2M wide rows)

After

120ms (Index Only Scan, compact)

hard

Composite Index Column Order

IndexComposite IndexSort elimination
!

Two separate indexes force bitmap merge + sort instead of a single efficient scan

Slow Query

-- Two separate indexes exist:
-- idx_tickets_status (status)
-- idx_tickets_created (created_at)

SELECT id, title, status, created_at
FROM tickets
WHERE status = 'open'
ORDER BY created_at DESC
LIMIT 50;

Before

650ms (Bitmap Scan + Sort)

After

8ms (Index Scan Backward, pre-sorted)

easy

Implicit Type Casting Kills Index

WHERE optimizationType castingIndex
!

phone is VARCHAR but compared to an integer — PostgreSQL casts every row's phone to numeric, preventing index usage

Slow Query

SELECT * FROM users
WHERE phone = 5551234567;

Before

1.9s (Seq Scan, implicit cast per row)

After

0.3ms (Index Scan, direct match)

easy

LIKE with Leading Wildcard

LIKE optimizationIndexFunctional Index
!

Leading wildcard % prevents B-tree index usage — full table scan on every row

Slow Query

SELECT id, name, email
FROM customers
WHERE email LIKE '%@gmail.com';

Before

2.1s (Seq Scan 3M rows)

After

15ms (Index Scan)

easy

COUNT(*) on Entire Table

AggregationTable statisticsApproximate counts
!

PostgreSQL must do a full table scan for exact COUNT(*) — MVCC means no stored row count

Slow Query

SELECT COUNT(*)
FROM logs;
-- logs table has 50M rows

Before

28s (Seq Scan 50M rows)

After

0.1ms (catalog lookup)

medium

NOT IN with NULL Trap

EXISTS optimizationNULL handlingAnti-join
!

NOT IN returns no rows if the subquery contains any NULL — also prevents index usage and builds a full hash

Slow Query

SELECT id, name
FROM products
WHERE id NOT IN (
  SELECT product_id FROM order_items
);

Before

4.2s (Hash Anti Join, NULL risk)

After

180ms (Nested Loop Anti Join, index-backed)

easy

Unnecessary DISTINCT Hiding a JOIN Bug

EXISTS optimizationDeduplicationJOIN vs Subquery
!

DISTINCT masks a many-to-many explosion — 100K customers × 5 orders × 3 items = 1.5M rows sorted and deduplicated

Slow Query

SELECT DISTINCT c.id, c.name, c.email
FROM customers c
JOIN orders o ON o.customer_id = c.id
JOIN order_items oi ON oi.order_id = o.id;

Before

5.1s (1.5M rows → sort → unique)

After

120ms (semi-join, early exit)

medium

Partial Index for Hot Path

IndexPartial IndexSelective indexing
!

Full index on status includes 95% completed rows that are never queried on this path — bloated index, more I/O

Slow Query

-- 95% of orders are 'completed', only 2% are 'pending'
-- Full index exists: idx_orders_status (status)

SELECT * FROM orders
WHERE status = 'pending'
ORDER BY created_at;

Before

320ms (scans 2M-entry index)

After

5ms (scans 40K-entry partial index)

hard

Window Function Recomputing on Every Row

Window FunctionsN+1 patternPartition optimization
!

Correlated subquery recalculates the average for each row — same average recomputed 500 times per category

Slow Query

SELECT *,
  (SELECT AVG(price) FROM products p2
   WHERE p2.category_id = p.category_id) AS avg_category_price
FROM products p;

Before

3.8s (correlated subquery per row)

After

45ms (single WindowAgg pass)

hard

Expensive CTE Evaluated Multiple Times

CTE optimizationFILTER clauseSingle-pass aggregation
!

In PostgreSQL <12, CTEs are optimization fences — the aggregation runs 3 times. In 12+, the planner may still not inline complex CTEs

Slow Query

WITH order_stats AS (
  SELECT customer_id,
    COUNT(*) AS cnt,
    SUM(total_amount) AS revenue
  FROM orders
  GROUP BY customer_id
)
SELECT 'high_value' AS segment, COUNT(*) FROM order_stats WHERE revenue > 10000
UNION ALL
SELECT 'medium_value', COUNT(*) FROM order_stats WHERE revenue BETWEEN 1000 AND 10000
UNION ALL
SELECT 'low_value', COUNT(*) FROM order_stats WHERE revenue < 1000;

Before

2.4s (3x aggregation passes)

After

420ms (single pass + conditional counts)

medium

UPDATE Without Index on WHERE

UPDATE optimizationBatch processingLock contention
!

No index on (status, created_at) — UPDATE does a sequential scan locking rows as it goes, blocking other queries

Slow Query

UPDATE orders
SET status = 'archived'
WHERE created_at < '2022-01-01'
  AND status = 'completed';

Before

45s (Seq Scan + 800K row locks)

After

200ms per batch (Index Scan + 5K locks)

medium

Excessive JOIN Columns in GROUP BY

GROUP BYPrimary Key optimizationHash memory
!

Grouping by 7 columns creates a huge hash key — slow hashing and high memory usage

Slow Query

SELECT c.id, c.name, c.email, c.phone,
  c.address, c.city, c.state,
  COUNT(o.id) AS order_count
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.id, c.name, c.email, c.phone,
  c.address, c.city, c.state;

Before

1.8s (HashAggregate with 7-col key)

After

650ms (HashAggregate with 1-col key)

easy

Sorting Without Index Support

Sort eliminationIndexLIMIT optimization
!

No index on created_at — sorts entire 500K-row table in memory just to return 10 rows

Slow Query

SELECT id, name, created_at
FROM products
ORDER BY created_at DESC
LIMIT 10;

Before

420ms (Seq Scan + Sort 500K rows)

After

0.5ms (Index Scan, 10 rows)

hard

Materialized View for Dashboard Queries

Materialized ViewCachingDashboard optimization
!

Aggregates 5M rows on every request — 50 identical expensive queries per second

Slow Query

-- Dashboard query runs every page load (50 req/sec)
SELECT DATE_TRUNC('day', created_at) AS day,
  COUNT(*) AS orders,
  SUM(total_amount) AS revenue,
  AVG(total_amount) AS avg_order
FROM orders
WHERE created_at > NOW() - INTERVAL '90 days'
GROUP BY 1
ORDER BY 1;

Before

3.2s per request × 50 rps

After

2ms per request (pre-computed)

easy

HAVING vs WHERE Placement

WHERE optimizationHAVING vs WHEREFilter pushdown
!

HAVING filters after aggregation — groups ALL categories first, then discards most results

Slow Query

SELECT category_id, COUNT(*) AS cnt
FROM products
GROUP BY category_id
HAVING category_id IN (1, 2, 3);

Before

180ms (groups all 200 categories)

After

12ms (groups only 3 categories)

medium

Redundant Subquery in FROM

Subquery simplificationDead column eliminationHAVING
!

Subquery computes MAX(created_at) for every customer but it's never used — wasted CPU and memory

Slow Query

SELECT sub.customer_id, sub.total_orders
FROM (
  SELECT customer_id, COUNT(*) AS total_orders,
    MAX(created_at) AS last_order
  FROM orders
  GROUP BY customer_id
) sub
WHERE sub.total_orders > 5;

Before

680ms (computes unused MAX per group)

After

410ms (aggregate only what's needed)

hard

Multi-Column IN vs EXISTS

EXISTS optimizationMulti-column lookupSemi-join
!

Multi-column IN builds a full hash set of all alert pairs — can't leverage indexes efficiently on composite check

Slow Query

SELECT * FROM inventory
WHERE (warehouse_id, product_id) IN (
  SELECT warehouse_id, product_id
  FROM stock_alerts
  WHERE alert_type = 'low_stock'
);

Before

1.6s (Hash Semi Join, full materialization)

After

85ms (Nested Loop Semi Join, index-backed)

medium

String Concatenation in WHERE

Sargable predicatesWHERE optimizationExpression index
!

Concatenation expression on every row prevents any index usage — full table scan

Slow Query

SELECT * FROM customers
WHERE first_name || ' ' || last_name = 'John Smith';

Before

1.2s (Seq Scan, concat per row)

After

0.4ms (Index Scan on composite index)

easy

LEFT JOIN When INNER JOIN Suffices

JOIN optimizationLEFT vs INNERQuery semantics
!

LEFT JOIN preserves NULLs from customers, but WHERE c.status = 'active' filters NULLs out anyway — contradictory logic wastes planner effort

Slow Query

SELECT o.id, o.total_amount, c.name
FROM orders o
LEFT JOIN customers c ON c.id = o.customer_id
WHERE c.status = 'active';

Before

520ms (Left Join + Filter NULLs)

After

380ms (Hash Join, smaller result set)

hard

DELETE Without Batching

DELETE optimizationBatch processingTable partitioning
!

Single massive DELETE holds locks for minutes, bloats WAL, and can cause OOM with undo log

Slow Query

DELETE FROM audit_logs
WHERE created_at < '2023-01-01';
-- Deletes 30M rows in a single transaction

Before

8min (30M rows, WAL bloat, lock contention)

After

0.5s per 10K batch or instant DROP

easy

UNION vs UNION ALL

UNION ALLDeduplicationSort elimination
!

UNION deduplicates by sorting/hashing the entire result set — expensive when duplicates are impossible or acceptable

Slow Query

SELECT name FROM customers_us
UNION
SELECT name FROM customers_eu;

Before

1.2s (Sort + Unique on 2M rows)

After

180ms (Append, no sort)

medium

CASE Inside Aggregate vs FILTER

FILTER clauseConditional aggregationPostgreSQL-specific
!

Three CASE expressions evaluated per row — verbose and harder for the planner to optimize

Slow Query

SELECT
  COUNT(CASE WHEN status = 'active' THEN 1 END) AS active,
  COUNT(CASE WHEN status = 'inactive' THEN 1 END) AS inactive,
  COUNT(CASE WHEN status = 'suspended' THEN 1 END) AS suspended
FROM users;

Before

340ms (3 CASE evals per row)

After

290ms (FILTER, same speed but cleaner + optimizable)

hard

Index-Only Scan Blocked by Visibility

VACUUMVisibility mapIndex Only ScanAutovacuum
!

Index Only Scan is available but falls back to regular Index Scan because the visibility map is stale (too many dead tuples)

Slow Query

-- Table has heavy UPDATE/DELETE activity
-- Index exists on (status, total_amount)

VACUUM; -- hasn't run in weeks

SELECT status, SUM(total_amount)
FROM orders
GROUP BY status;

Before

2.1s (Index Scan + heap fetch per row)

After

380ms (Index Only Scan, no heap)

easy

Cartesian Product from Missing JOIN Condition

Cartesian productJOIN optimizationQuery correctness
!

No join condition between customers and products — creates a Cartesian product (1K customers × 10K products = 10M rows)

Slow Query

SELECT c.name, p.name AS product
FROM customers c, products p
WHERE c.city = 'New York';

Before

45s (10M row cross product)

After

35ms (proper joins, ~2K rows)

medium

Expression Index for Computed Filter

Sargable predicatesExpression indexDate optimization
!

Wrapping created_at in DATE() prevents index usage — evaluates function on every row

Slow Query

SELECT * FROM events
WHERE DATE(created_at) = '2024-03-15';

Before

1.8s (Seq Scan, DATE() per row)

After

3ms (Index Range Scan)

hard

Aggregation on Joined Table Before JOIN

Pre-aggregationJOIN optimizationIntermediate result reduction
!

Three-way JOIN creates a massive intermediate result before aggregating — 100K customers × 5 orders × 3 items = 1.5M rows grouped

Slow Query

SELECT c.name,
  COUNT(oi.id) AS total_items,
  SUM(oi.quantity * oi.price) AS total_spent
FROM customers c
JOIN orders o ON o.customer_id = c.id
JOIN order_items oi ON oi.order_id = o.id
GROUP BY c.id, c.name;

Before

4.5s (1.5M row intermediate)

After

800ms (100K row intermediate)

hard

LATERAL JOIN for Top-N Per Group

LATERAL JOINTop-N per groupCorrelated subquery
!

Correlated subquery in WHERE with LIMIT — hard for planner to optimize, often falls back to nested loop without index

Slow Query

-- Get latest 3 orders per customer
SELECT c.id, c.name, o.*
FROM customers c
JOIN orders o ON o.customer_id = c.id
WHERE o.id IN (
  SELECT o2.id FROM orders o2
  WHERE o2.customer_id = c.id
  ORDER BY o2.created_at DESC
  LIMIT 3
);

Before

8.5s (nested correlated subquery)

After

350ms (LATERAL + index scan per customer)

medium

GIN Index for JSONB Queries

GIN IndexJSONBNoSQL in SQL
!

No index on JSONB fields — sequential scan parses JSON on every row

Slow Query

SELECT * FROM events
WHERE payload->>'event_type' = 'purchase'
  AND (payload->>'amount')::numeric > 100;

Before

3.5s (Seq Scan + JSON parse per row)

After

25ms (GIN Index Scan)

medium

Excessive Indexes Slowing Writes

Index maintenanceWrite optimizationIndex bloat
!

Every INSERT/UPDATE must maintain 12 indexes — write amplification makes inserts 10x slower than necessary

Slow Query

-- Table has 12 indexes including:
-- idx_orders_1 (customer_id)
-- idx_orders_2 (created_at)
-- idx_orders_3 (status)
-- idx_orders_4 (customer_id, created_at)  -- supersedes idx_1
-- idx_orders_5 (status, created_at)
-- ... 7 more rarely-used indexes

INSERT INTO orders (customer_id, total_amount, status)
VALUES (42, 99.99, 'pending');

Before

8ms per INSERT (12 index updates)

After

2ms per INSERT (5 index updates)

hard

Row-Level Security Overhead

Row-Level SecurityHidden filtersComposite Index
!

RLS adds a hidden filter evaluated per row — without index on customer_id, it triggers a Seq Scan even if status is indexed

Slow Query

-- RLS policy on orders:
-- USING (customer_id = current_setting('app.customer_id')::int)

SELECT SUM(total_amount)
FROM orders
WHERE status = 'completed';

Before

1.9s (Seq Scan, RLS filter per row)

After

8ms (Index Only Scan covers both filters)

hard

Recursive CTE for Hierarchy vs Closure Table

Recursive CTEClosure tableHierarchy optimization
!

Recursive CTE walks the tree one level at a time — each iteration does an Index Scan. Deep trees (20+ levels) mean 20+ passes

Slow Query

-- Get all descendants of category id=5
WITH RECURSIVE tree AS (
  SELECT id, name, parent_id, 0 AS depth
  FROM categories WHERE id = 5
  UNION ALL
  SELECT c.id, c.name, c.parent_id, t.depth + 1
  FROM categories c
  JOIN tree t ON c.parent_id = t.id
)
SELECT * FROM tree;

Before

450ms (20 recursive iterations)

After

2ms (single index scan on closure table)

medium

Connection Overhead from Short-Lived Queries

Connection poolingPgBouncerLatency reduction
!

Connection setup (TCP handshake + SSL + auth) takes 10x longer than the actual query

Slow Query

-- Application opens new connection per request:
-- connect() → query → disconnect() (x1000/sec)

SELECT balance FROM accounts WHERE id = 42;
-- 2ms query, but 15ms connect + 5ms disconnect = 22ms total

Before

22ms per query (15ms connect overhead)

After

2ms per query (pooled connection)

hard

Bloom Index for Multi-Column Equality

Bloom IndexMulti-column filterIndex strategy
!

No single B-tree index can serve arbitrary column combinations — creating all permutations is impractical

Slow Query

-- Users search by any combination of columns:
-- WHERE color = 'red' AND size = 'L'
-- WHERE material = 'cotton' AND brand = 'Nike'
-- 6 columns, would need 64 composite indexes

SELECT * FROM products
WHERE color = 'red' AND size = 'L' AND material = 'cotton';

Before

1.4s (Seq Scan, no usable index)

After

45ms (Bloom Index Scan)

medium

EXPLAIN ANALYZE on Write Query in Production

EXPLAIN safetyProduction debuggingTransaction rollback
!

EXPLAIN ANALYZE actually EXECUTES the query — this UPDATE modifies 50K rows in production while profiling

Slow Query

-- Debugging a slow UPDATE in production:
EXPLAIN ANALYZE
UPDATE orders SET status = 'processed'
WHERE id BETWEEN 1 AND 50000;

Before

Modifies 50K rows while profiling!

After

Zero data modification, same plan info

hard

Nested Loop on Large Tables

Join strategywork_memHash JoinPlanner hints
!

Nested Loop does an index lookup per order row — fine for 100 rows, but 200K qualifying rows means 200K index lookups

Slow Query

SELECT o.id, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.total_amount > 500;

-- Planner chooses Nested Loop:
-- Seq Scan on orders (filter: total > 500)
--   → Index Scan on customers (per row)

Before

4.2s (200K Nested Loop index lookups)

After

380ms (single Hash Join pass)

easy

Unnecessary ORDER BY in Subquery

Sort eliminationSubquery simplificationQuery rewrite
!

ORDER BY in the subquery is pointless — the outer query re-sorts by a different column anyway

Slow Query

SELECT category_id, avg_price
FROM (
  SELECT category_id, AVG(price) AS avg_price
  FROM products
  GROUP BY category_id
  ORDER BY avg_price DESC
) sub
WHERE avg_price > 100
ORDER BY category_id;

Before

210ms (double sort: inner + outer)

After

130ms (single sort)

hard

SELECT FOR UPDATE Lock Scope

LockingSKIP LOCKEDConcurrent processingQueue pattern
!

FOR UPDATE locks every row in the result set — blocks other workers from processing any pending order

Slow Query

-- Processing pending orders one at a time
SELECT * FROM orders
WHERE status = 'pending'
ORDER BY created_at
FOR UPDATE;
-- Locks ALL pending orders while processing one

Before

Sequential processing (1 worker)

After

Parallel processing (N workers, no contention)

hard

GiST Index for Range Queries

GiST IndexRange typesOverlap queries
!

Two-sided range comparison can't be served by a single B-tree index — at best uses one side, then filters

Slow Query

-- Find all events overlapping a time range
SELECT * FROM events
WHERE start_time <= '2024-03-15 18:00'
  AND end_time >= '2024-03-15 09:00';

Before

1.8s (Seq Scan, two-sided filter)

After

12ms (GiST Index Scan, native overlap)

medium

Unnecessary COALESCE Preventing Index

Sargable predicatesCOALESCEExpression index
!

COALESCE wraps indexed columns in an expression — no index can match, full table scan required

Slow Query

SELECT * FROM products
WHERE COALESCE(discount_price, price) < 50;

Before

680ms (Seq Scan, COALESCE per row)

After

35ms (Bitmap OR of two Index Scans)

easy

Unbounded SELECT Without LIMIT

LIMITAPI safetyMemory management
!

No LIMIT clause — returns entire result set, consuming memory on both database and application side

Slow Query

-- API endpoint: GET /api/orders?status=pending
SELECT * FROM orders
WHERE status = 'pending';
-- Returns 150K rows to the application

Before

3.2s (150K rows transferred, 45MB)

After

8ms (50 rows, 0.015MB)

hard

Table Partitioning for Time-Series Data

Table partitioningPartition pruningTime-series
!

Single monolithic table — index scans still touch a huge B-tree, VACUUM takes hours, index bloat compounds

Slow Query

-- 500M rows in a single table, growing daily
SELECT COUNT(*), SUM(amount)
FROM transactions
WHERE created_at BETWEEN '2024-03-01' AND '2024-03-31';

Before

28s (Index Scan on 500M-row B-tree)

After

1.8s (Seq Scan on 15M-row partition)

hard

Parallel Query Underutilization

Parallel queryMulti-coreConfiguration tuning
!

Parallel query disabled or under-configured — large aggregation runs on a single CPU core while 15 cores idle

Slow Query

-- Server has 16 cores but query uses only 1
SET max_parallel_workers_per_gather = 0;

SELECT COUNT(*), AVG(total_amount)
FROM orders
WHERE created_at > '2024-01-01';

Before

4.8s (single-core Seq Scan)

After

1.2s (4 parallel workers)