Introduction to SQL Indexing

Database performance can make or break an application. Even the most elegantly designed system will frustrate users if queries take too long to execute. Among the most powerful tools in a database administrator's arsenal for performance optimization is proper indexing.

In this comprehensive guide, we'll explore everything you need to know about SQL indexing to significantly boost your database performance—from basic concepts to advanced strategies, complete with practical examples and benchmarks.

What You'll Learn

This guide covers both fundamental and advanced indexing concepts, with practical examples you can apply immediately. You'll learn to identify when indexes are appropriate, which type to use, and how to measure their impact on performance.

Indexing Fundamentals

At its core, an index is a data structure that improves the speed of data retrieval operations on a database table. Think of it like the index at the back of a book—instead of reading through every page to find information, you can quickly look up specific topics.

How Indexes Work

Without an index, SQL must perform a full table scan, examining each row sequentially until all matching data is found. This becomes progressively more inefficient as tables grow larger.

An index creates a separate structure that contains:

  • The indexed column(s) values
  • Pointers to the actual rows in the table

This structure is typically stored in a B-tree format that allows for rapid lookups, like this simplified illustration:

B-tree index structure
Simplified B-tree index structure showing how data is organized for quick retrieval

Common Index Types

Different databases support various index types, but most offer these common options:

Index Type Description Best Use Cases
B-tree (Balanced tree) Standard index type that sorts values in a tree structure General-purpose indexing for equality and range queries
Clustered Determines physical order of data in table Primary keys, frequently accessed range queries
Non-clustered Separate structure with pointers to data Secondary access paths, covering indexes
Unique Enforces uniqueness of indexed columns Primary keys, unique constraints
Composite Index on multiple columns Queries filtering on multiple columns together
Full-text Specialized for text searching Content searching, document indexing

When to Create Indexes

While indexes dramatically improve query performance, they aren't free—they consume storage space and add overhead to write operations. Here's when you should consider creating an index:

Good Index Candidates

  • Columns frequently used in WHERE clauses
  • Columns used in JOIN conditions
  • Columns used in ORDER BY or GROUP BY operations
  • Columns with high cardinality (many unique values)

Poor Index Candidates

  • Small tables (full table scans may be faster)
  • Columns with low cardinality (few unique values)
  • Columns rarely used in queries
  • Tables with frequent INSERT/UPDATE/DELETE operations

Performance Trade-off

Remember that each index speeds up reads but slows down writes. For write-heavy applications, be particularly selective about which indexes to create.

Creating Effective Indexes

Let's explore how to create different types of indexes with practical SQL examples.

Basic Index Creation

Here's the syntax for creating a simple index:

-- Basic non-clustered index
CREATE INDEX idx_last_name
ON employees (last_name);

-- Unique index
CREATE UNIQUE INDEX idx_employee_id
ON employees (employee_id);

-- Specific index type (syntax varies by database system)
CREATE INDEX idx_description
ON products (description)
USING GIN; -- PostgreSQL-specific for full-text

Composite (Multi-Column) Indexes

When queries frequently filter on multiple columns together, a composite index can be more efficient than separate indexes:

-- Composite index for filtering by state and city
CREATE INDEX idx_location
ON customers (state, city);

-- This index helps queries like:
SELECT * FROM customers 
WHERE state = 'California' AND city = 'San Francisco';

Column Order Matters!

In a composite index, the order of columns is critical. The index above will help queries filtering on state alone or state+city, but not queries filtering only on city.

Covering Indexes

A covering index includes all columns referenced in a query, allowing the database to retrieve data directly from the index without accessing the table:

-- Covering index for a specific query pattern
CREATE INDEX idx_product_lookup
ON products (category_id, price, product_name);

-- This query can be resolved entirely from the index
SELECT product_name, price 
FROM products 
WHERE category_id = 5 
ORDER BY price;

Covering indexes can dramatically improve performance by eliminating table lookups.

Advanced Indexing Strategies

Filtered/Partial Indexes

In some database systems, you can create indexes on a subset of rows, which saves space and improves maintenance overhead:

-- SQL Server filtered index
CREATE INDEX idx_active_orders
ON orders (order_date, customer_id)
WHERE status = 'active';

-- PostgreSQL partial index
CREATE INDEX idx_high_value_orders
ON orders (order_date)
WHERE total_amount > 1000;

Included Columns (SQL Server)

In SQL Server, you can add non-key columns to an index to create covering indexes more efficiently:

-- Index with included columns
CREATE INDEX idx_order_lookup
ON orders (order_date)
INCLUDE (customer_id, total_amount);

-- This query can use the index exclusively
SELECT order_date, customer_id, total_amount 
FROM orders 
WHERE order_date > '2023-01-01';

Ensuring Indexes Are Used

Creating an index doesn't guarantee the query optimizer will use it. Here are some factors that might prevent index usage:

  • Functions applied to indexed columns
  • Implicit data type conversions
  • Complex expressions in WHERE clauses
  • Very small result sets where a table scan is faster

For example, this query cannot use an index on last_name effectively:

-- Index won't be used efficiently
SELECT * FROM employees
WHERE UPPER(last_name) = 'SMITH';

-- Better version that can use the index
SELECT * FROM employees
WHERE last_name = 'Smith';

Measuring Index Performance

To determine if your indexes are effective, you need to analyze query execution:

Reading Execution Plans

Most database systems provide tools to visualize query execution plans:

  • SQL Server: EXPLAIN or Execution Plan in Management Studio
  • MySQL: EXPLAIN
  • PostgreSQL: EXPLAIN ANALYZE
  • Oracle: EXPLAIN PLAN

Here's an example of using EXPLAIN in MySQL:

EXPLAIN SELECT * 
FROM orders 
WHERE customer_id = 12345;

Look for these key indicators in execution plans:

  • "Index Seek" or "Index Scan" operations (good)
  • "Table Scan" or "Full Table Scan" operations (potentially problematic for large tables)
  • The number of rows examined vs. returned
SQL execution plan example
Sample execution plan showing index usage

Benchmarking Index Impact

To quantify the impact of an index, you can:

  1. Measure query execution time before adding the index
  2. Create the index
  3. Measure execution time again
-- Before index - measure time
SET STATISTICS TIME ON;
SELECT * FROM large_table WHERE non_indexed_column = 'value';
SET STATISTICS TIME OFF;

-- Create index
CREATE INDEX idx_column ON large_table (non_indexed_column);

-- After index - measure time
SET STATISTICS TIME ON;
SELECT * FROM large_table WHERE non_indexed_column = 'value';
SET STATISTICS TIME OFF;

Real-world Testing

Always test index changes in a development environment before applying them to production. The impact can vary significantly depending on data distribution and server configuration.

Index Maintenance

Indexes require ongoing maintenance to remain effective:

Addressing Fragmentation

As data changes, indexes can become fragmented, reducing their efficiency. Regularly check fragmentation levels:

-- SQL Server fragmentation check
SELECT 
    object_name(ips.object_id) AS TableName,
    i.name AS IndexName,
    ips.index_type_desc,
    ips.avg_fragmentation_in_percent
FROM 
    sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, NULL) ips
JOIN 
    sys.indexes i ON ips.object_id = i.object_id AND ips.index_id = i.index_id
WHERE 
    ips.avg_fragmentation_in_percent > 30
ORDER BY 
    ips.avg_fragmentation_in_percent DESC;

Address fragmentation with reorganization or rebuilding:

-- For moderate fragmentation (5-30%)
ALTER INDEX idx_name ON table_name REORGANIZE;

-- For high fragmentation (>30%)
ALTER INDEX idx_name ON table_name REBUILD;

Identifying Unused Indexes

Unused indexes waste space and slow down write operations. Periodically identify and remove them:

-- SQL Server unused index query
SELECT 
    o.name AS TableName,
    i.name AS IndexName,
    i.type_desc AS IndexType,
    us.user_seeks,
    us.user_scans,
    us.user_lookups,
    us.user_updates
FROM 
    sys.indexes i
JOIN 
    sys.objects o ON i.object_id = o.object_id
LEFT JOIN 
    sys.dm_db_index_usage_stats us ON i.object_id = us.object_id AND i.index_id = us.index_id
WHERE 
    o.type = 'U' -- User tables only
    AND i.type_desc != 'HEAP'
    AND (us.user_seeks = 0 OR us.user_seeks IS NULL)
    AND (us.user_scans = 0 OR us.user_scans IS NULL)
    AND (us.user_lookups = 0 OR us.user_lookups IS NULL)
ORDER BY 
    o.name, i.name;

Real-world Case Studies

Case Study: Order Processing System

A retail company's order processing system was experiencing slowdowns during peak hours. The main orders table had grown to over 10 million rows. Here's how indexing solved the problem:

Problem Query

SELECT o.order_id, o.order_date, o.customer_id, c.customer_name, o.total_amount
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.status = 'processing'
AND o.order_date >= DATEADD(day, -7, GETDATE())
ORDER BY o.order_date DESC;

The query was taking over 15 seconds to execute. After analyzing the execution plan, we identified these issues:

  1. No index on the status column, forcing a full table scan
  2. Inefficient sorting operation for ORDER BY
  3. Suboptimal join performance

Solution implemented:

-- Created a filtered index for active orders
CREATE INDEX idx_processing_orders
ON orders (order_date DESC)
WHERE status = 'processing';

-- Added covering information
CREATE INDEX idx_customer_lookup
ON customers (customer_id)
INCLUDE (customer_name);

After implementing these indexes, the query execution time dropped to under 200ms—a 75x improvement!

Case Study: Reporting Database

A financial reporting system was experiencing timeout issues when generating monthly reports. The main transactions table contained 50+ million rows with data going back several years.

The solution involved a combination of strategies:

  1. Partitioning the table by date ranges
  2. Creating appropriate indexes on each partition
  3. Implementing a covering index for commonly run reports
-- Covering index for monthly summaries
CREATE INDEX idx_transaction_reporting
ON transactions (transaction_date, account_id, category_id)
INCLUDE (amount, description);

The result was a 95% reduction in report generation time, bringing the system back within acceptable performance parameters.

Common Indexing Mistakes

Avoid these common pitfalls in your indexing strategy:

Over-indexing

Creating too many indexes can be as problematic as having too few:

  • Every index increases storage requirements
  • Each write operation must update all affected indexes
  • The query optimizer has more work to determine the optimal plan

Aim for a minimal set of indexes that cover your most important queries.

Incorrect Column Order in Composite Indexes

In a composite index, always put the columns in this order:

  1. Equality columns first (WHERE column = value)
  2. Then inequality columns (WHERE column > value)
  3. Finally, columns used in ORDER BY
-- Good composite index order
CREATE INDEX idx_product_search
ON products (category_id, subcategory_id, price);

-- For queries like:
SELECT * FROM products
WHERE category_id = 5 AND subcategory_id = 10
ORDER BY price;

Not Analyzing Index Usage

Indexes should evolve with your application. Regularly review and adjust your indexing strategy based on:

  • Changing query patterns
  • Growing data volumes
  • New features or requirements

Conclusion

Effective SQL indexing is both an art and a science. It requires understanding your data, your query patterns, and the specific characteristics of your database system. By implementing the strategies outlined in this guide, you can dramatically improve database performance while maintaining a balanced approach that considers both read and write operations.

Remember these key principles:

  • Index the right columns based on query patterns
  • Understand the trade-offs between different index types
  • Measure performance before and after adding indexes
  • Maintain indexes regularly
  • Monitor and evolve your indexing strategy over time

With a thoughtful indexing strategy in place, your database will be prepared to handle growth and deliver consistent performance even as your application scales.