Creating Nested Row Indexes in Power Query: SQL's ROW_NUMBER Equivalent

If you've worked with SQL, you're likely familiar with the powerful ROW_NUMBER() function, especially when used with PARTITION BY to create grouped sequential numbering. When transitioning to Power Query in Power BI, you might find yourself missing this functionality—but fear not! This article will guide you through implementing an equivalent solution in Power Query's M language.

Understanding the Challenge

In SQL, creating row numbers within groups is straightforward:

SELECT 
    Department,
    Employee,
    ROW_NUMBER() OVER(PARTITION BY Department ORDER BY Salary DESC) as RankInDepartment
FROM Employees

This elegant SQL statement assigns sequential numbers to each employee within their department. But how do we achieve this in Power Query, where window functions aren't available in the same form? And what about when we need multiple levels of nesting?

Basic Row Indexing in Power Query

Before diving into nested indexing, let's review the basics. Power Query offers the Table.AddIndexColumn function to add a simple sequential index:

// Add a simple index starting from 0
Table.AddIndexColumn(YourTable, "Index", 0)

This works well for flat data, but what about when we need indexing within groups? That's where the magic begins.

Creating Group-Based Indexing in Power Query

To create an index that restarts for each group (similar to SQL's ROW_NUMBER() OVER(PARTITION BY)), we'll use a combination of Table.Group and Table.AddIndexColumn.

Step 1: Group Your Data

First, we'll group the data by our partition column(s):

let
    Source = YourTable,
    GroupedRows = Table.Group(
        Source,
        {"Department"},  // Column(s) to group by
        {
            {"AllData", each _, type table}  // Keep all rows as a nested table
        }
    )
in
    GroupedRows

Step 2: Add Index to Each Group

Now, we'll add an index to each nested table:

let
    Source = YourTable,
    GroupedRows = Table.Group(
        Source,
        {"Department"},  // Column(s) to group by
        {
            {"AllData", each Table.AddIndexColumn(_, "RowIndex", 1, 1), type table}
        }
    )
in
    GroupedRows

In this code, we're adding an index column named "RowIndex" to each group, starting from 1 and incrementing by 1.

Step 3: Expand the Grouped Data

Finally, we'll expand the nested tables to get our original data with the new indexed column:

let
    Source = YourTable,
    GroupedRows = Table.Group(
        Source,
        {"Department"},  // Column(s) to group by
        {
            {"AllData", each Table.AddIndexColumn(_, "RowIndex", 1, 1), type table}
        }
    ),
    ExpandedTable = Table.ExpandTableColumn(
        GroupedRows, 
        "AllData", 
        Table.ColumnNames(GroupedRows[AllData]{0})
    )
in
    ExpandedTable

Implementing Nested Indexing

Now, let's tackle the main challenge: creating nested indexes at multiple levels. Imagine we have a dataset with Regions, Departments, and Employees, and we want to number employees within departments and departments within regions.

The Full Nested Indexing Solution

Here's a comprehensive solution for multi-level indexing:

let
    Source = YourTable,
    
    // Step 1: Group by the top level (Region)
    RegionGroups = Table.Group(
        Source,
        {"Region"},
        {
            {"RegionData", each _, type table}
        }
    ),
    
    // Step 2: Add Region Index
    RegionsWithIndex = Table.AddIndexColumn(RegionGroups, "RegionIndex", 1, 1),
    
    // Step 3: For each Region, group by Department
    DepartmentGroups = Table.TransformColumns(
        RegionsWithIndex, 
        {
            {"RegionData", each 
                let
                    // Group the region data by Department
                    DeptGroups = Table.Group(
                        _,
                        {"Department"},
                        {
                            {"DeptData", each _, type table}
                        }
                    ),
                    // Add Department Index (resets for each Region)
                    DeptWithIndex = Table.AddIndexColumn(DeptGroups, "DeptIndex", 1, 1)
                in
                    DeptWithIndex
            }
        }
    ),
    
    // Step 4: For each Department, add Employee Index
    EmployeeGroups = Table.TransformColumns(
        DepartmentGroups,
        {
            {"RegionData", each
                Table.TransformColumns(
                    _,
                    {
                        {"DeptData", each 
                            // Add Employee Index (resets for each Department)
                            Table.AddIndexColumn(_, "EmployeeIndex", 1, 1)
                        }
                    }
                )
            }
        }
    ),
    
    // Step 5: Expand all nested tables to get flat data with indexes
    ExpandRegions = Table.ExpandTableColumn(
        EmployeeGroups,
        "RegionData",
        {"Department", "DeptIndex", "DeptData"}
    ),
    ExpandDepartments = Table.ExpandTableColumn(
        ExpandRegions,
        "DeptData",
        {"Employee", "Salary", "EmployeeIndex"}  // Add all columns from your original table
    )
in
    ExpandDepartments

Understanding How It Works

This solution follows a pattern:

  1. Group data by the outermost level (Region)
  2. Add an index to these groups
  3. Transform each group by grouping it by the next level (Department)
  4. Add an index to these sub-groups
  5. Continue this pattern for all nesting levels
  6. Finally, expand all nested tables to get a flat table with all indices

The key insight is that we're creating a hierarchy of grouped tables, adding indexes at each level, and then flattening everything at the end.

Optimizing Your Nested Indexing Solution

Performance Considerations

While the solution above works well, it can become resource-intensive with large datasets. Here are some optimization tips:

  • Only include necessary columns at each grouping stage to reduce memory usage
  • Consider using Table.Buffer for intermediate tables if you notice performance issues
  • For very large datasets, consider pre-aggregating or filtering data before applying complex indexing

Adding Order By Functionality

In SQL's ROW_NUMBER(), you can specify an ORDER BY clause. We can implement similar functionality in our Power Query solution by sorting before grouping:

// Sort employees by salary within each department before indexing
DeptData = Table.Sort(_, {{"Salary", Order.Descending}}),
// Then add index
IndexedEmployees = Table.AddIndexColumn(DeptData, "EmployeeRank", 1, 1)

Common Pitfalls and Solutions

Losing Column Metadata

When working with multiple Table.Group and Table.ExpandTableColumn operations, you might lose column types and formatting. To preserve them:

// Store column types before transformations
ColumnTypes = Table.ToRecords(
    Table.TransformColumns(
        Table.SelectRows(Source, each false),
        List.Transform(
            Table.ColumnNames(Source), 
            each {_, each _ , type any}
        )
    )
),

// After your final expansion, restore column types
TypedResult = Table.TransformColumnTypes(
    ExpandedTable,
    List.Transform(
        ColumnTypes,
        each {[Column], Value.Type([Value])}
    )
)

Handling Missing Values in Grouping Columns

Null or missing values in grouping columns can cause unexpected results. A good practice is to replace nulls before grouping:

// Replace nulls in grouping columns
CleanedSource = Table.ReplaceValue(
    Source, 
    null, 
    "(Blank)", 
    Replacer.ReplaceValue, 
    {"Region", "Department"}
)

Practical Example: Sales Performance Analysis

Let's apply our nested indexing technique to a real-world scenario: analyzing sales performance across regions, product categories, and products.

let
    Source = #"Sales Data",
    
    // Clean data and prepare for grouping
    CleanedData = Table.ReplaceValue(Source, null, "(Blank)", Replacer.ReplaceValue, {"Region", "Category", "Product"}),
    
    // Sort by Sales Amount to rank by performance
    SortedData = Table.Sort(CleanedData, {{"SalesAmount", Order.Descending}}),
    
    // Group by Region
    RegionGroups = Table.Group(
        SortedData,
        {"Region"},
        {
            {"RegionData", each _, type table},
            {"TotalRegionSales", each List.Sum([SalesAmount]), type number}
        }
    ),
    
    // Add Region Rank (based on TotalRegionSales)
    RegionsWithRank = Table.Sort(RegionGroups, {{"TotalRegionSales", Order.Descending}}),
    RankedRegions = Table.AddIndexColumn(RegionsWithRank, "RegionRank", 1, 1),
    
    // For each Region, group by Category
    CategoryGroups = Table.TransformColumns(
        RankedRegions, 
        {
            {"RegionData", each 
                let
                    // Group by Category within Region
                    CatGroups = Table.Group(
                        _,
                        {"Category"},
                        {
                            {"CategoryData", each _, type table},
                            {"TotalCategorySales", each List.Sum([SalesAmount]), type number}
                        }
                    ),
                    // Sort and Rank Categories within Region
                    SortedCats = Table.Sort(CatGroups, {{"TotalCategorySales", Order.Descending}}),
                    RankedCats = Table.AddIndexColumn(SortedCats, "CategoryRankInRegion", 1, 1)
                in
                    RankedCats
            }
        }
    ),
    
    // For each Category, add Product ranks
    ProductGroups = Table.TransformColumns(
        CategoryGroups,
        {
            {"RegionData", each
                Table.TransformColumns(
                    _,
                    {
                        {"CategoryData", each 
                            let
                                // Create Product rankings within each Category
                                ProductData = Table.Sort(_, {{"SalesAmount", Order.Descending}}),
                                RankedProducts = Table.AddIndexColumn(ProductData, "ProductRankInCategory", 1, 1)
                            in
                                RankedProducts
                        }
                    }
                )
            }
        }
    ),
    
    // Expand all nested tables to get flat data with rankings
    ExpandRegions = Table.ExpandTableColumn(
        ProductGroups,
        "RegionData",
        {"Category", "CategoryRankInRegion", "TotalCategorySales", "CategoryData"}
    ),
    FinalResult = Table.ExpandTableColumn(
        ExpandRegions,
        "CategoryData",
        {"Product", "SalesAmount", "Date", "ProductRankInCategory"}  // Include all your original columns
    )
in
    FinalResult

This example not only adds ranking indexes but also calculates aggregates at each level, providing rich analytical capabilities.

Advanced Techniques

Creating Cumulative Indexes

Sometimes you might need cumulative indexing rather than restarting for each group. Here's how to implement that:

let
    Source = YourTable,
    
    // Add a helper column to track the row count per group
    WithHelper = Table.Group(
        Source,
        {"Department"},  
        {
            {"RowCount", each Table.RowCount(_), type number},
            {"Data", each _, type table}
        }
    ),
    
    // Sort by Department if needed
    Sorted = Table.Sort(WithHelper, {{"Department", Order.Ascending}}),
    
    // Add a cumulative sum of RowCount
    WithCumulative = List.Generate(
        () => [i = 0, row = Sorted{0}, acc = 0, result = []],
        each [i] < Table.RowCount(Sorted),
        each [
            i = [i] + 1, 
            row = Sorted{[i]}, 
            acc = [acc] + [row][RowCount],
            result = [row] & [PrevCumulativeCount = [acc] - [row][RowCount]]
        ],
        each [result]
    ),
    
    // Convert list back to table
    CumulativeTable = Table.FromRecords(WithCumulative),
    
    // Now add indices to each group's data with the appropriate offset
    WithIndices = Table.TransformColumns(
        CumulativeTable,
        {{"Data", (GroupData, RowInfo) => 
            let
                StartIndex = RowInfo[PrevCumulativeCount] + 1,
                WithIndex = Table.AddIndexColumn(
                    GroupData, 
                    "CumulativeIndex", 
                    StartIndex, 
                    1
                )
            in
                WithIndex
        }}
    ),
    
    // Expand the result
    ExpandedResult = Table.ExpandTableColumn(
        WithIndices,
        "Data",
        Table.ColumnNames(WithIndices[Data]{0})
    )
in
    ExpandedResult

Percentile Ranking

Instead of sequential numbers, you might want percentile rankings within groups. Here's how to implement this:

let
    Source = YourTable,
    
    // Group by Department
    Grouped = Table.Group(
        Source,
        {"Department"},
        {
            {"DeptData", each _, type table}
        }
    ),
    
    // For each Department, calculate percentile ranks
    WithPercentiles = Table.TransformColumns(
        Grouped,
        {{"DeptData", each 
            let
                // Sort by Salary
                Sorted = Table.Sort(_, {{"Salary", Order.Descending}}),
                // Add row index
                WithIndex = Table.AddIndexColumn(Sorted, "Index", 0, 1),
                // Get total count
                RowCount = Table.RowCount(WithIndex),
                // Calculate percentile
                WithPercentile = Table.AddColumn(
                    WithIndex, 
                    "PercentileRank", 
                    each [Index] / (RowCount - 1) * 100,
                    type number
                ),
                // Format as percentage
                Formatted = Table.TransformColumns(
                    WithPercentile, 
                    {{"PercentileRank", each Number.Round(_, 2), type number}}
                )
            in
                Formatted
        }}
    ),
    
    // Expand the results
    Result = Table.ExpandTableColumn(
        WithPercentiles,
        "DeptData",
        // Include all your columns plus the newly added ones
        {"Employee", "Salary", "Index", "PercentileRank"}
    )
in
    Result

Comparing with SQL

To see how our Power Query solution compares with SQL, here's a side-by-side comparison:

Task SQL Power Query
Simple Row Number
SELECT *, 
ROW_NUMBER() OVER(ORDER BY ID) as RowNum
FROM MyTable
Table.AddIndexColumn(
    Source, 
    "RowNum", 
    1, 
    1
)
Partitioned Row Number
SELECT *, 
ROW_NUMBER() OVER(
    PARTITION BY Department 
    ORDER BY Salary DESC
) as RankInDept
FROM Employees
// Group, add index, expand
let
    Grouped = Table.Group(
        Source,
        {"Department"},
        {{"Data", each 
            Table.AddIndexColumn(
                Table.Sort(_, {{"Salary", Order.Descending}}),
                "RankInDept", 
                1, 
                1
            ), 
        type table}}
    ),
    Result = Table.ExpandTableColumn(
        Grouped, 
        "Data", 
        Table.ColumnNames(Source) & {"RankInDept"}
    )
in
    Result

While the Power Query solution requires more code, it offers the flexibility to handle complex hierarchical indexing that would be more cumbersome even in SQL.

Conclusion

Creating nested row indexes in Power Query allows you to implement SQL-like ROW_NUMBER() functionality with the added benefit of handling multiple hierarchical levels. By mastering the technique of grouping, adding indices at each level, and then expanding the results, you can unlock powerful analytical capabilities in Power BI.

This approach may seem complex at first, but once you understand the pattern, you can adapt it to a wide range of business scenarios requiring hierarchical numbering, ranking, or sequential identifiers. The flexibility of Power Query's M language allows you to implement sophisticated data transformations that rival or even exceed what's possible in traditional SQL environments.

As with any complex transformation, remember to test your solution with small datasets first, and gradually scale up, monitoring performance along the way. Happy querying!