KitDocumentation

DataframePivot

Pivoting a dataframe / table results in a new dataframe displaying various numerical summarizations for a specified column grouped by categories / values in one or more other columns. A pivoted table along a single column would result in a table without columns, just rows. And, a pivoted table along two or more columns would have rows and columns. The summarized metrics can be either sum, mean, median, min, max, count, standard deviation, variance, first, and last.

Options

rows: Specifies columns to use as rows in the pivot table
columns: Specifies columns to use as columns in the pivot table
cells: Specifies cells to aggregate
average: Calculates the average of the values in the pivot table
sum: Calculates the sum of the values in the pivot table
mean: Calculates the mean of the values in the pivot table
median: Calculates the median of the values in the pivot table
min: Calculates the minimum of the values in the pivot table
max: Calculates the maximum of the values in the pivot table
count: Counts occurrences of the values in the pivot table
std: Calculates the standard deviation of the values in the pivot table
var: Calculates the variance of the values in the pivot table
first: Gets the first value in each group of the pivot table
last: Gets the last value in each group of the pivot table

Examples

Example 1 - Sum Values by Row Categories

A pivot table can be used to aggregate values based on categorical groupings. In this example, we group all rows by the Location column and calculate the total sum of TransactionAmount for each location. This is useful for understanding total volume by location without adding column-level grouping.
#> DataframePivot --rows Location --cells TransactionAmount --sum
AFLEFT 
bankTransactionsDfPivot = bankTransactionsDf.pivot_table(index='Location', values='TransactionAmount', aggfunc='sum')
bankTransactionsDfPivot = bankTransactionsDfPivot.reset_index(drop=False) AFRIGHT

Example 2 - Get Max Value by Row and Column Grouping

Pivot tables can also be used with both row and column categories. In this case, we group by Location as the rows and TransactionType as the columns, and compute the maximum AccountBalance within each group. This helps to identify the largest account balances for each type of transaction in every location.
#> DataframePivot --rows Location --columns TransactionType --cells AccountBalance --max
AFLEFT 
bankTransactionsDfPivot = bankTransactionsDf.pivot_table(index='Location', columns='TransactionType', values='AccountBalance', aggfunc='max')
bankTransactionsDfPivot = bankTransactionsDfPivot.reset_index(drop=False) AFRIGHT

Example 3 - Count Occurrences by Multi-Column Groups

We can create pivot tables using multiple columns for both rows and columns. Here, we group by Location (row), and both Channel and TransactionType (columns), counting how many AccountID entries exist in each group. This gives insight into the number of transactions by channel and type at each location.
#> DataframePivot --rows Location --columns Channel TransactionType --cells AccountID --count
AFLEFT 
bankTransactionsDfPivot = bankTransactionsDf.pivot_table(index='Location', columns= [ 'Channel', 'TransactionType' ] , values='AccountID', aggfunc='count')
bankTransactionsDfPivot = bankTransactionsDfPivot.reset_index(drop=False) AFRIGHT

Example 4 - Multi-Level Pivot with Multiple Rows and Columns

Pivot tables can be constructed using several row and column dimensions at once. In this example, we group by CustomerAge and TransactionType (rows), and Channel, LoginAttempts, and CustomerOccupation (columns), then compute the total TransactionAmount in each group. This structure enables fine-grained analysis across behavioral and demographic dimensions.
#> DataframePivot --rows CustomerAge TransactionType --columns Channel LoginAttempts CustomerOccupation --cells TransactionAmount --sum
AFLEFT 
bankTransactionsDfPivot = bankTransactionsDf.pivot_table(index= [ 'CustomerAge', 'TransactionType' ] , columns= [ 'Channel', 'LoginAttempts', 'CustomerOccupation' ] , values='TransactionAmount', aggfunc='sum')
bankTransactionsDfPivot = bankTransactionsDfPivot.reset_index(drop=False) AFRIGHT