KitDocumentation

ColumnRemove

The ColumnRemove kit writes code that will remove one or more columns from a table / dataframe. The columns can be identified by a column name or column index. Additionally, ColumnRemove can be used to remove all non-numeric columns or remove columns from a starting column through the end of the dataframe.

Options

columns: Specifies columns to remove
nonNumeric: Specifies whether to remove non-numeric columns

Examples

Example 1 - Remove a Single Column

Remove a single column from the dataframe by specifying its column name. This is helpful when a column is no longer needed in the analysis or is being removed to reduce dimensionality.
#> ColumnRemove CustomerAge
AFLEFT bankTransactionsDf.drop(columns='CustomerAge', inplace=True) AFRIGHT

Example 2 - Remove Multiple Columns by Name

Remove more than one column at once by specifying the column names you want to drop. In this case, both TransactionID and TransactionDuration are removed from the dataframe.
#> ColumnRemove TransactionID TransactionDuration
AFLEFT bankTransactionsDf.drop(columns= ['TransactionID', 'TransactionDuration'] , inplace=True) AFRIGHT

Example 3 - Remove Columns by Name and Index

You can remove columns by mixing column names and their index positions. In this example, we remove TransactionType by name and the columns at index 2 and 3 to eliminate specific features from the dataframe.
#> ColumnRemove TransactionType 2 3
AFLEFT bankTransactionsDf.drop(columns= ['TransactionType', bankTransactionsDf.columns[2], bankTransactionsDf.columns[3]] , inplace=True) AFRIGHT

Example 4 - Remove All Non-Numeric Columns

If your analysis or model requires only numeric data, you can remove all columns that are not numeric. This operation filters the dataframe to retain only columns with numeric data types, such as integers or floats.
#> ColumnRemove --nonNumeric
AFLEFT 
bankTransactionsDf = bankTransactionsDf.select_dtypes(include=['number']) AFRIGHT