ColumnRemove
The ColumnRemove kit writes code that will remove one or more columns from a table / dataframe. The columns can be identified by a column name or column index. Additionally, ColumnRemove can be used to remove all non-numeric columns or remove columns from a starting column through the end of the dataframe.
Options
columns: Specifies columns to remove
nonNumeric: Specifies whether to remove non-numeric columns
Examples
Example 1 - Remove a Single Column
Remove a single column from the dataframe by specifying its column name. This is helpful when a column is no longer needed in the analysis or is being removed to reduce dimensionality.
#> ColumnRemove CustomerAge
AFLEFT bankTransactionsDf.drop(columns='CustomerAge', inplace=True) AFRIGHT
Example 2 - Remove Multiple Columns by Name
Remove more than one column at once by specifying the column names you want to drop. In this case, both TransactionID and TransactionDuration are removed from the dataframe.
#> ColumnRemove TransactionID TransactionDuration
AFLEFT bankTransactionsDf.drop(columns= ['TransactionID', 'TransactionDuration'] , inplace=True) AFRIGHT
Example 3 - Remove Columns by Name and Index
You can remove columns by mixing column names and their index positions. In this example, we remove TransactionType by name and the columns at index 2 and 3 to eliminate specific features from the dataframe.
#> ColumnRemove TransactionType 2 3
AFLEFT bankTransactionsDf.drop(columns= ['TransactionType', bankTransactionsDf.columns[2], bankTransactionsDf.columns[3]] , inplace=True) AFRIGHT
Example 4 - Remove All Non-Numeric Columns
If your analysis or model requires only numeric data, you can remove all columns that are not numeric. This operation filters the dataframe to retain only columns with numeric data types, such as integers or floats.
#> ColumnRemove --nonNumeric
AFLEFT
bankTransactionsDf = bankTransactionsDf.select_dtypes(include=['number']) AFRIGHT