RowCategorize
Creates a new column within a dataframe that gives a category to each row of the dataframe. This is useful when you have numerical data, or categorical data that is overclassified, and you want to assign each row a label. For example, if you have a column that goes from 1 to 100, you may want to classify the data as 1-10, 10-20, 20-30, etc.
Options
columns: Specifies column to categorize its rows, currently, only one column supported for RowCategorize
categories: Formatted list of breakpoints and category names on which to categorize a column
Examples
Example 1 - Categorize Rows Based on Maximum Thresholds
Use RowCategorize to assign category labels to rows based on the values in a specified column. In this case, we assign the labels cold, cool, warm, or hot based on the value of MaxTemp. Since no default category was given for values above 40, the MaxTempCategorized column for those values would be empty.
#> RowCategorize MaxTemp --categories cold < 10, cool < 20, warm < 30, hot < 40
AFLEFT
weatherDf['MaxTempCategorized'] = pd.cut(x=weatherDf['MaxTemp'], bins=[-sys.float_info.max,10,20,30,40], labels=['cold','cool','warm','hot'], include_lowest=True) AFRIGHT
Example 2 - Categorize Rows Based on Minimum Thresholds
Rather than using upper-bound conditions, you can define categories based on lower thresholds. In this example, we assign each row a category of newCold, newCool, newWarm, or newHot depending on whether MaxTemp exceeds each successive value. Any values below the first threshold fall into the first listed category.
#> RowCategorize MaxTemp --categories newCold, 10 < newCool, 20 < newWarm, 30 < newHot
AFLEFT
weatherDf['MaxTempCategorized'] = pd.cut(x=weatherDf['MaxTemp'], bins=[-sys.float_info.max,10,20,30,sys.float_info.max], labels=['newCold','newCool','newWarm','newHot'], include_lowest=True) AFRIGHT