RowRemove
Removes rows from the dataframe according to one of the following:
- Remove rows at specified indexes
- Remove rows that start from an index and go to a second index
- Remove rows that meet a specified criteria
- Removes rows that have missing data
- Removes rows that do not have missing data
- Remove rows at specified indexes
- Remove rows that start from an index and go to a second index
- Remove rows that meet a specified criteria
- Removes rows that have missing data
- Removes rows that do not have missing data
Options
columns: Specifies columns to remove rows, use with either --missing or --notMissing
where: Specifies a condition for removing rows
index: Specifies the index for removing rows
indexStart: Specifies the start index for removing rows
indexStop: Specifies the stop index for removing rows
missing: Specifies whether to remove rows where columns are missing
notMissing: Specifies whether to remove rows where columns are not missing
Examples
Example 1 - Remove Rows with Empty Cells
Data is often not clean and there might be rows within a dataframe that are missing values from one or more columns. To simply remove all rows with missing cells, use the RowRemove kit with the missing option.
#> RowRemove --missing
AFLEFT
pizzeriasDf = pizzeriasDf.dropna().reset_index(drop=True) AFRIGHT
Example 2 - Remove Rows with Missing Cells in Specified Columns
Rather than removing rows with any missing cells, sometimes we only want to clean up a dataframe that is missing values in specific columns. In this example, we only remove rows that have missing values in the Rating or Established Year columns.
#> RowRemove --missing --columns Rating Established Year
AFLEFT
pizzeriasDf = pizzeriasDf.dropna(subset = ['Rating', 'Established Year']).reset_index(drop=True) AFRIGHT
Example 3 - Remove All Rows Starting At an Index
Rather than remove a single row from the dataframe, sometimes you want to remove many, many more rows. Using the indexStart option, without the indexStop option, will remove the row at the specified index and all rows after it. In this example, the row at index 3 through the end of the dataframe are removed.
#> RowRemove --indexStart 3
AFLEFT
pizzeriasDf = pizzeriasDf.drop(pizzeriasDf.index[3:]) AFRIGHT
Example 4 - Remove All Rows Up To an Index
The previous example removed rows at and after an index. This time, we remove all rows starting at the first row up to, and including, the specified row index. In this example, we rows from the start of the dataframe up to and including the row at index 50. This is done when you use the indexStop option without the indexStart option.
#> RowRemove --indexStop 50
AFLEFT
pizzeriasDf = pizzeriasDf.drop(pizzeriasDf.index[:50+1]).reset_index(drop=True) AFRIGHT
Example 5 - Remove All Rows Starting at an Index to a Second Index
Combining the previous two examples, this time we use both the indexStart and indexStop options. This will remove only rows from the row at the indexStart index to the indexStop index. It will be inclusive of both indexes. Therefore, this example will remove rows from and including the row at index 5 to the row at index 10.
#> RowRemove --indexStart 5 --indexStop 10
AFLEFT
pizzeriasDf = pizzeriasDf.drop(pizzeriasDf.index[5:10+1]).reset_index(drop=True) AFRIGHT
Example 6 - Remove Rows That Match a Criteria
Rather than specifying certain rows indexes, or rows with missing data, other times you want to remove rows that fit into a specified criteria. In this example, we remove all rows where the Pizzeria Name contains the word Antonio.
#> RowRemove --where Pizzeria Name contains Antonio
AFLEFT
pizzeriasDf = pizzeriasDf[~(pizzeriasDf['Pizzeria Name'].astype('str').str.contains('Antonio').fillna(False))].reset_index(drop=True) AFRIGHT