Histogram

Plots a histogram using the matplotlib library for the data provided

Options

x: Specifies the data column to use on the x-axis

y: Specifies the data column to use on the y-axis

bins: the number of bins, channels, buckets, etc. to be used by the histogram

3d: Flag that projects 2 dimensional groups onto a 3 dimensional plot

samePlot: A flag that forces multiple plots to be rendered on the same plot

sameWindow : A flag that forces multiple plots to be rendered on the same window

year: Specifies the year component of the dataset or time-related analysis. This flag allows you to filter or focus on data within a specific year for more granular insights

month : Denotes the month component of the dataset or time-related analysis. This flag helps you zoom into data for a particular month within a given year, offering a focused view of seasonal or monthly trends

day: Refers to the day component of the dataset or time-related analysis. This flag filters the data to represent specific days, providing a fine-grained level of detail for daily trends or activities.

Examples

Example 1 - Plot Histogram of Single Column

A histogram is useful to see how values are distributed across a column. In this example, we plot the histogram of the High column to observe the distribution of daily high prices in the Apple stock dataset.

#> Histogram --x High
AFLEFT  plt.hist(appleStockDf['High'], bins=10)

plt.title('High Histogram', fontsize=14, fontweight='bold')
plt.xlabel('High', fontsize=12, fontweight='bold', color='gray')
plt.ylabel('Counts', fontsize=12, fontweight='bold', color='gray')
plt.legend()
plt.grid(True, linestyle='--', linewidth=0.5)
plt.tick_params(axis='both', which='major', labelsize=10)  AFRIGHT

Example 2 - Plot Histogram of Computed Ratio Column

You can plot histograms of computed columns the same way as standard columns. Here we first compute the HighDivLow ratio by dividing the High column by the Low column, then plot a histogram to analyze how this ratio varies across the dataset.

#> Histogram --x HighDivLow
AFLEFT  plt.hist(appleStockDf['HighDivLow'], bins=10)

plt.title('HighDivLow Histogram', fontsize=14, fontweight='bold')
plt.xlabel('HighDivLow', fontsize=12, fontweight='bold', color='gray')
plt.ylabel('Counts', fontsize=12, fontweight='bold', color='gray')
plt.legend()
plt.grid(True, linestyle='--', linewidth=0.5)
plt.tick_params(axis='both', which='major', labelsize=10)  AFRIGHT

Example 3 - Plot Histogram of Dates

Although histograms are typically used for numeric data, you can also visualize how dates are distributed in a dataset. This example shows how the Apple stock data is spread across time by plotting a histogram of the Date column. Unsurprisingly, the data is roughly evenly distributed according to the dates as the stock was traded daily during the week.

#> Histogram --x Date
AFLEFT  appleStockDf['Date'] = pd.to_datetime(appleStockDf['Date'])

plt.hist(appleStockDf['Date'], bins=10)

plt.title('Date Histogram', fontsize=14, fontweight='bold')
plt.xlabel('Date', fontsize=12, fontweight='bold', color='gray')
plt.ylabel('Counts', fontsize=12, fontweight='bold', color='gray')
plt.legend()
plt.grid(True, linestyle='--', linewidth=0.5)
plt.tick_params(axis='both', which='major', labelsize=10)  AFRIGHT

Example 4 - Plot Histogram with Custom Bin Count

You can specify the number of bins in a histogram to increase granularity. Here, we use 50 bins to better understand the distribution of trading Volume in the Apple stock data, revealing more detail than the default setting.

#> Histogram --x Volume --bins 50
AFLEFT  plt.hist(appleStockDf['Volume'], bins=50)

plt.title('Volume Histogram', fontsize=14, fontweight='bold')
plt.xlabel('Volume', fontsize=12, fontweight='bold', color='gray')
plt.ylabel('Counts', fontsize=12, fontweight='bold', color='gray')
plt.legend()
plt.grid(True, linestyle='--', linewidth=0.5)
plt.tick_params(axis='both', which='major', labelsize=10)  AFRIGHT

Example 5 - Compare Distributions of Multiple Columns

When comparing the distributions of several related columns, you can overlay multiple histograms on the same plot. In this case, we compare Attack, Sp.Atk, Defense, and Sp.Def from the Pokemon dataset, using the same bin count to observe their relative distributions.

#> Histogram --x Attack Sp.Atk Defense Sp.Def --bins 20
AFLEFT  plt.hist(pokemonDf['Attack'], bins=20, label='Attack', alpha=0.4)

plt.hist(pokemonDf['Sp.Atk'], bins=20, label='Sp.Atk', alpha=0.4)

plt.hist(pokemonDf['Defense'], bins=20, label='Defense', alpha=0.4)

plt.hist(pokemonDf['Sp.Def'], bins=20, label='Sp.Def', alpha=0.4)

plt.title('Attack, Sp.Atk, Defense, and Sp.Def Histogram', fontsize=14, fontweight='bold')
plt.xlabel('Attack, Sp.Atk, Defense, and Sp.Def', fontsize=12, fontweight='bold', color='gray')
plt.ylabel('Counts', fontsize=12, fontweight='bold', color='gray')
plt.legend()
plt.grid(True, linestyle='--', linewidth=0.5)
plt.tick_params(axis='both', which='major', labelsize=10)  AFRIGHT