5.6. Exploratory Data Analysis with Matplotlib and Seaborn: Categorical Data Visualization

Exploratory data analysis (EDA) is a crucial step in the life cycle of Machine Learning and Deep Learning projects. It allows data scientists to better understand the patterns, relationships, and anomalies present in data. Data visualization is a powerful tool in EDA, and libraries like Matplotlib and Seaborn are essential for creating graphical representations that make data interpretation easier.

Categorical Data Visualization

Categorical data is variables that contain labels instead of numeric values. Visualizing this data is essential to understand the distribution and relationship between different categories. Matplotlib and Seaborn offer several options for visualizing categorical data effectively.

Bar Charts

The bar chart is one of the most common visualizations for categorical data. It displays the frequency or proportion of each category, making it easier to compare them. In Matplotlib, you can create a bar chart using the bar() function, while in Seaborn, the countplot() function is a handy way to create bar charts that show the count of observations in each category.


import matplotlib.pyplot as plt
import seaborn as sns

# Example categorical data
categories = ['Category A', 'Category B', 'Category C']
values ​​= [10, 20, 30]

# Bar chart with Matplotlib
plt.bar(categories, values)
plt.title('Bar Chart with Matplotlib')
plt.xlabel('Categories')
plt.ylabel('Values')
plt.show()

# Bar chart with Seaborn
sns.countplot(x='category', data=df)
plt.title('Bar Chart with Seaborn')
plt.xlabel('Categories')
plt.ylabel('Count')
plt.show()

Boxplots

Boxplots are excellent for visualizing the distribution of numerical data grouped by categories. They show the median, quartiles, and outliers, providing a quick understanding of data variability. In Matplotlib, you can use the boxplot() function, and in Seaborn, the boxplot() function is also available with additional features.


# Boxplot with Matplotlib
plt.boxplot([category_data_A, category_data_B, category_data_C])
plt.title('Boxplot with Matplotlib')
plt.xlabel('Categories')
plt.ylabel('Values')
plt.xticks([1, 2, 3], categories)
plt.show()

# Boxplot with Seaborn
sns.boxplot(x='category', y='value', data=df)
plt.title('Boxplot with Seaborn')
plt.xlabel('Categories')
plt.ylabel('Values')
plt.show()

Violin Plots

Violin plots combine features of boxplots and kernel density plots. They provide a richer view of the data distribution by showing the probability density at different values. Seaborn has a dedicated violinplot() function for creating these plots.


# Violin plot with Seaborn
sns.violinplot(x='category', y='value', data=df)
plt.title('Violin Plot with Seaborn')
plt.xlabel('Categories')
plt.ylabel('Values')
plt.show()

Swarm Plots

Swarm plots are an alternative to dot plots that avoid overlapping points, making it easier to visualize the distribution and amount of data in each category. In Seaborn, you can create a swarm plot with the swarmplot() function.


# Swarm plot with Seaborn
sns.swarmplot(x='category', y='value', data=df)
plt.title('Swarm Plot with Seaborn')
plt.xlabel('Categories')
plt.ylabel('Values')
plt.show()

Count Plots

Count plots are a form of bar chart that shows the count of observations in each category. In Seaborn, the countplot() function is used to create these plots quickly and intuitively.


# Count plot with Seaborn
sns.countplot(x='category', data=df)
plt.title('Count Plot with Seaborn')
plt.xlabel('Categories')
plt.ylabel('Count')
plt.show()

Graphics Customization and Styling

Both Matplotlib and Seaborn allow extensive customizations to the graphs. You can adjust colors, line styles, markers, and many other aspects to improve the presentation and readability of charts. Seaborn also offers theme styles that can be applied globally for a consistent, professional look.


# Customizing plots with Matplotlib
plt.bar(categories, values, color='skyblue')
plt.title('Custom Plot with Matplotlib')
plt.xlabel('Categories')
plt.ylabel('Values')
# Changing title font style and color
plt.title('Custom Graphic', fontsize=14, color='darkred')
plt.show()

# Applying theme stylesat Seaborn
sns.set_theme(style='whitegrid')
sns.countplot(x='category', data=df, palette='pastel')
plt.title('Seaborn Theme Styled Chart')
plt.xlabel('Categories')
plt.ylabel('Count')
plt.show()

Conclusion

Visualizing categorical data is an essential step in exploratory data analysis. Matplotlib and Seaborn are two powerful libraries that offer a wide range of options for creating informative and attractive plots. By using these tools, you can gain valuable insights into your data and effectively communicate your findings.

In summary, the ability to visualize and interpret categorical data is an important aspect of working with Python for Machine Learning and Deep Learning. Continued practice with these libraries and experimentation with different graph types will improve your EDA skills and help ensure that your analyzes are grounded in a solid understanding of the underlying data.

Now answer the exercise about the content:

Which of the following statements about visualizing categorical data with Matplotlib and Seaborn is correct?

You are right! Congratulations, now go to the next page

You missed! Try again.

Article image Exploratory Data Analysis with Matplotlib and Seaborn: Visualizing Continuous Data

Next page of the Free Ebook:

12Exploratory Data Analysis with Matplotlib and Seaborn: Visualizing Continuous Data

6 minutes

Obtenez votre certificat pour ce cours gratuitement ! en téléchargeant lapplication Cursa et en lisant lebook qui sy trouve. Disponible sur Google Play ou App Store !

Get it on Google Play Get it on App Store

+ 6.5 million
students

Free and Valid
Certificate with QR Code

48 thousand free
exercises

4.8/5 rating in
app stores

Free courses in
video, audio and text