Table of Contents
- Visualization is important, as it allows one to see trends and patterns in the data
- Process of understanding how the variables in the dataset relate each other and their relationships are termed as statistical analysis
Python seaborn Functions
Visualizing Statistical Relationships
- Process of understanding relationships between variables of a dataset
Plotting with Categorical data
- Main variables is further divided into discrete groups
Visualizing the distribution of a dataset
- Understanding the datasets with context of being univariate or bivariate
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv("F:/Advanced Python/Module - 3/Dataset/iris.csv")
Sepal Length | Sepal Width | Petal Length | Petal Width | Class | |
0 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
1 | 4.9 | 3 | 1.4 | 0.2 | Iris-setosa |
2 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
3 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
4 | 5 | 3.6 | 1.4 | 0.2 | Iris-setosa |
… | … | … | … | … | … |
145 | 6.7 | 3 | 5.2 | 2.3 | Iris-virginica |
146 | 6.3 | 2.5 | 5 | 1.9 | Iris-virginica |
147 | 6.5 | 3 | 5.2 | 2 | Iris-virginica |
148 | 6.2 | 3.4 | 5.4 | 2.3 | Iris-virginica |
149 | 5.9 | 3 | 5.1 | 1.8 | Iris-virginica |
Distribution of Numerical Variable
- Histograms show the distribution of a single numerical variable
- Shows an estimated smooth distribution of a single numerical variable (or two numerical variables)
- A jointplot comprises three plots. Out of the three, one plot displays a bivariate graph which shows how the dependent variable (Y) varies with the independent variable (X)
- Another plot is placed horizontally at the top of the bivariate graph and it shows the distribution of the independent variable (X)
- The third plot is placed on the right margin of the bivariate graph with the orientation set to vertical and it shows the distribution of the dependent variable (Y)
- A distplot plots a univariate distribution of observations
- It combines matplotlib hist function with the seaborn kdeplot() and rugplot() function
- a: Series, 1d-array or list (most essential parameter)
- Many more parameters are there
sns.distplot(data.loc[(data['Class']=='Iris-virginica'),'Sepal Length'])
<matplotlib.axes._subplots.AxesSubplot at 0x1eef901dc70>

- Kernel Density Estimate is used for visualizing the probability density of a continuous variable.
- It depicts the probability density at different values in a continuous variable

sns.kdeplot(data.loc[(data['Class']=='Iris-virginica'),'Sepal Length'],color = 'orange',shade = True, Label = 'Iris-virginica')
plt.xlabel('Sepal Length')
plt.ylabel('Probability Density')
Text(0, 0.5, 'Probability Density')
data.loc[(data[‘Class’]==’Iris-virginica’),’Sepal Length’] – Extracts the column Sepal Length for the class Iris-virginica
sns.jointplot(x=data["Sepal Length"],y=data["Petal Length"])

<seaborn.axisgrid.JointGrid at 0x1eefdd248b0>

#For Better Understanding
import numpy as np
sales = pd.DataFrame({'Days':['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday'],
sns.jointplot(x=sales['Week1'], y = sales['Week2'])
<seaborn.axisgrid.JointGrid at 0x1eefe0b6f40>
- A pairplot plot a pairwise relationships in a dataset
- The pairplot function creates a grid of Axes such that each variable in data will be shared in the y axis across a single row and in the x-axis across a single column
sns.pairplot(data) #drawing pair plot for all numerical columns
sns.pairplot(data,vars=['Sepal Length','Sepal Width']) #drawing pair plot only for column in the list mentioned

Plotting categorical plots
data1 = pd.read_csv("F:/Advanced Python/Module - 3/Dataset/tips.csv")
total_bill | tip | sex | smoker | day | time | size | |
0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
2 | 21.01 | 3.5 | Male | No | Sun | Dinner | 3 |
3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
… | … | … | … | … | … | … | … |
239 | 29.03 | 5.92 | Male | No | Sat | Dinner | 3 |
240 | 27.18 | 2 | Female | Yes | Sat | Dinner | 2 |
241 | 22.67 | 2 | Male | Yes | Sat | Dinner | 2 |
242 | 17.82 | 1.75 | Male | No | Sat | Dinner | 2 |
243 | 18.78 | 3 | Female | No | Thur | Dinner | 2 |
Categorical Scatterplots
- stripplot() ➡ (with kind = “strip”; the default)
- swarmplot() ➡ (with kind=“swarm”)
Categorical distribution plots
- boxplot() ➡ (with kind=“box”)
- violinplot() ➡ (with kind = “violin”)
- boxenplot() ➡ (with kind = “boxen”)
Categorical estimate plots
- pointplot() ➡ (with kind = “point”)
- barplot() ➡ (with kind = “bar”)
- countplot() ➡ (with kind = “count”)
- Plot between one categorical and one numerical variable
- Plot the points in strips that denote each category
For each day, bill amount is marked in y axis

Based on the third variable hue=data1[‘sex’]
- Reduce too much overlapping caused by stripplot()
- swarmplot is otherwise termed to be bee swarm plot

Works in the same way as boxplot() in matplotlib


- Violin plots are used when there is a need to observe the distribution of numeric data
- Particularly useful when to make a comparison of distribution between multiple groupds

Show value counts for a single categorical variable


To count the instances based on sex and smoker i.e display for each sex how many smoker and non smoker are there
Male 157
Female 87
Name: sex, dtype: int64

Used to draw a barplot. A barplot represents an estimate of central tendency for a numeric variable with the height of each rectangle and provides some indication of the uncertainty around that estimate using error bars
sns.barplot(x=data1['sex'], y=data1['tip'])

sns.barplot(x='sex', y='tip',data = data1,hue=data1['smoker'])

Views: 1