Table of Contents
show
Visualization
- Visualization is important, as it allows one to see trends and patterns in the data
- Process of understanding how the variables in the dataset relate each other and their relationships are termed as statistical analysis
Python seaborn Functions
Visualizing Statistical Relationships
- Process of understanding relationships between variables of a dataset
Plotting with Categorical data
- Main variables is further divided into discrete groups
Visualizing the distribution of a dataset
- Understanding the datasets with context of being univariate or bivariate
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv("F:/Advanced Python/Module - 3/Dataset/iris.csv")
data
Sepal Length | Sepal Width | Petal Length | Petal Width | Class | |
0 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
1 | 4.9 | 3 | 1.4 | 0.2 | Iris-setosa |
2 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
3 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
4 | 5 | 3.6 | 1.4 | 0.2 | Iris-setosa |
… | … | … | … | … | … |
145 | 6.7 | 3 | 5.2 | 2.3 | Iris-virginica |
146 | 6.3 | 2.5 | 5 | 1.9 | Iris-virginica |
147 | 6.5 | 3 | 5.2 | 2 | Iris-virginica |
148 | 6.2 | 3.4 | 5.4 | 2.3 | Iris-virginica |
149 | 5.9 | 3 | 5.1 | 1.8 | Iris-virginica |
Distribution of Numerical Variable
distplot
- Histograms show the distribution of a single numerical variable
kdeplot
- Shows an estimated smooth distribution of a single numerical variable (or two numerical variables)
jointplot
- A jointplot comprises three plots. Out of the three, one plot displays a bivariate graph which shows how the dependent variable (Y) varies with the independent variable (X)
- Another plot is placed horizontally at the top of the bivariate graph and it shows the distribution of the independent variable (X)
- The third plot is placed on the right margin of the bivariate graph with the orientation set to vertical and it shows the distribution of the dependent variable (Y)
pairplot
distplot()
- A distplot plots a univariate distribution of observations
- It combines matplotlib hist function with the seaborn kdeplot() and rugplot() function
Parameter:
- a: Series, 1d-array or list (most essential parameter)
- Many more parameters are there
sns.distplot(data.loc[(data['Class']=='Iris-virginica'),'Sepal Length'])
<matplotlib.axes._subplots.AxesSubplot at 0x1eef901dc70>
kdeplot
- Kernel Density Estimate is used for visualizing the probability density of a continuous variable.
- It depicts the probability density at different values in a continuous variable
sns.kdeplot(data.loc[(data['Class']=='Iris-virginica'),'Sepal Length'],color = 'orange',shade = True, Label = 'Iris-virginica')
plt.xlabel('Sepal Length')
plt.ylabel('Probability Density')
Text(0, 0.5, 'Probability Density')
data.loc[(data[‘Class’]==’Iris-virginica’),’Sepal Length’] – Extracts the column Sepal Length for the class Iris-virginica
jointplot
sns.jointplot(x=data["Sepal Length"],y=data["Petal Length"])
<seaborn.axisgrid.JointGrid at 0x1eefdd248b0>
#For Better Understanding
import numpy as np
sales = pd.DataFrame({'Days':['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday'],
'Week1':[12,16,8,10,14,8,18],
'Week2':[10,8,14,9,8,20,22]}
)
sns.jointplot(x=sales['Week1'], y = sales['Week2'])
<seaborn.axisgrid.JointGrid at 0x1eefe0b6f40>
pairplot
- A pairplot plot a pairwise relationships in a dataset
- The pairplot function creates a grid of Axes such that each variable in data will be shared in the y axis across a single row and in the x-axis across a single column
sns.pairplot(data) #drawing pair plot for all numerical columns
sns.pairplot(data,vars=['Sepal Length','Sepal Width']) #drawing pair plot only for column in the list mentioned
Plotting categorical plots
data1 = pd.read_csv("F:/Advanced Python/Module - 3/Dataset/tips.csv")
data1
total_bill | tip | sex | smoker | day | time | size | |
0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
2 | 21.01 | 3.5 | Male | No | Sun | Dinner | 3 |
3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
… | … | … | … | … | … | … | … |
239 | 29.03 | 5.92 | Male | No | Sat | Dinner | 3 |
240 | 27.18 | 2 | Female | Yes | Sat | Dinner | 2 |
241 | 22.67 | 2 | Male | Yes | Sat | Dinner | 2 |
242 | 17.82 | 1.75 | Male | No | Sat | Dinner | 2 |
243 | 18.78 | 3 | Female | No | Thur | Dinner | 2 |
Categorical Scatterplots
- stripplot() ➡ (with kind = “strip”; the default)
- swarmplot() ➡ (with kind=“swarm”)
Categorical distribution plots
- boxplot() ➡ (with kind=“box”)
- violinplot() ➡ (with kind = “violin”)
- boxenplot() ➡ (with kind = “boxen”)
Categorical estimate plots
- pointplot() ➡ (with kind = “point”)
- barplot() ➡ (with kind = “bar”)
- countplot() ➡ (with kind = “count”)
stripplot()
- Plot between one categorical and one numerical variable
- Plot the points in strips that denote each category
sns.stripplot(x=data1['day'],y=data1['total_bill'])
For each day, bill amount is marked in y axis
sns.stripplot(x=data1['day'],y=data1['total_bill'],hue=data1['sex'])
Based on the third variable hue=data1[‘sex’]
swarmplot()
- Reduce too much overlapping caused by stripplot()
- swarmplot is otherwise termed to be bee swarm plot
sns.swarmplot(x=data1['day'],y=data1['total_bill'])
sns.swarmplot(x=data1['day'],y=data1['total_bill'],hue=data1['sex'])
boxplot()
Works in the same way as boxplot() in matplotlib
sns.boxplot(x=data1['day'],y=data1['tip'])
violinplot()
- Violin plots are used when there is a need to observe the distribution of numeric data
- Particularly useful when to make a comparison of distribution between multiple groupds
sns.violinplot(x=data1['day'],y=data1['tip'])
countplot()
Show value counts for a single categorical variable
sns.countplot(x=data1['sex'])
sns.countplot(x=data1['sex'])
sns.despine()
To count the instances based on sex and smoker i.e display for each sex how many smoker and non smoker are there
data1.sex.value_counts()
Male 157
Female 87
Name: sex, dtype: int64
sns.countplot(x=data1['sex'],hue=data1['smoker'])
barplot()
Used to draw a barplot. A barplot represents an estimate of central tendency for a numeric variable with the height of each rectangle and provides some indication of the uncertainty around that estimate using error bars
sns.barplot(x=data1['sex'], y=data1['tip'])
sns.barplot(x='sex', y='tip',data = data1,hue=data1['smoker'])
Views: 1