1. Home
  2. Docs
  3. Advanced Python
  4. Data Visualization
  5. Data Visualization with Seaborn

Data Visualization with Seaborn

Visualization

  • Visualization is important, as it allows one to see trends and patterns in the data
  • Process of understanding how the variables in the dataset relate each other and their relationships are termed as statistical analysis

Python seaborn Functions

Visualizing Statistical Relationships

  • Process of understanding relationships between variables of a dataset

Plotting with Categorical data

  • Main variables is further divided into discrete groups

Visualizing the distribution of a dataset

  • Understanding the datasets with context of being univariate or bivariate
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv("F:/Advanced Python/Module - 3/Dataset/iris.csv")
data
 Sepal LengthSepal WidthPetal LengthPetal WidthClass
05.13.51.40.2Iris-setosa
14.931.40.2Iris-setosa
24.73.21.30.2Iris-setosa
34.63.11.50.2Iris-setosa
453.61.40.2Iris-setosa
1456.735.22.3Iris-virginica
1466.32.551.9Iris-virginica
1476.535.22Iris-virginica
1486.23.45.42.3Iris-virginica
1495.935.11.8Iris-virginica

Distribution of Numerical Variable

distplot

  • Histograms show the distribution of a single numerical variable

kdeplot

  • Shows an estimated smooth distribution of a single numerical variable (or two numerical variables)

jointplot

  • A jointplot comprises three plots. Out of the three, one plot displays a bivariate graph which shows how the dependent variable (Y) varies with the independent variable (X)
  • Another plot is placed horizontally at the top of the bivariate graph and it shows the distribution of the independent variable (X)
  • The third plot is placed on the right margin of the bivariate graph with the orientation set to vertical and it shows the distribution of the dependent variable (Y)

pairplot

distplot()

  • A distplot plots a univariate distribution of observations
  • It combines matplotlib hist function with the seaborn kdeplot() and rugplot() function
Parameter:
  • a: Series, 1d-array or list (most essential parameter)
  • Many more parameters are there
sns.distplot(data.loc[(data['Class']=='Iris-virginica'),'Sepal Length'])
<matplotlib.axes._subplots.AxesSubplot at 0x1eef901dc70>

kdeplot

  • Kernel Density Estimate is used for visualizing the probability density of a continuous variable.
  • It depicts the probability density at different values in a continuous variable
sns.kdeplot(data.loc[(data['Class']=='Iris-virginica'),'Sepal Length'],color = 'orange',shade = True, Label = 'Iris-virginica')
plt.xlabel('Sepal Length')
plt.ylabel('Probability Density')
Text(0, 0.5, 'Probability Density')

data.loc[(data[‘Class’]==’Iris-virginica’),’Sepal Length’] – Extracts the column Sepal Length for the class Iris-virginica

jointplot

sns.jointplot(x=data["Sepal Length"],y=data["Petal Length"])
<seaborn.axisgrid.JointGrid at 0x1eefdd248b0>
#For Better Understanding 
import numpy as np
sales = pd.DataFrame({'Days':['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday'],
                     'Week1':[12,16,8,10,14,8,18],
                     'Week2':[10,8,14,9,8,20,22]}
                    )
sns.jointplot(x=sales['Week1'], y = sales['Week2'])
<seaborn.axisgrid.JointGrid at 0x1eefe0b6f40>

pairplot

  • A pairplot plot a pairwise relationships in a dataset
  • The pairplot function creates a grid of Axes such that each variable in data will be shared in the y axis across a single row and in the x-axis across a single column
sns.pairplot(data) #drawing pair plot for all numerical columns
sns.pairplot(data,vars=['Sepal Length','Sepal Width']) #drawing pair plot only for column in the list mentioned

Plotting categorical plots

data1 = pd.read_csv("F:/Advanced Python/Module - 3/Dataset/tips.csv")
data1
 total_billtipsexsmokerdaytimesize
016.991.01FemaleNoSunDinner2
110.341.66MaleNoSunDinner3
221.013.5MaleNoSunDinner3
323.683.31MaleNoSunDinner2
424.593.61FemaleNoSunDinner4
23929.035.92MaleNoSatDinner3
24027.182FemaleYesSatDinner2
24122.672MaleYesSatDinner2
24217.821.75MaleNoSatDinner2
24318.783FemaleNoThurDinner2

Categorical Scatterplots

  • stripplot() ➡ (with kind = “strip”; the default)
  • swarmplot() ➡ (with kind=“swarm”)

Categorical distribution plots

  • boxplot() ➡ (with kind=“box”)
  • violinplot() ➡ (with kind = “violin”)
  • boxenplot() ➡ (with kind = “boxen”)

Categorical estimate plots

  • pointplot() ➡ (with kind = “point”)
  • barplot() ➡ (with kind = “bar”)
  • countplot() ➡ (with kind = “count”)

stripplot()

  • Plot between one categorical and one numerical variable
  • Plot the points in strips that denote each category
sns.stripplot(x=data1['day'],y=data1['total_bill'])

For each day, bill amount is marked in y axis

sns.stripplot(x=data1['day'],y=data1['total_bill'],hue=data1['sex'])

Based on the third variable hue=data1[‘sex’]

swarmplot()

  • Reduce too much overlapping caused by stripplot()
  • swarmplot is otherwise termed to be bee swarm plot
sns.swarmplot(x=data1['day'],y=data1['total_bill'])
sns.swarmplot(x=data1['day'],y=data1['total_bill'],hue=data1['sex'])

boxplot()

Works in the same way as boxplot() in matplotlib

sns.boxplot(x=data1['day'],y=data1['tip'])

violinplot()

  • Violin plots are used when there is a need to observe the distribution of numeric data
  • Particularly useful when to make a comparison of distribution between multiple groupds
sns.violinplot(x=data1['day'],y=data1['tip'])

countplot()

Show value counts for a single categorical variable

sns.countplot(x=data1['sex'])
sns.countplot(x=data1['sex'])
sns.despine()
Use of despine() removes this top and right

To count the instances based on sex and smoker i.e display for each sex how many smoker and non smoker are there

data1.sex.value_counts()
Male 157 
Female 87 
Name: sex, dtype: int64
sns.countplot(x=data1['sex'],hue=data1['smoker'])

barplot()

Used to draw a barplot. A barplot represents an estimate of central tendency for a numeric variable with the height of each rectangle and provides some indication of the uncertainty around that estimate using error bars

sns.barplot(x=data1['sex'], y=data1['tip'])
sns.barplot(x='sex', y='tip',data = data1,hue=data1['smoker'])

Views: 1

How can we help?

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments