1. Home
  2. Docs
  3. Advanced Python
  4. Introduction to Advanced ...
  5. Data Standardization

Data Standardization

Standardization (Z- Score Normalization)

A variable is made to follow standard normal distribution.

Aim: The features will be rescaled to ensure the mean and standard deviation to be 0 and 1, respectively

Formula

\[x_{new} = {x_{old} – \mu\over \sigma}\]

Where,

\[\mu\] \[\] \[\sigma\]

represents Mean

represents Standard Deviation

Why Data Standardization?

  • It is very necessary for the data to have the same scale in terms of the Feature to avoid bias in the outcome
  • Ex: A variable X has a range of 0 – 1000 and variable Y has a range of 0 – 10.
    • Variable X will outweigh variable Y due to it’s higher range

sklearn

  • Most useful library in python for machine learning

sklearn.preprocessing

  • Package provides several common utility functions and transformer classes to change raw feature vectors into a representation that is more suitable for processing

StandardScaler

  • Standardize features by removing the mean and scaling to unit variance
  • This operation is performed feature-wise in an independent way

fit_transform

  • Fit to data, then transform it
  • The fit method is calculating the mean and variance of each of the features present in our data
  • The transform method is transforming all the features using respective mean and variance
  • Once the mean and standard deviation of the Feature F at a time is found, it will transform the data points of the Feature F immediately
import pandas as pd
from sklearn import preprocessing
df_zs1 = pd.read_csv("F:/SRIHER/2021-2022/Quarter - 3/Advacned Python/Module - 1/Dataset/D9.csv")
#creating object
sca = preprocessing.StandardScaler()
#Invoking fit_transform() using object created for StandardScaler
df_zs1_after_standardization = sca.fit_transform(df_zs1)
print("After Data Standardization")
df_zs1_after_standardization

Views: 0

How can we help?

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments