Table of Contents
show
Feature Selection
What is Feature Selection
Characterize problem with smallest set of features,
Feature Selection Methods
Adding Features
New Features derived from existing features
Removing Features
- Features that are very correlated
- Features with a lot of missing values
- Irrelevant features : ID, row number, etc.
Combining Features
Recoding Features
Examples
- Discretization: re-format continuous feature as discrete
- Customer’s age à {teenager, young, adult, senior}
Breaking Up Feature
Feature Selection Summary
- Goal: Select Smallest set of features that best captures data for application
- Domain Knowledge is important
- Also known as “Feature Engineering”
Feature Transformation
Feature transformation involves mapping a set of values for the feature to a new set of values to make the representation of the data more suitable or easier to process for the downstream analysis
1) Scaling
- Changing the range of values for a feature to another specified range
- Done to avoid allowing features with large values to dominate the analysis results
Scaling to a range
To perform scaling is to map all values of a feature to a specific range such as between 0 an 1
Zero – Normalization / Standardization
Transform the features such that the results have zero mean and unit standard deviation
2) Filtering
A low pass filter removes components above a certain frequency allowing the rest to pass through unaltered
Remove grainy appearance in images
Filter noise from audio signal
3) Aggregation
Combines values for a feature in order to summarize the data or to reduce variability
Feature Transformation Summary
- What: Map feature values to new set of values
- Why: Have data in format suitable for analysis
- Caveat: Take care not to filter out important characteristics of data
Views: 2