9
Preprocessing
Feature Engineering
Theory
Feature engineering is the process of creating new features or transforming existing ones to improve model performance. It's often more impactful than choosing the right algorithm. Good features capture the essence of the problem and make patterns more apparent to the model.
Visualization

Mathematical Formulation
Common Techniques: • Scaling: Normalize features to similar ranges • Encoding: Convert categorical to numerical • Polynomial: Create interaction terms • Binning: Group continuous values • Domain-specific: Use expert knowledge
Code Example
import pandas as pd
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
# Numerical scaling
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Polynomial features
poly = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly.fit_transform(X)
# Date features
df['date'] = pd.to_datetime(df['date'])
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['day_of_week'] = df['date'].dt.dayofweek
df['is_weekend'] = df['day_of_week'].isin([5, 6]).astype(int)
# Binning
df['age_group'] = pd.cut(df['age'],
bins=[0, 30, 50, 100],
labels=['young', 'middle', 'senior'])
# Interaction features
df['interaction'] = df['feature1'] * df['feature2']
df['ratio'] = df['feature1'] / (df['feature2'] + 1e-10)
print("Feature engineering complete!")