9

Preprocessing

Feature Engineering

Theory

Feature engineering is the process of creating new features or transforming existing ones to improve model performance. It's often more impactful than choosing the right algorithm. Good features capture the essence of the problem and make patterns more apparent to the model.

Visualization

Feature Engineering visualization

Mathematical Formulation

Common Techniques:
• Scaling: Normalize features to similar ranges
• Encoding: Convert categorical to numerical
• Polynomial: Create interaction terms
• Binning: Group continuous values
• Domain-specific: Use expert knowledge

Code Example

import pandas as pd
from sklearn.preprocessing import StandardScaler, PolynomialFeatures

# Numerical scaling
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Polynomial features
poly = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly.fit_transform(X)

# Date features
df['date'] = pd.to_datetime(df['date'])
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['day_of_week'] = df['date'].dt.dayofweek
df['is_weekend'] = df['day_of_week'].isin([5, 6]).astype(int)

# Binning
df['age_group'] = pd.cut(df['age'], 
                         bins=[0, 30, 50, 100],
                         labels=['young', 'middle', 'senior'])

# Interaction features
df['interaction'] = df['feature1'] * df['feature2']
df['ratio'] = df['feature1'] / (df['feature2'] + 1e-10)

print("Feature engineering complete!")