8

Classification

Logistic Regression

Theory

Despite its name, Logistic Regression is a classification algorithm. It uses the logistic (sigmoid) function to model the probability of a binary outcome. The output can be interpreted as the probability that the input belongs to the positive class.

Visualization

Logistic Regression visualization

Mathematical Formulation

Sigmoid Function:
σ(z) = 1 / (1 + e⁻ᶻ)

Cost Function (Log Loss):
J(θ) = -(1/m) Σ[y·log(ŷ) + (1-y)·log(1-ŷ)]

Decision: Predict 1 if σ(z) ≥ 0.5, else 0

Code Example

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification

# Generate dataset
X, y = make_classification(n_samples=1000, n_features=10, 
                          random_state=42)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2
)

# Train model
model = LogisticRegression()
model.fit(X_train, y_train)

# Evaluate
train_score = model.score(X_train, y_train)
test_score = model.score(X_test, y_test)

print(f"Training Accuracy: {train_score:.3f}")
print(f"Test Accuracy: {test_score:.3f}")

# Predict probabilities
probas = model.predict_proba(X_test[:5])
print(f"\nPrediction probabilities:")
print(probas)