Scikit-learn Cheatsheet 2025: From Beginner to Advanced

If you’re working in machine learning with Python, Scikit-learn (sklearn) is one of the most powerful and beginner-friendly libraries you’ll ever use. It provides tools for data preprocessing, model selection, evaluation and deployment.

This Scikit-learn cheatsheet is your one-stop guide, covering everything from the basics to advanced techniques. Bookmark it and you’ll never get stuck again!

1. Getting Started with Scikit-learn

Installation

pip install scikit-learn

Importing

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import datasets

Loading Datasets

# Built-in datasets
iris = datasets.load_iris()
X, y = iris.data, iris.target

# Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

2. Data Preprocessing

Handling Missing Data

from sklearn.impute import SimpleImputer

imputer = SimpleImputer(strategy='mean')
X = imputer.fit_transform(X)

Feature Scaling

from sklearn.preprocessing import StandardScaler, MinMaxScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Encoding Categorical Data

from sklearn.preprocessing import LabelEncoder, OneHotEncoder

# Label Encoding
le = LabelEncoder()
y_encoded = le.fit_transform(y)

# One-Hot Encoding
ohe = OneHotEncoder(sparse_output=False)
X_encoded = ohe.fit_transform(X)

3. Machine Learning Models

Classification

from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier

clf = RandomForestClassifier()
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

Regression

from sklearn.linear_model import LinearRegression
from sklearn.ensemble import GradientBoostingRegressor

reg = LinearRegression()
reg.fit(X_train, y_train)
y_pred = reg.predict(X_test)

Clustering

from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=3)
kmeans.fit(X)
labels = kmeans.labels_

Dimensionality Reduction

from sklearn.decomposition import PCA

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

4. Model Evaluation

Classification Metrics

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

print(accuracy_score(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))

Regression Metrics

from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

print(mean_squared_error(y_test, y_pred))
print(r2_score(y_test, y_pred))

Cross-Validation

from sklearn.model_selection import cross_val_score

scores = cross_val_score(clf, X, y, cv=5)
print(scores.mean())

5. Model Selection & Hyperparameter Tuning

Grid Search

from sklearn.model_selection import GridSearchCV

param_grid = {'n_estimators': [100, 200], 'max_depth': [5, 10]}
grid = GridSearchCV(RandomForestClassifier(), param_grid, cv=5)
grid.fit(X_train, y_train)

print(grid.best_params_)

Random Search

from sklearn.model_selection import RandomizedSearchCV

random_search = RandomizedSearchCV(RandomForestClassifier(), param_grid, n_iter=10, cv=5)
random_search.fit(X_train, y_train)

6. Advanced Topics

Pipelines

from sklearn.pipeline import Pipeline

pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('clf', LogisticRegression())
])

pipeline.fit(X_train, y_train)

Feature Selection

from sklearn.feature_selection import SelectKBest, chi2

selector = SelectKBest(score_func=chi2, k=2)
X_new = selector.fit_transform(X, y)

Handling Imbalanced Data

from sklearn.utils.class_weight import compute_class_weight

weights = compute_class_weight('balanced', classes=np.unique(y), y=y)
print(weights)

Ensemble Learning (Stacking)

from sklearn.ensemble import StackingClassifier

estimators = [
    ('rf', RandomForestClassifier(n_estimators=10)),
    ('svr', SVC(probability=True))
]

stack = StackingClassifier(estimators=estimators, final_estimator=LogisticRegression())
stack.fit(X_train, y_train)

7. Model Deployment

Save and Load Models

import joblib

joblib.dump(clf, 'model.pkl')
loaded_model = joblib.load('model.pkl')

Conclusion

This Scikit-learn Cheatsheet 2025 has walked you through everything from basic data preprocessing to advanced model tuning and deployment. Whether you’re a beginner just getting started or an advanced ML engineer, this guide is your quick reference for all things Scikit-learn.

Bookmark this blog and the next time you’re stuck, you’ll know exactly where to look!

Scikit-learn Cheatsheet 2025: From Beginner to Advanced

Table of Contents

1. Getting Started with Scikit-learn

Installation

Importing

Loading Datasets

2. Data Preprocessing

Handling Missing Data

Feature Scaling

Encoding Categorical Data

3. Machine Learning Models

Classification

Regression

Clustering

Dimensionality Reduction

4. Model Evaluation

Classification Metrics

Regression Metrics

Cross-Validation

5. Model Selection & Hyperparameter Tuning

Grid Search

Random Search

6. Advanced Topics

Pipelines

Feature Selection

Handling Imbalanced Data

Ensemble Learning (Stacking)

7. Model Deployment

Save and Load Models

Conclusion

External Resources

3 thoughts on “Scikit-learn Cheatsheet 2025: From Beginner to Advanced”

Leave a Comment Cancel reply

Table of Contents

1. Getting Started with Scikit-learn

Installation

Importing

Loading Datasets

2. Data Preprocessing

Handling Missing Data

Feature Scaling

Encoding Categorical Data

3. Machine Learning Models

Classification

Regression

Clustering

Dimensionality Reduction

4. Model Evaluation

Classification Metrics

Regression Metrics

Cross-Validation

5. Model Selection & Hyperparameter Tuning

Grid Search

Random Search

6. Advanced Topics

Pipelines

Feature Selection

Handling Imbalanced Data

Ensemble Learning (Stacking)

7. Model Deployment

Save and Load Models

Conclusion

Related Reads

External Resources

3 thoughts on “Scikit-learn Cheatsheet 2025: From Beginner to Advanced”

Leave a Comment Cancel reply