VOOZH about

URL: https://www.geeksforgeeks.org/machine-learning/implementation-of-stacking-ml/

⇱ Implementation of Stacking - ML - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

Implementation of Stacking - ML

Last Updated : 10 Sep, 2025

Stacking is a technique in machine learning where we combine the predictions of multiple models to create a new model that can make better predictions than any individual model.

  • In stacking, we first train several base models (also called first-layer models) on the training data.
  • Then, a meta-model (also called final estimator) is trained using the predictions of the base models as input.
  • The core idea is that if one model is sometimes right and another model is right in other cases, combining them intelligently can improve overall accuracy.

Step 1: Importing the required Libraries

We will import pandas, matplotlib and scikit learn for data handling, visualization and modeling.

Step 2: Loading the Dataset 

We will load the dataset into a pandas DataFrame and separate features from the target variable.

  • pd.read_csv(): Reads the dataset from a CSV file.
  • drop(): Removes the target column from features.
  • df['target']: Selects the target column for prediction.

You can Download the dataset from this link Heart Dataset.

Output:

👁 Image

Step 3: Splitting the Data into Training and Testing Sets

We will split the dataset into training and testing sets so we can train models and evaluate their performance.

  • train_test_split(): Splits data into train and test sets.
  • test_size = 0.2: Specifies that 20% of the data should be used for testing, leaving 80% for training.
  • random_state = 42: Ensures reproducibility by setting a fixed seed for random number generation.

Step 4: Standardizing the Data

We will standardize numerical features so they have a mean of 0 and standard deviation of 1. This helps some models perform better.

  • StandardScaler(): Standardizes features.
  • fit_transform(): Learns scaling parameters from training data and applies them.
  • transform(): Applies learned scaling to test data.
  • var_transform: Specifies the list of feature columns that need to be standardized.
  • X_train[var_transform]: Applies the fit_transform method to standardize the selected columns in the training data.
  • X_test[var_transform]: Applies the transform method to standardize the corresponding columns in the test data using the scaling parameters from the training data.

Output:

👁 Image

Step 5: Building First Layer Estimators

We will create base models that will form the first layer of our stacking model. For this example we’ll use K-Nearest Neighbors classifier and Naive Bayes classifier.

  • KNeighborsClassifier(): A model based on nearest neighbors.
  • GaussianNB(): A Naive Bayes classifier assuming Gaussian distribution.

Step 6: Training and Evaluating KNeighborsClassifier

We will Train the KNN model and check its accuracy on the test set.

  • fit(): Trains the model.
  • predict(): Makes predictions on test data.
  • accuracy_score(): Calculates accuracy

Output:

Accuracy Score of KNeighbors Classifier: 86.88524590163934

Step 7: Training and Evaluating Naive Bayes Classifier

Similarly, we will train the Naive Bayes model and check its accuracy.

Output:

Accuracy of Naive Bayes Classifier: 86.88524590163934

Step 8: Implementing the Stacking Classifier

Now, we will combine the base models using a Stacking Classifier. The meta-model will be a logistic regression model which will take the predictions of KNN and Naive Bayes as input.

  • StackingClassifier(): Combines base models and a meta-model.
  • classifiers: List of base learners.
  • meta_classifier: Model that learns from base learners’ predictions.
  • use_probas=True: Passes probability outputs to the meta-model instead of class labels.

Step 9: Training Stacking Classifier  

Next we will rain the stacking classifier and evaluate its accuracy.

Output:

Accuracy Score of Stacked Model: 88.52459016393442

Both individual models (KNN and Naive Bayes) achieved an accuracy of approximately 86.88%, while the stacked model achieved an accuracy of around 88.52%. This shows that combining the predictions of multiple models using stacking can slightly improve overall performance compared to using a single model.

Related Articles

Comment