VOOZH about

URL: https://www.analyticsvidhya.com/blog/2015/08/optimal-weights-ensemble-learner-neural-network/

⇱ Find Optimal Weights of Ensemble Learner with Neural Network


India's Most Futuristic AI Conference Is Back – Bigger, Sharper, Bolder

  • d
  • :
  • h
  • :
  • m
  • :
  • s

Reading list

Finding Optimal Weights of Ensemble Learner using Neural Network

Tavish Srivastava Last Updated : 26 Jun, 2020
4 min read

Introduction

Encountering ensemble learning algorithm in winning solutions of data science competitions has become a norm now. The ability to train multiple learners on a set of hypothesis, not only adds robustness to the model, but also enables it to deliver highly accurate predictions. In case you missed, I would recommend reading Basics of Ensemble Learning Explained in Simple English before you go forward.

While building ensemble models, one of most common challenge that people face is to find optimal weights. Few of them fight hard to solve this challenge and the not so brave ones, convince themselves to apply simple bagging, which assumes equal weight for all models and takes the average of all predicted values.

It often works well because it removes variance error from individual models. However, as we know, assigning equal weights is not the best approach to obtain the best combination of models. What could be an alternate way? In this article, I’ve solved the problem of finding out the optimal weights in ensemble learners using neural network in R Programming.

πŸ‘ Finding Optimal Weights of Ensemble Learner using Neural Network

Let’s consider a simple problem

Imagine you have built 3 models on a given (hypothetical) data set. Each of these models predicts probability of output for an event.

In the following figure : Model 1, Model 2 and Model 3 are the three predictive models. Each of them have their own uniqueness and can work as a team to perform even better than individually best model.

πŸ‘ Screen Shot 2015-08-22 at 6.09.10 pm
We initially assume a 33.33% weight for each of the model and build an Ensemble model. Here, the challenge is to optimize these weights w1, w2 and w3 in such a fashion as to build a highly powerful ensemble model.

What could be a traditional approach to this problem?

Assume p1 , p2 and p3 are three outputs from the three models respectively. We need to optimize w1, w2 and w3 to optimize an objective function. Let’s try to write down the constraints and objective function mathematically :

Constraint :

w1 + w2 + w3 = 1
p = w1 * p1 + w2 * p2 + w3 * p3

Objective function (Maximize Likelihood function) :

Maximize (Product over all observations [ (p) ^ (y) * (1-p) ^ (1-y) ] )
where
y is the response flag for the observation
p is the predicted probability through ensemble model
p1, p2 and p3 : predicted probabilities from individual model
w1, w2 and w3 : weights given to each model

This is a classic case of simplex optimization. However, with a huge number of models to bag, and diving into mathematical formulas every time can be stressful and time consuming at times. Hence, we need a smarter method here.

We’ll now learn to find these weights without getting into such mathematical formulation using neural network implementation.

How to find optimal weights using Neural Network?

I understand, neural network implementation can be quite overwhelming at times. Hence, to solve the current case in hand, we’ll not deep dive into the complex concepts of deep neural network.

Basically, neural network operates on finding weights for interlink between ( input variable to hidden node) and ( hidden node to output node). Our objective here is to find the right combination of weights from input nodes directly to the output node.

Here is an easy and diligent way to accomplish this task. We restrict the number of hidden nodes to one. This will automatically adjust the weight from hidden node to output node as 1. This implies that the hidden node and the output node is no longer different.

Here is a simplistic schema to represent what we just discussed :

πŸ‘ Screen Shot 2015-08-22 at 6.40.37 pm

So we found a way! This will automatically accomplish what we were trying to do using simplex equations. Let’s put this formulation down to a R code

# x is a matrix of multiple predictions coming from different learners
#y is a vector of all output flags
#x_test is a matrix of multiple predictions on an unseen sample
x <- as.matrix(prediction_table)
y <- as.numeric(output_flags)
nn <- dbn.dnn.train(x,y,hidden = c(1),
activationfun = "sigm",learningrate = 0.2,momentum = 0.8)
nn_predict <- n.predict(nn,x)
nn_predict_test <- n.predict(nn,x_test)

End Notes

I have implemented this logic on multiple Kaggle problems and found that the results are quite encouraging. Here is one such example from a recent competition:

  • Simple logistic model at score of 100
  • Best machine learning model at score of 107
  • Simple bagging techniques of 110, and finally
  • Optimized weights using neural network of 113.

In this article, I’ve explained the method to find optimal weight in an ensemble model using a traditional approach and neural network implementation (recommended).

Did you find this article useful? Have you tried anything else to find optimal weights? I’ll be happy to hear from you in the comments section below.

If you like what you just read & want to continue your analytics learning, subscribe to our emailsfollow us on twitter or like our facebook page.

Tavish Srivastava, co-founder and Chief Strategy Officer of Analytics Vidhya, is an IIT Madras graduate and a passionate data-science professional with 8+ years of diverse experience in markets including the US, India and Singapore, domains including Digital Acquisitions, Customer Servicing and Customer Management, and industry including Retail Banking, Credit Cards and Insurance. He is fascinated by the idea of artificial intelligence inspired by human intelligence and enjoys every discussion, theory or even movie related to this idea.

Login to continue reading and enjoy expert-curated content.

Free Courses

Ensemble Learning and Ensemble Learning Techniques

Learn ensemble learning, its techniques, and how it works in this course!

Bagging and Boosting ML Algorithms

Explore Bagging and Boosting to understand advanced ML algorithms.

Naive Bayes from Scratch

Master NaΓ―ve Bayes for ML: Build classifiers, analyze data, and apply Bayes.

Dimensionality Reduction for Machine Learning

Master key dimensionality reduction techniques for ML success!

Responses From Readers

Samrat Gupta

Hi Tavish, I found this article very interesting. However I need some more understanding on the matrices x,y and x_test used in your code. For example I have 3 models and their corresponding predicted values in the following format: Actual Value | Predicted Value by Model1 | Predicted Value by Model2 | Predicted Value by Model3 A | A1 | A2 | A3 B | B1 | B2 | B3 C | C1 | C2 | C3 - - - In such a case what will be the contents of x,y and test_x?

123 1
Tavish Srivastava

Hi Samrat, In this case x will be A1,A2,A3 and y will be column A. To build test you need an unseen population. Hope this clarifies. Tavish

123 456

Hi Tavish, Excellent Write-up. i have a Quick question. How to set activation function for Numerical Predicted values( it might be Time Series or Regression Output). i had tried "linear" and "tanh" also. but i 'm getting only Categorical Output. like 1 or 0. Please help me this out

123 2
Tavish Srivastava

Hi Karthi, If you are using neural net package, then using the code mentioned in the article will give you probability prediction.

123 456

Thanks for the explanation Tavish. I'm not familiar with the R package. " nn <- dbn.dnn.train(x,y,hidden = c(1), activationfun = "sigm",learningrate = 0.2,momentum = 0.8) " According to the code above, the activiationfun is "sigm" for both hidden node and output node. And the weight W from hidden node to output node is fixed to 1 , which means W will not be updated during the backpropagation. Am I right ? or the network setting is not like this?

123 456

Thanks lot Tavish.

Flagship Programs

GenAI Pinnacle Program| GenAI Pinnacle Plus Program| AI/ML BlackBelt Program| Agentic AI Pioneer Program

Free Courses

Generative AI| DeepSeek| OpenAI Agent SDK| LLM Applications using Prompt Engineering| DeepSeek from Scratch| Stability.AI| SSM & MAMBA| RAG Systems using LlamaIndex| Building LLMs for Code| Python| Microsoft Excel| Machine Learning| Deep Learning| Mastering Multimodal RAG| Introduction to Transformer Model| Bagging & Boosting| Loan Prediction| Time Series Forecasting| Tableau| Business Analytics| Vibe Coding in Windsurf| Model Deployment using FastAPI| Building Data Analyst AI Agent| Getting started with OpenAI o3-mini| Introduction to Transformers and Attention Mechanisms

Popular Categories

AI Agents| Generative AI| Prompt Engineering| Generative AI Application| News| Technical Guides| AI Tools| Interview Preparation| Research Papers| Success Stories| Quiz| Use Cases| Listicles

Generative AI Tools and Techniques

GANs| VAEs| Transformers| StyleGAN| Pix2Pix| Autoencoders| GPT| BERT| Word2Vec| LSTM| Attention Mechanisms| Diffusion Models| LLMs| SLMs| Encoder Decoder Models| Prompt Engineering| LangChain| LlamaIndex| RAG| Fine-tuning| LangChain AI Agent| Multimodal Models| RNNs| DCGAN| ProGAN| Text-to-Image Models| DDPM| Document Question Answering| Imagen| T5 (Text-to-Text Transfer Transformer)| Seq2seq Models| WaveNet| Attention Is All You Need (Transformer Architecture) | WindSurf| Cursor

Popular GenAI Models

Llama 4| Llama 3.1| GPT 4.5| GPT 4.1| GPT 4o| o3-mini| Sora| DeepSeek R1| DeepSeek V3| Janus Pro| Veo 2| Gemini 2.5 Pro| Gemini 2.0| Gemma 3| Claude Sonnet 3.7| Claude 3.5 Sonnet| Phi 4| Phi 3.5| Mistral Small 3.1| Mistral NeMo| Mistral-7b| Bedrock| Vertex AI| Qwen QwQ 32B| Qwen 2| Qwen 2.5 VL| Qwen Chat| Grok 3

AI Development Frameworks

n8n| LangChain| Agent SDK| A2A by Google| SmolAgents| LangGraph| CrewAI| Agno| LangFlow| AutoGen| LlamaIndex| Swarm| AutoGPT

Data Science Tools and Techniques

Python| R| SQL| Jupyter Notebooks| TensorFlow| Scikit-learn| PyTorch| Tableau| Apache Spark| Matplotlib| Seaborn| Pandas| Hadoop| Docker| Git| Keras| Apache Kafka| AWS| NLP| Random Forest| Computer Vision| Data Visualization| Data Exploration| Big Data| Common Machine Learning Algorithms| Machine Learning| Google Data Science Agent
πŸ‘ Av Logo White

Continue your learning for FREE

Forgot your password?
πŸ‘ Av Logo White

Enter OTP sent to

Edit

Wrong OTP.

Enter the OTP

Resend OTP

Resend OTP in 45s

πŸ‘ Popup Banner
πŸ‘ AI Popup Banner