Supervised vs Unsupervised Learning
Discussing the main differences between supervised, unsupervised and semi-supervised learning in Machine Learning
Introduction
In the field of Machine Learning there are two fundamental learning types namely supervised and unsupervised methods. Now depending on the problem we want to solve, the questions we need to answer and the data we have access to we need to choose a suitable learning algorithm.
Therefore, the overall learning procedure relies on the answers given to the the questions raised above. And given that these answers may vary, we first need to clarify what learning type suits the nature of the problem we are trying to solve, before choosing a specific learning algorithm.
Supervised Learning
In supervised learning, the dataset of interest contains the explanatory variables (also known as the input or features) as well as the target responses (also known as the output labels). Such algorithms attempt to learn a function that approximates the relationship between the feature values and the labels in a way that it’d be able to generalise well to new unseen data.
In other words, supervised learning algorithms associate the input features of the training examples to the corresponding output labels so that they can perform good enough predictions to all possible inputs. This learning method is also called learning from exemplars.
Problems that require supervised learning methods can be further grouped into classification and regression problems. The former is when the output variable (label) corresponds to a category; for example spam vs ham emails while the latter is when the output variable is a real value; for example a distance or a price.
Some examples of supervised learning algorithms include Linear Regression, Random Forest, Decision Trees and Support Vector Machines.
Unsupervised Learning
On the other hand, unsupervised learning is suitable for problems that require the algorithms to identify and extract similarities between the inputs so that similar inputs can be categorised together. In contrast to supervised learning, unsupervised learning methods are suitable when the output variables (i.e the labels) are not provided.
The two fundamental types of unsupervised learning methods are clustering and density estimation. The former (which is probably the most commonly used) involves problems where we need to group the data into specific categories (known as clusters) while the latter involves summarizing the distribution of the data.
Some examples of unsupervised learning algorithms include K-Means Clustering, Principal Component Analysis and Hierarchical Clustering.
Semi-Supervised Learning
Now there’s also another type of learning called semi-supervised that comes in handy when we do not have target labels for all the examples in the training dataset. Therefore, such problems require a mixture of supervised and unsupervised learning techniques.
A very common problem that requires such methods is Image classification or Object Detection. Usually datasets containing images, may only have labels only for a subset of the examples included while the remaining come with no label at all.
Final Thoughts
In today’s article we discussed the main differences between the two fundamental Machine Learning methods namely supervised and unsupervised learning.
To summarise, supervised learning methods are useful when the dataset available contains both the features and the correct labels for each examples. Such methods are useful when we want to perform some sort of prediction over the data of interest such as classifying whether an email is a spam or not. On the other hand, unsupervised learning methods come in handy when we don’t have access to the output label and we need to categorise (or cluster) the data together into groups.
It is also important to mention that these are not the only learning methods in the context of Machine Learning. A few other types include Reinforcement Learning and Evolutionary Learning which are all beyond the scope of this article.
Feature Scaling and Normalisation in a Nutshell
fit() vs predict() vs fit_predict() in Python scikit-learn
fit() vs transform() vs fit_transform() in Python scikit-learn
Share This Article
Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.
Write for TDS