![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
Ever wonder how to scale the process of training machine learning models, without having to use a new dataset each time? Transfer learning is a machine learning technique used to solve a task quickly by leveraging knowledge gained from solving a related task. Pre-trained models can be re-purposed in a variety of ways, depending on the relatedness of the task, so only a small number of labeled examples from the new task are needed. Transfer learning can be a powerful tool for data scientists and engineers alike, enabling those without the means to train a model from scratch to benefit from the powerful features learned by deep models.
Supervised learning is the problem of learning a function that maps inputs (observations) to outputs (labels) based on example pairs. Transfer learning is a variant of supervised learning that we can use when faced with a task with a limited number of these labeled examples. Or, if data scarcity is not an issue, we can leverage transfer learning when we would like to avoid expending the large number of resources required to train a data-hungry model.
This lack of training data could arise if labeled examples are difficult or expensive to collect or annotate, but, at the same time, the task may still require a large (and therefore data-hungry) machine learning model to solve it. For these reasons, there is often not enough data to train a model to an acceptable level of accuracy (or another performance criterion) from scratch.
To overcome data scarcity or to avoid training a model from scratch, we can leverage knowledge gained from training a model on a related task (the source task), for which there are many labeled examples, to solve the original task at hand (the target task). This is the main conceit of transfer learning, and it is often successful when the source and target tasks require similar information to solve.
Diagram of the general transfer learning approach (image source). We are interested in leveraging knowledge contained in a model trained on one task to inform a model used to solve another.
For example, machine learning models trained on images learn similar features (edges, corners, gradients, simple shapes, etc.) from different image datasets, suggesting that these features can be reused to solve other image recognition tasks.
Transfer learning can be further broken down depending on the similarities and differences between the source and target tasks:
A basic approach to transfer learning with neural networks is laid out below, assuming that the source and target tasks share common feature and label spaces:
Using this approach, the neural network at the end of step #1 contains a great deal of information related to solving the source task. Step #2 “saves” the representation learned by the network in its early layers, and during step #3, it is used as a starting point for learning to solve the target task. In this way, we require only a small number of examples to fine-tune the parameters of the later layers of the network, rather than the large number of examples needed to pre-train the entire model. This basic approach can be easily adapted to suit the different kinds of transfer learning outlined in the previous section.
If the source and target tasks have similar feature spaces, we can expect that the retrained network from step 3 will be able to leverage the representation learned from the source task to solve the target tasks.
A compelling example of this approach involves using learned features of a convolutional neural network (CNN) trained to classify the ImageNet or Open Images image datasets. Here, the convolutional features learned by the CNN are considered as a general image representation, and re-purposed to solve image classification, scene recognition, fine-grained recognition, attribute detection, and image retrieval tasks on a diverse collection of datasets — often matching or outperforming the state-of-the-art, from-scratch approaches to solving them. Many of these datasets had far fewer labeled examples than were available for ILSVRC13 training, empirically demonstrating the statistical efficiency that transfer learning offers.
As we’ve discussed, transfer learning is useful in cases where (1) we don’t have the means to curate a large enough dataset to train a model from scratch, or (2) we wish to avoid expending the computational resources or time required to train a model from scratch. In the right situations, transfer learning can open doors for more engineers to experiment with novel deep learning applications. For example, one application (see case study #1 in this article) fine-tunes a pre-trained ImageNet model to achieve high performance with few training examples in a cats vs. dogs image classification task.
Even without deep knowledge of dataset curation, model building, and optimization, existing models can be fine-tuned on a small number of labeled examples in the target task domain until a satisfactory level of performance is reached. Transfer learning enables individuals and small organizations to benefit from the powerful representational capacity of deep models, without the time or budgetary capacity that it typically requires.
Feature image via Pixabay.