![]() |
VOOZH | about |
Autoencoders are neural networks designed for unsupervised tasks like dimensionality reduction, anomaly detection and feature extraction. They work by compressing data into a smaller form through an encoder and then reconstructing it back using a decoder. The goal is to minimize the difference between the original input and its reconstruction. In this article, weโll implement a simple autoencoder in PyTorch using the MNIST dataset of handwritten digits.
Lets see various steps involved in the implementation process.
We will be using PyTorch including the torch.nn module for building neural networks and torch.optim for optimization. For loading and preprocessing the MNIST dataset, we will use datasets and transforms from the torchvision package. Also we use Matplotlib for visualizing training progress and displaying images.
Now we will load MNIST dataset containing 70,000 grayscale images of digits (0-9), each sized 28x28 pixels. We will convert images to tensors and create a data loader to fetch data in batches for training.
In this step we are going to define our autoencoder. It consists of two components:
Encoder: Compresses the 784-pixel image into a smaller latent representation through fully connected layers with ReLU activations helps in reducing dimensions.
28*28 = 784 ==> 128 ==> 64 ==> 36 ==> 18 ==> 9
Decoder: Reconstructs the original image by expanding the latent vector back to the original size, ending with a Sigmoid activation to output pixel values between 0 and 1.
9 ==> 18 ==> 36 ==> 64 ==> 128 ==> 784 ==> 28*28 = 784
After defining the autoencoder, we create an instance of the model. We use Mean Squared Error (MSE) as the loss function since it measures how close the reconstructed images are to the original inputs. For optimization, we use the Adam optimizer with a learning rate of 0.001 and weight decay of which helps to prevent overfitting.
In this step the model undergoes training for 20 epochs. The training process updates the model's weights using backpropagation and optimization techniques. Loss values are recorded during each iteration and after training a loss plot is generated to assess the modelโs performance over time.
Note: This snippet takes 15 to 20 mins to execute depending on the processor type. Initialize epoch = 1 for quick results. Use a GPU/TPU runtime for faster computations.
Output:
The loss curve in the image shows how the model's error decreases over training iterations. Initially the loss is high but quickly drops showing that the model is learning.
After training, it's important to see how well the autoencoder reconstructs the images. We take a batch of images and pass them through the trained model and display the original and reconstructed images side by side.
Output:
๐ ImageThe top row shows the original MNIST digits and the bottom row shows their reconstructions. Some reconstructed images may look a little blurry which is expected because the model compresses the data. This can be improved by using more advanced architectures or training longer.
You can download source code from here.