Machine Learning

The Power of (Local Binary) Patterns

Edge- and Texture-Based Clustering

Aaron Dougherty

Nov 2, 2020

11 min read

👁 Image by H Heyerlein on Unsplash

Image by H Heyerlein on Unsplash

Hands-on Tutorials

Unsupervised Bayesian Inference (reducing dimensions and unearthing features)

👁 Image

👁 (Left) Photo by Bradley Brister on [Unsplash](http://unsplash.com) | (Right) Photo by Waldemar Brandt on Unsplash

(Left) Photo by Bradley Brister on [Unsplash](http://unsplash.com) | (Right) Photo by Waldemar Brandt on Unsplash

Edge- and Texture-Based Clustering

Can you tell the difference between these images? Easy right? And I’m sure those of you familiar with classical machine learning methods could whip up a classification algorithm to automate this process in a heartbeat.

But what about when we don’t have any labels? Hopefully, for some of you, some fancy unsupervised clustering algorithms come to mind. But how do we go about achieving this with hundreds of 2GB+ images on a 13inch MacBook Pro (as I am currently trying to do), when working in the field of Computational Pathology, or some other form of complex medical image analysis?

Thankfully, there are a whole host of exciting and useful techniques to help us achieve this!

In this series, Unsupervised Bayesian Inference (reducing dimensions and unearthing features), we will be exploring a range of Unsupervised Bayesian statistical models, which act as the Machine Learning equivalent of a triple threat:

· Dimensionality reduction

· Feature extraction

· Clustering

· Uncertainty quantification

Okay, so there are four, but quadruple threat doesn’t quite have the same ring to it…

But we can probably combine the first two.

See, learning how to reduce dimensionality already!

So without further ado, let us jump into our first article: The incredible power of Local Binary Patterns (or LBP, to save me us some time) is in its ability to differentiate tiny differences in texture and topography, to identify key features with which we can then differentiate between images of the same type – no painstaking labelling required.

In this article, we will cover the key concepts behind LBP; the power this surprisingly simple algorithm holds; and the many great benefits we can acquire from its implementation.

The goal of LBP is to encode geometric features of an image by detecting edges, corners, raised or flat areas and hard lines; allowing us to generate a feature vector representation of an image, or group of images.

By encoding commonalities between images or of a single image of a given unknown class, we allow for comparison of their features, with that of another image. Through this comparison, we can determine the level of similarity between our target representation and an unseen image and can calculate the probability that the image presented is of the same variety or type as the target image.

Although fairly simplistic, and certainly nothing new (it was first invented back in 2002), this unsupervised learning method is capable of differentiating between surprisingly similar images (with notable accuracy) using only minimal data. As an added bonus, this is achieved without any need to train the model (in the traditional sense anyway). Simply building a comparative representation is enough to use this technique. Fortunately, for those of us who are less familiar with vectorised representations, these representations can be displayed in a far more user-friendly manner: histograms.

To explain why this is the case, let us delve further into how LBP works.

LBP can be split into 4 key steps:

· Simplification

· Binarisation

· PDF (probability density function) calculation

· Comparison (of the above functions)

Simplification

Before we begin creating our LBP, we first need to simplify our image. This is our data preprocessing step. In essence, this is our first step in dimensionality reduction, which allows our algorithm to focus purely on the local differences in luminance, rather than worrying about any other potential features.

Therefore, we first convert our image into a single channel (typically greyscale) representation (see below). This creates our ‘window’ for which we can create an LBP feature vector, representing the image.

It is important to do this both with each of our target images (the images representing our types or groups), as well as our input images. The great thing about this is, that we can work with a single representative image of a given type or group, allowing us to classify inputs with a small dataset, in an unsupervised manner.

Binarisation

Next, we calculate the relative local luminance changes. This allows us to create a local, low dimensional, binary representation of each pixel based on luminance.

For each pixel in our window, we take k surrounding pixels from its local ‘neighbourhood’ and compare each one in turn to the central pixel, moving either clockwise or anticlockwise. The direction and starting point are irrelevant, so long as we stick with one direction and take the calculation for each pixel in turn. For each comparison, we output a binary value of 0 or 1, dependent on whether the central pixel’s intensity (scalar value) is greater or less (respectively) than the comparison pixel. This forms a k-bit binary value, which can then be converted to a base 10 number; forming a new intensity for that given pixel. We repeat this until we have a new pixel intensity for each pixel, representing the cumulative local intensity comparative to its neighbours (where the intensity value ranges from 0–2^k). This leaves us with our reduced dimension LBP representation of our original image.

In practice, we first define our LBP parameters. This includes setting our cell size, radius and number of comparative points (k). Our cell size refers to an arbitrary _M_xN pixel size, which we may use to further split our window. The above computation may then be used on each cell independently, rather than on the whole window. This allows for faster and more efficient parallel processing of the image and also provides the possibility of using overlapping cell areas to pick up on local patterns, which maybe too harshly divided should we compute LBP using the entire window. As standard, we set the cell size to 16×16 pixels (arbitrarily).

The radius defines the size of the neighbourhood from which we sample our comparative pixels, for each central pixel in our image when generating our LBP representation (i.e. it is the number of pixels away from the central pixel each comparative pixel will be). This defines exactly what we mean by ‘local’.

Finally, our k value refers to the number of points within our neighbourhood to sample. Typically, this is 8, thereby generating an 8-bit value for each pixel; leaving our final pixel intensity values to range between 0 and 255 (2^8).

Fortunately, Scikit-Image has a simple convenient implementation we can borrow, which abstracts most of these technicalities:

Once we have generated our k-bit LBP representations for each cell in our window, we are then ready to combine them to form our feature vectors.

Side note for training:

The binary patterns generated here can actually be of two types: uniform and non-uniform. These refer to the pattern created by the binary number prior to the base 10 conversion.

When we say ‘uniform’ we are stating that the binary number contains a maximum of two value changes (0–1 or 1–0) throughout the pattern (e.g. 11001111, 11111110, 00011000 and 00111110 are potential examples for a byte of data). Comparatively, (and perhaps unsurprisingly) everything else constitutes as non-uniform.

But why on earth do we care about the number of changes in a binary number? Especially when we only use it to calculate the base 10 compliment anyway?

I’m glad you asked. You see, the beauty of this type of binary pattern is that it preserves the local luminance at each pixel, in such a way that the encoded information becomes rotation invariant. This means that we will end up achieving the same PDF (feature vector representation of our LBP) regardless of the orientation of the image when we calculated our LBP.

In contrast, any non-uniform pattern cannot be trusted to produce the same PDF regardless of orientation. Therefore, we will be risking encoding the same image as different inputs, with distinct feature patterns by using these; potentially damaging our model’s integrity.

PDF Calculation

Now, back to the task at hand!

So how do we convert our LBP image representations into something a little more useful (and, to be honest, a little more understandable)?

We convert them to feature vectors of course! Essentially, we create a histogram. This is the part where we deviate from Bayesian statistics ever so slightly and borrow an old trick from our frequentist cousins. That’s right, you guessed it, we’re going to count the features. We plot these results in the form of a histogram, concatenating the LBP representations from each cell to create a window-level feature vector representation. Again, most of this is abstracted away from us in the Scikit-Image representation.

👁 Image by Author

Image by Author

Now we have our low dimensional, local feature representations, we can choose to use these as easily trainable inputs to powerful classifiers such as Support Vector Machines or Extreme Learning Models (stay tuned for articles on these) or we can explore these feature vectors to discover important potential geometric features which characterise our images.

Alternatively, we now have the ability to conduct labelless classification (or clustering), with the added benefit of uncertainty classification.

Comparison

Last but not least, the part you’ve all been waiting for, it’s time for Kullback-Leibler Divergence: D(p,q). This means, by how much does distribution q (our sample distribution), differ from sample p (our target distribution). The very essence of our Bayesian statistical modelling. If there’s one thing you need to know for this series, and for any future learning in regard to Bayesian statistical modelling, it’s KL Divergence (from now on referred to as KLD).

This powerful Bayesian algorithm allows us to compare two probability density or mass functions (or essentially, any pair of distributions), creating a new probability distribution to explain the relationship between them. This then tells us how likely our data points from distribution ‘q’ come from the same underlying distribution ‘p’.

However, it is important to note that KLD is not commutative (it does nothing to determine how likely p is q, only how likely q is p) – i.e. D(p,q) ≠ D(q,p).

I will be following up with a more detailed article on this key concept, but for now it is only important that you understand the gist of what this algorithm is trying to achieve. For those of you crying out for some more solid detail now, please see a basic Scikit-Image implementation below:

So…

As you can see, LBP allows us to generate low dimensional representations of our images, which emphasises local topographical characteristics. These can be used to classify labelless images, by comparing their key visual features, to determine the probability each image would be sampled from the same population.

This provides us with a number of important benefits:

· We can create low dimensional representations of images, which can be used as input vectors to more complex models, which will now be more easily analysed (lower complexity of the input means a reduced need for model complexity and reduced computational requirements)

· We now have a computationally easy method of feature extraction for high dimensional images (potentially reducing storage memory requirements)

· We have a powerfully accurate classifier which can be created and used in seconds (computationally efficient with minimal memory requirements)

· We have a probabilistic classifier which provides us with true likelihoods rather than strictly imposed classes and provides us a method for uncertainty quantification

· We can classify unseen images based on examples of the targets we want to identify, without the long and tedious task of labelling

Finally, please do not panic if any of these terms seem complicated or confusing for you. Although designed to allow a machine learning and statistic novice to gain the ‘gist’ of what is going on in LBPs, I have purposely included some more complex detail for those wanting to dip their toes in probabilistic modelling and dimensionality reduction techniques.

I will be releasing a series on Bayesian statistics and more simplistic machine learning methods soon, where we will be looking to gain a deeper insight into the more foundational elements mentioned in this series. If you have any questions, feel free to reach out.

Cheers for reading and catch you in the next article.

References

All credit for the LBP method and the details of its use go to the original papers detailed below. All credit for the LBP code implementations go to Scikit-Image

[1] Ojala, T. and Pietikäinen, M., 1999. Unsupervised texture segmentation using feature distributions. Pattern recognition, 32(3), pp.477–486.

[2] Ojala, T., Pietikainen, M. and Maenpaa, T., 2002. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on pattern analysis and machine intelligence, 24(7), pp.971–987.

Written By

Aaron Dougherty

See all from Aaron Dougherty

Face Recognition, Hands On Tutorials, Local Binary Patterns, Machine Learning, Probabilistic Programming

Share This Article

Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.

Write for TDS

URL: https://towardsdatascience.com/the-power-of-local-binary-patterns-3134178af1c7/