VOOZH about

URL: https://towardsdatascience.com/geometric-deep-learning-for-spherical-data-55612742d05f/

⇱ Geometric Deep Learning for Spherical Data | Towards Data Science


Geometric Deep Learning for Spherical Data

Spherical CNNs

12 min read

Spherical CNNs

By encoding an understanding of the translational symmetry of the physical world, convolutional neural networks (CNNs) have revolutionised computer vision. In this blog post we investigate how the principles underlying the success of CNNs may be transferred to the range of problems for which the data exhibits complex geometry, such as the sphere.

This blog post was co-authored by Oliver Cobb and Augustine Mavor-Parker from Kagenova.

👁 An example of spherical data. [Photo by NASA on Unsplash]
An example of spherical data. [Photo by NASA on Unsplash]

The ability of CNNs to extract semantic meaning from conventional (planar) images and videos has improved rapidly over the past decade. Given enough data, human level performance can often be reached. However, the analysis of data with spatial structure is far from a solved problem. There are a variety of problems for which the data exhibits spatial, but non-planar, structure. Examples include 360° imagery in virtual reality, the cosmic microwave background (CMB) radiation from the Big Bang, 3D scans in medical imaging, and meshed surfaces in computer graphics, to name just a few.

For each of these problems we would like to leverage our knowledge of the structure of the data and in particular the symmetry transformations they respect. As discussed in a previous blog post, encoding an understanding of symmetry into machine learning models can be a powerful way to restrict the space of models under consideration, allowing models to be learnt more effectively.

For images on the plane, translational symmetry can be encoded easily and efficiently by applying a stack of convolutional filters, translated across the image. Because the same convolutional filters are being applied across all locations the resultant operation is translationally equivariant, i.e. it respects translational symmetry. This means that regardless of where a feature is located in an image, it stimulates activation neurons in the corresponding location in an identical manner.

Unfortunately, for problems with non-planar structure there typically does not exist such simple procedures for encoding an understanding of the symmetries at play. However, for these problems researchers in the emerging field of geometric deep learning are formulating new approaches that leverage properties of the geometric form of the data and that respect the symmetries at play. A set of problems for which significant progress has recently been made are those for which the data is defined on the surface of the sphere.

Symmetries of spherical data

Many fields involve data that live inherently on the sphere.

Spherical data can arise when observations are made at each point on a spherical surface, such as a topographic map of the Earth. However, it also arises when observations are made over directions, such as for the cosmic microwave background (CMB) in cosmology or for 360° imagery in virtual reality and computer vision (see images below). At Kagenova we’re working to unlock the remarkable success of deep learning for these problems and others involving data with complex geometry, such as the sphere.

👁 Examples of spherical data. [Original figure created by authors.]
Examples of spherical data. [Original figure created by authors.]

For planar images, CNNs stipulate that the rules defining how a particular feature is transformed should not depend on where the feature happens to be located in the plane. For data defined on the sphere, we would instead like to stipulate that the rules should not depend on how and where the features happens to be oriented on the sphere. Transforming the feature and then rotating its transformed form should be equivalent to rotating the feature and transforming its rotated form. Operations respecting this property are said to be rotationally equivariant (see diagram below).

👁 Illustration of rotational equivariance. Given spherical data (top left), applying a transformation (𝒜) to obtain a feature map (top right) and then rotating (ℛᵨ) the feature maps (bottom right) is equivalent to first rotating the data (bottom left) and then applying the transformation (bottom right). [Original figure created by authors.]
Illustration of rotational equivariance. Given spherical data (top left), applying a transformation (𝒜) to obtain a feature map (top right) and then rotating (ℛᵨ) the feature maps (bottom right) is equivalent to first rotating the data (bottom left) and then applying the transformation (bottom right). [Original figure created by authors.]

In physics, stipulating that the physical laws governing the behaviour of a system should not depend on how the system is oriented gives rise to the law of conservation of angular momentum. It is therefore unsurprising that some of the same machinery used for studying angular momenta in quantum physics is useful for defining rotationally equivariant layers in deep learning (as we will see later).

Limitations of standard (planar) CNNs

Before diving into ideas of spherical deep learning it is perhaps natural to wonder why the effectiveness of planar CNNs can not be leveraged directly. Could we not project our spherical data onto some planar representation and simply apply a CNN in the usual way? We are, after all, familiar with planar projections (maps) of our spherical world.

The problem with projections is that there does not exist a projection from the sphere to the plane that preserves both shapes and areas. In other words, distortions are unavoidable.

This is why Greenland commonly appears to be a similar size to Africa on maps of the Earth, whereas it is actually less than one tenth of the size (see map below).

👁 Projections of the sphere to the plane introduce distortions that are unavoidable, irrepsective of the projection method used. For this reason Greenland commonly appears to be a similar size to Africa on maps of the Earth, whereas it is actually less than one tenth of the size. [Image sourced from Wikimedia Commons.]
Projections of the sphere to the plane introduce distortions that are unavoidable, irrepsective of the projection method used. For this reason Greenland commonly appears to be a similar size to Africa on maps of the Earth, whereas it is actually less than one tenth of the size. [Image sourced from Wikimedia Commons.]

These distortions mean that when applying a conventional CNN to a planar projection of a spherical image, features appear differently depending on where they are located. The translational equivariance of planar CNNs does not encode rotational equivariance when applied to planar projections of spherical images. Encoding rotational equivariance requires a notion of convolution that is designed specially for the geometry of the sphere.

Convolutional complications

The simple convolutional procedure implemented by planar CNNs unfortunately cannot be applied in the spherical setting.

To see why this is the case, first consider the form of planar data. Planar data are represented as a 2D array of pixel values. For data defined on the plane we can uniformly space pixel locations, both horizontally and vertically. This uniform sampling of the plane means that each pixel has associated neighbours and all pixel’s have neighbours at the same relative locations (north, north-east, east, etc). This means that any filter defined at the same sample locations can, through a translation, be centered on any pixel in the input such that the samples exactly align.

Unfortunately there is no way to sample the sphere such that all pixels have neighbours at the same relative positions. Locations on the sphere are typically described using spherical coordinates, with θ measuring the polar angle and ϕ the azimuthal angle. Uniformly spacing samples with respect to θ and ϕ results in the sampling of the sphere shown on the left in the diagram below. If we use these sample locations to define a filter, and then rotate the filter, we will find that the sample locations do not align (see diagram below). This is true regardless of how we choose to sample the sphere.

👁 Suppose we define a filter using the same sample positions as our spherical data. It is then not possible to evaluate how well the filter matches the data under various rotations, because the samples do not align. This is true for all samplings of the sphere. [Original figure created by authors.]
Suppose we define a filter using the same sample positions as our spherical data. It is then not possible to evaluate how well the filter matches the data under various rotations, because the samples do not align. This is true for all samplings of the sphere. [Original figure created by authors.]

It is well-know that it is not possible to discretize a sphere in a manner that is invariant to rotations. Consequently, it is not possible to construct a purely discrete spherical convolution that is strictly rotationally equivariant.

To construct a spherical notion of convolution that captures the desired property of rotational equivariance we must consider a continuous representation. Thankfully there exists such a representation for which a natural notion of convolution can be performed.

Consider the representation of continuous signals on the sphere. These are functions f: 𝕊² → ℝ associating a value with each point (θ, ϕ) on the sphere 𝕊², not just at select sample locations. Just as continuous signals on the circle (i.e. periodic functions) can be decomposed as a weighted sum of sine and cosine functions, those on the sphere can be similarly decomposed as a weighted sum of harmonic basis signals (see diagram below). In both cases the weights (coefficients) may then be used to represent the signal, giving rise to the Fourier series representation for signals on the circle and spherical harmonic representation for those on the sphere.

👁 Spherical harmoinc functions. [Image sourced from Wikimedia Commons.]
Spherical harmoinc functions. [Image sourced from Wikimedia Commons.]

Although this representation is infinite, real world signals can be very accurately approximated by suitably truncating the vector of coefficients. From the diagram above we can see that low degree spherical harmonics are only capable of capturing low frequency variations whereas high degree spherical harmonics may capture higher frequency variations. The point at which we truncate determines the resolution of our data representation.

Spherical convolution

Recall that we would like to perform a transformation of spherical data that satisfies the property of rotational equivariance.

There is a very natural notion of spherical convolution that in the continuous setting is analogous to that performed in the planar case.

This is to take the spherical signal f: 𝕊² → ℝ, define a spherical filter g: 𝕊² → ℝ, and compute the convolved signal f * g _d_efined by

👁 Image

Here we have used the rotation operator ℛᵨ, defined as _(ℛ_ᵨ g)(ω)=g(ρ⁻¹ω). In other words it has the effect of applying the corresponding inverse rotation to the domain of the function, analogous to how we think of translating functions defined on the 1D line or 2D plane.

The interpretation of the above equation is that the convolved signal f * g captures how well the signal f is matched by filter g under any given rotation ρ (see diagram below). This is analogous to the planar case where we consider how well a filter matches the input under various translations.

👁 Visualization of spherical convolution of data (left) with filter (centre) to produce a feature map (right). [Original figure created by authors.]
Visualization of spherical convolution of data (left) with filter (centre) to produce a feature map (right). [Original figure created by authors.]

The main difference here is that the space upon which the convolved signal is defined, that of rotations (a 3D space), is different to the space upon which the signal and filter being convolved are defined (the 2D sphere). In the illustrative example shown above the filter is invariant to azimuthal rotations so the output remains on the sphere.

This lifting of the input signal from the sphere to the space of 3D rotations is not particularly problematic, however. An analogous notion of convolution can subsequently be performed between signals and filters defined on the space of rotations. Therefore, given a spherical input, in order to learn features hierarchically we may perform one spherical convolution, resulting in an activation map on the 3D rotation group, and then as many rotation group convolutions as we desire.

To see why the above notions of convolution are rotationally equivariant note that rotating the input by ρ is equivalent to instead applying an additional rotation of ρ⁻¹ to the filter inside the integral. In turn this then has the effect of rotating the domain on which the convolved signal is defined by ρ⁻¹. In other words, rotating the input before performing the convolution is equivalent to performing the convolution and then rotating the output.

The convolution of two spherical signals seemingly requires the computation of a two dimensional integral for each value in a three dimensional space. Fortunately, however, the relationship between the harmonic representation of f * g and those of f and g is simple. The spherical convolution can be computed in harmonic space by performing matrix multiplications between the harmonic coefficients of f and those of g. This is particularly convenient given deep learning practitioners are well accustomed to leveraging GPUs to perform matrix multiplications efficiently.

Spherical convolutions are not enough

Equipped with a rotationally equivariant linear operation that may be efficiently implemented it might appear that we have all we need to apply this operation repeatedly and learn features hierarchically.

However there’s an important component that we have so far neglected to mention – the introduction of non-linearity.

In planar networks non-linearity is introduced by pointwise activation functions, i.e. by applying a chosen non-linear function separately to the values at each sample location. Because of the uniformity of the planar sampling scheme this is indeed a translationally equivariant operation. However, we have transitioned to working with harmonic representations without associated sample locations or values. Although it is possible to obtain a sample-based representation, our inability to uniformly sample the sphere (as discussed above) means that applying a non-linear function identically to each sample is not a strictly rotationally equivariant operation.

Nevertheless introducing non-linearity in this way is possible and, as shown by Cohen et al. (2018) and Esteves et al. (2018), often fairly effective. However, repeatedly transitioning between harmonic and sample-based representations in order to perform convolutional and non-linear operations is cumbersome. Moreover it is natural to wonder the extent to which the lost equivariance is impeding performance.

In our next post we will see how ideas from quantum physics may be leveraged to introduce non-linearity directly in harmonic space without compromising on the degree to which we respect rotational symmetry.

References

[1] Cohen,Geiger, Koehler, Welling, Spherical CNNs, ICLR (2018), arxiv:1801.10130.

[2] Esteves, Allen-Blanchette, Makadia, Daniilidis, Learning SO(3) Equivariant Representations with Spherical CNNs, ECCV (2018), arXiv:1711.06721.


Written By

Jason McEwen

Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.

Write for TDS

Related Articles