Deep learning geodemographics with autoencoders and geographic convolution (geoconvolution)

At the AGILE conference 2019, I presented a paper in collaboration with Pengyuan Liu on approaches to creating geodemographic classifications using deep neural networks.

We discussed two approaches, both based on deep autoencoders, which allow automating dimensionality reduction before clustering.

The second approach also introduces the idea of geographic convolution in neural networks (geoconvolution), which aims to mirror in the geographical domain the approach of graphical convolution, beyond its application to raster datasets in earth observation. Convolutional neural networks have revolutionised image recognition and demonstrated how it is possible to identify shapes and patterns that go beyond the single pixel by applying smoothing functions to images. We postulate that a similar approach, namely geoconvolution, can be used when analysing geographic patterns in data representing area objects. Geoconvolution aims to account for higher-scale patters in the creation of the classification, by looking at the geographically-local average values, whereas common approaches such as k-means are essentially non-spatial.

To test our approaches, we created a geodemographic classification based on the United Kingdom Census 2011 for the county of Leicestershire and compared it to the official 2011 Output Area Classification (see image below and also DataShine 2011 OAC). Our results show that the two deep neural networks are successful in creating classifications which are statistically similar to the official classification and demonstrate high cluster homogeneity. However, validation of geodemographic classification is a complex issue, and more research will be necessary to fully validate our approaches.

oac_dl_maps.png

The paper illustrates how an unsupervised deep neural network can be devised to recreate a geodemographic classification in a largely automated fashion. However, while the approaches presented in the paper provide a more automated geodemographic procedure, the number clusters and their interpretation are still largely at the discretion of the practitioner creating a classification, along with a large number of the parameters required to define the deep autoencoders.

The second contribution of the paper is introducing the concept of geoconvolution. Crucially, the number of possible approaches to implementing the general idea of geoconvolution is vast, and further work is needed to explore this new research avenue fully.