Spatial Data Science – What is it? (Part 2)
Classifying our world
Blog mini-series by Phil Donovan, GBS Spatial Data Scientist
Spatial data science is being applied all over the world and being used to enhance both large and small organisations globally. But how can it help you? In this blog post I want to provide you with an application of spatial data science to help you understand what it is, what it can do, and how it can help you.
The Esri team at Redlands have been working hard to integrate data science into the Esri ecosystem. GIScience is a natural fit for data science and both disciplines contain plenty of overlap. One of those is in image processing and classification. Image classification or remote sensing, as it is known as in GIScience, has been an integral part of GIS for a long time due to prevalence and need for processing large amounts of satellite imagery. However, the modern data science toolkit provides suite of tools which are perfectly set up for augmenting GIScience present capabilities and taking analysis further.
In this particular example, Esri use satellite imagery to detect all of the pools in a jurisdiction in the USA. This was a particular problem for local government as the previous solution of ‘feet-in-the-ground’ surveying was extremely expensive and time consuming.
Figure 4 shows the output from the algorithm with all properties in a neighbourhood with a pool being found. The team used ESRI ArcPro and Python’s PyTorch library to train the model. ArcPro was used to more easily generate a training data set; that is a dataset with known true classifications which the model can learn from or ‘train’ on. Figure 5 shows the model being trained in ArcPro which also includes functionalities for exporting the data into a machine learning ‘ready’ folder and format.
The results of the model where for the most part extremely successful except for a few oddities such as hills, rivers and some motorways being classified as ‘pools’. In order to ensure that pools where restricted for residential areas, the Esri team used a residential layer from the council zones to restrict the classification further. This demonstrated the importance of knowing of, and understanding the abilities of spatial data. The image below shows some of the predictions in the Normalised difference vegetation index (NDVI) band of the satellite imagery.
The results of this analysis was that an extra 600 pools were found and added to the database. This data could then be easily shared with council staff teams on the ground for follow-up using ArcGIS Online or Portals. Furthermore, the analysis is easily repeatable and can be regularly updated at an extremely low cost. Importantly, this is only one example of the power of spatial data science to solve problems and improve efficiencies in organisations. I will be following up with some more.