How Artificial Intelligence is Improving Open Source GIS
More and more companies are starting to use geospatial data for their machine learning applications to draw insights from the patterns of life. To better understand how they do this, we’ll discuss what exactly is meant with Geospatial Artificial Intelligence (GeoAI). We’ll cover the tasks that form part of (geospatial) machine learning and deep learning workflows, the prerequisites to perform these, and give an overview of the current tools and initiatives in the open source GIS community to integrate machine learning and deep learning into existing workflows.
Artificial Intelligence is the science and engineering of making machines intelligent, so that they can achieve a task the way humans do. While true AI does not exist (yet), AI subfields are improving rapidly and already changing the way companies understand how people interact with their environment and how they make predictions based on the patterns they discover in their data, such as predicting traffic patterns or housing prices, or simply classifying large quantities of imagery data.
This article will focus mostly on machine learning tools that are available for the open source GIS community, to inform the reader interested in this subject how the technology is evolving and what is needed to get started. The overview presented here is by no means complete, but presents innovative projects and tools that have been discussed during the most recent FOSS4G event, or are available freely as part of the QGIS plugin repository (or other similar and related open source GIS technology).
Machine Learning and Deep Learning: definition and workflows
To better understand how GeoAI tools work, it’s necessary to explain what machine learning and deep learning are and describe the workflows for both. While both terms are often used interchangeably, they are not one and the same thing: machine learning is a subfield of AI that deals with the field of study that gives computers the ability to learn without being explicitly programmed. Deep learning is a subfield of machine learning that uses deep, or many-layered artificial neural networks. This is software that roughly emulates the way neurons in the brain operate.
The distinction between machine learning and deep learning is important when reviewing the tools presented later in this article.
Next, we’ll discuss the workflows of both to understand how both subfields work in practice. As stated above, the overall purpose of machine learning is to discover patterns in a dataset and use those to make predictions to answer business questions, detect/analyze trends and solve problems. As explained in this article, machine workflows are different for each company, but the three tech giants (Google, Amazon and Microsoft) use three steps for a typical ML workflow that includes data processing, modeling and deployment.
Additionally, this source discusses a generic deep learning workflow, which is more fine-grained than the machine learning model, but includes roughly the same steps. As we’ll see later in this article, some tools for geospatial machine learning focus on separate steps of this workflow (such as creating training data), where others deploy existing, pre-trained models: the only thing you need to do as a user is choose a model, that is run on an input dataset. This distinction is important to understanding how new tools for geospatial AI are evolving, and how rapidly this is happening.
Additional requirements for performing geospatial machine learning and deep learning
To understand why certain tools for geospatial machine learning and deep learning were created and what their function is in the overall workflows discussed above, we’ll now focus on prerequisites for performing geospatial ML/DL: programming skills, data and IT infrastructure.
One reason ML is so popular these days is because of advances in hardware, especially graphics processing units (GPU), that offer massive computational speedups over the more conventional central processing unit (CPU). In addition to quickly rendering graphics, GPUs are equally adept in training complex machine learning algorithms.
This happens in the cloud, which explains why more companies undertake machine learning projects here and as a result of that, cloud processing costs are decreasing. Currently, a number of geospatial companies are providing APIs in combination with plugins to access data and/or ML functionality in a GIS application of their choice.
Python is often mentioned as the to-go programming language for machine learning due to its availability of ML frameworks and libraries, so that ML products can be developed faster. It’s by no means the only language for ML: other languages that offer ML functionality are R, Scala, Julia and Java. As we’ll see below, a number of geospatial AI tools offer no-code functionality for those without programming skills. Domain knowledge about ML/DL is still an additional requirement though to be able to use such tools.
Finally, there’s a data requirement. ML and DL both depend on (large quantities of) data. Firstly, to train a model and next to deploy the pre-trained models. There are currently tools available for training geospatial ML data, while one recent FOSS4G project focuses on the interaction between humans and algorithms when labeling/training data.
Machine Learning Applications for Spatial Data Analysis
After covering the “why” and the “how” of both machine learning and deep learning, it’s time to focus on the available machine learning tools for spatial data analysis. While there are many advancements in machine learning that result in new tools and applications, it’s easy to forget that there are many familiar spatial data analysis tools that use machine learning without using that label. We can distinguish three categories of tools: regression, classification and clustering of spatial data. As we’ll see later in this article, multiple geospatial machine learning tools use a range of geospatial imagery, as well as lidar, for classification purposes.
Regression tools: regression analysis in spatial data is used for interpolation, such as Empirical Bayesian Kriging (EBK) for interpolating univariate data. For example, performing house price interpolation with support from input data such as “disaster risks” or “distance from the public facility”. Other familiar spatial data algorithms include Ordinary Least Square (OLS) regression and Geographically Weighted Regression (GWR). In addition to point interpolation, we can distinguish areal interpolation, that is returning a bigger set of polygons into a set of smaller polygons according to their surroundings.
Classification tools: for example, using a remote sensing imagery to classify land cover.
Clustering tools: Includes Hot Spot analysis to show where high and low values are concentrated, as well as density-based clustering specifically for grouping a number of points according to their density. Not only can vector data be clustered, the same goes for raster images using image segmentation. Space-Time Pattern Mining clusters spatial and temporal data at the same time, which is illustrated as a 3D-cuboid. This is used for analysis of emerging hot and cold spots as a result of increasing, decreasing or constant areas over time.
New GeoAI tools and applications
To complete this overview of geospatial machine and deep learning tools, we’ll give an overview of specific QGIS plugins and FOSS4G projects that include any form of AI/ML/DL.
QGIS plugins for deep learning and machine learning
The Mapflow QGIS plugin extracts real-world objects from satellite imagery. This classificatication tool offers multiple AI mapping models: in addition to land cover, it offers a building classification tool, and AI models for construction sites roads, agriculture fields and forest vegetation. This plugin lets you extract real-world objects for free for an area up to 90 sq km. What is interesting is that it runs pre-trained models, returning QGIS layers with ready-to-use objects. In the same vein, the buildSEG QGIS plugin is another classification tool that offers (only) building extraction from raster images.
The Produce Training Data for Deep Learning QGIS plugin produces deep learning training datasets for remote sensing experts lacking programming skills to perform this task. As such, it fills a gap in the larger deep learning workflow.
Similarly, the EnMap-Box QGIS plugin processes imaging spectroscopy data, particularly for data from the upcoming EnMap sensor. The plugin enables integration of machine learning algorithms into the user’s image classification and regression with Random Forests, Support Vector Machines and more.
Finally, the Cluster Points QGIS plugin conducts clustering of points based on their mutual distance to each other or based on supplemental information from attributes. Instead of raster data, this tool is only for vector data.
FOSS4G projects and applications
The recent FOSS4G 2022 event, held in Firenze, Italy, hosted a number of project presentations focusing on AI, ML and DL. The following overview gives an idea of how AI/ML/DL data, tools and infrastructure are being improved continuously, leading to new use cases.
OTBTF is a remote module of the Orfeo ToolBox that enables deep learning with remote sensing images. Created in 2018, it aimed to provide a generic framework for various kinds of raster-oriented deep learning-based applications. A few years later, it has been used for a wide range of applications, like land cover mapping at country scale, super-resolution, optical image cloud removal and more.
At the event, the initial results of a no-code geoAI activity were presented, including a high-level architecture and a stack expected to be composed of various open source software tools and open standards for digital infrastructures, data engineering and AI development.
Open Source Point Cloud Semantic Segmentation using AI/ML, where new AI/ML powered models are outperforming prior methods of assigning semantic labels to points within a point cloud, being able to learn novel features and adapt to the intrinsic variability of the data. The open source ecosystem powering this trend includes benchmark datasets like US3D and DALES to ML frameworks such as PyTorch and Tensorflow, and key libraries such as PDAL, Open3D and PyG.
KartAI is an open living lab for AI in Norway that falls under ML classification tools. This innovation project in the public sector is aimed at developing AI methods that detect buildings not found in the cadastre or the building map dataset. The project has made efforts to release the training datasets publicly, together with data results for different models and developed approaches, to ensure other AI models have easier access to high resolution and high accuracy data to train and apply models where data is scarcer.
A human-in-the-loop (HITL) ML with realtime model predictions is a solution that incorporates GroundWork, a geospatial annotation platform, with Raster Vision, an open source deep learning platform, to provide a human-in-the-loop active workflow. The solution focuses on acquiring and labeling geospatial data for training machine learning models, which is a time-consuming and expensive process. It speeds up the labeling-training-labeling cycle and makes the connection between the AI and human GIS data labelers easy and seamless.
Finally, the event featured the presentation of a new tool for the generation of 3D geometries using AI for volumetry prediction in Buenos Aires, generating 3D graphics of city parcels.
Conclusion
This article discussed how geospatial machine learning and deep learning helps companies to draw insights from the patterns of life and make predictions. We described the steps in machine learning and deep learning workflows to better understand the tasks that current tools for geospatial ML/DL perform. To perform geospatial ML/DL, geospatial analysts need data, hardware infrastructure, domain knowledge and programming skills.
However, the creation of no-code tools and even an entire no-code infrastructure helps those without programming skills perform geospatial ML/DL tasks. Looking at the tools that are available, many familiar spatial data analysis tools use machine learning without using that name. These are known as spatial clustering, classification and regression tools. Looking at available QGIS plugins, we saw that many of these target image classification, which can be explained by the supply of big datasets of satellite imagery as a result of more and cheaper sensors on the market, in combination with cheaper cloud infrastructure to host that data.
From a workflow perspective, many of the tools described above address only a single activity of a larger ML/DL workflow, such as training a dataset. It’s likely that more tools in the vein of MapFlow’s QGIS Plugin will be released that allow users to upload data, run a pre-trained AI model and return ready-to-use GIS data to the user. Such tools are the logical outcome of satellite imagery companies building services on top of their data and offering them through an API. This doesn’t mean that geospatial ML/DL will be the domain of such companies, far from it: on the other side of the spectrum, no code tools, sometimes offered as online collaboration tools, let multiple users work on training data, using remote sensing imagery data that is split into smaller areas. Such tools meet the demand for spatial AI tools that will benefit those that want to draw insights of how people interact with their environment.
References
- https://2022.foss4g.org/
- https://plugins.qgis.org/plugins/
- Architects of Intelligence (Packt Publishing)
- Why is machine learning happening now?
- Why Use Python for AI and Machine Learning?
- Introducing Machine Learning for Spatial Data Analysis
- Geospatial Data: Where Machine Learning Meets Life
- An Introduction To Artificial Intelligence
- The Difference Between Artificial Intelligence, Machine Learning, and Deep Learning