How to use Jupyter Notebooks for your Geospatial Workflows
Combined with the Python scripting and programming language, Jupyter Notebooks have become an indispensable tool for GIS analysts to extend their desktop GIS environment. This article describes how to use Jupyter Notebooks for Python, which popular Python packages are available for doing geospatial analysis and how to get started with Jupyter Notebooks inside a desktop GIS application.
Challenges of GIS software for doing geospatial data analysis
Today, geospatial data analysis is embedded within the larger discipline of data science that surpasses the traditional way that spatial analysis would be done. With more data, tools and approaches to interact with data, today’s GIS analyst requires a larger toolset to perform location intelligence and use data to solve a (spatial) problem.
While GIS technology offers many great tools for doing geospatial data analysis, it also has various shortcomings in today’s context of data science workflows. GIS software is still very much centered around making maps from data, which makes it easy to forget how that data is managed and analyzed before it is being presented through a map. And while GIS software is improved continuously over time with new tools to manage and analyze and present data, new tools, methods and practices are being developed everyday, which is much more than can be included in a single GIS application.
Related to this is that GIS software (or any data analysis software for that matter) comes with its own tools, practices, interfaces and related learning curve. One application may be good for one thing, but as a user you’d want to be flexible and combine best practices of multiple applications into one. Such an application faces the user with a closed environment where you are at the mercy of the application developers and their decisions on what to include in the software or not.
Application developers decide which data you can and cannot import. The data you need for your GIS analysis has to be brought into the software from the outside into the application. This has always been a challenge for GIS applications, where data format support was an issue until industry standards were developed.
However, a different problem occurred when big datasets became the norm, meaning that GIS analysts needed to process large spatial datasets locally, a task that desktop GIS was never designed for. While customized solutions were introduced over time, these did not prove to be the best approach as database technology transitioned from server-based to cloud-native solutions and market demands changed continuously.
Today, datasets are continuously being updated and are being stored, pre-processed and accessed remotely in a cloud or server environment as it is simply impossible to manage these locally. Over time, GIS application developers understood it was better to change their approach and instead of trying to integrate every possible new third-party solution into their own application, choose for an existing and proven approach that had already proved its worth in the 1980s: scripting languages, in combination with a new interactive web tool.
Geospatial Analysis + Python + Jupyter Notebooks
In the 1980s, scripting languages proved to be a handy way to automate GIS workflows. Instead of doing manual “button-pushing” operations in a GIS environment, a GIS analyst would write a script that would perform these actions when running the script in a code editor. This would not only save a lot of time and effort, it also reduced the possibility of human errors. Analytical GIS workflows lend themselve very well for this type of workflow automation, as they apply spatial algorithms to a geographical dataset on disk. For cartographic map production workflows, such an approach would not work as human interaction with a mapping interface is required.
After the introduction of the Python language for automating GIS workflows around 2004, it quickly became popular among GIS users. The Python language itself became hugely popular among the data science community, resulting in a very large ecosystem of freely available libraries for everyone to use. The GIS community was quick to realize its potential and adopted Python as the language to tap into this ecosystem and extend existing desktop GIS tools. However, what was lacking at the time was an easy-to-use tool that could replace a code editor as a working environment, or the primitive Python single-line editor inside a GIS application.
Today, the Jupyter Python Notebook is the to-go tool for GIS analysts who want to use Python to do anything from data management, to prototyping, spatial analysis, big data analytics, data visualization, among other things. Jupyter Notebooks are browser-based documents that combine code, annotations, explanations as well as links to online media. You can write, run and code there as with a code editor, but divided into individual cells instead of entire scripts. Jupyter Notebooks were an extension of scientific computational notebooks and became the de facto standard quickly after their introduction, for a large part because they provide remote access to data that might otherwise be impractical to download.
What makes Jupyter Notebooks more versatile than local programming scripts is that they can use both a local and remote backend to run code from the browser-based Notebooks. This means that you can create a Notebook from your local computer, that is run somewhere else, be it a supercomputer with a huge capacity or in the cloud, so you’re no longer dependent on your local computer’s resources for big data processing. This also takes away the necessity to download data locally and process it there.
Another benefit from Jupyter Notebooks is that you easily integrate existing Notebooks into your own workflows or collaborate with others: sharing a Notebook with others also includes the tool results (such as the output of a function), which is different from using single scripts that are run in an IDE where results are printed or returned in a console.
Python packages for geospatial analysis
Currently, there are many geospatial Python packages available that offer everything from geospatial data management to mapping capabilities inside a Jupyter Notebook. A LinkedIn post from Matt Forest from CARTO from August last year mentioned the most popular Python geospatial libraries, based on total PyPI downloads:
- Shapely (89M): for manipulation and analysis of planar features;
- geopy (83M): a Python client for several popular geocoding web services;
- pyproj (48M): performs cartographic transformations and geodetic computations;
- Fiona (25M): for reading and writing vector data;
- GeoPandas (18M): spatial data processing, based on pandas data objects;
- Descartes (10.8M): enables the use of geometric objects as matplotlib paths and patches;
- Folium (9.8M): visualizes data on an interactive Leaflet map;
- Rasterio (9.3M): GDAL and NumPy-based library for raster data;
- GDAL (2.8M): supports reading and writing capabilities for both vectors and rasters;
- pysal (1.3M): for open source, cross-platform geospatial data science;
- OSMnx (932K): download geospatial data from OpenStreetMap and model, project, visualize, and analyze real-world street networks and any other geospatial geometries;
- ipyleaflet (922K): for creating interactive maps in the Jupyter notebook;
- CartoPy (834K): designed for geospatial data processing in order to produce maps and other geospatial data analyses;
- CARTOframes (583K): enables integration of CARTO maps, analysis and data services;
- keplergl (410K): a web-based application for visual exploration of large-scale geolocation data sets;
- GeoPlot (132K): a high-level Python geospatial plotting library.
If you’re not already using these tools as a GIS analyst or geospatial developer, it’s good to know that your skills determine your value to a company/client, and these tools will help give you an edge over the competition and keep you delivering innovative results into the future. To start using such packages, you can download the Anaconda distribution of Python, which includes the Jupyter Notebooks application as well. Anaconda includes conda, which lets you create a virtual environment for each Notebook, so you don’t have to worry about possible dependency conflicts between projects. Also, this separates the Notebooks (coding environments) from the package management environment (conda). Packages can be downloaded directly from conda, which comes with a default pip package installer for every new environment.
Jupyter Notebook functionality inside desktop GIS
To start using the Jupyter Notebook application inside a desktop GIS, note that the application is renamed ArcGIS Notebook inside ArcGIS Pro and comes with a default installation. QGIS users need to install the IPython QGIS Console plugin, which gives you access to the IPython Console inside of QGIS. The IPython Console allows you to execute commands and interact with data inside IPython interpreters.
This plugin requires you have the qtconsole Python package and the Jupyter Notebook installation installed. After installing the IPython QGIS Console plugin from the QGIS Plugin dropdown menu, you need to run two separate commands from the OSGeo4W shell as an administrator, which is explained in detail here. After this, you’ll find the IPython QGIS Console plugin listed under the installed plugin. Selecting it will open the QGIS IPython console, which gives you access to canvas, iface, app (QGIS application) objects and all qgis and PyQt core and gui modules directly from the shell.
Right after its introduction, Jupyter Notebooks have become an indispensable tool for GIS analysts to extend their desktop GIS environment thanks to the Python programming and scripting language. We explained how to use both Jupyter Notebooks with the Python language, and covered the most popular geospatial Python packages, and how to start using the IPython console inside QGIS and the ArcGIS Notebook application inside ArcGIS Pro.
Python & Geospatial Resources
- Python: https://www.python.org/
- Shapely Python package: https://pypi.org/project/Shapely/
- GeoPy Python package: https://pypi.org/project/geopy/
- Pyproj Python package: https://pypi.org/project/pyproj/
- Fiona Python package: https://pypi.org/project/Fiona/
- GeoPandas Python package: https://pypi.org/project/geopandas/
- Descartes Python package: https://pypi.org/project/descartes/
- Folium: https://pypi.org/project/folium/
- Rasterio Python package: https://pypi.org/project/rasterio/
- GDAL Python package: https://pypi.org/project/GDAL/
- Pysal Python package: https://pypi.org/project/pysal/
- OSMnx Python package: https://pypi.org/project/osmnx/
- Ipyleaflet Python package: https://pypi.org/project/ipyleaflet/
- Cartopy Python package: https://pypi.org/project/Cartopy/
- Cartoframes Python package: https://pypi.org/project/cartoframes/
- Keplergl Python package: https://pypi.org/project/keplergl/
- GeoPlot Python Package: https://pypi.org/project/geoplot/
- Qtconsole Python Package: https://pypi.org/project/qtconsole/
- QGIS IPython Console: https://pypi.org/project/qtconsole/
- Jupyter Notebooks: https://jupyter.org/
- Anaconda Distribution: https://www.anaconda.com/products/distribution