Analysis at Scale Tutorial Notebooks

In this Notebook, we walk through a simple methodology for extracting coastlines from 8-band multispectral imagery. This tutorial demonstrates how to link together several concepts from remote sensing, image science, and GIS to produce a complete geospatial analysis. The steps in the workflow include: (1) calculating a Normalized Difference Water Index; (2) thresholding the water index into a binary image; (3) cleaning up small features; and (4) converting the land/water boundaries into vector polylines representing coastlines. This tutorial is the first in a series that will demonstrate how to progress from prototyping an algorithm in GBDX Notebooks to deploying and running that algorithm at production scale over large geographic regions.
In this Notebook, we will extend the simple methodology for extracting coastlines that we built in the previous tutorial. Our goal in this Notebook is to be able to run the same methodology over a much bigger geographic area. Specifically, we are going to show two different approaches to running the algorithm over an entire image rather than just one small part of that image, like we used last time.
In the previous Notebook, we demonstrated two different strategies for scaling our coastline extraction algorithm to run over a full image strip. The result was a Python function that we can point at any (coastal) image and efficiently delineate the coastlines, either in downsampled resolution or full resolution. The execution of that function was conducted within our Notebook, which is convenient for hands-on development and testing, but not necessarily ideal for production-level analysis. For example, what if we wanted to run the analysis over multiple strips at the same time? Or set up a recurring job that runs on every new strip that comes in over a given area? Handling production of coastline features from the Notebook in these cases would be relatively cumbersome. At this point in the development process, it's important to consider moving our algorithm from the Notebook environment to the GBDX Platform by deploying it as a GBDX Task. A GBDX Task is simply a stored version of an algorithm (or tool, or methodology, or really any operation) that we can execute on the GBDX Platform. The code itself isn't run locally; instead, it runs as part of a Workflow in the cloud. This means we can execute the same Task against multiple images all at the same time: each will be kicked off as a separate, parallel workflow, without being constrained to the computational limits of a single machine (or GBDX Notebook Kernel). In this Notebook, we provide a walkthrough of how to deploy our coastline extraction algorithm as a GBDX Task, using some helpful tools built right into the GBDX Notebooks interface. We also do a quick test of our new task and review the results to make sure it works.
In the previous Notebook, we walked through how to deploy our coastline extraction algorithm as a GBDX Task, enabling us to run it on the GBDX platform instead of inside of our Notebook. In this final Notebook, we are going to use the GBDX Task we created to run coastline extraction over multiple images, in parallel, using GBDX Workflows. Using this approach, we'll be able to extract a highly detailed coastline for the entire island of Kauai, in less than 10 minutes.