martes, 3 de abril de 2018

Sentinel-2 imagery with Big Spatial Data software (Hadoop + HBase + GeoWave + GeoServer)


This tutorial shows results of publishing Sentinel-2 imagery with opensource Big Spatial Data software.

It continues this previous article focused on vector data. You can read it to learn about first steps with this BigData technology and to setup your own platform. Of course, you can read the fantastic GeoWave documentation.

Now we are going to use raster data, such as Sentinel-2 imagery (Level-2A) from Sentinel 2A and 2B satellites. The idea is to download and ingest Sentinel-2 data into GeoWave in order to offer new products processed from these images and finally publish them as OGC services.

Landsat, Sentinel-1 and Sentinel-2 data are available for anyone to use via many repositories; Open Access Hub, Sentinel-hub, Amazon S3, and so on. In most of them, only Sentinel-2 Level-1C (Top Of Atmosphere reflectance) imagery is freely accessible, onward processing from Level-1C to Level-2A (Bottom Of Atmosphere reflectance) is performed by the user using the Sentinel-2 Toolbox or similar. As you can know, this process is a CPU-RAM intensive task, and sometimes it takes many time too.

I know an interesting Sentinel-2 repository in France, Theia Land Data Center, that provides, among other things, Sentinel-2 Level-2A products transformed with MAJA processor. My idea is to download and directly consume these data in a GeoWave platform.

For that purpose I have developed an extension for GeoWave to directly ingest Theia data with similar tools that current Landast8 extension provides.


Sentinel-2 imagery


The Sentinel-2 mission (ESA) is a land monitoring constellation of two satellites that provide high resolution optical imagery and provide continuity for the current SPOT and Landsat missions.




The mission provides a global coverage of the Earth's land surface every 10 days with one satellite and 5 days with 2 satellites, making the data of great use in on-going studies. The satellites are equipped with the state-of-the-art MSI (Multispectral Imager) instrument, that offers high-resolution optical imagery.

Sentinel-2 delivers high-resolution optical images for land monitoring, emergency response and security services. The satellites carry a multispectral imager with a swath of 290 km. The imager provides a versatile set of 13 spectral bands spanning from the visible and near infrared to the shortwave infrared, featuring four spectral bands at 10 meter, six bands at 20 meter and three bands at 60 meter spatial resolution.

Learn more about Sentinel-2 data products.

Theia Land Data Center


The Theia Land Data Center is a French national inter-agency organization designed to foster the use of images issued from the space observation of land surfaces. Theia is offering scientific communities and public policy actors a broad range of images at different scales, methods and services.




The Sentinel-2 Level-2A data distributed by Theia (CNES) are based on Copernicus Sentinel-2 Level-1C data, but processed with MAJA processor.

Theia does not offer data of the whole world, but most of western Europe is covered:



It is adequate for my test, I am using it in my BigData platform to avoid the CPU-RAM intensive task of sen2cor tool by ESA, and also I am testing other fun ways.

Theia extension for GeoWave


We can download the Theia imagery with the great theia_download tool by Olivier Hagolle. Later, we would load the images with the available raster commands of GeoWave (Raster demo here).

But I have developed a new module for GeoWave to directly ingest the Theia Sentinel-2 Level-2A data. This extension works in a very similar way as current Landast8 extension does.

The module complements GeoWave commandline tools with direct access to Theia public imagery. To use, ensure the module is on the classpath for your geowave commandline tools and then you should have geowave theia options available to you. "analyze" and "download" are completely separate from storage within GeoWave. The "ingest" routines wrap download with the additional step of ingesting into GeoWave. "ingestraster" and "ingestvector" are fairly self-explanatory and ingest just wraps both in a single command so for all of the scenes and bands you have ingested into your grid coverage (raster) layer, you will have the vector layers of scenes and bands with associated metadata.

The following is the commandline usage help listing the set of available commands and options:

Usage: geowave theia [options] 
Commands: 
 analyze 
  Print out basic aggregate statistics for available Theia imagery. 
 download 
  Download Theia imagery to a local directory. 
 ingest 
  Ingest routine for locally downloading Theia imagery and ingesting
  ... 
 ingestraster 
  Ingest routine for locally downloading Theia imagery and ingesting
  ... 
 ingestvector 
  Ingest routine for searching Theia scenes that match certain
  filter...

Read full description of options here.

The Theia REST service only returns metadata of 500 records by request, if you do not obtain correct results, maybe you have to limit the response defining a more restrictive criteria. Although the CQL expression can contain many attributes, the module accepts a set of main filter fields to pre-filter server-side the data to consider.

The available parameters to filter are:
  • --collection: Product collection to fetch ('SENTINEL2', 'Snow'). 
  • --platform: Satellite ('SENTINEL2A', 'SENTINEL2B'). 
  • --location: Product location, 100 km Grid Square ID of the Military Grid Reference System (EX: 'T30TWM'). 
  • --startDate & --endDate: Date window to consider. 
  • --orbitNumber: Orbit number, 
  • --relativeOrbitNumber: Relative orbit number, 
  • --cql: CQL expression to filter the ingested imagery. The feature type for the expression has the following attributes: shape (Geometry), location (String), productIdentifier (String), productType (String), collection (String), platform (String), processingLevel (String), startDate (Date), quicklook (String), thumbnail (String), bands (String), resolution (int), cloudCover (int), snowCover (int), waterCover (int), orbitNumber (int), relativeOrbitNumber (int). 

Usage, config and ingest Theia data


First of all, remember, I have my GeoWave platform running with HBASE and Hadoop (Read this for first steps), but now also it contains my Theia extension.

I defined my own environment variable with a base command in order to execute GeoWave processes as comfortable as possible:

> set GEOWAVE=
   java -cp "%GEOWAVE_HOME%/geowave-deploy-0.9.7-SNAPSHOT-tools.jar"
   mil.nga.giat.geowave.core.cli.GeoWaveMain

Then, I can easily run commands typing %geowave% [...]. To check the GeoWave version:

> %geowave% --version

Now, let's me practice with a small spatial coverage, Navarra (North of Spain), the most wonderful place in the world :-)




It covers 4 product tiles (T30TWM, T30TWN, T30TXM and T30TXN), full Theia product description here. I am always going to set a basic spatial criteria to only work with my tiles of interest:

--cql "BBOX(shape, -1.8274, 42.3253, -1.6256, 42.4735)"

Typing the "analyze" command:

> %geowave% theia analyze 
  --cql "BBOX(shape,-1.8274,42.3253,-1.6256,42.4735)"


The module prints information of scene/products and their bands: 20 scenes for this request (Several Dates), 10 bands for each one of them:




A more precise criteria using a Date range:

> %geowave% theia analyze
  --startDate "2018-01-28"
  --endDate "2018-01-30"
  --cql "BBOX(shape,-1.8274,42.3253,-1.6256,42.4735)"


Ok, our four tiles for a Date:




Next step, download data. The Sentinel-2 Level-2A data distributed by Theia are based on Copernicus Sentinel-2 Level-1C data, which are subject to this pdf). To download you need an user that you can registrer here.

When you get your credentials (user and password) you will include them in the "download" command:

> %geowave% theia download
  --startDate "2018-01-28" 
  --endDate "2018-01-30" 
  --cql "BBOX(shape,-1.8274,42.3253,-1.6256,42.4735)" 
  --userident ? 
  --password ? 



You could have noticed the size of files to download, if "CloudCover" value is low, zip files are around 2gb. The process automatically unzips the content of each compressed file in order to be ingested in a later step.

Now, we can load data into GeoWave. We need to create two GeoWave stores for this ingest. "theia-raster-store" will store the raster images, and "theia-vector-store" will store the vector metadata associated with those images:

> %geowave% config addstore theia-raster-store 
  --gwNamespace geowave.theia-raster 
  -t hbase 
  --zookeeper localhost:2222 

> %geowave% config cpstore theia-raster-store theia-vector-store 
  --gwNamespace geowave.theia-vector


And some indexes:

> %geowave% config addindex -t spatial theia-spindex 
  --partitionStrategy ROUND_ROBIN 

> %geowave% config addindex -t spatial_temporal theia-dyindex 
  --partitionStrategy ROUND_ROBIN 
  --period DAY

Typing the "ingest" command to only load bands 'B4' and 'B8', each of them will go to an independent coverage layer:

> %geowave% theia ingest
  --startDate "2018-01-28"
  --endDate "2018-01-30"
  --cql
   "BBOX(shape,-1.8274,42.3253,-1.6256,42.4735)
    AND 
   (band='B4' OR band='B8')"
  --userident ?
  --password ?
  --retainimages
  --vectorstore theia-vector-store
  --vectorindex theia-spindex,theia-dyindex
  --pyramid
  --coverage theia_navarra_mosaic-${band}
  theia-raster-store
  theia-spindex 



Or we can ingest a multiband raster layer with R+G+B channels, as GeoWave raster demo documentation shows:

> %geowave% theia ingest 
  --startDate "2018-01-28" 
  --endDate "2018-01-30" 
  --cql 
   "BBOX(shape,-1.8274,42.3253,-1.6256,42.4735) 
    AND 
   (band='B2' OR band='B3' OR band='B4')" 
  --userident ? 
  --password ? 
  --retainimages 
  --vectorstore theia-vector-store 
  --vectorindex theia-spindex,theia-dyindex 
  --pyramid 
  --coverage theia_navarra_mosaic-rgb 
  theia-raster-store 
  theia-spindex


Use the "gs" command of GeoWave to register Data Stores and Layers in a GeoServer instance:

> %geowave% gs addlayer theia-raster-store --add ALL 
> %geowave% gs addlayer theia-vector-store --add ALL

Here our grid of envelopes:




We have got two new vector layers, "theia-scene" layer with all metadata per product or scene, and "theia-band" layer that adds information for each band. Depending on you ingested the imagery, you have got a raster layer per band, or a multiband raster layer (e.g. a mix of visible channels: Red, Green and Blue bands).

A GetMap request for the Band #8 ("geowave:theia_navarra_mosaic-B8" layer):




Great! we are working with these layers in GeoServer as usual. With bands #4 and #8 we can compose the NDVI index ( (B08-B04)/(B04+B08) ), SAVI index ( (1.5*(B08-B04))/(B04+B08+0.5) ), or use them in any WPS raster process, or for whatever you need.


Time series data


A very important attribute of the imagery is the acquisition date. The scenes are packed per day, and it is very useful provide a mechanism to fetch them by a specific Date. It is supposed we are going to load images for several days, namely, we are creating a mosaic of coverages. In GeoServer terminology, we are speaking about time-series data.

We can easily configure the time-dimension settings of ingested vector layers, but there is a problem here, how can we configure it for raster layers? As far I know, the GeoServer ImageMosaic plugin does not allow to directly associate a vector layer as source of attributes (time, elevation, etc) of another raster layer. This association is not possible in Geowave platform either, but the solution would go through extend the GeoWave GridCoverageReader class with the StructuredGridCoverage2DReader interface to link vector and raster layers.


Next?


F̶o̶r̶ ̶t̶h̶e̶ ̶t̶i̶m̶e̶ ̶b̶e̶i̶n̶g̶ ̶I̶ ̶p̶u̶b̶l̶i̶s̶h̶ ̶t̶h̶e̶ ̶f̶u̶l̶l̶ ̶T̶h̶e̶i̶a̶ ̶m̶o̶d̶u̶l̶e̶ ̶i̶n̶ ̶m̶y̶ ̶g̶i̶t̶h̶u̶b̶ ̶f̶o̶r̶k̶,̶ ̶a̶n̶d̶ ̶I̶ ̶o̶f̶f̶e̶r̶ ̶a̶ ̶p̶u̶l̶l̶ ̶r̶e̶q̶u̶e̶s̶t̶ ̶w̶i̶t̶h̶ ̶i̶t̶ ̶t̶o̶ ̶G̶e̶o̶W̶a̶v̶e̶ ̶p̶r̶o̶j̶e̶c̶t̶.̶ ̶I̶f̶ ̶I̶ ̶d̶i̶d̶ ̶n̶o̶t̶ ̶d̶e̶v̶e̶l̶o̶p̶ ̶a̶ ̶m̶e̶s̶s̶ ̶a̶n̶d̶ ̶I̶ ̶a̶m̶ ̶l̶u̶c̶k̶y̶ ̶:̶-̶)̶,̶ ̶t̶h̶e̶ ̶c̶o̶r̶e̶ ̶m̶a̶i̶n̶t̶a̶i̶n̶e̶r̶s̶ ̶c̶o̶u̶l̶d̶ ̶m̶e̶r̶g̶e̶ ̶i̶t̶. UPDATE: The code was merged, thanks GeoWave team!

I think there are very interesting areas to explore, add raster time series support to GeoWave (It is useful for other data sources, like Landsat8 data), develop a new module to ingest Sentinel-2 from Amazon (L2A data is available over EU Central region - Requester Pays buckets)... UPDATE: The code was merged.