martes, 9 de febrero de 2021

How to process the STAC of Sentinel2 in a serverless Lambda on AWS?




This post describes how you can keep up-to-date your own derived data from Sentinel-2 imagery archive using a serverless architecture on AWS.

Let's me keep the guide simple; our derived data will be very simple, a scalar value for each tile processed, the Cloud coverage of Urban areas of a region. In other words, we are going to awnser, using Sentinel2 data and a serverless platform on the cloud, this question: how sunny are some cities over the time?

:-( Yes, I know, it is not an innovative data, also, we can get it from a lot of other sources on internet, but I think the interesting thing here is the way that I have to walk to get this value. I am sure you will have in mind your data whose digesting could be reached in a similar way.

Interesting?, please, click the link for reading full article in linkedin:

martes, 3 de abril de 2018

Sentinel-2 imagery with Big Spatial Data software (Hadoop + HBase + GeoWave + GeoServer)


This tutorial shows results of publishing Sentinel-2 imagery with opensource Big Spatial Data software.

It continues this previous article focused on vector data. You can read it to learn about first steps with this BigData technology and to setup your own platform. Of course, you can read the fantastic GeoWave documentation.

Now we are going to use raster data, such as Sentinel-2 imagery (Level-2A) from Sentinel 2A and 2B satellites. The idea is to download and ingest Sentinel-2 data into GeoWave in order to offer new products processed from these images and finally publish them as OGC services.

Landsat, Sentinel-1 and Sentinel-2 data are available for anyone to use via many repositories; Open Access Hub, Sentinel-hub, Amazon S3, and so on. In most of them, only Sentinel-2 Level-1C (Top Of Atmosphere reflectance) imagery is freely accessible, onward processing from Level-1C to Level-2A (Bottom Of Atmosphere reflectance) is performed by the user using the Sentinel-2 Toolbox or similar. As you can know, this process is a CPU-RAM intensive task, and sometimes it takes many time too.

I know an interesting Sentinel-2 repository in France, Theia Land Data Center, that provides, among other things, Sentinel-2 Level-2A products transformed with MAJA processor. My idea is to download and directly consume these data in a GeoWave platform.

For that purpose I have developed an extension for GeoWave to directly ingest Theia data with similar tools that current Landast8 extension provides.


Sentinel-2 imagery


The Sentinel-2 mission (ESA) is a land monitoring constellation of two satellites that provide high resolution optical imagery and provide continuity for the current SPOT and Landsat missions.




The mission provides a global coverage of the Earth's land surface every 10 days with one satellite and 5 days with 2 satellites, making the data of great use in on-going studies. The satellites are equipped with the state-of-the-art MSI (Multispectral Imager) instrument, that offers high-resolution optical imagery.

Sentinel-2 delivers high-resolution optical images for land monitoring, emergency response and security services. The satellites carry a multispectral imager with a swath of 290 km. The imager provides a versatile set of 13 spectral bands spanning from the visible and near infrared to the shortwave infrared, featuring four spectral bands at 10 meter, six bands at 20 meter and three bands at 60 meter spatial resolution.

Learn more about Sentinel-2 data products.

Theia Land Data Center


The Theia Land Data Center is a French national inter-agency organization designed to foster the use of images issued from the space observation of land surfaces. Theia is offering scientific communities and public policy actors a broad range of images at different scales, methods and services.




The Sentinel-2 Level-2A data distributed by Theia (CNES) are based on Copernicus Sentinel-2 Level-1C data, but processed with MAJA processor.

Theia does not offer data of the whole world, but most of western Europe is covered:



It is adequate for my test, I am using it in my BigData platform to avoid the CPU-RAM intensive task of sen2cor tool by ESA, and also I am testing other fun ways.

Theia extension for GeoWave


We can download the Theia imagery with the great theia_download tool by Olivier Hagolle. Later, we would load the images with the available raster commands of GeoWave (Raster demo here).

But I have developed a new module for GeoWave to directly ingest the Theia Sentinel-2 Level-2A data. This extension works in a very similar way as current Landast8 extension does.

The module complements GeoWave commandline tools with direct access to Theia public imagery. To use, ensure the module is on the classpath for your geowave commandline tools and then you should have geowave theia options available to you. "analyze" and "download" are completely separate from storage within GeoWave. The "ingest" routines wrap download with the additional step of ingesting into GeoWave. "ingestraster" and "ingestvector" are fairly self-explanatory and ingest just wraps both in a single command so for all of the scenes and bands you have ingested into your grid coverage (raster) layer, you will have the vector layers of scenes and bands with associated metadata.

The following is the commandline usage help listing the set of available commands and options:

Usage: geowave theia [options] 
Commands: 
 analyze 
  Print out basic aggregate statistics for available Theia imagery. 
 download 
  Download Theia imagery to a local directory. 
 ingest 
  Ingest routine for locally downloading Theia imagery and ingesting
  ... 
 ingestraster 
  Ingest routine for locally downloading Theia imagery and ingesting
  ... 
 ingestvector 
  Ingest routine for searching Theia scenes that match certain
  filter...

Read full description of options here.

The Theia REST service only returns metadata of 500 records by request, if you do not obtain correct results, maybe you have to limit the response defining a more restrictive criteria. Although the CQL expression can contain many attributes, the module accepts a set of main filter fields to pre-filter server-side the data to consider.

The available parameters to filter are:
  • --collection: Product collection to fetch ('SENTINEL2', 'Snow'). 
  • --platform: Satellite ('SENTINEL2A', 'SENTINEL2B'). 
  • --location: Product location, 100 km Grid Square ID of the Military Grid Reference System (EX: 'T30TWM'). 
  • --startDate & --endDate: Date window to consider. 
  • --orbitNumber: Orbit number, 
  • --relativeOrbitNumber: Relative orbit number, 
  • --cql: CQL expression to filter the ingested imagery. The feature type for the expression has the following attributes: shape (Geometry), location (String), productIdentifier (String), productType (String), collection (String), platform (String), processingLevel (String), startDate (Date), quicklook (String), thumbnail (String), bands (String), resolution (int), cloudCover (int), snowCover (int), waterCover (int), orbitNumber (int), relativeOrbitNumber (int). 

Usage, config and ingest Theia data


First of all, remember, I have my GeoWave platform running with HBASE and Hadoop (Read this for first steps), but now also it contains my Theia extension.

I defined my own environment variable with a base command in order to execute GeoWave processes as comfortable as possible:

> set GEOWAVE=
   java -cp "%GEOWAVE_HOME%/geowave-deploy-0.9.7-SNAPSHOT-tools.jar"
   mil.nga.giat.geowave.core.cli.GeoWaveMain

Then, I can easily run commands typing %geowave% [...]. To check the GeoWave version:

> %geowave% --version

Now, let's me practice with a small spatial coverage, Navarra (North of Spain), the most wonderful place in the world :-)




It covers 4 product tiles (T30TWM, T30TWN, T30TXM and T30TXN), full Theia product description here. I am always going to set a basic spatial criteria to only work with my tiles of interest:

--cql "BBOX(shape, -1.8274, 42.3253, -1.6256, 42.4735)"

Typing the "analyze" command:

> %geowave% theia analyze 
  --cql "BBOX(shape,-1.8274,42.3253,-1.6256,42.4735)"


The module prints information of scene/products and their bands: 20 scenes for this request (Several Dates), 10 bands for each one of them:




A more precise criteria using a Date range:

> %geowave% theia analyze
  --startDate "2018-01-28"
  --endDate "2018-01-30"
  --cql "BBOX(shape,-1.8274,42.3253,-1.6256,42.4735)"


Ok, our four tiles for a Date:




Next step, download data. The Sentinel-2 Level-2A data distributed by Theia are based on Copernicus Sentinel-2 Level-1C data, which are subject to this pdf). To download you need an user that you can registrer here.

When you get your credentials (user and password) you will include them in the "download" command:

> %geowave% theia download
  --startDate "2018-01-28" 
  --endDate "2018-01-30" 
  --cql "BBOX(shape,-1.8274,42.3253,-1.6256,42.4735)" 
  --userident ? 
  --password ? 



You could have noticed the size of files to download, if "CloudCover" value is low, zip files are around 2gb. The process automatically unzips the content of each compressed file in order to be ingested in a later step.

Now, we can load data into GeoWave. We need to create two GeoWave stores for this ingest. "theia-raster-store" will store the raster images, and "theia-vector-store" will store the vector metadata associated with those images:

> %geowave% config addstore theia-raster-store 
  --gwNamespace geowave.theia-raster 
  -t hbase 
  --zookeeper localhost:2222 

> %geowave% config cpstore theia-raster-store theia-vector-store 
  --gwNamespace geowave.theia-vector


And some indexes:

> %geowave% config addindex -t spatial theia-spindex 
  --partitionStrategy ROUND_ROBIN 

> %geowave% config addindex -t spatial_temporal theia-dyindex 
  --partitionStrategy ROUND_ROBIN 
  --period DAY

Typing the "ingest" command to only load bands 'B4' and 'B8', each of them will go to an independent coverage layer:

> %geowave% theia ingest
  --startDate "2018-01-28"
  --endDate "2018-01-30"
  --cql
   "BBOX(shape,-1.8274,42.3253,-1.6256,42.4735)
    AND 
   (band='B4' OR band='B8')"
  --userident ?
  --password ?
  --retainimages
  --vectorstore theia-vector-store
  --vectorindex theia-spindex,theia-dyindex
  --pyramid
  --coverage theia_navarra_mosaic-${band}
  theia-raster-store
  theia-spindex 



Or we can ingest a multiband raster layer with R+G+B channels, as GeoWave raster demo documentation shows:

> %geowave% theia ingest 
  --startDate "2018-01-28" 
  --endDate "2018-01-30" 
  --cql 
   "BBOX(shape,-1.8274,42.3253,-1.6256,42.4735) 
    AND 
   (band='B2' OR band='B3' OR band='B4')" 
  --userident ? 
  --password ? 
  --retainimages 
  --vectorstore theia-vector-store 
  --vectorindex theia-spindex,theia-dyindex 
  --pyramid 
  --coverage theia_navarra_mosaic-rgb 
  theia-raster-store 
  theia-spindex


Use the "gs" command of GeoWave to register Data Stores and Layers in a GeoServer instance:

> %geowave% gs addlayer theia-raster-store --add ALL 
> %geowave% gs addlayer theia-vector-store --add ALL

Here our grid of envelopes:




We have got two new vector layers, "theia-scene" layer with all metadata per product or scene, and "theia-band" layer that adds information for each band. Depending on you ingested the imagery, you have got a raster layer per band, or a multiband raster layer (e.g. a mix of visible channels: Red, Green and Blue bands).

A GetMap request for the Band #8 ("geowave:theia_navarra_mosaic-B8" layer):




Great! we are working with these layers in GeoServer as usual. With bands #4 and #8 we can compose the NDVI index ( (B08-B04)/(B04+B08) ), SAVI index ( (1.5*(B08-B04))/(B04+B08+0.5) ), or use them in any WPS raster process, or for whatever you need.


Time series data


A very important attribute of the imagery is the acquisition date. The scenes are packed per day, and it is very useful provide a mechanism to fetch them by a specific Date. It is supposed we are going to load images for several days, namely, we are creating a mosaic of coverages. In GeoServer terminology, we are speaking about time-series data.

We can easily configure the time-dimension settings of ingested vector layers, but there is a problem here, how can we configure it for raster layers? As far I know, the GeoServer ImageMosaic plugin does not allow to directly associate a vector layer as source of attributes (time, elevation, etc) of another raster layer. This association is not possible in Geowave platform either, but the solution would go through extend the GeoWave GridCoverageReader class with the StructuredGridCoverage2DReader interface to link vector and raster layers.


Next?


F̶o̶r̶ ̶t̶h̶e̶ ̶t̶i̶m̶e̶ ̶b̶e̶i̶n̶g̶ ̶I̶ ̶p̶u̶b̶l̶i̶s̶h̶ ̶t̶h̶e̶ ̶f̶u̶l̶l̶ ̶T̶h̶e̶i̶a̶ ̶m̶o̶d̶u̶l̶e̶ ̶i̶n̶ ̶m̶y̶ ̶g̶i̶t̶h̶u̶b̶ ̶f̶o̶r̶k̶,̶ ̶a̶n̶d̶ ̶I̶ ̶o̶f̶f̶e̶r̶ ̶a̶ ̶p̶u̶l̶l̶ ̶r̶e̶q̶u̶e̶s̶t̶ ̶w̶i̶t̶h̶ ̶i̶t̶ ̶t̶o̶ ̶G̶e̶o̶W̶a̶v̶e̶ ̶p̶r̶o̶j̶e̶c̶t̶.̶ ̶I̶f̶ ̶I̶ ̶d̶i̶d̶ ̶n̶o̶t̶ ̶d̶e̶v̶e̶l̶o̶p̶ ̶a̶ ̶m̶e̶s̶s̶ ̶a̶n̶d̶ ̶I̶ ̶a̶m̶ ̶l̶u̶c̶k̶y̶ ̶:̶-̶)̶,̶ ̶t̶h̶e̶ ̶c̶o̶r̶e̶ ̶m̶a̶i̶n̶t̶a̶i̶n̶e̶r̶s̶ ̶c̶o̶u̶l̶d̶ ̶m̶e̶r̶g̶e̶ ̶i̶t̶. UPDATE: The code was merged, thanks GeoWave team!

I think there are very interesting areas to explore, add raster time series support to GeoWave (It is useful for other data sources, like Landsat8 data), develop a new module to ingest Sentinel-2 from Amazon (L2A data is available over EU Central region - Requester Pays buckets)... UPDATE: The code was merged.

lunes, 8 de enero de 2018

Fast Cesium terrain rendering with the new quantized-mesh output of the opensource CesiumTerrainBuilder tool

Cesium is an open-source JavaScript library for creating 3D globes and 2D maps in a web browser without a plugin. It uses WebGL for hardware-accelerated graphics, and is cross-platform, cross-browser, and tuned for dynamic-data visualization.
Cesium is developed by an open-source community, with support from the Cesium Consortium, founded by Analytical Graphics, Inc (AGI). and Bentley Systems.
Terrain support
Cesium supports streaming and visualizing global high-resolution terrain and water effects for oceans, lakes, and rivers. Mountain peaks, valleys, and other terrain features really show the benefit of a 3D globe compared to a 2D map.
Cesium supports several methods for requesting terrain using terrain providers. Most terrain providers use a REST interface over HTTP to request terrain tiles as height maps. Terrain providers differ based on how requests are formatted and how terrain data is organized. You can read details in this tutorial.
How can you add your own terrain data to Cesium? 
First, you need to check if you really need it. You have the option to stream high-resolution terrain data directly from the servers at AGI. It's free to use on public sites under the terms of use. If you want to host the terrain data on your own servers, AGI provides a commercial product - the STK Terrain Server.
But if you want to do it yourself, you have to follow reading next :-).
Cesium supports two terrain formats:
  1. heightmap
  2. quantized-mesh
Read this great article here for more details. About formats, it says:
  • The quantized-mesh format follows the same tile structure as heightmap tiles, but each tile is better optimised for large-scale terrain rendering. Instead of creating a dense, uniform triangle mesh in the browser, an irregular triangle mesh is pre-rendered for each tile. It's a better representation of the landscape, having less detail in flat areas while increasing the density in steep terrain. The mesh terrain is also more memory efficient and renders faster.
You can generate heightmap tiles with Cesium Terrain Builder, a great opensource command-line utility by Homme Zwaagstra at the GeoData Institute, University of Southampton.
Here you can see how much dense are the tiles using the heightmap format:
At the present time, this tool can't create tiles in quantized-mesh format. So I have added the quantized-mesh support to "ctb-tile" tool to create tiles in this lightweight format :-)
Briefly, ctb-tile tool provides two new features:
  • New quantized-mesh format ouput.
  • New metadata file output.
The format parameter ("-f" or "--output-format") provides the new "Mesh" option in order to output quantized-mesh tiles. It reads from a Digital Elevation Model (DEM) as usual, and applies the Chuncked LOD strategy by Thatcher Ulrich to convert the regular grid of each tile in an equivalent and simplified irregular mesh.
The process takes care of an input geometric error to safely merge each pair of triangles with its parent one if there is not a visible loss of quality. Also, you can define a factor of this geometric error using the new "--mesh-qfactor" parameter (1.0 by default) to apply your own error value. Larger values should mean minor quality.
How does ctb-tile calculate the geometric error of each tile? It uses a similar algorithm as Cesium chooses what level of tiles has to render. The error for level zero of a geodetic quadtree is around 77,067.34 meters:
This errors table is applied to decide if a triangle has to be written to output mesh or it has to be splitted to its next child ones. To do not collapse all tiles of lower levels, the process adds more triangles than geometric error indicates, otherwise we would get a squared globe.
And to finish, the new "-l" (or "--layer") option will allow you to create the "layer.json" file with all metadata information that describes the tileset. It fills the "available" json-attribute too, it seems mandatory to successfully load the terrain in Cesium.
Do you like this? You can freely test my github fork!. I proposed a pull request with this to CTB repository too, If owners agree, I would happy they merge it.

miércoles, 18 de octubre de 2017

Testing Big Spatial Data software (Hadoop + HBase + GeoWave + GeoServer) without dying in the attempt :-)


The purpose of this article is to show you the results testing the integration of a Big Data platform with other Geospatial tools. It is necessary to stand out that the integration of used components, all of them are open source, allow us to publish WEB services compliant with OGC standards (WMS, WFS, WPS).



This article describes installation steps, settings and development done to get a web mapping client application showing NO2 measures of around 4k of European stations during four months (Observations were registered per hour), around of 5 million of records. Yes, I know, this data doesn't seem like a "Big Data" storage, but it seems big enough to check performance when applications read it using spatial and/or temporal filters.

The article doesn't focus on teach deeper knowledge of used software, all of them already publish good documentation from user or developer point of view, it simply wants to offer experiencies and a simple guide to collect resources of used software components. By example, comments about GeoWave, and its integration with GeoServer, are a copy of content of product guide in its website.


Data scheme

Test data was downloaded from European Environment Agency (EEA). You can search here information or map viewers of this or another thematics, or better, you could use your own data. GDELT is other interesting project that offers massive data.

Scheme of test data is simple, the input is a group of CSV files (Text files with their attributes separated with commas) with points in geographical coordinates (Latitude/Longitude) that georeference the sensor, the measure date, and the NO2 concentration in air. There are other secondary attributes but they aren't important for our test.



Software architecture

Test consists of chain a set of tools, all of them offer data and funcionality to next software component in the application architecture. The application workflow starts with Hadoop and its HDFS, HBase to map it like a database, the great GeoWave working as a connector between it and the popular GeoServer that implements several OGC standards, and finally, a web client application fetching data to show maps as usual (By example, using Leaflet and Heatmap.js library).



Apache Hadoop

Apache Hadoop is, when we search a bit on Google, ... a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

HDFS is a distributed file system that provides high-performance access to data across Hadoop clusters. Because HDFS typically is deployed on low-cost commodity hardware, server failures are common. The file system is designed to be highly fault-tolerant, however, by facilitating the rapid transfer of data between compute nodes and enabling Hadoop systems to continue running if a node fails. That decreases the risk of catastrophic failure, even in the event that numerous nodes fail.

Our test will use Hadoop and its HDFS as repository of data where we are going to save and finally publish to end user application. You can read project resources here, or dive on Internet to learn about it deeply.

I have used Windows for my tests. The official Apache Hadoop releases do not include Windows binaries, but you can easily build them with this great guide (It uses Maven) and to configure the necessary files at least to run a single node cluster. Of course, a production environment will require us to configure a distributed multi-node cluster, or use a "just-to-use" distribution (Hortonworks...) or jump to the Cloud (Amazon S3, Azure...).

We go ahead with this guide; after Hadoop was built with Maven, configuration files were edited and environment variables were defined, we can test if all is ok executing in console...

> hadoop version 


Then, we start the "daemons" of namenode and datanode objects, and the "yarn" resource manager.

> call ".\hadoop-2.8.1\etc\hadoop\hadoop-env.cmd" 
> call ".\hadoop-2.8.1\sbin\start-dfs.cmd" 
> call ".\hadoop-2.8.1\sbin\start-yarn.cmd"

We can see the Hadoop admin application on the configured HTTP port number, 50,070 in my case:



Apache HBase

Apache HBase is, searching on Google again, ... a NoSQL database that runs on top of Hadoop as a distributed and scalable big data store. This means that HBase can leverage the distributed processing paradigm of the Hadoop Distributed File System (HDFS) and benefit from Hadoop’s MapReduce programming model. It is meant to host large tables with billions of rows with potentially millions of columns and run across a cluster of commodity hardware.

You can read here to start and to install HBase. Again, we check product version executing...

> hbase version


Start HBase:

> call ".\hbase-1.3.1\conf\hbase-env.cmd" 
> call ".\hbase-1.3.1\bin\start-hbase.cmd"

See the HBase admin application, on 16,010 port number in my case:


Ok, at this point, we have the big data environment working, it is time to prepare some tools which append Geospatial capabilities; GeoWave and GeoServer, we go ahead...


LocationTech GeoWave

GeoWave is a software library that connects the scalability of distributed computing frameworks and key-value stores (Hadoop + HBase in my case) with modern geospatial software to store, retrieve and analyze massive geospatial datasets. Wow! this is a great tool :-)



Speaking from a developer point of view, this library implements a vector data provider of GeoTools toolkit in order to read features (geometry and attributes) from a distributed environment. When we add the corresponding plugin to GeoServer, user will see new data stores to configure new supported distributed dataset types.



Nowadays, GeoWave supports three distributed data store types; Apache Accumulo, Google BigTable and HBase, we will use last of them.

Let's leave GeoServer for later. According to GeoWave user and developer guides, we have to define primary and secondary indexes that layers want to use, then we can load information to our big data storage.

Reading in developer guide, we will build with Maven the GeoWave toolkit in order to save geographical data on HBase:

> mvn package -P geowave-tools-singlejar

and the plugin to include in GeoServer:

> mvn package -P geotools-container-singlejar

I have defined my own environment variable with a base command in order to execute GeoWave processes as comfortable as possible:

> set GEOWAVE= java -cp "%GEOWAVE_HOME%/geowave-deploy-0.9.6-SNAPSHOT-tools.jar" 
  mil.nga.giat.geowave.core.cli.GeoWaveMain

Now, we can easily run commands typing %geowave% [...]. We check the GeoWave version:

> %geowave% --version


Ok, we are going to register necessary spatial and temporal indexes of our layer. Client application will filter data using a spatial filter (BBOX-crossing filter) and a temporal filter to fecth only NO2 measures of a specific date.

We go ahead, register both indexes:

> %geowave% config addindex -t spatial eea-spindex
  --partitionStrategy ROUND_ROBIN

> %geowave% config addindex -t spatial_temporal eea-hrindex 
  --partitionStrategy ROUND_ROBIN --period HOUR

And add a "store", in GeoWave terminology, for our new layer:

> %geowave% config addstore eea-store 
  --gwNamespace geowave.eea -t hbase --zookeeper localhost:2222

Warning, in last command, 2,222 is the port number where I published my Zookeeper.

Now, we can load data. Our input are CSV files, so I will use "-f geotools-vector" option to indicate that GeoTools inspects which vector provider have to use to read data. There are other supported formats, and of course, we can develop a new provider to read our own specific data types.

To load a CSV file:

> %geowave% ingest localtogw -f geotools-vector
  ./mydatapath/eea/NO2-measures.csv eea-store eea-spindex,eea-hrindex

Ok, data loaded, so far no problems, right? but GeoTools CSVDataStore has some limitations when reading files. Current code doesn't support date/time attributes (nor boolean attributes). code manages all of them as strings. This is unacceptable for our own requirements, the measure date has to be a precise attribute for index, so I fixed it in original java code. Also, in order to calculate the appropiate value type of each attribute the reader reads all rows in file, it is the safest manner, but it can be very slow when reading big files with thousands ans thousands of rows. If the file has a congruent scheme, we can read a little set of rows to calculate types. So I changed it too. We have to rebuild GeoTools and GeoWave. You can download changes from my own GeoTools fork.

After this break, let me now return to main path of the guide, we have loaded features in our layer with "ingest" command. We have included the plugin in a deployed GeoGerver instance too (See developer guide, it is easy, just copy the "geowave-deploy-xxx-geoserver.jar" component to "..\WEB-INF\lib" folder and restart).


GeoServer

GeoServer is an open source server for sharing geospatial data. Designed for interoperability, it publishes data from any major spatial data source using open standards. GeoServer is an Open Geospatial Consortium (OGC) compliant implementation of a number of open standards such as Web Feature Service (WFS), Web Map Service (WMS), and Web Coverage Service (WCS).

Additional formats and publication options are available including Web Map Tile Service (WMTS) and extensions for Catalogue Service (CSW) and Web Processing Service (WPS).



We use GeoServer to read Layers loaded with GeoWave, the plugin just added to our GeoServer will allow us to connect to these data. We can use it as any other type of Layer. Wow! :-)

To configure the access to distributed data stores, we can use two options:

  • Using the GeoServer admin panel as usual:

  • Using the "gs" command of GeoWave to register Data Stores and Layers in a started GeoServer instance.

Since we are testing things, we are going to use the second option. First step requires to indicate to GeoWave which GeoServer instance we want to configure.

> %geowave% config geoserver 
  -ws geowave -u admin -p geoserver http://localhost:8080/geoserver

Similar to what we would do with the GeoServer administrator, we execute two commands to add respectively the desired Data Store and Layer.

> %geowave% gs addds -ds geowave_eea -ws geowave eea-store
> %geowave% gs addfl -ds geowave_eea -ws geowave NO2-measures



As you can notice, the spatial reference system of the Layer is EPSG:4326, remain of settings are similar to other layer types. If we preview the map with the OpenLayers client of GeoServer...



The performance is quety decent (You can see the video linked to the image), taking into account that I am running on a "no-very-powerful" PC, with Hadoop working in "single mode", and drawing the whole NO2 measures of all available dates (~5 million of records). Spatial index works right, as lower zoom as faster response. Also, If we execute a WFS filter with a temporal criteria, we will check temporal index runs right, GeoServer doesn't scan all records of the Layer.

GeoWave user guide teaches us about a special style named "subsamplepoints" (It uses a WPS process named "geowave:Subsample" and that GeoWave plugin implements). When drawing a map, this style perform spatial subsampling to speed up the rendering process. I have verified a great performance gain, I recommend it to draw point type layers.

I tested too to load a polygon type layer from a Shapefile, no problems, WMS GetMap and WFS GetFeature requests run fine. Only a note, GeoWave loading tool automatically transforms geometries from the original spatial reference system (EPSG:25830 in my case) to EPSG:4326 in geographical coordinates.



At this point, we have verified that everything fits, we could stop here since the exploitation of this data could already be done with standard Web Mapping libraries (Leaflet, OpenLayers, AGOL, ...) or any GIS desktop application (QGIS, gvSIG, ...).

¿Would you like to continue?


Web Mapping client with Leaflet

I have continued developing a Web mapping client application with Leaflet. This viewer can draw the map in two styles, drawing an Heatmap or drawing a thematic ramp color. It renders all observations or measures of a unique date, and even animate between all available dates.


Also, we can verify performance with this viewer, it mixes spatial and temporal filter in an unique query. We go ahead.

The easiest option, and perhaps optimal, would have been client application performing WMS GetMap requests, but I am going to execute requests to GeoServer to fetch the geometries to draw them in the client as we desire. We could use WFS GetFeature requests with the current map bounds (It generates a spatial BBOX filter) and a propertyIsEqual filter of a specific date. But we shouldn't forget that we are managing Big Data Stores that can create GML or JSON responses with huge sizes and thousands and thousands of records.

In order to avoid this problem I developed a pair of WPS processes, called "geowave:PackageFeatureLayer" and "geowave:PackageFeatureCollection", that return the response in a compressed binary stream. You could use another packaing logic, by example, returning a special image where pixels encode geometries and feature attributes. Everything is to minimize the size of the information and accelerate the digestion of it in the client application.



WPS parameters are: first, layer name in current GeoServer Catalog (A "SimpleFeatureCollection" for the "geowave:PackageFeatureCollection" process), BBOX, and an optional CQL filter (In my case I am sending something similar to "datetime_begin = 2017-06-01 12:00:00").

I am not going to explain the code in detail, it leaves the scope of this guide. If you like, you could study it in the github link at the end of the article.

Client application runs a WebWorker executing a WPS request to our GeoServer instance. The request executes the "geowave:PackageFeatureLayer" process to minimize the response size. Then the WebWorker decompresses the binary stream, parses it to create javascript objects with points and attributes, and finally return them to main browser thread to draw. Client application renders these objects using the Heatmap.js library or drawing on a Canvas HTML5 to create a thematic ramp color. For this second style, application creates some on-the-fly textures of the colored icons to use when drawing the points. This trick allows map renders thousands and thousands of points pretty fast.

If our client application requires to draw millions of points, we can dive into WebGL and the great WebGL Heatmap library or fantastic demos as How I built a wind map with webgl.

Source code of WPS module and Web mapping client application is here.