Introduction

Recently, the concept of data cubes has emerged as a relevant paradigm for organising Earth observation data for performing analysis. In an intuitive sense, data cubes are temporal sequences of images from the same geographical area. This sequence is organised consistently to allow algorithms to explore the data in both the spatial and temporal directions[Appel2019]. Data cubes are thus an efficient way to explore satellite image archives spanning years or even decades.

Data cubes rely on the fact that Earth observation satellites revisit the same place at regular intervals. Thus measures can be calibrated so that observations of the same place in different times are comparable. These calibrated observations can be organised in regular intervals, so that each measure from sensor is mapped into a three dimensional multivariate array in space-time. Let \(S = \{s_{1}, s_{2},\dotsc, s_{n}\}\) be a set of remote sensing images which shows the same region at consecutive times \(T = \{t_{1}, t_{2}, \dotsc, t_{n}\}\). Each location of a pixel in an image (latitude, longitude, time) maps to a position in a 3D array. Each array position will be associated to a set of attributes values \(A = \{a_{1}, a_{2}, \dotsc, a_{m}\}\) which are the sensor measurements at each location in space-time (see figure below}). For optical sensors, these observations are proportional to Earth’s reflexion of the incoming solar radiation at different wavelengths of the electromagnetic spectrum.

A Normalized Difference Vegetation Index (NDVI) time series

A Normalized Difference Vegetation Index (NDVI) time series

In what follows, we use the term “coverage” in the same sense as the definition of a “data cube,” proposed from Appel and Pebesma (Appel and Pebesma 2019): A regular, dense Earth observation (EO) data cube is a four-dimensional array with dimensions x (longitude or easting), y (latitude or northing), and time, with the following properties:

  1. Spatial dimensions refer to a single spatial reference system (SRS)};
  2. Cells of a data cube have a constant spatial size (with regard to the cube’s SRS)};
  3. The spatial reference is defined by a simple offset and the cell size per axis, i.e., the cube axes are aligned with the SRS axes;}
  4. The temporal reference is a known set of non-overlapping temporal intervals; each temporal interval is defined by a simple start date/time and the temporal duration of cells;}
  5. All cells have the same set of attributes; each attribute can be a scalar or a vector;}
  6. For every combination of dimensions, a cell has a single set of attribute values.}

This definition of EO data cubes reflects their construction. Data cubes are built of 2D images collected on a specific date constrained by satellite orbits. These images are then grouped together in arbitrary intervals. In principle, a data cube can have an image on time 1, another image at time 5, another at time 8, and so on. Since interpolation in space is different from interpolation in time, a data set composed by these images would make a valid data cube. Space needs to be compact (dense) in a data cube, but time does not need to be. Thus, a sequence of 2D + time images need not be dense. 2D images in a data cube can correspond to different instants of acquisition.

Web Services for Earth Observation Data Cubes

Web services interfaces isolate users from concrete implementations, thus increasing portability There are many ways to implement Earth observation data cubes. One of the earlier approaches is to break large images in chunks and store them in an object-relational DBMS, as done in RASDAMAN (Baumann et al. 1998). Google Earth Engine(Gorelick et al. 2017) uses a massive parallel design; image data is partitioned in hundreds or thousands of CPUs, processed separately and later aggregated using map-reduce techniques. The Open Data Cube software, after making the required geometric and radiometric transformations in the images, stores them in files, which are indexed in a PostgreSQL database (Lewis et al. 2017). Similar file-based approaches are used in the R package “gdalcubes” (Appel and Pebesma 2019). Each of these designs is associated to a different API, thus making compatibility and interoperability between them hard to achieve.

Given the different designs of Earth observation data cubes, and their inherent limitations for achieving interoperability, one natural approach is to propose Web services that would present a unified interface for accessing them. Web services interfaces isolate users from concrete implementations, thus increasing portability(Ferris and Farrell 2003). In the geospatial domain, web service standards developed by the Open Geospatial Consortium have achieved considerable impact (Percivall 2010). Since services such as WMS (Web Map Service) and WCS (Web Coverage Service) are used worldwide with success, it is natural to ask: how to design web services for big Earth observation (EO) data? The current work addresses this issue and proposes WTSS (Web Time Series Service), a new services for big EO data.

Satellite Image Time Series

Given a series of remote sensing snapshots, we can reorganise them into a set of time series. A satellite image time series is obtained by taking measurements in the same pixel location \((x,y)\) in consecutive times \(t_1,...,t_m\), as shown in the figure below.

Events associated to a time series

Events associated to a time series

Recent results in the literature show that analysis of satellite image times series enables extracting long-term trends of land change \citep(Pasquarella et al. 2016). Such information which would be harder to obtain by processing 2D images separately. These analysis are supported by algorithms such as TWDTW (Maus et al. 2016), TIMESTAT (Jonsson and Eklundh 2004) and BFAST (Verbesselt et al. 2010). These algorithms process individual time-series and combine the results for selected periods to generate classified maps.

For example, classification methods such as TWDTW (Maus et al. 2019) can then break an image time series into a set of intervals. As an example, the figure below shows four events extracted from a remote sensing time series, expressed in terms of the its intervals. From 2000 to 2001 the area was a forest that was deforested in 2002. From 2003 to 2005 the area it was used for pasture and from 2005 to 2008, it was transformed into a cropland. This kind of classification is done by algorithms such that split a time series into a set of events. Combining snapshots with time series, scientists can explore the full depth of big remote sensing data archives.

Web Time Series Service (WTSS)

Motivated by the need to retrieve satellite image time series from large 3D arrays, we have designed and implemented the Web Time Series Service (WTSS). A WTSS server takes as input a 3D array. Each array has a spatial and a temporal dimension and can be multidimensional in terms of its attributes. The WTSS service is independent of the actual data architecture used for 3D array store. It can work with solutions such as flat files, MapReduce distributed datasets, array databases or object-relational databases.

The WTSS API has three commands: (a) list_coverages that returns a list of coverages available in the server; (b) describe_coverage that returns the metadata for a given coverage; (c) time_series that returns a time series for a spatio-temporal location.

Connecting to a WTSS server

The first step towards using the service is connecting to a server that supports the WTSS protocol. Currently, Brazil’s National Institute for Space Research (INPE) runs such a service. In the package, the connection is enabled by using the URL of the service. So, the first procedure is to find the URL of a WTSS service.

# Connect to the WTSS server at INPE Brazil
wtss_inpe <-  "https://brazildatacube.dpi.inpe.br/wtss/"

Listing coverages available at the WTSS server

This operation allows clients to retrieve the capabilities provided by any server that implements WTSS. It returns a list of coverage names available in a server instance.

# Connect to the WTSS server at INPE Brazil
Rwtss::list_coverages(wtss_inpe)
## [1] "MOD13Q1-6" "MYD13Q1-6"

Describing a coverage from the WTSS server

This operation returns the metadata for a given coverage identified by its name. It includes its range in the spatial and temporal dimensions.

# Connect to the WTSS server at INPE Brazil
desc <- Rwtss::describe_coverage(wtss_inpe, name = "MOD13Q1-6")
## ---------------------------------------------------------------------
## WTSS server URL = https://brazildatacube.dpi.inpe.br/wtss
## Cube (coverage) = MOD13Q1-6
## Timeline - 508 time steps
## start_date: 2000-02-18 end_date: 2022-03-22
## ---------------------------------------------------------------------

The coverage description is also saved as a tibble in the wtss object, to be used whenever required.

Obtaining a time series

This operation requests the time series of values of a coverage attribute at a given location. Its parameters are: (a) wtss.obj: either a WTSS object (created by the operation wtss::WTSS as shown above) or a valid WTSS server URL; (b) name: Cube (coverage) name; (c) attributes: vector of band names (optional). If omitted, all bands are retrieved; (d) longitude: longitude in WGS84 coordinate system; (e)latitude: Latitude in WGS84 coordinate system; (f)start_date (optional): Start date in the format yyyy-mm-dd or yyyy-mm depending on the coverage. If omitted, the first date on the timeline is used; (g) end_date(optional): End date in the format yyyy-mm-dd or yyyy-mm depending on the coverage. If omitted, the last date of the timeline is used; (h) To access the BDC time series it is necessary to provide a token, provide the token through the token parameter. To create a new BDC token, please see this tutorial.

# Request a time series from the "MOD13Q1" coverage
ndvi_ts   <- Rwtss::time_series(
  URL = wtss_inpe, 
  name = "MOD13Q1",
  attributes = c("NDVI","EVI"), 
  longitude = -45.00, 
  latitude  = -12.00,
  start_date = "2000-02-18", 
  end_date = "2016-12-18",
  token = "YOUR-BDC-TOKEN"
)
ndvi_ts

The result of the operation is a tibble, which is a generalization of a data.frame, the usual way in R to organise data in tables. Tibbles are part of the tidyverse, a collection of R packages designed to work together in data manipulation (Wickham and Grolemund 2017). The tibble contains data and metadata. The first six columns contain the metadata: satellite, sensor, spatial and temporal information, and the coverage from where the data has been extracted. The spatial location is given in longitude and latitude coordinates for the “WGS84” ellipsoid. The time_series column contains the time series data for each spatiotemporal location. This data is also organized as a tibble, with a column with the dates and the other columns with the values for each spectral band.

# Showing the contents of a time series
ndvi_ts$time_series[[1]]

References

Appel, Marius, and Edzer Pebesma. 2019. “On-Demand Processing of Data Cubes from Satellite Image Collections with the Gdalcubes Library.” Data 4 (3).
Baumann, P., A. Dehmel, P. Furtado, R. Ritsch, and N. Widmann. 1998. The multidimensional database system RasDaMan.” ACM SIGMOD Record 27 (2): 575–77.
Ferris, Christopher, and Joel Farrell. 2003. “What Are Web Services?” Communications of the ACM 46 (6): 31.
Gorelick, Noel, Matt Hancher, Mike Dixon, Simon Ilyushchenko, David Thau, and Rebecca Moore. 2017. “Google Earth Engine: Planetary-Scale Geospatial Analysis for Everyone.” Remote Sensing of Environment 202: 18–27.
Jonsson, Per, and Lars Eklundh. 2004. TIMESAT - a Program for Analyzing Time-Series of Satellite Sensor Data.” Computers & Geosciences 30 (8): 833–45.
Lewis, Adam, Simon Oliver, Leo Lymburner, and others. 2017. “The Australian Geoscience Data Cube: Foundations and Lessons Learned.” Remote Sensing of Environment 202: 276–92.
Maus, Victor, Gilberto Camara, Ricardo Cartaxo, Alber Sanchez, Fernando M Ramos, and Gilberto R de Queiroz. 2016. “A Time-Weighted Dynamic Time Warping Method for Land-Use and Land-Cover Mapping.” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 9 (8): 3729–39.
Maus, Victor, Gilberto Câmara, Marius Appel, and Edzer Pebesma. 2019. “dtwSat: Time-Weighted Dynamic Time Warping for Satellite Image Time Series Analysis in R.” Journal of Statistical Software 88 (5): 1–31.
Pasquarella, Valerie J., Christopher E. Holden, Les Kaufman, and Curtis E. Woodcock. 2016. “From Imagery to Ecology: Leveraging Time Series of All Available Landsat Observations to Map and Monitor Ecosystem State and Dynamics.” Remote Sensing in Ecology and Conservation 2 (3): 152–70. https://doi.org/10.1002/rse2.24.
Percivall, George. 2010. “Progress in OGC Web Services Interoperability Development.” In Standard-Based Data and Information Systems for Earth Observation, 37–61. Springer.
Verbesselt, Jan, Rob Hyndman, Glenn Newnham, and Darius Culvenor. 2010. “Detecting Trend and Seasonal Changes in Satellite Image Time Series.” Remote Sensing of Environment 114 (1): 106–15.
Wickham, Hadley, and Garrett Grolemund. 2017. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc.