doc

From TSDS

Jump to: navigation, search

Contents

  1. About
  2. Using
    1. Web interface
    2. API
  3. Installing
    1. Using pre-compiled binary
    2. From Source
    3. From Eclipse
    4. Into Existing Web Application
  4. Configuring
    1. tss.properties
    2. HTML Style (CSS)
  5. Connecting to data services
    1. Catalog
    2. NcML
    3. IOSP
  6. Extending
  7. Appendix
    1. Overview data
    2. Output formats
      1. bin
      2. fbin
    3. Using NcML
      1. Scalar
      2. Structure
      3. Sequence
    4. THREDDS Catalogs
      1. Basic Configuration
      2. Advanced Configuration
        1. Method 1
        2. Method 2

1. About

Time Series Data System (TSDS) is a project that provides a modular Java Servlet-based OPeNDAP server (a Time Series Server, TSS) built around the Netcdf-Java implementation [1] of the Unidata Common Data Model [2] and the NetCDF Markup Language (NcML) for serving time series data.

2. Using

2.1. Web interface

To see a list of all possible parameters available from a TSDS server, enter http://host/servletpath/ and follow the links to form a data request.

Examples:

2.2. API

The base-line API builds on OPeNDAP-compliant URL requests of the form:

http://host/servletpath/dataset.suffix?parameters&constraint&filter

where

  • host: name of the computer hosting the TSDS servlet;
  • servletpath: is the path to the servlet;
  • dataset: name of a dataset containing time series parameters;
  • suffix: type or format of the output;
  • parameters: list of parameters (alphanumeric and underscore characters only for parameter names) to return with optional hyperslab (index subset) definitions (default all);
  • constraint: contraints on the values of the parameters; and
  • filter: filters applied to the parameter values after the constraints have been applied.

filter options include

  • replace(a,b) replace any occurrence of the value a with b,
  • replace_missing(a) replace missing values with the value a,
  • exclude_missing() exclude any time sample that has a missing value,
  • format_time(format) format ASCII time output, see Java's SimpleDateFormat [3] (time variable must be explicitly requested to use),
  • stride(n) return every nth time sample, and
  • thin(n) apply a stride to return about n time samples.

constraint options include

  • >, <, >=, <=, and =.

The suffix options include

  • csv: comma separated values with single line header,
  • dat: space delimited ASCII format with no header,
  • bin: A flat binary table,
  • nc: Network Common Data Form (NetCDF) file (to be implemented),
  • cdf: Common Data Format (CDF) file (to be implemented),
  • h5: Hierarchical Data Format (HDF) version 5 (to be implemented),
  • json: JavaScript Object Notation (JSON),
  • xml: An XML representation of the data (to be implemented; structure to be determined),
  • info: information about the dataset and parameters,
  • html: HTML view of dataset information and a form for requesting data,
  • dds: dataset Descriptor Structure (ASCII),
  • das: dataset Attribute Structure (ASCII),
  • dods: dataset as defined by the Data Access Protocol (DAP), and
  • asc: dataset represented as ASCII.

Examples:

Other filters (under development):

  • binave(T) Takes the bin average with window width T (in time units of the dataset as specified in its NcML file; in the future this will be an ISO8601 duration). binave(T,To,Tf) is the same as time>To&time<Tf&binave(T). In this case To defines the start of the first bin. Output is time (bin center), bin average, average of time stamps in bin, bin min, bin max, and number of measurements in bin. Empty bins will have the appropriate bin time and 0 counts and the rest will be NaNs. Works on single scalars only. To obtain averages of a vector, use a projection clause (e.g., B.X) and make one request per vector component. If the time variable is a formatted time, the units are milliseconds. (Formatted times have something like string units "yyyy-MM-dd HH:mm" in the DataSetAttribute Structure, e.g., [4]. Unformatted times have something like string units "days since 1970-01-01" in the DataSetAttributeStructure, e.g., [5].)

Examples:

3. Installing

3.1. Using pre-compiled binary

Simply download a tsds war file from [6] and deploy it to your Servlet container.

For Tomcat, simply copy the war file into the webapps directory. You may need to restart your server.

In the future TSDS will be packaged so that it can be installed on a server that does not have a Servlet container already installed by following the approach described at [7] and [8].

3.2. From Source

Check out the TSDS project from the SourceForge subversion repository, then run ant to build the war file that you can then deploy to your Servlet container.

svn co https://tsds.svn.sourceforge.net/svnroot/tsds/trunk tsds

ant war

Copy the generated war file to your Tomcat webapps directory and browse to http://localhost:8080/tsds.

3.3. From Eclipse

This project is saved with Eclipse project metadata to simplify development and testing within Eclipse. It is a "Dynamic Web" project so the Java EE edition of Eclipse is recommended. First create a project named TSDS and check out the source code into the project. To compile, right-click on build.xml and select "Run-As"->"Ant Build". A war file will be created in a location indicated in the console tab. Copy this file to your Tomcat webapps directory and browse to http://localhost:8080/tsds.

If you would like to run in an instance of Tomcat running within Eclipse, right-click on the top-level directory and select "Refresh". Then right-click and select "Run As"->Run on Server" and enter the path to your Tomcat installation directory. (Note that if you get a page in Eclipse with a 404 error for http://localhost:8080/TSDS, select the Server tab and try a combination of removing the project from the server, re-adding it, re-starting the server, cleaning the work directory, etc. This is an Eclipse-specific problem that happens with other projects.)

3.4. Into Existing Web Application

You can configure your web application to include TSDS by deploying a tsds jar file with your app and providing a Servlet mapping to the TimeSeriesServer Servlet.

More to come.

4. Configuring

4.1. tss.properties

The features of the TSS Servlet are configured via a single Java properties file named tss.properties. By default, it is located within the subversion project in WebContent/tss.properties. It will appear at the top level of the web app when deployed. You can change the location of this file by setting the "config" parameter for the "TimeSeriesServer" Servlet in the web.xml file under the WEB-INF directory [9]. This can be a relative or absolute path. It is often a good idea to manage your tss.properties file external to the web app so upgrades won't overwrite your configuration.

To make a dataset available for serving, it must be described by a NcML file. The TSS maps the dataset portion of a request URL to the NcML file. The name of the dataset (which may include '/'s to indicate a directory path) is appended to the value of the dataset.dir property (followed by the .ncml suffix). By default, the tss.properties file defines dataset.dir to the datasets directory under WebContent in subversion (or the top level of the web app when deployed). This can be a relative or absolute path. It is usually a good idea to manage your NcML files external to the web app so upgrades won't overwrite your configuration.

The default NcML location can be overridden by using a THREDDS catalog to map the dataset name to a NcML URL. Set the catalog.url property to point to the top level catalog.

The output (writer) and filter options that are available to a deployment of the TSS are also defined in the tss.properties file. Output properties start with writer followed by the name which is used to match the output suffix from the request. The class property identifies the Java class that implements the Writer interface for this output option. Likewise, the filter properties map the name of the filter to the implementing class. If the description property is set, this option will appear on the TSS help page. Specialized properties can be added to support specific Writer and Filter features.

4.2. HTML Style (CSS)

Some of the TSS responses (e.g. html, help, info) are returned as html. The tss.css style sheet will be used for some formatting. Of particular note is the html response which is designed to be an order form. The goal of this CSS is to reuse the same html code for the form but to configure the style via the CSS. Thus data providers could integrate the server into their system with the same look and feel as the rest of their web pages.

More to come.

5. Connecting to data services

Serving data through the TSDS API from a local or remote data service requires two key pieces of information and possibly some additional code.

  1. A catalog listing containing all information required to form a data request. At the very least is a list of parameter IDs for each data server and start dates. Ideally additional information is given including stop date, units, and a link to documentation.
  2. A NcML file that is used by TSDS to form a data request.
  3. An IOSP (Input/Output Service Provider) - Usually Java code that maps the response from a service to the internal TSDS data structure.

5.1. Catalog

1. A catalog listing containing all information required to form a data request.

At the very least is a list of parameter IDs (alphanumeric and underscore characters only for parameter names) for each data server and start dates. Ideally additional information is given including stop date, units, and a link to documentation. For details on configuring TSDS to use existing catalogs, see #THREDDS_Catalogs.

Example data requests:

  • CDAWeb (parameter ID = AC_H1_MFI Magnitude): [10]
  • SPIDR (parameter ID = index_ssn): [11]
  • SuperMAG (parameter ID = BOU): [12]
  • VSEO (parameter ID = DE::DE-1::HAPI::HAPI::D1HE): [13]

Example catalogs | code to generate:

5.2. NcML

2. NcML is used by TSDS to describe each dataset.

  • In some cases the NcML is stored as an XML file on disk (for examples, see [14]).
  • In other cases the NcML is generated based on a request to TSDSFE and a template NcML file.

For more details on NcML, see #Using_NcML. Note parameters may only include alphanumeric and underscore characters.

TSDS interprets common NcML attributes such as missing_value, _FillValue, and units. In addition to numerical time units (e.g. "seconds since 1970-01-01"), TSDS will also interpret "formatted" time units that use the Java SimpleDateFormat (e.g. "yyyy-MM-dd").

Some output options (that use the FormattedAsciiWriter) also make use of the following variable attributes:

  • precision: the number of decimal places (using Java's %f formatting)
  • sigfig: the number of significant figures (using Java's %g formatting)
  • format: a raw Java format expression

5.3. IOSP

3. An IOSP (Input/Output Service Provider) - Usually Java code that maps the response from a service to the internal TSDS data structure (CDM).

IOSPs exist for:

  • Columnar remote or local data files.
  • Data piped from the command line.
  • Data in a text file that is pre-processed by a regular expression.
  • Data from web services: CDAWeb, SSCWeb, SPIDR, LISIRD, ViRBO, and SuperMAG.
  • An IOSP exists for Autoplot. If a URL can be plotted by Autoplot by entering the URL in its address bar, the Autoplot IOSP may be used with that URL. For examples, see [15]

6. Extending

Extended capability for Writers and Filters can be "plugged in" by adding an entry to the tss.properties file mapping a suffix to the class that implements the appropriate interface. Likewise, dataset descriptors (and even the data themselves) and supporting IOSPs can be packaged in a zip file and easily added to the server.

The NetCDF-Java API defines an IOServiceProvider (IOSP) interface. Implementations of this interface can be identified by an NcML file as the class to use to read from the native data source. Ideally, they are generic (e.g. by file format) and reusable. Or, they can be a specialized hack to access a single data source. The IOSP has two primary jobs:

  1. "Open" the data source and define it in terms of a NetcdfFile Object. In other words, define the data source in terms of the NetCDF (Unidata CDM) data model.
  2. "Read" the data for a given Section (i.e. subset) of a given Variable and return it in a ucar.ma2.Array.

The TSDS code comes packaged with a specialized NetCDF-Java jar (that has been extended to expose the NcML to the IOSP) and an abstract class (lasp.tss.iosp.AbstractIOSP) to make IOSP development easier. The AbstractIOSP handles step one above. It uses the dataset structure as defined in the NcML to create the NetcdfFile object that the NetCDF API then uses internally. Typically, NcML is used to augment a self-describing data format. We have taken the approach that the NcML is the description of the dataset since we often do not have easy access to metadata. However, if you override the open method in your IOSP, you can manage the creation of the NetcdfFile object yourself (e.g. from your metadata) and leave the NcML largely empty. All that remains is to extend AbstractIOSP and implement the logic that reads a subset of a Variable:

  • Array readData(Variable variable, Section section)
  • variable: ucar.nc2.Variable, which is part of the NetcdfFile object. This is often used only to get the name of the Variable for mapping to the source.
  • section: ucar.ma2.Section, which defines index ranges and strides for multiple dimensions.

Recall that the role of readData is to map a request for data to the data source. In many cases, directly mapping each read to the source does make sense. However, the TSDS typically reads then writes one time sample at a time (to enable streaming of arbitrarily long time series). If a separate request to the source for each time sample is not efficient, some kind of caching is recommended. Most IOSPs packaged with the TSDS simple read the entire dataset from the source when first requested. The subsetting requested by the Section in the readData call is then performed by the IOSP on the cached data. This naive caching approach will not scale to larger datasets. The cache doesn't persist between requests, so there is another opportunity for big performance improvements.

7. Appendix

7.1. Overview data

Some data sets available through the TSDS server running http://tsds.net/ have been cached in a special format that allows a user to quickly view the entire range of available data on a single plot. An interface for viewing these parameters is given at http://tsds.net/overview.

7.2. Output formats

7.2.1. bin

This simple binary format is typically used for developing light-weight and dependency free interfaces to TSDS. Both the MATLAB (suffix=m) and IDL (suffix=pro) interfaces request data in bin format.

It the request does not include the variable time, the bin file contains D·T 64-bit little endian floating point values, where D is the number of samples per time tag T.

  • http://tsds.net/tsdsdev/bin/Scalar.bin?Scalar - This D=1 parameter is on a uniform time grid and the ncml file provides the information required to compute time tags: “hours since 1989-01-01 00:00:0.0”. The bin file has ordering S(1), S(2), ..., S(T). The time tags can also be explicitly requested via http://tsds.net/tsds/Scalar.bin?time, which returns T 64-bit little endian floating point time stamps.
  • http://tsds.net/tsdsdev/bin/Vector.bin?Vector - This D=3 parameter is on a uniform time grid and the ncml file provides the information required to create y-labels "Vx, Vy, and Vz" and time tags: “hours since 1989-01-01 00:00:0.0”. The bin file has ordering Vx(1),Vy(1),Vz(1),...,Vx(3·T),Vy(3·T). The time tags can also be explicitly requested via http://tsds.net/tsds/Vector.bin?time, which returns T 64-bit little endian floating point time stamps.
  • http://tsds.net/tsds/bin/Spectrum.bin?Spectrum - This D=10 parameter is on a uniform time grid and the NcML file provides the information required to create y-labels "f1, f2, ..., f10" and time tags: “hours since 1989-01-01 00:00:0.0”. The bin file has ordering A(1,f1),A(1,f2),...,A(1,f10),...,A(10·T,f1),A(10·T,f2),...,A(10·T,f10). The frequency values can also be explicitly requested via http://tsds.net/tsdsdev/bin/Spectrum.bin?frequency, which returns 10 64-bit little endian floating point frequency values.The time tags can be explicitly requested via http://tsds.net/tsds/Spectrum.bin?time, which returns T 64-bit little endian floating point time stamps.

It the request includes the variable time, the bin file contains (D+1)·T 64-bit little endian floating point values, where D is the number of samples per time tag T.

7.2.2. fbin

There is also an experimental fbin ("fast bin") output format that is identical to the bin format. The "fast" results from the fact that most of the processing and inspection operations of TSDS are bypassed; in general it only makes sense to request fbin files when the source data is in bin format and no filters are applied.)

By default, the time tags are not returned when a parameter is not specified. For example, this request

results in a file with S(1), S(2), ..., S(N).

In contrast, this request

results in a file with T(1),S(1), T(2),S(2), ..., T(N),S(N).

7.3. Using NcML

The NetCDF Markup Language (NcML) was designed at Unidata to provide an XML description of a NetCDF dataset. It also serves as an access interface to a virtual dataset that it describes. NcML can add attributes (enhance metadata), hide variables in the underlying dataset, and aggregate multiple data granules (e.g. files). The NetCDF-Java API can open and access data via a NcML file just like it would a NetCDF file.

NetCDF-Java and NcML are built around the Unidata Common Data Model (CDM), which merges the NetCDF, OPeNDAP, and HDF5 data models. Data stored in these CDM compatible formats can also be accessed via an NcML file, which provides the opportunity to "enhance" the dataset and still map the result into the CDM. NetCDF-Java also provides a complementary extention mechanism via the NetCDF-Java IOServiceProvider (IOSP) interface. An IOSP maps from a dataset's native format into the CDM. An IOSP can be implemented for any dataset described by an NcML file. A primary design goal of the TSS is to enable the serving of a dataset in its native form by adapting it with XML markup (NcML for TSS1) and custom IOSPs. (Note that this is not a goal of NcML, which is why we intend to evolve this markup language aspect of the server.)

The TSDS includes a number of IOSPs that can generally be reused, thus often only an NcML file needs to be created to serve a new dataset. For those with more specialized needs, we also provide an abstract IOSP class that gives the developer a head start over the NetCDF abstract implementation. We expect the collection of reusable IOSPs to grow to further ease the adaptation of datasets. Of particular interest, are IOSPs that communicate with an existing data API. For example, we have configured datasets that get data directly from NGDC's SPIDR web API. We could do likewise for other existing capabilities to leverage adaptations that have already been done.

A NcML file created for the TSS is a valid NcML file that could be used by any NcML client. However, not all NcML files will be usable by the TSS. The main issue is conventions and how data are modeled. Most NcML in the wild uses CF conventions, which is well suited for geo-located gridded data (e.g. climate models). This doesn't help us much for the datasets we care about in heliophysics: time series of scalars, vectors, and spectra.

We have chosen to model time series using the CDM Sequence variable type. It provides the appropriate semantics for a growing record of time samples. This also means that we use OPeNDAP's Sequence type when serving their standard DAP data (for a "dods" request). Unfortunately, support for the Sequence type is not very mature on either end as they have been largely unexercised.

Another limitation of NcML is that it is not designed to redefine the structure of the data. A spectral time series may be represented as a 2D array in a NetCDF file while we want to model it as a Sequence of a Sequence, so NcML alone is not sufficient to adapt all datasets. To address this we have created some crude conventions so we can support vector and spectral datasets more easily (based on Groups), relying on our IOSPs more than native NetCDF support. We also made a few modifications to the NetCDF-Java code to enable our IOSPs to access the NcML content. Our work on TSS2 involves an improved data model (TSDM) and a markup language (TSML) that are better suited for mapping native data into a common data model, independent of the NetCDF implementation.

The best way to understand how TSS uses NcML is to look at some examples. In the TSDS source code, some NcML examples can be found under "WebContent" in the "datasets" directory.

For more information, see the annotated NcML Schema.

7.3.1. Scalar

The simplest dataset is a scalar time series (test/Scalar.ncml) that can be thought of as a simple table with a row for each time sample and a column for each variable.

Each NcML file has a top level "netcdf" XML element where access information is expressed in XML attributes. In this example, we use a custom IOSP (ValueGeneratorIOSP) that generates data values instead of reading them from some source. The IOSP knows to use the "start" and "increment" attrubutes as parameters for generating the data values. Note that here the "location", typically used to identify a data file, is set to "/dev/null", even though it is superflous in this case. That is because the NcML implementation insists on opening that location for you and will fail if there is not something real there to open. The IOSP can do about anything it wants, including managing its own resources and reading specialized parameters from the NcML file, so it provides more flexibility than what can be done with NcML alone.

Inside the "netcdf" element, you can express "global attributes" with "attribute" elements. (Note the difference between an XML attribute and a NcML attribute, which is an XML element.)

A "dimension" element helps to describe the functional semantics of the dataset. Each variable's "shape" is defined in terms of dimensions. The TSS builds on the NetCDF notion of a coordinate variable. A coordinate variable (think independent variable or domain of a function) is one that has a dimension (or shape) of the same name. In this case, we define a "time" dimension (which specifies a length so the ValueGeneratorIOSP will know when to stop) and a "time" variable. "time" is the standard name for the time variable in the TSS.

The bulk of the NcML file is variable definitions. There is an XML element for each variable to be served (unlike typical NcML usage where all underlying variables are exposed unless explicitly excluded) with information about its dimensionality (shape) and type (we treat all numeric data as doubles, for now, because we can). "variable" elements can also contain "attribute" elements for containing metadata. We tend to use CF conventions such as "units", "_FillValue", and "long_name" when available. There are other "standard" attributes that we have introduced to help support reusable Writers. (More on Writers later.) We have chosen to support only String type attributes, for now.

7.3.2. Structure

Used for Vector time series. More to come.

7.3.3. Sequence

Used to represent a spectrum. More to come.

7.4. THREDDS Catalogs

More to come. In the mean time, see the Dataset Inventory Catalog Specification.

7.4.1. Basic Configuration

TSDS uses the THREDDS catalog specification [16] to generate a list of datasets with links to download forms (sample form).

If tss.properties has dataset.dir = datasets/test/catalog.thredds, and catalog.thredds has content

<?xml version="1.0" encoding="UTF-8"?>
<catalog name="Test Data"
        xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
        xmlns:xlink="http://www.w3.org/1999/xlink">
 
  <service name="tss" serviceType="OpenDAP" base="" />
  <service name="ncml" serviceType="NCML" base="" />
 
  <dataset name="Scalar" >
    <access serviceName="tss" urlPath="Scalar" />
    <access serviceName="ncml" urlPath="Scalar.ncml" />
    <documentation type="summary">
      Single variable time series
    </documentation>
  </dataset>
 
 </catalog>

the user will see at http://server/TSDS/ a list that contains

Multiple catalogs may be combined to form a composite catalog. For example, if dataset.dir = datasets/catalog.thredds contains

<?xml version="1.0" encoding="UTF-8"?>
 <catalog name="Time Series Data Server THREDDS Catalog"
        xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
        xmlns:xlink="http://www.w3.org/1999/xlink">
 
  <catalogRef xlink:title="test"          xlink:href="test/catalog.thredds"/>
  <catalogRef xlink:title="test repeated" xlink:href="test/catalog.thredds"/>
 
 </catalog>

the user will see at http://server/TSDS/

7.4.2. Advanced Configuration

There are two ways to obtain catalog and NcML files from a remote URI.

7.4.2.1. Method 1

In tss.properties, set dataset.dir to ./datasets/catalog.thredds. In catalog.thredds include a catalogRef element with an href attribute having value that is a URI (escape amperstand) that returns a catalog:

<?xml version="1.0" encoding="UTF-8"?>
 <catalog
        name="Time Series Data Server THREDDS Catalog"
        xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0"
        xmlns:xlink="http://www.w3.org/1999/xlink">
 
  <catalogRef xlink:title="Local demos"  ID="local"
                      xlink:href="test/catalog.thredds"/>
  <catalogRef xlink:title="Remote demos" ID="remote"
              xlink:href="http://virbo.org/meta/viewDataFile.jsp?docname=C6D5623A-ADEC-8397-88A7-DD62A37BA490&amp;filetype=data" />
 </catalog>

Using the above configuration, the remote catalog should be shown at http://server/TSDS/:

Note that the remote catalog includes references to remote NcML files:

<?xml version="1.0" encoding="UTF-8"?> 
 <catalog xmlns="http://www.unidata.ucar.edu/namespaces/thredds/InvCatalog/v1.0" name="Test Data">
   <service name="tss" serviceType="OpenDAP" base="">
    <!-- By default, base will be the path to the TSDS servlet
            that served the catalog.
            If base="http://server2/TSDS/remote2", the user will see links to, e.g., 
            http://server2/TSDS/remote2/Scalar.html -->
   </service>
   <service name="ncml" serviceType="NCML" base="http://virbo.org/meta/viewDataFile.jsp?filetype=data&docname="> </service>
   <dataset name="Scalar">
     <access serviceName="tss" urlPath="Scalar" />
     <access serviceName="ncml"
                   urlPath="597C7956-742D-FEC6-D151-A37A7176E867" />
     <documentation type="summary">
       Single variable time series
     </documentation>
   </dataset>
 </catalog>

7.4.2.2. Method 2

In tss.properties, set catalog.url to a URI that returns a catalog. For example,

catalog.url = http://virbo.org/meta/viewDataFile.jsp?docname=C6D5623A-ADEC-8397-88A7-DD62A37BA490&filetype=data

Retrieved from "http://tsds.org//doc"
Personal tools