Telecons

From TSDS

Jump to: navigation, search

Conference Dial-in Number: (712) 432‑0255, Participant Access Code: 681546#, Host Access Code: 681546*


Contents

  1. 2013
    1. January 23-25 Meeting at GMU
    2. Plan
    3. TSDS DD
    4. TSDS HDPE Server
  2. 2012
    1. Feb
    2. Jan
  3. 2011
    1. October 15th
    2. September 30th
    3. September 23rd
    4. August 26th
    5. August 8th
      1. Logistics
      2. Agenda
      3. Notes
        1. Introduction to project
        2. NcML, THREDDS, and SPASE
        3. Connecting to CDAWeb/SSCWeb
        4. Connecting to Autoplot/DataShop
        5. TSS2
        6. Beta Users
        7. IOSP for CDF files
        8. File Name Templates
        9. Huge Requests
        10. Filtering
        11. Caching
    6. July 22nd
    7. July 8th
    8. June 17th
      1. catalog.url
      2. Autoplot IOSP Discussion
      3. Documentation Request
      4. Time Window Discussion
      5. D.L.
      6. J.M.
      7. J.F.
    9. May 27th
    10. May 5th
    11. April 15th
    12. March 30th
    13. March 18th
  4. Archive
    1. 2010
    2. 2009
      1. November 13
      2. November 6
      3. October 30
      4. October 23
      5. October 2
      6. Next telecon
      7. August 7
      8. June 26
      9. June 19th
      10. May 15
      11. May 6
      12. April 17
      13. April 10
      14. April 3
  5. Appendix - Old notes (pre 2011)
    1. Source Code
    2. Installation
    3. Server Configuration
      1. Server Modes
        1. Mode 1
        2. Mode 2
        3. Mode 3
        4. Mode 4
      2. Data Types
        1. Type 1
          1. Benchmarks
        2. Type 2
        3. Type 3
        4. Type 4
        5. Type 5
        6. Type 6
      3. Combinations
        1. Type 1-3
        2. Type 4-6
    4. Data Access
    5. REST API
      1. Type 1
      2. Type 2
      3. Type 3
      4. Script
        1. Jython
        2. Excel/VB
        3. MATLAB
        4. IDL
      5. Basic API
      6. Filter API

1. 2013

Telecon on 2/8 - 11 Eastern

1.1. January 23-25 Meeting at GMU

1.2. Plan

  • Bi-weekly telecons
  • Implement the TSDS Dataset Descriptor
  • Create short->expanded DD's for sample datasets
  • Discuss follow-on proposal
  • Create simple CSV output from DD's for use with unit tests.
  • Write short DD expander and put in GitHub
  • Write several implementations of expander
  • Short DD -> DD <--> XML -> expander
  • Short DD -> DD -> expander
  • Aggregation - read Todd's doc and respond.
  • Plan for summer proposal round.

1.3. TSDS DD

Developed http://tsds.net/dd

1.4. TSDS HDPE Server

Defined

  1. Serves .bin and allows convert_time(seconds since 1970) and HTTP header has REST ASCII File Descriptor.
  2. Serves .asc (header = REST ASCII File Descriptor)
  3. Serves .csv header = comma separated variable name (unit) (variable name matches \w, unit not restricted)
  4. Serves .spase (optional) is SPASE description of .csv

2. 2012

2.1. Feb

  • Make tsdsdev tsds and update documentation links.
  • User should not see http://mag.gmu.edu:8081/TSDS/bin/Structure.html
  • Autoplot IOSP issues. Fixed.
  • Document aggregation issues (must be file on server unless remote file is nc. Aggregation only works reasonably with FileCache enabled. Doug worries FileCache may have resource leak ... does not know what lifecycle is.)

<attribute name="units" value="count" />

  • Add logic - if .info file found use it. If not, show list of document urls. If not, use Dataset descriptor.
  • Add output option of NcML.
  • Look into NetCDF cf conventions to see if they have an attribute for documentation.
  • Add option to output CDF. Does Nand's java code write CDF?
  • Add output option of nc. NetCDF java code can only write netCDF 3 but not netCDF 4 without native libraries. Newjon is a pure java implementation of netCDF 4. Doug will look into using this.
  • Document sigfig, etc. Make sure all demos use up-to-date notation.
    • precision -> %f
    • sigfig -> %g
  • Document _FillValue or missing_value both work (missing_value should be documented). Use with exclude_missing and replace_missing.
  • Send Jon spec for catalog. Do screenscast where he watches me generate ncml.
  • NcML for Reiner's data and Iowa wave data.
  • Filefinder web service - how to implement?
  • How to handle mode changes?

2.2. Jan

Meeting to-do

Joey Arrive Sunday depart Saturday Doug Arrive Sunday depart late Friday Jeremy Arrive Monday afternoon depart noon Friday

3. 2011

3.1. October 15th

Discussion of Joey's data file:

  • The first two columns are the start and top time over which data were collected (particle counts, for example).
  • The next 256 columns are energy channels (or bins). What are the units? Are the bin centers in log space?
    • The energy units are in Electron Velocity (km/s) and are logarithmically spaced.
  • There are 17 more blocks of 256 columns. What are the units?
    • The data units are in science units of Distribution Function (sec^3/km^6).
  • The total number of columns is 2 + 256 + 18 x 256 = 4866 columns.
    • The fill value used in the file is -3.400e+38.

Some more information on the data:

 ELSPADNR is the pitch angle sorted ELS data which includes data from 
 one less than the ELS sectors which are unblocked.  Some ELS sectors 
 are blocked by the spacecraft.  A simulation was run using the SPICE 
 kernels and the scanner kernels which determined the position of the 
 instrument viewing direction.  It separated the ELS plane into those 
 sectors which view the spacecraft, those which do not, and those which 
 partly view the spacecraft.  The Regular Selected Sectors include all 
 of those which do not view the spacecraft.  This virtual excludes the 
 two border sectors assuming that these exclude any contamination (Narrow 
 Selected Sectors); however, caution still should be exercised when 
 viewing this data because it still could contain data contaminated by 
 the spacecraft.

 ELS pitch angle bins are 10 deg wide from 0 deg to 180 deg.  They are
 marked every 10 deg at the mid angle.  ELS sectors are 4 deg x 22.5 deg,
 but depending on how this falls into the pitch angle bins, each ELS sector
 influence from 1 to 3 pitch angle bins.  In order to sort, each ELS sector
 is subdivided into 10.  Pitch angles at the edges and corners of each 
 subarea are determined.  If the subsector lies totally within a pitch 
 angle bin, then the data value is accumulated in that pitch angle bin.  If 
 the corners and edges show that a pitch angle boundary is crossed, then 
 fractions of that data value is accumulated in each pitch angle bin depending
 on the fraction of the subsector exists in each pitch angle bin.  After
 all of the appropriate ELS sectors have been examined, pitch angle bins
 are averaged by the total number of samples accumulated in each bin.  

 The data value accumulated is the distribution function.  This value should
 be independent of any unique sector properties and should reflect the
 electron distribution.  The distribution function for each ELS sector
 has had background removed when it was available.  Background values were 
 determined when the ELS energy was above 10,000 eV.  Background values
 were pre-computed at periods of (1) the sample, (2) 1 minute, (3) 5 minutes,
 and (4) 25 minutes.  For statistics reasons, the background count is
 examined at each of the four back ground levels and the more appropriate 
 background level is taken as the number of background counts to remove.
 Appropriateness is determined by a balance of the shortest accumulation
 time which has decent counts.  Some VEx ELS sectors are more noisy than
 others, so the background levels at shorter accumulation times are more
 appropriate.  However, for other sectors, the background rate is very low.
 For these sectors, it is more appropriate to use a large accumulation time
 in order to determine an accurate background rate.  If the background 
 rate can not be determined or the no background data exists (which is the 
 case for the 1 sec sweep), it is assigned a value of zero.  Background
 counts are determined on a ssector basis because of the differences in
 background values of the VEx ELS.   The following is the condition for 
 selecting an appropriate accumulation time for the background:
      counts above 10,000 eV must be greater than 5: sample accumulation
      counts above 10,000 eV must be greater than 10: 1 minute accumulation
      counts above 10,000 eV must be greater than 20: 5 minute accumulation
      counts above 10,000 eV must be greater than 50: 25 minute accumulation

 Background counts are assumed to be average values which must be removed
 from the instrument count before conversion to the geophysical quantity.
 It is assumed that the background counts are independent of energy.  The
 ELS count rates on VEx are not high enough for the count to significantly 
 influence the conversion to the geophysical unit.  So the average background 
 count is subtracted from the count at each energy.  The remaining value of 
 count is then multiplied by the conversion of 1 count in geophysical units.
 The subtraction of average background values can lead to negative fluxes.
 These are not real, but should be included in an ensamble of fluxes in 
 order to determine the average of the geophysical quantity.

 The magnetic field has two resolution products.  This is a 1 sec product and 
 an 4 sec product.  Higher resolutions of the magnetic field are sometimes
 available on a case-by-case basis.  The sorting of VEx ELS using the 
 magnetic field is conducted slightly differently for these two resolutions.
 For the 1 sec product, the value reported for the magnetic field is
 obtained at the resolution of the magnetometer, so interpolation is performed
 between reported magnetometer values.  Sorting occurs using these 
 interpolated values.  The 4 sec magnetometer product is the result of a 
 weighting filter, averaging data within 16 sec window and reporting
 values every 4 sec.  These magnetic field values are taken as an average
 quantity over 4 sec.
 
               |
               |   *                          |      *-----            
 1 sec      | /   \             4 sec   |      |        | 
 B (x,y,z) |       \   /    B (x,y,z)  |----        |
               |        *                      |               *----
               |-------------        |-------------
              time->                      time->



First we concluded that this data structure was beyond what we planned on handling but then realized that it could be handled in an unconventional way (two requests needed to make a plot).

Action items:

  1. Joey would send a larger version of his file.
    1. This is now at ftp://virbo.org/tmp/Venus_Express_ELSPADNR_2009125140000.txt.gz

Image:vex_elspadnr_mode_change.png

  1. Doug would use either the larger file or the file that Joey sent previously to create an NcML file.

3.2. September 30th

  • OPeNDAP/Autoplot interface? Patch did not work with Doug's code?. Will continue to work on. For Autoplot demo purposes, we will use ASCII or bin files.
  • Command line call from Tomcat (Joey needs to do). In trunk. See CommandReader.java
  • Remote ncml and security - no longer an issue because we won't allow people to type http://tsds.net/data/Scalar.ncml.

3.3. September 23rd

  • Doug discuss code updates
  • OPeNDAP/Autoplot interface? Doug has patch to netCDF Java libs. Patch is expected to appear soon in new netCDF Java release.
  • Fall AGU side meeting? Decide on this by September. No. Bob will be at a different meeting.
  • Discuss http://tsds.net/doc Doug and Anne will review
  • Discuss [6] Joey will try installing TSDS on his server. How will TSDS read data from a TSDS server? Could use netCDF granule, but ideally will use OpenDAP (which requires repaired netCDF Java libs.
  • CDFGranuleIOSP - Doug will attempt to create a Granule ISOP for a restricted set of CDF file structures (possibly just Type 1 defined below). Eventually we will probably create and use QDataSetIOSP for all CDF reads, but it will be handy to have access to a basic CDF ISOP for reference. This pdf describes all of the ways data can be stored in a CDF file [7]. It is important to note that prior to calling the CDFread routine, you need to know the file structure. Here are some common CDF file structures. To discuss later.
    • Type 1 - Time variable is Epoch. Variable is stored with structure of one time stamp per record. Each record has one row and N columns (N=3 for vector, for example).
    • Type 2 - Time variable is Epoch. Variable is stored with a structure of one record per variable. Each record has T rows and N columns.
    • Type 3 - Time variable is CDF_INT8. Otherwise same as Type 1.
    • Type 4 - Time variable is CDF_INT8. Otherwise same as Type 2.
  • CDAWebServiceIOSP/SSCWebServiceIOSP - Bob will have Wei and Sheng try to create based on Doug's examples and the web service documentation at CDAWeb/SSCWeb. They will use ASCII for data transmission method given in the examples. Later, we may switch to CDF for data transmission, which is an option for both CDAWeb and SSCWeb. Still working on.
  • http://localhost:8080/TSDS/hpde_data/datasets/p11tr_hr.asc?time,TDF&TDF=~A Bob was uncomfortable with this (non-numeric stuff) .... to be discussed later.

3.4. August 26th

Reduced-participant telecon to discuss targeted technical issues.

  • Doug/Anne report on Aggregation issue (Bob noted that readAllData was being hit once per line of file when aggregation was used.)
The GranuleIOSP and the AsciiGranuleReader subclass are largely complete. I was able to improve one aggregation test by THREE orders of magnitude (2000 to 2 seconds). The Variable caching sure helped but the FileCache was the key. Without it, the netcdf code was trying to call open on the IOSP for every read.
We were using a 4.1 version of the NetCDF-Java libraries that I had modified a bit. Since then, they have released a stable version 4.2. I adapted the server to a few of their changes while I was developing the GranuleIOSP. Instead of changing the code in the netcdf jar, I added modified versions of two of their classes to our source code. Due to the way they are using a private inner class, I was not able to simply extend those classes. The only difference is the ability to get the "netcdf" element from the NcML so the IOSP could use it. (Their OPeNDAP support for Sequences is still broken.) We will need to make sure that our version shows up in the classpath before theirs when we deploy. Short of writing a custom class loader, I think our best option is to release the class files in WEB-INF/classes which takes precedence over jar files in WEB-INF/lib.
GranuleIOSP will replace AbstractIOSP. You must implement readalldata() and getdata(variable) and returns netcdf array object. For getdata() developer needs to fill ucar.ma2 array object (may go back requiring him to fill to 1-d array).
Should be available later today for testing.
  • Doug/Anne report on CRRES data file time format issue.
I improved the way the new AsciiGranuleReader deals with formatted times over multiple columns.
A related change uses a "column" attribute (XML) instead of the v# naming convention.
For the multi-column time you can say column="1 2 3 4 5 6". The format in the "units" attribute is the same but the :Reader preserves the columns and stitches them together when parsing the time. The other AsciiIOSP was expecting :an exact regular expression match.
This fixes the time parsing problem observed in the CRRES files.
  • Timeranges
The new AsciiGranuleReader now supports URLs. This is the first step to a web service reader. This could be used now by hard-coding the time range in the url in the NcML.
A couple of things to clean up before release.
The next step is the dynamic NcML generation. We could define the NCML service in a catalog and write a little code in the server to pass it whatever parameters (e.g. time range) that it needs to construct the NcML.
Doug will include example from reading from SPIDR to clarify how time ranges will be handled.
  • Bob report on follow-up with Bobby on 301 redirect for CDF file.
Important feature is headers that indicate expiration. (Would break things, especially when multiple files are returned. (But our understanding was that for most/all things relevant to TSDS, single files are returned). Will move this discussion to the Autoplot/SPDF telecon as this is a feature that Jeremy was interested in. Virtual Variables - in master cdf, virtual variables are defined. When encountered, an IDL function (in code readmycdf.pro available in [8]) is called that creates the variable based on variables in a given file. CDAWeb services examples don't use CDF file readers. The request in the examples are for ASCII data. We will implement an IOSP using ASCII requests. The issues of expiry time is not urgent now.
  • Discuss Bob's suggestion for a pre-processor. New fixes will reduce need for it, but ASCII files in the wild will still need pre-processing. Victoir is working on a web service project, based on Doug's TSDS source code structure, that allows http:://localhost:8080/filefixer?urltofile&fixer and responds with a normalized version of the file that can be read by a standard IOSP.
  • In latest version, it seems catalog listed is that in datasets/catalog.thredds instead of that returned by catalog.url. Comment out dataset.dir and set catalog.url = http://virbo.org/metadev/viewDataFile.jsp?docname=F8ADA960-F16B-5F72-6B09-BE1FE64E5BB1&filetype=data
  • Discuss things Doug can ask Wei and Sheng work on.
    • Doug send/put in SVN SPIDR IOSP code (send in response to this email)
    • Doug send/put in SVN Autoplot IOSP code
  • Bob reports on using SSCWeb and CDAWeb web service tools. Following SPIDR IOSP example in [9], which does not require CDF file reader.
  • Bob reports (via Sheng) on embedding TSDS in Jetty.
  • Jeremy report on dods reader bug.
  • Put CDF IOSP effort on hold until other issues are resolved?
  • When a scan is used for one file per day over a year and data for one day is requested, are all files opened?
  • To read a subset of a file, can use LayoutRegular: [10]
  • IOSP Tutorials: [11], [12], [13], [14]
  • Need examples with nc and h5

3.5. August 8th

Face-to-face meeting at GMU. Attending: Weigel, Lindholm, Vandegriff, Lal, Joey, Candey, Wilson, Sheng, Wei, Veibell.

3.5.1. Logistics

  • Bob's cell is 571-230-3233
  • Location: Research I, room 301 [15]
  • Parking is $12/day in the Sandy Creek Parking deck. It is the (unlabeled) building across from Research I on this map [16]
  • General directions to George Mason: http://www.gmu.edu/resources/welcome/Directions-to-GMU.html (GMU's address is 4400 University Drive, Fairfax, VA 22030)
  • To get to Mason from the subway, take the Orange Line to the last stop. You may take a cab ($15) or a bus ($2.0) [17]. The bus will take about 30 minutes and the cab about 15 minutes. It takes about 45 minutes on the subway from Metro Station to Fairfax.
  • Wireless will be available. username=TSDSMeeting, password: NASAYYYY

3.5.2. Agenda

Monday August 8th

  • 09:00-09:15 Agenda discussion (Bob leads)
  • 09:15-10:00 Introduction to project [18], review of proposal, proposed work, and progress thus far (Bob leads)
  • 10:00-11:00 Discussion/introduction to NCML [19], THREDDS Catalogs [20] & [21], and SPASE-enabled syntax (Bob leads)
  • 11:00-12:00 Connecting to CDAWeb/SSCWeb. How best to approach? (Bob leads)
  • 13:00-14:00 Connecting to Autoplot [22] and (VODownloader [23], [24], [25]) (Bob leads)
  • 14:00-15:00 Discussion of TSS2 (Doug) Slides
  • 15:00-17:00 Aggregation and Caching (Bob)

Tuesday (working meeting)

  • 09:00-12:00 Using documentation, build NCML and Catalog for an ASCII data source. Each participant shows result and makes documentation suggestions at end.(Doug/Anne leads); Suggestions include (Examples (2), (3), and (5) could be implemented by downloading the file locally):
    1. Example of a remote service (login needed): http://supermag.jhuapl.edu/archive/line.html
    2. Example of remote ASCII with XML metadata- Data: [26]; Associated metadata: [27]
    3. GOES ASCII [28] (from http://goes.ngdc.noaa.gov/data/avg/1998/)
    4. Data from an OPeNDAP server [29]
    5. An ASCII file that requires a special reader (see wdc files inside): [30] the spec is [31]
  • 13:00-15:00 Review of codebase - each participant installs in Eclipse. (Doug/Anne leads). Fedora 14 with Tomcat had this error:
SEVERE: Error deploying web application archive tsds.war
java.lang.UnsupportedClassVersionError: lasp/tss/TimeSeriesServer : Unsupported major.minor version 51.0 (unable to load class lasp.tss.TimeSeriesServer)
at org.apache.catalina.loader.WebappClassLoader.findClassInternal(WebappClassLoader.java:2822)
      at org.apache.catalina.loader.WebappClassLoader.findClass(WebappClassLoader.java:1159)
      at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1647)
      at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1526)

problem was fixed by installing Sun Java.

  • 15:00-17:00 Creation of an IOSP - each participant attempts to create an IOSP (Doug/Anne leads)

Wednesday

  • 09:00-11:00 Discussion of tasks and timelines for work. Discussion of codebase management. Discussion of documentation management. (Bob leads)
  • 11:00-12:00 Develop summary to send to HPDE/SPASE list. (Bob leads)

3.5.3. Notes

3.5.3.1. Introduction to project

3.5.3.2. NcML, THREDDS, and SPASE

3.5.3.3. Connecting to CDAWeb/SSCWeb

3.5.3.4. Connecting to Autoplot/DataShop

3.5.3.5. TSS2

3.5.3.6. Beta Users

Next summer have working session where people can learn more about system.

3.5.3.7. IOSP for CDF files

  • Could use Autoplot IOSP to read CDF files, but anticipate need for a IOSP for just CDF files, and it should be easy to implement.
  • Doug will implement skelenton IOSP to be finished by someone assigned by Bob

3.5.3.8. File Name Templates

Discussion of using File Finder service to build a NcML file for aggregation.

Bob needs to write up requirements for file finder service and send to Jeremy and Jon to see if what they have can be used to implement the service.

Notes:

finder.pl calls finder.jar which starts JVM. Slow start-up. To use as a service, we need jar file. Some work needed to release code (re-write of time string parser with problem license?)

3.5.3.9. Huge Requests

  • How to deal with? With filter, a client could estimate response size (ask for number of points per day for days in requested time range).
  • Need to provide data servers an option for placing limits. Should an ISOP be passed an argument that is "stop writing after N bytes"?
  • How to communicate to client that request will take a long time?

3.5.3.10. Filtering

  • Don't emphasize in discussion - make sure emphasis is that we provide access to lots of data
  • When filter is in URL, does it operate on each variable?
  • Should filter be allowed as an attribute in NcML? This would allow new data products to be defined in NcML.

3.5.3.11. Caching

Motivation Many remote databases/APIs are configured in such a way that request for small amounts of data take a long time. Caching is used everywere on the web. (Note that caching is generally different from mirroring.) Caching is used to improve user experience by

  • Speeding up requests
  • Handle temporary downtime of a service that a higher-order service depends on.

For example,

  • When you re-load a page, your browser does not re-dowload every image (client-side caching).
  • Common queries to a database are often cached, especially if the query requires significant CPU and the memory required to store the query result is small.
  • Google has a "cache" link in case a server goes offline (server-side caching to handle temporary downtime).
  • When you visit a web page, popular scripts (e.g., JQuery) are often served from a content delivery network (CDN). Given a request for an object, the CDN determines the best server to use to fulfill the request.

Discussion resulted in identification of four types of caching:

  1. Cache response from URL
  2. Save granules locally; inspect HTTP headers to determine if it needs to be re-downloaded (Autoplot)
  3. Save granules locally; refresh granule cache weekly. (Jon and Joey)
  4. Save local store of transformed granules in (fast) CDM format on disk. (Bob)

3.6. July 22nd

  • 3pm Eastern
  • Doug: Update on problem with http://autoplot.org/autoplot.jnlp?http://tsds.net/tsds/test/Scalar.dods Seems to be due to Autoplot using old netCDF libraries (did the dods standard change?). Jeremy is checking.
  • Bob: Modify existing virbo.* code at [33] to work with new server. Pending fix for above
  • Bob: Comment on most recent catalog.url email [34]
  • Doug + Anne: Ponder using [35] a work-around to [36]. To continue on August 8th.
  • Discuss local saving of OPeNDAP response in Autoplot discussed here [37]
  • To discuss on August 8th: Options for aggregation: Scripting [38]; Using NcML aggregation; other?
  • Doug + Anne + Jeremy: Discuss progress on Autoplot IOSP (a zip file and instructions for installing as an add-on IOSP.) Pending.
  • Doug and Anne: Discuss progress on documentation requested (see June 17th Telecon notes). Will continue to develop in prep for August 8th meeting.
  • Discuss http://ivoa.net/Documents/TAP/
  • Discuss
 <dataset name="Scalar0" urlPath="http://aurora.gmu.edu/tmp/Scalar0.ncml">            
   <access serviceName="tss" urlPath="Scalar0"/>                                      
   <documentation type="summary">                                                     
     Single variable time series                                                      
   </documentation>                                                                   
 </dataset>                                                                            
  • Discuss draft agenda
Monday August 8th

09:00-09:15 Agenda discussion (Bob leads)
09:15-10:00 Review of proposal, proposed work, and progress thus far (Bob leads)
10:00-11:00 Discussion of SPASE-enabled syntax (Bob leads)
11:00-12:00 Connecting to CDAWeb.  How best to approach? (Bob leads)
13:00-14:00 Connecting to Autoplot and VODownloader
14:00-15:00 Discussion of TSS2 (Doug)
15:00-16:00 Cache API (Bob)

Tuesday (working meeting)

09:00-12:00 Using documentation, build NCML and Catalog for an ASCII data source.  Each participant shows result and makes documentation suggestions at end.(Doug/Anne leads)
* Suggestions include SuperMAG, USGS Dst, VMO Repository, TSDS0
13:00-15:00 Review of codebase - each participant installs in Eclipse. (Doug/Anne leads)
15:00-17:00 Creation of an IOSP - each participant attempts to create an IOSP (Doug/Anne leads)

Wednesday 

09:00-11:00 Discussion of tasks and timelines for work.  Discussion of codebase management. Discussion of documentation management. (Bob leads)
11:00-12:00 Develop summary to send to HPDE/SPASE list. (Bob leads)

3.7. July 8th

  • 3pm Eastern
  • Discuss catalog.url - move to feature request? See [39]
  • Discuss putting wildcards in NcML URI that are passed to IOSPs (for example, goodata.cgi?name=data&start={$YYYY-$MM-$DD}&stop={$YYYY-$MM-$DD}). Are these the advantages: for REST calls, the URI is shown in the NcML instead of being hidden in source code and if URI changes, recompile is not necessary, only change to NcML? See [40]
  • Doug and Anne: Discuss progress on SPIDR IOSP. See [41]
  • Doug and Anne: Discuss progress on "developer-level" instructions (bulleted list with pointers to source code) in documentation for implementing an IOSP [42]
  • Bob: Report on OPeNDAP clients. Here they are: OPeNDAP's and Autoplot's and LISIRD's. Also note that http://autoplot.org/autoplot.jnlp?http://tsds.net/tsds/test/Scalar.dods does not work, but changing dods to dat works.

3.8. June 17th

  • 3pm Eastern
  • Email list issue Resolved. All lists will be public.
  • Add feature request for ability to output NcML (default is allow, but configuration allows off for security). Done [43]
  • Doug summarize updates on "developer-level" instructions (bulleted list with pointers to source code) in documentation for implementing an IOSP [44]
  • Bob send Joey spec for bin exchange format + NcML. Joey may need to extend binary IOSP to handle http bin. Better to start with just ASCII, for which http+local files webservice exists.This was discussed at length. We have decided that the best approach is to modify SPIDR IOSP so that it uses start/stop times in the request to SPIDR (now it pulls down everything and subsets on the server). Joey will use this as a starting point. Joey noted that he uses the file format http://www.idfs.org/.
  • Bob - meet with Jeremy and Doug to discuss Autoplot IOSP. Done. Doug is continuing to work on moving relevant code into SVN.
  • Discussion of time stamp and time window - The question involved how to deal with measurements that were accumulated over a time window and where the time stamp may not be at the center of the time window (and may vary with time). Aaron mentioned that the end-user usually just wants time stamp + data. Bob mentioned that if the native data did not include , it could be provided via a filter. Jeremy and Joey - were sent request to provide an example of such data and we will discuss if/how it would be represented in NcML. Responses below. Doug chimed in. We will not act on this now, but may refer back to the discussion the future.
  • Best option for summer meeting date [45]: Monday, August 8th at GMU from 9am - ? for high-level discussions. Tuesday and Wednesday will be work days. Bob will send announcement to HPDE list for anyone that wants to participate.
  • Doug's IDL code [46]. Bob will create similar version for MATLAB. These will form the "lite" version interfaces. A more full-featured version will be created using Autoplot's jumbojar. Telecon discussion: Why not just use existing OPeNDAP clients? Need to test these out. Better if we interface to a jar instead of relying on native (binary) MATLAB or IDL functions.
  • In the future, all communication should be on one of the lists here: [47]. Give a few warnings that the ad-hoc list will no longer be used and that if you want to receive messages, you will need to subscribe.

3.8.1. catalog.url

tss.properties has the variable dataset.dir Could it also have catalog.url?

This way, I could say

catalog.url = http://virbo.org/catalog.cgi?list=all

and my TSDS server will always be up-to-date without me having to hand-edit the file

/usr/local/tomcat/webapps/tsds/hpde/datasets/catalog.thredds

The convention would be that if catalog.url is given, then dataset.dir is ignored. Will revisit this on the next telecon.

3.8.2. Autoplot IOSP Discussion

  • Jeremy sent Doug latest JumboJar. Doug will create a zip file and instructions for installing as an add-on IOSP.

3.8.3. Documentation Request

As a placeholder for full documentation, Doug will provide a few simple "getting started" examples.

Scalar time series on a uniform time grid

  1. No external data file. Time series generated by NcML description.
  2. ASCII file (from http and local) with two columns and a header and comments interspersed (check this).
    1. Subset and display second column and ignore time stamps in first column.
    2. Subset and display second column using time stamps in first column.
  3. bin file
    1. Subset and display bin file with generated time stamps
    2. Subset and display bin file with time found in second bin file
  4. fbin file
    1. Subset and display fbin file with generated time stamps
    2. Subset and display fbin file with time found in second bin file

3.8.4. Time Window Discussion

Summary of responses to:

  • What are your thoughts on the timestamp+data versus time window + data discussion?
  • Would you (Jeremy and Joey) both provide a few sentences that describe other special/unusual data structure types that you think are important and that we may want to consider representing in TSDS?

3.8.5. D.L.

Here's a link to how the CF Conventions deal with data "cells". [48]

They define a "cell_methods" variable attribute that can have values which includes point, sum, mean, maximum, minimum, mid_range, standard_deviation, variance, mode, and median.

The time variable can have a "bounds" attribute as a pointer to a variable that contains the min and max times for each "cell".

3.8.6. J.M.

In some cases, there is no real good way of doing accurate timing if all one has is a timestamp+data since gap information may or may not be available. Often hardware folks are super obsessed with the timing information. CDF does not have enough timing support to be useful for some of these folks (although recent additions are helping).

Example 1: A data file has two timestamps that vary wildly. In most cases, this could be considered a gap, but often one is just guessing when making a plot. Consider Autoplot vers gplot's rendering of the same data. Autoplot is making guesses (RSW: guessing can be turned off, right?).



Example 2: The cadence of the instrument might be such that the gap is a few milliseconds or a few days, so to do really accurate timing, you need more than a timestamp if the data is not regularly spaced or has funny timing. With Mars Express, we were looking at really small time ranges so we could see the space in the flyback steps. I've attached an example of that as well. With the zooming feature, you can drill down into the plot to see the spacing.

3.8.7. J.F.

Example: Mode changes - Autoplot can plot E field RBSP data which is the combination of several scanning modes. The instrument scans rank 2 simple tables in low, mid and high frequency bands, and QDataSet represents all these together by adding a "JOIN" dimension that said that links them together into one dataset. We talked about this a week or two ago, and there are ways of avoiding this structure if we decide to not support it.

Further discussion: Best to implement this a layer on top of existing TSDS using aggregation, etc.

Other notes:

QDataSet has separate representation for the case where all one has is timestamp + data versus timestamp + time window. (For an overview of representations, see [49] slide number 9.)

To represent timestamp + data, DEPEND_0 (time) points to a rank 1 QDataSet (a time stamp). To represent time window + data DEPEND_0 would point to a rank 2 QDataSet ttag that has "bins" dimension with min and max for the second index:

ttag[i,0] is the i-th time min, ttag[i,1] is i-th time max.
ttag.property( QDataSet.BINS_1 )= "min,max"

Although QDataSet has a representation for these types of data, Autoplot's renderers don't necessarily. (Support is added as they are requested.)

Regarding Joey's comment about CDF timing: CDF has a fixed time base, rather then the NetCDF "seconds since <Time>", which is better for several reasons. (RSW: The latest proposed scheme seems to address this partially with the J2000 base [50]. Also, they do allow the specification of a time window + tick position as an attribute, in comparison to your approach, in which it is a variable).

3.9. May 27th

  • Nand mentioned http://code.google.com/p/protobuf/ - It looks like this is an efficient way to convert structured metadata into a binary format. You create a text file with metadata and compile it to a language-specific binary structure (C++, Java, and Python at present). The use for TSDS would be for encoding TSML and NCML files. I don't know how useful this would be at present because most time is spent on data not metadata.
  • Doug and Bobby give update on Bobby's question about using TSDS on new server that is about one month away from production. What is not available but needed for CDAWeb? Answer is that existing functionality is available already.
  • Best option for summer meeting date [51]: Monday, August 8th at GMU from 9am - ? for high-level discussions. Tuesday and Wednesday will be work days.
  • Joey is looking into what it will take to create an IOSP for his SDDAS data. Preliminary discussion indicated that the best route would be to set up server that emits the TSDS fast binary format as an alternative to using JNI interface to his C/C++ libraries for IDFS. IDFS is a separate file format standard for space physics data (similar to CDF) and we have quite a bit of it archived here.

3.10. May 5th

  • Discuss summer meeting schedule: http://www.doodle.com/he5sh6atcp5vihmn Looks like the week of August 1st is best. Will wait for feedback from HDPE email from others who want to join before finalizing date.
  • Discuss Bob's draft email for HDPE group (sent out on May 5th). Bob will send out tomorrow.
  • Further discussion of Doug's sample posted for April 15th
  • Discuss IOSP interface documentation provided by Doug [52]
  • Bob present tree widget

Other discussion I: Abstract Data Model (summary by Doug)

The original concern about the Unidata Common Data Model is that there is no clear definition of the abstract model. They have some UML but it is largely just a representation of pieces of the NetCDF-Java API, which is the only implementation of the model. Effectively, there is no abstract model, only the concrete implementation. To use the CDM, you need to use the NetCDF-Java API which largely means buying in to using NetCDF. About 1.5 years ago (AGU 2009), Jon and I discussed that they need a pure Java interface that could be implemented by other data model implementations, such as his. This is part of my goal with the new data model that is coming together for TSS2 (in addition to a data processing framework that I am working on).

In the mean time, the NetCDF IOSP approach gets us past many of these concerns. With an IOSP, you never have to convert your data to a NetCDF file, you just have to make it look like one by mapping a relatively straight forward API to the data source (which could be another API, like Jon's). As long as you can logically model your data as multi-dimensional arrays (or other CDM constructs) and serve up a Section (indexed subset) of a particular Variable (by reading the data yourself via another API or delegating to another service) then you can play. However, this only exposes your data via the NetCDF-Java API, and you still have to buy in to Unidata's lower level Array data model (ucar.ma2). The TSDS encapsulates all of this, but the IOS

Other discussion II: The filter implementation is somewhat immature. At the summer meeting we will need to discuss this more in the context of recent ideas about functional programming and the ViSAD data model.

Discuss: Conclusion: REST has become an overloaded term. The issue of how REST-ish TSDS is will come up often so it may help to know the background.

I've been digging a bit more into Representational State Transfer (REST) and what it means for a web service API to be RESTful. I have referred to OPeNDAP and being RESTful in the past, but have become more inclined to call it REST-ish. I had fallen victim to calling anything that puts the request in the URL and uses HTTP GET as RESTful as a backlash against Simple Object Access Protocol (SOAP) which typically packages everything up as XML in an HTTP POST. I have coined a new term for this: RINS: REST is not SOAP. It's the first step towards REST that you get when you "rinse" the SOAP out of your web service. I expect the blogosphere to be abuzz with this new term, soon.

I ran across a good article by software engineering guru Martin Fowler that may help explain just how RESTful a web service can be. Though a fine standard API to build on, OPeNDAP is surely lacking in this regard. There may be some ideas here to enhance our API.

http://martinfowler.com/articles/richardsonMaturityModel.html

3.11. April 15th

  • Q URI Compliance? A Using OPeNDAP's, which is pushing limits.
  • Q One of the examples has multiple files explicitly specified. Is there a way to not need to specify each file and just specify a pattern of files? A Multiple Data Set Scan allows for this. There is a discussion of this in the README.
  • Doug noted that aggregation in time is very slow. He will look into this more.
  • Discussion of releases: Releases now are named tsds-YYYY-MM-DD.war. In the future they will be tsds-YYYY-MM-DD-R#.war, where R# is the SVN revision. Post releases to SourceForge that are used on the aurora server; delete old directories and war files on aurora server; symlink tsds to the latest release directory.
  • Discussion of API RFC: Distinction between a filter and a function. URI format does not really follow REST conventions name=value pairs separated by ampersands. But OPeNDAP is a standard, so better to follow it (OPeNDAP and REST "conventions" are about equal in age).
  • Discussion of connecting to Jon's server/API. We should be able to create a single reusable IOSP that delegates to Jon's API.

Email from Doug:

I've packaged up the latest version of the server (as a war file) and a collection of "datasets" for it to serve. See the latest files on the sorceforge project (https://sourceforge.net/projects/tsds/files/). To try it yourself, download:

tsds-20110414.war: Simply copy this into the webapps directory of your tomcat server (or other Servlet container). (We've been developing with tomcat 6.)

hpde_data.tar.gz: Download this and unpack it (tar zxvf hpde_data.tar.gz) in a convenient location. You should get a "hpde_data" directory.

Within the tsds-20110414 webapp, which tomcat will have unpacked for you, find and edit the tss.properties file and change the "dataset.dir" property to the "datasets" directory within the hpde_data directory that you created above. You may need to restart tomcat for it to see the change. (While you are in tss.properties, note how writers and filters are configured by mapping a name to a Java class that implements the appropriate interface.)

For the less adventurous, I have deployed this at: http://aurora.gmu.edu:8080/tsds-20110414

You should see several variations of two datasets that I found at VSPO. The README in hpde_data.tar.gz says a bit more about them.

See the "ncml" files in hpde_data/datasets/ to see how these datasets are adapted. Some of them serve directly from the remote server while others serve from files stored in the hpde_data/data/ directory. Note, there are also "info" files that contain additional information that the server serves.

Look at the usage instructions on the main page. Try various combinations of the parameters to get a feel for what the API currently supports. Note that there are some combinations that might not work. This code is not production ready. Remember, we are focusing on the API more so than the features. :-)

Here's one to get you started: http://aurora.gmu.edu:8080/tsds-20110414/p11tr_hr_noag.csv?time,C1&time>=1991 200 00&C1<7&exclude_missing()&format_time(yyyy-MM-dd)

3.12. March 30th

  1. Discuss Doug's documentation example.
  2. Discuss point 5 from March 18 telecon.

3.13. March 18th

Next telecon: March 30th at 2 PM Eastern (See email from Vandegriff about call-in).

  1. Overview of Devel#HPDE_DAP
  2. Q: How is this related to SPASE-QL? A: SPASE-QL would make calls to a TSDS server to obtain information.
  3. Q: Haven't the SPASE-QL developers made something like this already (clients for IDL that allow a user to have to specify only a data set ID and time range and then software pulls down data from remote databases)? A: Yes, many developers have implemented ad-hoc versions of this: Aaron mentioned that he had Bernie create such an interface to CDAWeb for IDL, SPASE-QL, Jeremy's code has some of this functionality, Friedel (PaPCO), Weigel (early version of "tsds"; see Devel#Points_of_Clarification). TSDS will do this by building on standards. As noted in the proposal: "By addressing this problem now, we will prevent a future problem in which a data service developer or a scientist who uses an API to access data needs to implement many APIs, possibly one for each data service it uses."
  4. Q: Will TSDS have a response option that is SPASE-QL format? A: TBD.
  5. Q: How will we handle situation where a request is made that will take a very long time to process? A: TBD. Several ideas were suggested. Doug noted that we could put a job id in the http headers. The OPeNDAP protocol does not address this, and it was suggested that perhaps this should be the job of the client and that we should add minimal functionality to TSDS that would allow a client to be "smart". The two things a client needs to know is how much data will be returned and how long it will take.
  6. Plan for next telecon: Doug will provide a documentation example showing how to make a ftp site with multiple ASCII files "TSDS-enabled". Instructions will be given for downloading a stand-alone version (using Jetty), writing XML to describe the structure of the files, and then providing a few sample URIs for testing. (If time is a factor for getting this done before the next telecon, just provide a war file.) Bob noted that in the long run we will want instructions for doing the example starting from download and compile of source.

4. Archive

4.1. 2010

Most discussion took place over email.

4.2. 2009

4.2.1. November 13

  • Discuss why -1-v0 was dropped in Doug's api (will be fixed in next update)
  • Discuss what bin should output from Doug's server (always little endian)
  • Jeremy - IOSP that uses AP. (Doug will report on this next week.)

4.2.2. November 6

  • Discuss why -1-v0 was dropped in Doug's api (will be fixed in next update)
  • Discuss what bin should output from Doug's server
  • Jeremy - IOSP that uses AP. (Doug will report on this next week.)

4.2.3. October 30

  • IOSP for Autoplot proof of concept - Jeremy look into. If TSDS proposal is successful it will be under
  • THREDDS demo from Doug in a week or two.
  • TSDS source code on Sourceforge by end of Nov. Packages need to be renamed, etc.
  • Setting up Netbeans profiler jeremy and Ann offline discussion
  • Jeremy test /dev/null on windows
  • Bob better demo files
  • Jeremy - recreate filter doc.

Benchmark test

time wget "http://timeseries.org/cgi-bin/parseurl3.cgi?SourceAcronym_Subset1-1-v0-to_19890101-tf_20041230-ppd_24-filter_0-ext_bin.bin"
0.00user 0.00system 0:00.12elapsed 3%CPU
(Note result is about 0.15 for elapsed if cache is not used).
time wget "http://aurora.gmu.edu:8080/tsds-20091027/SourceAcronym_Subset1.bin?Variable1[0:149015]"
0.02user 0.46system 0:01.38elapsed 35%CPU
(Note tsds gives file that is 2x larger, which is being fixed):

4.2.4. October 23

  • Go over examples Bob needs to implement Mode 1. Doug needs to make this type of access possible, and then we can decide if it is fast enough or should be done in perl.
  • Discuss a function that converts parameter name to file (for Mode 2)
  • Bob read up on THREDDS and Doug and Anne help.

4.2.5. October 2

  • Doug and Jeremy send emails to netcdf people about how to put autoplot urls in ncml
  • Doug - Update and fix links for union and master example that uses Types 1-3 and post ncml on wiki. Add vector components example to page too.
  • Doug decide on h5 format for each type.
  • Doug - Move notes from lasp wiki to timeseries.org/Devel. It is okay if it is not in polished form.

4.2.6. Next telecon

Discuss regexps and comma delimiters at

4.2.7. August 7

  • Bob - try to get tsds.org (again)
  • Doug - Make core project tsds. and then have tsds.core, tsds.io.qdataset, tsds.io.bin, tsds.filters.statistics, tsds.api, etc.?
  • Jeremy and Doug - Look into what will be involved to creat iosp for qdataset.
  • All - make sure that when you send an email with "content" to copy it to either the Telecon or the Devel wiki.
  • Anne - Make sure email with research about Aptana is on timeseries.org/Telecons.
  • Anne, Doug - Look into OPeNDAP 4 web service. Look into for next major release. Still need to understand versions they have an the differences between 3 and 4 releases.
  • Anne - Look into how to label components of a vector in NCML. Put example on Devel page
  • Doug - Create union and master example that uses Types 1-3 and post ncml on wiki. Doug to fix and update

4.2.8. June 26

  • Bob set up mirror of aurora in vm. Just create Ubuntu vm and mirror directories. http://communities.vmware.com/docs/DOC-5751
  • Anne email Aptana, and into java monitoring jrockit, stress testing from LASP?
  • Anne have Doug send IOSP for DB
  • Discuss how IOSP will be written for QDATA set. This will be in proposal. Idea is to be able to specify something like this in the netcdf file:

<netcdf location="vap+cdf:http://cdaweb.gsfc.nasa.gov/istp_public/data/ace/swe/$Y/ac_k0_swe_$Y$m$d_v...cdf?Np&timerange=20000101to20070117" iosp="org.virbo.autoplot">

<netcdf location="/dev/null" iospparam="vap+cdf:http://cdaweb.gsfc.nasa.gov/istp_public/data/ace/swe/$Y/ac_k0_swe_$Y$m$d_v...cdf?Np&timerange=20000101to20070117" iosp="org.virbo.autoplot">

  • Something else to consider regarding TSML/NcML. It would be a good idea to develop a tsml schema and use a tsml namespace within the files. I'm finding myself wanting to have data type attributes (e.g. "Scalar", "Vector", "Spectrum") in the ncml but ncml already has a "type" attribute for "variable"s. I'm thinking about a tss schema/namespace so I could say tss:type="Spectrum".
  • I see that the current ncml files for the SourceAcronym data have "location" pointing to the HDF files. I've been doing most of my testing with your binary files. Should I concentrate more on the HDF? Also, I think we need to iterate a bit on the NCML+TSML (http://timeseries.org/Telecons#May_15). It's not clear to me how your use of get.cgi to output NCML+TSML fits in. (I probably just need to review my notes and the wiki.) Do you have a use case written up somewhere?

4.2.9. June 19th

  • Proposal discussion: Will be writing a proposal for server in July. Due July 22nd.
  • Doug re-install server for SourceAcronym. Look at email about installing new versions and using symlinks to point to active version.
  • Doug start to document API on timeseries.org/Devel.
  • Bob modify get.cgi to output NCML+TSML for Types 2-6.
  • Ann integrate test code in Doug's source code.
  • Review Ann's emails about Aptana.
  • Jeremy integrate pack200 + proguard into release process. Release both compressed and uncompressed versions.
  • Doug and Jeremy: Start discussing how IOSP will be written for QDATA set. This will be in proposal.

4.2.10. May 15

  • Bob think about running Hudson on remote machine, either DreamHost or AWS.
  • Jeremy write parser for below Done'
  • Doug re-install server for SourceAcronym. Look at email about installing new versions and using symlinks to point to active version.
  • Doug see if time can be ISO 8601 in NCML files. If so, Bob start outputting it.

NCML+TSML === DataModel SPASE === MetaDataModel Type 1 NCML+TSML example: (to be compared to http://timeseries.org/data/SourceAcronym/Subset1/xml/SourceAcronym_Subset1-1-v0-TSML-0.xml)

<?xml version="1.0" encoding="utf-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
   <attribute name="title" value="Variable1 (SourceAcronym Subset1 1-hour)"/>
   <attribute name="Conventions" value="COARDS"/>
   <attribute name="Conventions" value="TSML"/>
   <attribute name="ScienceDataModel" value="URL"/>
   <attribute name="DataType" value="time_series"/><!--or vector or spectrogram-->
   <attribute name="StartDate" value="1989-01-01"/>
   <attribute name="StopDate" value="2005-12-31"/>
   <attribute name="PointsPerDay" value="24"/>
   <aggregation type="union">
      <netcdf>
         <dimension isUnlimited="true" length="149016" name="time"/>
         <variable name="time" shape="time" type="double">
            <attribute name="long_name" type="String" value="time"/>
            <attribute name="units" type="String" value="hours since 1989-01-01 00:00:0.0"/>
            <values increment="1.0" start="0.5"/>
         </variable>
      </netcdf>
      <netcdf location="SourceAcronym_Subset1-1-v0.tsds" iosp="org.timeseries.tsds" iospParam="filter4">
         <dimension isUnlimited="true" length="149016" name="time"/>
         <variable name="Variable1" orgName="ExtendibleArray" shape="time" type="double">
            <attribute name="long_name" type="String" value="Variable1"/>
            <attribute name="cformatstring" type="String" value=".16f"/>
            <attribute name="units" type="String" value="Unit"/>
            <attribute name="_FillValue" type="double" value="NaN"/>
         </variable>
      </netcdf>
   </aggregation>
</netcdf>

4.2.11. May 6

  • Ann - Hudson not small, easy task. Give bug problem another week. Anne send Jeremy bug report.
  • Bob create examples 1-6 and review next week on telecon. TSML conclusion - make TSML look NCML (or a NCML convention) and Jeremy writes a reader for TSML that does not require netCDF library (then we get spectrogram support). TSML will always use aggregation and types 1-6.
  • Doug - get tabular output (highest priority)
  • Doug - binary output format
  • Invalids discussion - not complete, Doug is still thinking about. Will document on wiki page when done. For now will implement only missing_value which will always be NaN for timeseries server. Others will be used for ingest.

4.2.12. April 17

New URL structure is

Plasma Wave Group's Hudson server:

  • Doug - Create TSDS project at sourceforge ... home page should point to http://timeseries.org/tsds
  • Ann - Set up hudson on aurora eventually. Set up hudson on her system and run some tests.

Discussion of invalids

  • pad_value: values inserted before or after the last available measured value. (Not implemented, not netCDF conventions.)
  • valid_range: (Not implemented, not netCDF convention. Used in QDataSet because it allows one to define escape codes using out-of-range numbers.
  • _FillValue values (Called pad_value in CDF) inserted where no measurement existed (to keep data on uniform time grid).
  • missing_value normally a value should have been there, but something went wrong and the value was removed (i.e., a data spike)

Discuss api for recipe formation (Ann and Bob)

ASCII output completion

TSML conclusion - make TSML look NCML, and jeremy writes a reader for TSML that does not require netCDF library (then we get spectrogram support). TSML will always use aggregation and types 1-6.

filters Bob and Jeremy will design filter API today. We'll send this to Doug today for his review. See FilterAPI_1 link on Devel.

4.2.13. April 10

  • Bob requested that we start to use the wiki Devel page for examples. Instead of sending them via email, send a link via email.
  • Doug noted that IO for bin format is almost done. The O is not complete, and it has not been moved over to aurora.
  • Discussion of amperstand in filter URL. Jeremy would prefer pipe before the ten_...
xhttp://server/MyData.asc?time>2009-01-01&ten_point_mean_on_uniform_time_grid()
  • Jeremy mentioned implementation of pipe streams in das2 server.
  • Ann will start to move the benchmarks to Doug's svn. Ann will look into hudson. We should keep in mind that eventually we will want to expose this code via SourceForge, which has its own suite of tools.
hudson: https://hudson.dev.java.net/
deployable under servlet containers
support for cvs and svn
test with junit, unittest++
  functional testers: FIT, Selenium, Watir
publishes reports
3 build levels: successful, unstable, failed
Eclipse plugin
project source code viewable by all
  security enabled limits writing to authenticated users   ???

CI Feature matrix:

http://confluence.public.thoughtworks.org/display/CC/CI+Feature+Matrix

Comments on the matrix: 

http://docs.codehaus.org/display/DAMAGECONTROL/Continuous+Integration+Server+Feature+Matrix  


  • Bob will try Eclipse install next week when he visits Jeremy. Doug, have you moved your Eclipse stuff into the HEAD branch?
  • Filter discussion - Bob wants filters to satisfy two things: (1) Easy to implement, and (2) Fast
  • netCDF with unsigned applet? Jeremy, Doug, do research.
  • Discussed flow chart of server at http://timeseries.org/Devel#Server_Modes

http://opendap.org/api/pguide-html/pguide_27.html

4.2.14. April 3

I have not tested it yet and there are a few things that are needed

  1. Integrate this with servlet releases.
  2. Add timeRange discussion at http://autoplot.org/servlet_guide
  3. Create a java program that generates a html form for all of the available system font options (Name, sizes, bold, italic) as a drop-down list that we can cut-and-paste into this html form.
  • Doug has reduced size of netCDF - verified 13 down to 3
  • Jeremy release with reduced set of jars
  • Next release will implement pack200. Implement for non-generic server first. Figure out generic later.
  • We discussed how NCML is for data transport and TSML is for prescribing how ncml can be used to get views and how they are two different functions.
  • Jeremy send current flow chart. Bob modify for time series browse library. What are inputs, what are outputs?
  • Doug: Send notes on timeformat of URL - post on timeseries.org/Devel page and replace existing
  • Possibility of outputting of .bin format -Doug is working on this, but we will still have the get.cgi method high performance server allow stripping bits off of server.
  • Doug: Create text or dods output based on NCML input. This is pending. Bob needs to re-think. If input to Autoplot is something that can be put into QDATA set as a cube that can be outputted as a table, this would work. But additional code is needed to output Create flow chart Can ncml aggregate on an OPeNDAP URL?
  • Doug/Ann: Look into OPeNDAP spec for functions. Create a trivial

function that operates on a netCDF object. Clarification - each parameter is stored on the disk as a simple array. What I am looking for is the equivalent of the HDF5 "filter" operation - when the array is read from the file it is passed through a filter. The trivial function is to add one to every value that is passed through, independent of if the array represents a time series, spectrogram, etc. Eventually we will hook in all of the QDATASET filters, but I still want the option of doing something simple without having to understand the netCDF and QDATASET data model or need the das2 jar files.

  • Doug/Ann: Debate THREDDs versus NCML representation of test URLs.

Conclusion - Just use ASCII file for now.

  • Ann: Set up unit testing framework for test URLs. Keep option for

wget, but have option turned off by default. Make wget's output less verbose.

  • Jeremy/Doug: Modify Autoplot to understand new data structure for spectrogram.
  • Jeremy: Add a control that allows zoom to be re-set and and option

for the right-click menu to be disabled.

  • Bob: Set up applet demo for new applet that was released.

5. Appendix - Old notes (pre 2011)

Many of the notes here are out-of-date but are kept for reference.

5.1. Source Code

5.2. Installation

5.3. Server Configuration

Use either explicit paths as in http://host:8080/tsds-YYYYMMDD/ or edit the apache configuration file to have

 RewriteCond %{REQUEST_FILENAME} !/tsds-[0-9][0-9][0-9][0-9][0-9][0-9][0-9 [0-9]/.*$
 RewriteCond %{REQUEST_FILENAME} !/tsds/.*$
 ProxyPass /tsds ajp://localhost:8009/tsds
 ProxyPassReverse /tsds ajp://localhost:8009/tsds
 /usr/local/tomcat/webapps
 sudo ln -s tsds-20091027 tsds

5.3.1. Server Modes

The time series server has four modes of access. (We have decided to only implement one mode. See documentation.) Modes are numbered in terms of decreasing speed and increasing ease-of-use.

Only Mode 3 is currently implemented

xhttp:// is used for a URL that does not exist or is not implemented

5.3.1.1. Mode 1

In Mode 1 the TSDB server takes inputs of

  • The URL to the flat binary array
  • StartIndex: The first value to access
  • StopIndex: The end value to access

And the output is

  • (StartIndex_file-EndIndex_file+1) 64-bit little-endian IEEE-754 formatted values (subsequently referred to as the "bin" format).

For example the parameter SourceAcronym_Subset1-1-v0, it is stored on the disk and is exposed through the URL

http://timeseries.org/data/SourceAcronym/Subset1/SourceAcronym_Subset1-1-v0.bin

and can be directly downloaded. The access metadata for this parameter is stored at

xhttp://timeseries.org/data/SourceAcronym/Subset1/xml/SourceAcronym_Subset1-1-v0.tsml.

The metadata used by the server in this access mode is the only number of values in the file and the data type (for now always 64-bit IEEE 754 floating point little endian). If the end-of-file is reached before the (StartIndex_file-EndIndex_file+1) values are returned, the server returns NaNs.

To access a subset of this parameter, one can use

  • The http byte-range header field. For example to access the first two values of SourceAcronym_Subset2-1-v0, one could use
curl --range 0-15 http://timeseries.org/data/SourceAcronym/Subset2/bin/SourceAcronym_Subset2-1-v0.bin | od -t f 8 -

which returns

0000000   1.000000000000000e+00   2.000000000000000e+00
  • The url xhttp://timeseries.org/get.cgi?param=SourceAcronym_Subset-1-v0&StartIndex=A&EndIndex=B will return the binary data. For example
curl xhttp://timeseries.org/get.cgi?param=SourceAcronym_Subset-1-v0&StartIndex=0&EndIndex=1 | od -t f 8

which returns

0000000   1.000000000000000e+00   2.000000000000000e+00

5.3.1.2. Mode 2

The inputs are

  • Param: The key of the parameter
  • StartDate: YYYYMMDD
  • EndDate: YYYYMMDD
  • Filter: mean, max, min, std
  • PPF: Number of points per filter operation.

The output is a bin-formatted stream containing all data points with time stamps in the range given by StartDate to EndDate. Note that the Filter and PPF are meant for use on data types 1-3. If the data to be returned are on a uniform time grid, the response is NaN padded before and after.

The url xhttp://timeseries.org/get.cgi?Param=SourceAcronym_Subset-1-v0&StartDate=A&EndDate=B&Filter=C&PPF=D will return binary data.

The metadata contains

  • Number of points
  • First date when data are available
  • Last date when data are available
  • Time information (uniform cadence or a pointer to a file).
  • Availble PPF and Filters

5.3.1.3. Mode 3

OPeNDAP server which is built on open standards and open source software. It is designed to support requests for time series data, i.e. any data that are a function of time. The server runs as a Java Servlet and handles http requests of a form that is compliant with the OPeNDAP specification. The server returns a response based on the parameters in the request URL.

A request is made in the form of an http URL and the results are streamed back to the client (e.g. a web browser or other application). A data request URL has the OPeNDAP compliant form:

 http://host:port/tsds/dataset.suffix?constraint_expression
 
 host:    Name of the computer running the server
 dataset: Name of a dataset that the server is configured to serve
 suffix:  The type/format of the output
 constraint_expression: A collection of request parameters such as time range and filters to limit the results

The type of response, including data format, is specified by the "suffix." Some of these output request options are specified by the OPeNDAP specification and result in standard output that any OPeNDAP enabled client can use. This server adds some additional output options, primarily to filter the data or deliver it in the desired format.

Some standard OPeNDAP suffixes:

info  Information about the dataset and request options
html  HTML view of dataset information and a form for requesting data
dds   Dataset Descriptor Structure (ASCII)
das   Dataset Attribute Structure (ASCII)
dods  Data object as defined by the Data Access Protocol (DAP)
asc   Data object represented as ASCII

Other output data format options:

csv   comma separated values
bin   raw unformatted binary
dat   tabular output

OPeNDAP specifies, and this server implements support for, constraint expressions of the form:

 var1,var2&time>t1&time<t2&filter()
 
 var1,var2: list of variables to return, all if none specified
 time>t1:   time range constraint where t1 is in native time format or ISO8601 (e.g. yyyy-mm-dd)
 filter():  an algorithm to be applied to the data on the server before being sent to the client

NcML files are used to specify the use metadata. Examples are given at http://timeseries.org/Devel#Data_Types

5.3.1.4. Mode 4

The inputs are

  • A list of parameters
  • StartDate
  • EndDate
  • Filter (one or the same number of parameters).
  • Ouput format (ASCII text table, h5, etc.)

These inputs are posted to

http://timeseries.org/aggregation.cgi

which returns, i.e.,

aggregation.ncml

or

aggregation.jy

which is a aggregation receipe containing urls that allows one to form the requested data set by using Access Modes 1-3.

This URL is posted to, for example,

http://timeseries.org/get.cgi?out=txt

The questions are:

  1. What does the receipe look like? NCML aggregation (are constraint expressions allowed? Doug - In theory, I can put a TSS (OPeNDAP) URL in the NcML "location". It's not trivial (fundamentally straight forward + devil in details), so I'll save that for later if we decide to make use of it.)
  2. Who does the aggregation? Autoplot's data reader or do we write more server-side code?
  3. How does time series browse work?

5.3.2. Data Types

There are six basic data types for which this server was designed. All of the data types are timeseries-like in the sense that it is assumed that the longest dimension is time.

Use the following section to determine the required integer ranges given a requested time range. http://www.unidata.ucar.edu/software/netcdf/ncml/

5.3.2.1. Type 1

Scalar time series on uniform time grid:

where the .bin file contains N 64-bit little endian floating point values X(1),X(2),...,X(N)

The ncml file provides the information require to turn a requested time range into a byte range: “hours since 1989-01-01 00:00:0.0”.

5.3.2.1.1. Benchmarks

Benchmarks

5.3.2.2. Type 2

Vector (two component) time series on uniform time grid:

where the .bin file contains N 64-bit little endian floating point values

Vx(1),Vy(1),...,Vx(N),Vy(N)

The number of elements in the vector and vector component labels are stored in the .ncml file.

5.3.2.3. Type 3

Spectrogram (10 channel) on uniform time grid:

A(1,f1),A(1,f2),A(1,f3),...,A(N,f1),A(N,f2),A(N,f3),..., The channel values are stored in the .ncml file and are assumed to much fewer than the number of data points.

5.3.2.4. Type 4

Scalar time series on non-uniform time grid:

X(1),X(2),...,X(N)

The associated time value in the .ncml points to http://timeseries.org/data/Type4/bin/Type4-t-v0.bin which has

T(1),T(2),...,T(N)

5.3.2.5. Type 5

Vector (two component) on non-uniform time grid:

Vx(1),Vy(1),...,Vx(N),Vy(N)

The associated time value in the .ncml points to http://timeseries.org/data/Type4/bin/Type4-t-v0.bin which has http://timeseries.org/data/Type5/bin/Type5-t-v0.bin

T(1),T(2),...,T(N)

5.3.2.6. Type 6

Spectrogram (10 channel) time series on non-uniform time grid:

T (1),T(2),...,T(N)

The associated time value in the .ncml points to http://timeseries.org/data/Type4/bin/Type4-t-v0.bin

which has

http://timeseries.org/data/Type6/bin/Type6-1-v0.bin

A(1,f1),A(1,f2),A(1,f3),...,A(N,f1),A(N,f2),A(N,f3),...,

5.3.3. Combinations

5.3.3.1. Type 1-3

5.3.3.2. Type 4-6

5.4. Data Access

Given the above low-level description of the data base, it would be straightforward to write a client that was able to access a subset of data, provided that it was on a uniform time grid. In the case of a non-uniform time grid, one would need to make a number of queries to the time parameter in order to determine the required range of data. The next layer that we add to this data base is an OPeNDAP server. Using this, we can quickly determine the required time range: read the .ncml file, determine the offset, and then make a byte-range request.

In addition, our server has the option of outputting a short script that reads data from the server. This approach was taken for two reasons: (1) A single file format is not needed, and (2) we can more easily upgrade.

5.5. REST API

5.5.1. Type 1

5.5.2. Type 2

5.5.3. Type 3

5.5.4. Script

Example scripts

5.5.4.1. Jython

5.5.4.2. Excel/VB

5.5.4.3. MATLAB

5.5.4.4. IDL

(Not yet written). See http://autoplot.org/ViRBO_Interface#IDL for rough example.

5.5.5. Basic API

The time series server accepts index ranges. To convert a time range, with time represented in the form YYYY-MM-DDTHH:MM:SS.[0-9], to indices, one makes a call to http://timeseries.org/time2index.jsp

For example, the parameter SourceAcronym_Subset1-1-v0 has structure assigned by this file: http://timeseries.org/data/SourceAcronym/Subset1/bin/SourceAcronym_Subset1-1-v0.ncml so if I said

http://timeseries.org/time2index.jsp?SourceAcronym_Subset1-1-v0

or

http://timeseries.org/time2index.jsp?SourceAcronym_Subset1-1-v0&to=1970

or

http://timeseries.org/time2index.jsp?SourceAcronym_Subset1-1-v0&to=1970-01

or

http://timeseries.org/time2index.jsp?SourceAcronym_Subset1-1-v0&to=1970-01-01

or

http://timeseries.org/time2index.jsp?SourceAcronym_Subset1-1-v0&to=1970-01-01T00

or

http://timeseries.org/time2index.jsp?SourceAcronym_Subset1-1-v0&to=1970-01-01T00:00

etc.

the response is

idxo === 0
idxf === 149015

If I said http://timeseries.org/time2index.jsp?SourceAcronym_Subset1-1-v0&to=1970-01-01T00:00:01 the response is

idxo === 1
idxf === 149015

because the first data point has time stamp of T00:00:00 and the second at T00:00:01.

Given this information, I could make a second request: http://aurora.gmu.edu:8080/TimeSeriesServer/SourceAcronym/Subset1/bin/SourceAcronym_Subset1-1-v0.ascii?time>=1&time<=149015

5.5.6. Filter API

A filter takes an input of the URI to a bin-formatted file, its tsml metadata, and filter procedure call arguments and returns a bin-formatted file, tsml metadata, and the remote procedure call response. For example,

mean xhttp://timeseries.org/data/SourceAcronym/Subset1/SourceAcronym_Subset1-1-v0.bin xhttp://timeseries.org/data/SourceAcronym/Subset1/SourceAcronym_Subset1-1-v0.tsml http://http://filters.org/ten_point_mean_on_uniform_time_grid.js

where the .js file is an JSON-RPC (or XML-RPC) result. The result would be a bin stream that is 1/10 the size of the input file.

Jeremy's specification for filter plug-ins that will work with Autoplot and the TimeSeriesServer is here: FilterAPI_1

http://carocoops.org/twiki_dmcc/bin/view/Main/DODSFilterColumn it seems pretty thin - it does not seem to support a format string on the output data. I can't believe something like this is not in the main DODS code base.

Data filters that may be relevant to our discussion today (that unfortunately seem to be intertwined with the netCDF data model): http://nco.sourceforge.net/

And a few other notes on averaging and reducing: http://www.cics.uvic.ca/scenarios/other/transient_highres/Appendix_I.pdf http://dust.ess.uci.edu/smn/smn_nco_ams_200701.pdf http://dust.ess.uci.edu/ppr/abs_xtn_ZeW07.pdf

Give me a call after you have read this. Here is my "diagram"

(1) User makes a REST-style request to TSDS's get.cgi (my server code that currently exists) from web form, autoplot or another web service.

URL has parameter, time start, time stop, filter option, output

option. (2) get.cgi translates request to OPeNDAP URL. In most cases, input URL should map somewhat cleanly to an OPeNDAP URL. Request requires extraction of data from a single file on disk so data access is fast. No filtering is performed and no aggregation is needed (it has been done before-hand). (3) get.cgi returns data to user in an OPeNDAP-supported output format.

Here's some info on the THREDDS metadata model: http://www.unidata.ucar.edu/projects/THREDDS/tech/catalog/Primer.html http://www.unidata.ucar.edu/projects/THREDDS/tech/catalog/InvCatalogSpec.html

It is defined as XML Schema. Some examples are included. Here's the time coverage, for example: http://www.unidata.ucar.edu/projects/THREDDS/tech/catalog/InvCatalogSpec.html#timeCoverageType

http://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg03303.html

Retrieved from "http://tsds.org//Telecons"
Personal tools