Edit this page

NWB Workshops and Hackathons

Back to Projects List

Advanced Data I/O via PyNWB

Key Investigators

Project Description

Enhance and gather requirements for advanced data I/O features, e.g.:

Objective

  1. Create list of requirments for the various advanced I/O features
  2. Expand existing advanced I/O features as needed to better support the requirements
  3. As approbriate, prioritize and define plan for how the features could be implemented

Current functionality

  1. Basic compression is currently supported via the H5DataIO class. An example for how to use H5DataIO is part of the PyNWB docs http://pynwb.readthedocs.io/en/latest/example.html#compressing-datasets .
  2. Iterative data write (and streaming) are currrently supported via:
    • DataChunkIterator Class for defining and iterating over data chunks. A number of additional classes related to the iterative data write are defined in the data_utils module, e.g. AbstractDataChunkInterator, DataChunk, ShapeValidator, etc.
    • HDF5IO implements the actual iterative data write (see __chunked_iter_fill__ function)
    • monitoring
    • A start for a tutorial for iterative data write is on the following branch but its far from complete: https://github.com/NeurodataWithoutBorders/pynwb/compare/iter_write_tutorial
  3. External files are currently supported through “reuse” of NWBContainers and through passing in of h5py.Dataset objects. Some known needs are:
    • See progress below Instead of using h5py.Dataset as inputs to NWBContainers to then create external links, this behavior should be made explicit by wrapping the datasets using HDF5IO and then configuring things on the container. This is needed to 1) make it explicit to users whether ExternalLinks are being created, 2) enable copy vs. linking of data, 3) facilitate error checking for mismatching attributes
    • TODO Need to add error checking ot ensure that attributes on the dataset match what the user is providing

Approach and Plan

  1. Review existing functionality in PyNWB for compression, iterative write, streaming,, external files, and parallel I/O
  2. Identify missing features
  3. Prioritize and define plan for implementing missing features and identify implementation leads for the different features.

Progress and Next Steps

Illustrations

Background and References