NWB Workshops and Hackathons

Advanced Data I/O via PyNWB

Enhance and gather requirements for advanced data I/O features, e.g.:

Create list of requirments for the various advanced I/O features
Expand existing advanced I/O features as needed to better support the requirements
As approbriate, prioritize and define plan for how the features could be implemented

Basic compression is currently supported via the H5DataIO class. An example for how to use H5DataIO is part of the PyNWB docs http://pynwb.readthedocs.io/en/latest/example.html#compressing-datasets .
Iterative data write (and streaming) are currrently supported via:
- DataChunkIterator Class for defining and iterating over data chunks. A number of additional classes related to the iterative data write are defined in the data_utils module, e.g. AbstractDataChunkInterator, DataChunk, ShapeValidator, etc.
- HDF5IO implements the actual iterative data write (see __chunked_iter_fill__ function)
- monitoring
- A start for a tutorial for iterative data write is on the following branch but its far from complete: https://github.com/NeurodataWithoutBorders/pynwb/compare/iter_write_tutorial
External files are currently supported through “reuse” of NWBContainers and through passing in of h5py.Dataset objects. Some known needs are:
- See progress below Instead of using h5py.Dataset as inputs to NWBContainers to then create external links, this behavior should be made explicit by wrapping the datasets using HDF5IO and then configuring things on the container. This is needed to 1) make it explicit to users whether ExternalLinks are being created, 2) enable copy vs. linking of data, 3) facilitate error checking for mismatching attributes
- TODO Need to add error checking ot ensure that attributes on the dataset match what the user is providing

Review existing functionality in PyNWB for compression, iterative write, streaming,, external files, and parallel I/O
Identify missing features
Prioritize and define plan for implementing missing features and identify implementation leads for the different features.

The following pull request has been merged: https://github.com/NeurodataWithoutBorders/pynwb/pull/400
- Allow use of HDF5IO to configure creation of external links
- Allow customization of default behavior when h5py.Dataset objects are used as input on write
- Expand the list of supported I/O parameters on HDF5IO to allow chunking, compression, etc. options to be set explicitly
- Some minor improvements to DataChunkIterator
Next steps:
Oliver to create Advanced Data I/O tutorial for the hackathon at LBNL
Enhance HDF5IO to create a queue with all DataChunkIterators to allow customization of how the write of DataChunkItertors is handled (see use case described in https://github.com/NeurodataWithoutBorders/pynwb/pull/310 and https://github.com/NeurodataWithoutBorders/pynwb/pull/309)
MPI I/O