The goal of the following document is to provide users of the NWB:N with additional guidelines on common best practices to facilitate consistent use of the standard and help avoid common problems and most effectively leverage the NWB:N data standard and its ecosystem of software tools.
Oliver Ruebel, Andrew Tritt, Ryan Ly, Ben Dichter, …
last edited: Aug 19, 2019
To enable NWB:N to accommodate the needs of the diverse neuroscience community, NWB:N provides a great degree of
flexibility. In particular, the number of instances of a particular
neurodata_type and corresponding names are often
not fixed, to enable, e.g., storage of data from arbitrary numbers of devices withing the same file. While this
flexibility is essential to enable coverage of a broad range of use-cases, it can also lead to ambiguity. At the same
time, we ultimately have the desire to have the schema as strict-as-possible to provide users and tool builders with a
consistent organization of data. As such, we need to strike a fine balance between flexibility to enable support
for varying experiments and use-cases and strictness in the schema to enforce standard organization of data. The
following “best practices” provide advice from developers and experienced users that outline some of the pitfalls to
avoid and common usage patterns to emulate.
NWBFile object generally contains data from a single experimental session.
NWBFile has two distinct places for ids:
sesison_id field marks unique experimental sessions. The
session_id should have a one-to-one
relationship with a recording session. Sometimes you may find yourself having multiple NWB files that correspond to
the same session. This can happen for instance if you separate out processing steps into multiple files or if you want
to compare different processing systems. In this case, the
session_id should be the same for each file. Each lab
should use a standard for
session_id so that sessions have unique names within the lab and the sessions ids are
identifier tag should be a globally unique value for the
NWBFile. Two different
NWBFiles from the same
session should have different
identifier values if they differ in any way. It is recommended that you use a unique id
generator like uuid to ensure its uniqueness and it is not important that the
identifier field is human readable.
Many of the neurodata_types in NWB inherit from the
TimeSeries neurodata_type. When using
TimeSeries or any of its descendants, make sure the following are followed.
Time dimension goes first. In
TimeSeries.data, the first dimension on the disk is always time. Keep in mind that the dimensions are reversed in MatNWB, so in memory in MatNWB the time dimension must be last. In PyNWB the order of the dimensions is the same in memory as on disk, so the time index should be first.
ElectrialSeries are reserved for neural data.
ElectrialSeries holds signal from electrodes positioned in or around the brain that are monitoring neural activity, and only those electrodes should be in the
electrodes table. Use
TimeSeries for other data in units Volts, such as the reading on a force-sensitive resistor.
times are always in seconds.
starting_time should be in seconds with respect to the
TimeSeries data should be stored as one continuous stream. Data should be stored in one continuous stream, as it is acquired, not by trial as is often reshaped for analysis. Data can be trial-aligned on-the-fly using the
trials table. Storing measured data as a continuous stream ensures that other users have access to the inter-trial data, and that we can align the data with whatever window they need. If you only have data in specific segments of time, then only include those timepoints in the data. Use
timestamps, even if there is a constant sampling rate within each segment, and have the
timestamps correctly reflect the gaps in the recording. Use the
TimeSeries.description field to explain how the data was segmented.
If the sampling rate is constant, use
TimeSeries allows you to specify time using either
starting_time (which defaults to 0). For
TimeSeries objects that have a constant sampling rate,
rate should be used instead of
timestamps. This will ensure that you can use analysis and visualization tools that rely on a constant sampling rate.
DynamicTable allow you to define custom columns,
which offer a high degree of flexibility.
False) are not used in the core schema, they are a supported data type, and we encourage the use of
DynamicTablecolumns with boolean values. For instance, boolean values would be appropriate for a
correctcustom column to the trials table.
Times are always stored in seconds in NWB:N. This rule applies to times in
across NWB:N in general. E.g., in
TimeInterval objects such as the trials and
stop_time should both be in seconds with respect to the
(which by default is set to the
Additional time columns in
TimeInterval tables (e.g., trials) should have
_time as name suffix. E.g., if
you add more times in the trials table, for instance a subject response time, name it with
_time at the end (e.g.
response_time) and store the time values in seconds from the
timestamps_reference_time, just like
timestamps_reference_time if you need to use a different reference time. Rather than relative times,
it can in practice be useful to use a common global reference time across files (e.g., Posix time). To do so, NWB:N
allows users to set the
timestamps_reference_time which serves as reference for all timestamps in a file. By default,
timestamp_reference_time is usually set to the
session_start_time to use relative times.
The ‘location’ field should reflect your best estimate of the recorded brain area. The
'location' column of the
electrodes table is meant to store the brain region that the electrode as in. Different
labs have different standards for electrode localization. Some use atlases and coordinate maps to precisely place an
electrode, and use physiological measures to confirm its placement. Others use histology or imaging processing
algorithms to identify regions after-the-fact. You fill this column with localization results from your most accurate
method. For instance, if you target electrodes using physiology, and later use histology to confirm placement, we would
recommend that you add a new column to the electrodes table called
'location_target', set those values to the original
intended target, and alter the values of
'location' to match the histology results.
Use established ontologies for naming areas It is preferable to use established ontologies instead of lab conventions for indicating anatomical region. We recommend the Allen Brain Atlas terms for mice, and you may use either the full name or the abbreviation (do not make up your own terms.)
The location column of the electrodes table is required. If you do not know the location of an electrode, use
As a default, name class instances with the same name as the class. Many of the
in NWB:N allow you to set their name to something other than the default name. This allows multiple objects of the same type to be stored
side-by-side and allows data writers to provide human-readable information about the contents of the neurodata_type. If
appropriate, simply use the name of the neurodata_type as the name of that object. For instance, if you are
ElectricalSeries object in
/acquisition that holds voltage traces for a multichannel recording, consider
simply naming that object
"ElectricalSeries". This is the
default_name for that object, and naming it like this will increase
your chances that analysis and visualization tools will operate seamlessly with you data.
There may be cases where you have multiple neurodata instances of the same type in the same Group. In this case the instances must have unique names. If they are both equally important data sources, build upon the class name (e.g.
"ElectricalSeries_2"). If one of the instances is an extra of less importance, name that one something different (e.g.
Names are not for storing meta-data. If you need to place other data of the same neurodata_type, you will need to
choose another name. Keep in mind that meta-data should not be stored solely in the name of objects. It is OK to name an
object something like “ElectricalSeries_large_array” however the name alone is not sufficient documentation. In this
case, the source of the signal will be clear from the device of the rows from the linked electrodes table region, and you should also include
any important distinguishing information in the
description field of the object. Make an effort to make meta-data as
explicit as possible. Good names help users but do not help applications parse your file.
’/’ is not allowed in names. When creating a custom name, using the forward slash (
/) is not allowed, as this
h5py and lead to the creation of an additional group. Instead of including a forward slash in the name,
please use “
Over” like in
ProcessingModuleswith custom names.
ProcessingModulesare themselves neurodata_types, and the other rules for neurodata_types also apply here.
TimeSeriesinstance has a
unitas an attribute of the
dataDataset, which is meant to indicate the unit of measurement of that data. We advise using SI units. Time is always in units of seconds.
io.write(filepath, cache_spec=True). Caching the specification is preferable, particularly if you are using a custom extension, because this ensures that anybody who receives the data also receives the necessary data to interpret it.
The output of a simulation should be stored in NWB, but not the settings of the simulation. You may store the result of simulations in NWB files. NWB:N allows you to store data as if it were recorded in vivo to facilitate comparison between simulated results and in vivo results. Core components of the NWB:N schema and HDF5 backend have been engineered to handle data from hundreds of thousands of units, and natively support parallel data access via MPI, so much of the NWB:N format should work for large-scale simulations out-of-the-box. The neurodata extension “simulation_output” provides a neurodata_type for storing continuous recordings from multiple cells and multiple compartments per cell. The extension only supports storing the output data of a simulation and does not support parameters for simulation configuration. This is out-of-scope for NWB:N, since it does not facilitate side-by-side comparison between simulated and in vivo results, and is quite difficult to generalize given the diversity of ways one can parametrize a simulation. That said, if you would benefit from storing such data in your NWB:N file, you might consider creating your own custom extension.