![]() |
aqnwb 0.1.0
|
Reading data from an open NWB file via AqNWB consists of the following main steps.
neurodata_type
(e.g., a TimeSeries
) for read via the RegisteredType::create factory method, e.g., auto electricalSeries = RegisteredType::create<ElectricalSeries>(electricalSeriesPath, io);
In the following sections we dive deeper into the Software Design, describing the different classes involved in reading data and their responsibilities. We then show a more detailed Example to illustrate how read works in the overall context of data acquisition.
The following figure shows the main classes involved with reading data from a dataset or attribute.
The main components involved in reading data from an NWB file via AqNWB are:
std::any
along with the shape
of the data.neurodata_type
in the NWB format, and are responsible for providing access to the datasets/attributes that they own. To provide access, these classes create ReadDataWrapper objects for the user for lazy read access to the data.We will discuss these different components in a bit more detail next.
The Container class (e.g., ElectricalSeries or NWBFile) is responsible for exposing read access to its specific datasets and attributes by providing appropriate access functions, which return ReadDataWrapper<AQNWB::Types::StorageObjectType::Dataset> or ReadDataWrapper<AQNWB::Types::StorageObjectType::Attribute> objects for lazily reading from the dataset/attribute.
The ReadDataWrapper stores a shared pointer m_io to the I/O object and the path to the dataset.
The valuesGeneric method then allows us to read all or parts of the dataset into memory as std::any
. This function uses the readDataset method of the I/O backend (e.g., HDF5IO.readDataset) to load the data. The I/O backend in turn takes care of allocating the memory for the appropriate data type and loading the data from disk.
We can retrieve data directly with the appropriate type by using the templated values function instead, which uses valuesGeneric and then automatically casts the data to a typed DataBlock<DTYPE> instead of returning an untyped DataBlockGeneric.
OTYPE
specifying the type of object being wrapped AQNWB::Types::StorageObjectType and 2) the VTYPE
defining the value type of the data. For attributes, slicing is disabled at compile time, i.e., attributes are always loaded fully into memory since attributes are intended for small data only.At first, data values are always represented as a DataBlockGeneric object, which stores the data as std::any
along with the shape of the data. For example, ReadDatasetWrapper.valuesGeneric and HDF5IO.readDataset return a DataBlockGeneric. This has the advantage that we can let the backend handle memory allocation and typing for us and load data even if we don't know the type yet.
To cast the data to the appropriate specific type (e.g., float
) we can then create a DataBlock with the appropriate data type via the DataBlock.fromGeneric factory method. DataBlock is templated on the specific data type, i.e., we call DataBlock<float>.fromGeneric(myGenericDataBlock)
. DataBlock then stores the data as an appropriately typed 1-dimensional std::vector
along with the shape of the data.
To simplify access to multi-dimensional data, we can then represent the data as a BOOST::multi_array
. The DataBlock.as_multi_array convenience method generates a boost::const_multi_array_ref<DTYPE, NDIMS>
for us. Here the DTYPE
template parameter is the same as for the DataBlock (so that we don't have to specify it again), and the NDIMS
template parameter is the number of dimensions (which is the same as shape.size()).
DTYPE
at compile time when using DataBlock. And if we want to use the DataBlock.as_multi_array, then we also need to know the number of dimensions NDIMS
at compile time.The I/O backend is responsible for implementing the actual readDataset and readAttribute methods used for reading data from disk. The methods are also responsible for allocating appropriate memory with the respective data type. The functions return the data as DataBlockGeneric, which stores the data as untyped std::any
. The user can then cast the data to the appropriate type as discussed in DataBlock with typed data.
Objects with an assigned neurodata_type
are represented by corresponding classes in AqNWB. To read objects with an assigned type, we therefore need to be able to instantiate the corresponding classes in AqNWB based on the data from a file. The following figure illustrates the main components of this process.
The main components involved in reading typed objects from an NWB file via AqNWB are:
RegisteredType maintains a registry of all classes that inherit from it and the types they represent. We can retrieve the full registry via the static method getFactoryMap and a list of just the full type names that are in the registry via getRegistry. Importantly, RegisteredType provides static create methods that we can use to instantiate any registered subclass just using the io
object and path
for the object in the file. RegisteredType can read the type information from the corresponding namespace
and neurodata_type
attributes to determine the full type, then look up the corresponding class in its registry, and then create the type. Using RegisteredType::readField also provides a general mechanism for reading arbitrary fields.
Child classes of RegisteredType (e.g., Container or Data), then implement specific neurodata_types
defined in the NWB schema. The subclasses register with RegisteredType, such that we can look them up and determine which class represents which neurodata_type
.
All data read is implemented lazily, i.e., AqNWB does not load data into memory until we make a request to do so. To access data lazily, datasets and attributes are wrapped via ReadDataWrapper with appropriate OTYPE
object type template parameter set. The Container object that owns the dataset/attribute then provides accessor methods to get access to the dataset/attribute. Here, we access the data
dataset of the ElectricalSeries.
In particular for fields that are optional, it is useful to first check that the field actually exists using the exists method of our ReadDataWrapper.
To load the data values, we can then use the valuesGeneric and values methods, which load the data as generic (untyped) or typed data, respectively.
The data is here represented as a DataBlock, which stores the data as a 1-dimensionsal vector along with the shape of the data. E.g, here we validate the data against the original mock data:
To ease interaction with mutli-dimensional data, e.g., the (time x channel)
data of our ElectricalSeries, we can use the DataBlock.as_multi_array method to construct a boost::const_multi_array_ref
.
Using boost multi-array simplifies access and interaction with the data as a multi-dimensional array. Here we use this again to validate the data we loaded against the original mock, like we did above.
Reading an Attribute
from a file works much in the same way as reading a Dataset
. The main differences are when we read an attribute:
float32
as the dtype for the resolution
attribute. As such, the template parameter for the VTYPE
(value type) template parameter for ReadAttributeWrapper<OTYPE, VTYPE> is set to float
by default, so we do not need to specify it. If for some reason a file should use float64
instead, then we can still set the VTYPE
accordingly via electricalSeries->readDataResolution<float64>()
.DataBlock<float>
, then we can also infer the return type of the values()
function at compile time via decltype(readDataResolutionWrapper->values()) resolutionValueFloat = readDataResolutionWrapper->values();
So far we read data by specifying the VTYPE
template parameter of the read wrapper. However, if we do not know (or want to) specify the VTYPE
then we can set it to std::any
, which is the default for data with variable type, e.g. ElectricalSeries::readData. In this case, we can still read the data via the valuesGeneric to load the data first in untyped form. When loading the data, the I/O backend determines the data type and allocates memory appropriately. The actual data type is then stored in the typeIndex variable of our data block. We can then convert our DataBlockGeneric to a DataBlock<DTYPE> with a specific data type via DataBlock<dtype>::fromGeneric().
switch/case
logic or by using a map for lookup, such as: Next we stop the recording and close the file so we can show how we can read from the file we just created.
Using the findType() function of our I/O object we can conveniently search for objects with a given type.
"/"
). Using STOP_ON_TYPE does not recurse further into defined types, hence, this mode is useful if we only want to search for objects that the object at the starting path manages directly.typesToSearch
.The returned std::unordered_map
uses the full to object as key and the full type (i.e., namepspace::neurodata_type
) as value, which is all we need to read the objects.
To read from a neurodata_type
object from an existing file, we can use the RegisteredType::create factory methods to conveniently construct an instance of the corresponding class in AqNWB.
path
and 2) io
(as in the example above), AqNWB reads the neurodata_type
and namespace
attributes from the NWB file to automatically determine the class to use to represent the type.fullname
(e.g., core::ElectricalSeries
), 2) path
and 3) io
AqNWB looks up the class to use in RegisteredType's type registry (see also How to Use the RegisteredType Registry )create<AQNWB::NWB::ElectricalSeries>(path, io);
the instance is being constructed using the common constructor, i.e., this is equivalent to creating the object via ElectricalSeries(path, io)
Option 1 and 2 instantiates the specific type (e.g., ElectricalSeries
) but return a generic RegisteredType pointer that we can cast to the specific type if necessary, e.g., via auto readElectricalSeries = std::dynamic_pointer_cast<AQNWB::NWB::ElectricalSeries>(readRegisteredType);
. Option 3 creates and returns a pointer to the specific type directly.Now we can read fields and subsets of data from the fields as before.
Even if there is no dedicated DEFINE_FIELD
definition available, we can still read any arbitrary sub-field associated with a particular RegisteredType via the generic RegisteredType::readField method. The main difference is that for datasets and attributes we need to specify all the additional information (e.g., the relative path, object type, and data type) ourselves, whereas using DEFINE_FIELD
this information has already been specified for us. For example, to read the data from the ElectricalSeries we can call:
Similarly, we can also read any sub-fields that are itself RegisteredType objects via RegisteredType::readField (e.g., to read custom VectorData columns of a DynamicTable). In contrast to dataset and attribute fields, we here only need to specify the relative path of the field. RegisteredType in turn can read the type information from the neurodata_type
and namespace
attributes in the file directly.