aqnwb 0.1.0
Loading...
Searching...
No Matches
Reading data

Introduction

Reading data from an open NWB file via AqNWB consists of the following main steps.

  1. Create the I/O object to read the file
  2. Construct the container object for the neurodata_type (e.g., a TimeSeries) for read via the RegisteredType::create factory method, e.g., auto electricalSeries = RegisteredType::create<ElectricalSeries>(electricalSeriesPath, io);
  3. Access a dataset or attribute for read by retrieving a wrapper object that provides lazy read access to the particular dataset or attribute:
    // Get a ReadDatasetWrapper<float> for lazy reading of ElectricalSeries.data
    // By specifying the value type as a template parameter allows us to read
    // typed data
    auto readDataWrapper = electricalSeries->readData<float>();
  4. Request the parts of the data of interest,at which point the data is being loaded from disk.
    // Read the full ElectricalSeries.data back
    DataBlock<float> dataValues = readDataWrapper->values();

In the following sections we dive deeper into the Software Design, describing the different classes involved in reading data and their responsibilities. We then show a more detailed Example to illustrate how read works in the overall context of data acquisition.

Software Design

Reading datasets and attributes

The following figure shows the main classes involved with reading data from a dataset or attribute.

The main components involved in reading data from an NWB file via AqNWB are:

  • BaseIO, HDF5IO responsible for reading data from disk and allocating memory for data on read
  • DataBlockGeneric represents a generic, n-dimensional block of data loaded from a file, storing the data as a generic std::any along with the shape of the data.
  • ReadDataWrapper, is a simple wrapper class that represents a dataset/attribute for read, enabling lazy data read and allowing for transparent use of different I/O backends.
  • Container type classes represent Groups with an assigned neurodata_type in the NWB format, and are responsible for providing access to the datasets/attributes that they own. To provide access, these classes create ReadDataWrapper objects for the user for lazy read access to the data.

We will discuss these different components in a bit more detail next.

Container

The Container class (e.g., ElectricalSeries or NWBFile) is responsible for exposing read access to its specific datasets and attributes by providing appropriate access functions, which return ReadDataWrapper<AQNWB::Types::StorageObjectType::Dataset> or ReadDataWrapper<AQNWB::Types::StorageObjectType::Attribute> objects for lazily reading from the dataset/attribute.

ReadDataWrapper

The ReadDataWrapper stores a shared pointer m_io to the I/O object and the path to the dataset.

The valuesGeneric method then allows us to read all or parts of the dataset into memory as std::any. This function uses the readDataset method of the I/O backend (e.g., HDF5IO.readDataset) to load the data. The I/O backend in turn takes care of allocating the memory for the appropriate data type and loading the data from disk.

We can retrieve data directly with the appropriate type by using the templated values function instead, which uses valuesGeneric and then automatically casts the data to a typed DataBlock<DTYPE> instead of returning an untyped DataBlockGeneric.

Note
ReadDataWrapper takes two template parameters: 1) the OTYPE specifying the type of object being wrapped AQNWB::Types::StorageObjectType and 2) the VTYPE defining the value type of the data. For attributes, slicing is disabled at compile time, i.e., attributes are always loaded fully into memory since attributes are intended for small data only.

DataBlockGeneric and DataBlock

At first, data values are always represented as a DataBlockGeneric object, which stores the data as std::any along with the shape of the data. For example, ReadDatasetWrapper.valuesGeneric and HDF5IO.readDataset return a DataBlockGeneric. This has the advantage that we can let the backend handle memory allocation and typing for us and load data even if we don't know the type yet.

DataBlock with typed data

To cast the data to the appropriate specific type (e.g., float) we can then create a DataBlock with the appropriate data type via the DataBlock.fromGeneric factory method. DataBlock is templated on the specific data type, i.e., we call DataBlock<float>.fromGeneric(myGenericDataBlock). DataBlock then stores the data as an appropriately typed 1-dimensional std::vector along with the shape of the data.

Note
The DataBlock.fromGeneric (and DataBlock.as_multi_array) use casting and referencing to transform the data without making additional copies of the data.

Using Boost Multi Array for N-Dimensional Data

To simplify access to multi-dimensional data, we can then represent the data as a BOOST::multi_array. The DataBlock.as_multi_array convenience method generates a boost::const_multi_array_ref<DTYPE, NDIMS> for us. Here the DTYPE template parameter is the same as for the DataBlock (so that we don't have to specify it again), and the NDIMS template parameter is the number of dimensions (which is the same as shape.size()).

Note
Since we are in a strongly typed language, we here need to know the DTYPE at compile time when using DataBlock. And if we want to use the DataBlock.as_multi_array, then we also need to know the number of dimensions NDIMS at compile time.

I/O

The I/O backend is responsible for implementing the actual readDataset and readAttribute methods used for reading data from disk. The methods are also responsible for allocating appropriate memory with the respective data type. The functions return the data as DataBlockGeneric, which stores the data as untyped std::any. The user can then cast the data to the appropriate type as discussed in DataBlock with typed data.

Reading typed objects

Objects with an assigned neurodata_type are represented by corresponding classes in AqNWB. To read objects with an assigned type, we therefore need to be able to instantiate the corresponding classes in AqNWB based on the data from a file. The following figure illustrates the main components of this process.

The main components involved in reading typed objects from an NWB file via AqNWB are:

  • RegisteredType as the main base class for all classes implementing a type, e.g., Container, Data and all their subtypes. RegisteredType is responsible for managing all type classes and provides the create factory methods for creating instances of subclasses from a file.
  • BaseIO, HDF5IO are responsible for i) reading type attribute and group information, ii) searching the file for typed objects via findTypes() methods, and iii) retrieving the paths of all object associated with a storage object (e.g., a Group) via getStorageObjects()

RegisteredType

RegisteredType maintains a registry of all classes that inherit from it and the types they represent. We can retrieve the full registry via the static method getFactoryMap and a list of just the full type names that are in the registry via getRegistry. Importantly, RegisteredType provides static create methods that we can use to instantiate any registered subclass just using the io object and path for the object in the file. RegisteredType can read the type information from the corresponding namespace and neurodata_type attributes to determine the full type, then look up the corresponding class in its registry, and then create the type. Using RegisteredType::readField also provides a general mechanism for reading arbitrary fields.

Child classes of RegisteredType (e.g., Container)

Child classes of RegisteredType (e.g., Container or Data), then implement specific neurodata_types defined in the NWB schema. The subclasses register with RegisteredType, such that we can look them up and determine which class represents which neurodata_type.

Note
For more details about the design of the RegisteredType class and the various components involved with creating and managing the type registry, please see developer docs on Implementing a new Neurodata Type .

Example

Create a NWB file as usual

Setup mock data for write

// setup mock data for writing
SizeType numSamples = 100;
SizeType numChannels = 2;
std::vector<Types::ChannelVector> mockArrays = getMockChannelArrays();
BaseDataType dataType = BaseDataType::F32;
std::vector<std::string> mockChannelNames =
getMockChannelArrayNames("esdata");
std::vector<std::vector<float>> mockData =
getMockData2D(numSamples, numChannels);
std::vector<double> mockTimestamps = getMockTimestamps(numSamples, 1);
// To verify that the data was written correctly, we here transpose the
// mockData (which is per channel) to the (time x channel) layout used
// in the ElectricalSeries in the NWB file so we can compare
std::vector<std::vector<float>> mockDataTransposed;
mockDataTransposed.resize(numSamples);
for (SizeType s = 0; s < numSamples; s++) {
mockDataTransposed[s].resize(numChannels);
for (SizeType c = 0; c < numChannels; c++) {
mockDataTransposed[s][c] = mockData[c][s];
}
}

Create the NWBFile and record data

// setup io object
std::string path = getTestFilePath("ElectricalSeriesReadExample.h5");
std::shared_ptr<BaseIO> io = createIO("HDF5", path);
io->open();
// setup the NWBFile
NWB::NWBFile nwbfile(io);
Status initStatus = nwbfile.initialize(generateUuid());
REQUIRE(initStatus == Status::Success);
// create the RecordingContainer for managing recordings
std::unique_ptr<NWB::RecordingContainers> recordingContainers =
std::make_unique<NWB::RecordingContainers>();
// create a new ElectricalSeries
Status resultCreate = nwbfile.createElectricalSeries(
mockArrays, mockChannelNames, dataType, recordingContainers.get());
REQUIRE(resultCreate == Status::Success);
// get the new ElectricalSeries
NWB::ElectricalSeries* electricalSeries =
static_cast<NWB::ElectricalSeries*>(
recordingContainers->getContainer(0));
REQUIRE(electricalSeries != nullptr);
// start recording
Status resultStart = io->startRecording();
REQUIRE(resultStart == Status::Success);
// write channel data
for (SizeType ch = 0; ch < numChannels; ++ch) {
electricalSeries->writeChannel(
ch, numSamples, mockData[ch].data(), mockTimestamps.data());
}
io->flush();

Reading Datasets and Attributes

Lazy data access

All data read is implemented lazily, i.e., AqNWB does not load data into memory until we make a request to do so. To access data lazily, datasets and attributes are wrapped via ReadDataWrapper with appropriate OTYPE object type template parameter set. The Container object that owns the dataset/attribute then provides accessor methods to get access to the dataset/attribute. Here, we access the data dataset of the ElectricalSeries.

// Get a ReadDatasetWrapper<float> for lazy reading of ElectricalSeries.data
// By specifying the value type as a template parameter allows us to read
// typed data
auto readDataWrapper = electricalSeries->readData<float>();

Check that the object exists

In particular for fields that are optional, it is useful to first check that the field actually exists using the exists method of our ReadDataWrapper.

REQUIRE(readDataWrapper->exists());

Read data into memory

To load the data values, we can then use the valuesGeneric and values methods, which load the data as generic (untyped) or typed data, respectively.

// Read the full ElectricalSeries.data back
DataBlock<float> dataValues = readDataWrapper->values();

The data is here represented as a DataBlock, which stores the data as a 1-dimensionsal vector along with the shape of the data. E.g, here we validate the data against the original mock data:

// Check that the data we read has the expected size and shape
REQUIRE(dataValues.data.size() == (numSamples * numChannels));
REQUIRE(dataValues.shape[0] == numSamples);
REQUIRE(dataValues.shape[1] == numChannels);
REQUIRE(dataValues.typeIndex == typeid(float));
// Iterate through all the time steps
for (SizeType t = 0; t < numSamples; t++) {
// Get the data for the single time step t from the DataBlock
std::vector<float> selectedRange(
dataValues.data.begin()
+ static_cast<std::vector<float>::difference_type>(t
* numChannels),
dataValues.data.begin()
+ static_cast<std::vector<float>::difference_type>(
(t + 1) * numChannels));
// Check that the values are correct
REQUIRE_THAT(selectedRange,
Catch::Matchers::Approx(mockDataTransposed[t]).margin(1));
}

Accessing multi-dimensional data as Boost multi-array

To ease interaction with mutli-dimensional data, e.g., the (time x channel) data of our ElectricalSeries, we can use the DataBlock.as_multi_array method to construct a boost::const_multi_array_ref.

// Use the boost multi-array feature to simply interaction with data
auto boostMultiArray = dataValues.as_multi_array<2>();

Using boost multi-array simplifies access and interaction with the data as a multi-dimensional array. Here we use this again to validate the data we loaded against the original mock, like we did above.

// Iterate through all the time steps again, but now using the boost array
for (SizeType t = 0; t < numSamples; t++) {
// Access [t, :], i.e., get a 1D array with the data
// from all channels for time step t.
auto row_t = boostMultiArray[static_cast<long>(t)];
// Compare to check that the data is correct.
std::vector<float> row_t_vector(
row_t.begin(), row_t.end()); // convert to std::vector for comparison
REQUIRE_THAT(row_t_vector,
Catch::Matchers::Approx(mockDataTransposed[t]).margin(1));
}

Reading an attribute

Reading an Attribute from a file works much in the same way as reading a Dataset. The main differences are when we read an attribute:

  1. The ReadDataWrapper is created with the AQNWB::Types::StorageObjectType::Attribute template type instead of AQNWB::Types::StorageObjectType::Dataset
  2. The variants of valuesGeneric or values that accept arguments for slicing are disabled at compile time.
// Get a ReadDataWrapper<ReadObjectType::Attribute, float> to read data
// lazily
auto readDataResolutionWrapper = electricalSeries->readDataResolution();
// Read the data values
DataBlock<float> resolutionValueFloat = readDataResolutionWrapper->values();
REQUIRE(resolutionValueFloat.shape.empty()); // Scalar
REQUIRE(resolutionValueFloat.data.size() == 1);
REQUIRE(int(resolutionValueFloat.data[0]) == -1);
REQUIRE(resolutionValueFloat.typeIndex == typeid(float));
Note
In this case, the NWB specifies float32 as the dtype for the resolution attribute. As such, the template parameter for the VTYPE (value type) template parameter for ReadAttributeWrapper<OTYPE, VTYPE> is set to float by default, so we do not need to specify it. If for some reason a file should use float64 instead, then we can still set the VTYPE accordingly via electricalSeries->readDataResolution<float64>().
If we don't want to specify the DataBlock<float>, then we can also infer the return type of the values() function at compile time via decltype(readDataResolutionWrapper->values()) resolutionValueFloat = readDataResolutionWrapper->values();

Reading data with unknown type

So far we read data by specifying the VTYPE template parameter of the read wrapper. However, if we do not know (or want to) specify the VTYPE then we can set it to std::any, which is the default for data with variable type, e.g. ElectricalSeries::readData. In this case, we can still read the data via the valuesGeneric to load the data first in untyped form. When loading the data, the I/O backend determines the data type and allocates memory appropriately. The actual data type is then stored in the typeIndex variable of our data block. We can then convert our DataBlockGeneric to a DataBlock<DTYPE> with a specific data type via DataBlock<dtype>::fromGeneric().

// Get a generic ReadDatasetWrapper<std::any> for lazy reading of
// ElectricalSeries.data
auto readDataWrapperGeneric = electricalSeries->readData();
// Instead of using values() to read typed data, we can read data as generic
// data first via valuesGeneric
DataBlockGeneric dataValuesGeneric =
readDataWrapperGeneric->valuesGeneric();
// Note that the I/O backend determines the data type and allocates
// the memory for us. The std::type_index is stored in our data block as
// well
REQUIRE(dataValuesGeneric.typeIndex == typeid(float));
// We can then later convert the data block to a typed data block
DataBlock<float> dataValueFloat =
DataBlock<float>::fromGeneric(dataValuesGeneric);
Note
In most cases, we should not need runtime checking of types in the context of specific data acquisition systems. This is mostly likely relevant if one wants to consume arbitrary NWB files that may use different data types. One approach to implement behavior for types determined at runtime is to define a mapping of the type information to the corresponding statically type functionality, e.g., via switch/case logic or by using a map for lookup, such as:
DataBlockGeneric dataValuesGeneric = readDataWrapperGeneric->valuesGeneric();
// Map to associate std::type_index with corresponding type-specific functions
std::unordered_map<std::type_index, std::function<void(const DataBlockGeneric&)>> typeMap = {
{typeid(float), processData<float>},
{typeid(int), processData<<int>},
// Add more types as needed
};
// Use the map to process the data with the approbriate type
auto it = typeMap.find(dataValuesGeneric.typeIndex);
if (it != typeMap.end()) {
it->second(dataValuesGeneric); // call the correct processData function
} else {
std::cout << "Unsupported type" << std::endl;
}

Finalize the recording

Next we stop the recording and close the file so we can show how we can read from the file we just created.

// Stop the recording
io->flush();
io->stopRecording();
io->close();

Reading from an existing file

Opening an existing file for reading

// Open a new I/O for reading
std::shared_ptr<BaseIO> readio = createIO("HDF5", path);
readio->open(FileMode::ReadOnly);

Searching for Registered Type Objects (e.g.,ElectricalSeries)

Using the findType() function of our I/O object we can conveniently search for objects with a given type.

std::unordered_set<std::string> typesToSearch = {"core::ElectricalSeries"};
std::unordered_map<std::string, std::string> found_electrical_series =
readio->findTypes(
"/", // start search at the root of the file
typesToSearch, // search for all ElectricalSeries
IO::SearchMode::CONTINUE_ON_TYPE // search also within types
);
Note
findType() supports two main search modes. Using CONTINUE_ON_TYPE mode we can search recursively through all types (here the whole file since we started at the root "/"). Using STOP_ON_TYPE does not recurse further into defined types, hence, this mode is useful if we only want to search for objects that the object at the starting path manages directly.
Warning
The current implementation of findType() is not aware of inheritance but searches for exact matches of types only. However, we can search for objects of multiple different times at the same by specifying multiple types to search for in our typesToSearch.

The returned std::unordered_map uses the full to object as key and the full type (i.e., namepspace::neurodata_type) as value, which is all we need to read the objects.

// We should have esdata1 and esdata2
REQUIRE(found_electrical_series.size() == 2);
// Print the path and type of the found objects
for (const auto& pair : found_electrical_series) {
std::cout << "Path=" << pair.first << " Full type=" << pair.second
<< std::endl;
}

Reading the Registered Type Objects

To read from a neurodata_type object from an existing file, we can use the RegisteredType::create factory methods to conveniently construct an instance of the corresponding class in AqNWB.

// Read the ElectricalSeries from the file.
std::string esdata_path = "/acquisition/esdata0";
auto readElectricalSeries =
readio);
Note
RegisteredType::create comes in a few different flavors:
  1. When passing only 1) path and 2) io (as in the example above), AqNWB reads the neurodata_type and namespace attributes from the NWB file to automatically determine the class to use to represent the type.
  2. When passing the 1) fullname (e.g., core::ElectricalSeries), 2) path and 3) io AqNWB looks up the class to use in RegisteredType's type registry (see also How to Use the RegisteredType Registry )
  3. When passing the class to use as template parameter, e.g., create<AQNWB::NWB::ElectricalSeries>(path, io); the instance is being constructed using the common constructor, i.e., this is equivalent to creating the object via ElectricalSeries(path, io) Option 1 and 2 instantiates the specific type (e.g., ElectricalSeries) but return a generic RegisteredType pointer that we can cast to the specific type if necessary, e.g., via auto readElectricalSeries = std::dynamic_pointer_cast<AQNWB::NWB::ElectricalSeries>(readRegisteredType);. Option 3 creates and returns a pointer to the specific type directly.

Reading data fields

Now we can read fields and subsets of data from the fields as before.

// Now we can read the data in the same way we did during write
auto readElectricalSeriesData = readElectricalSeries->readData<float>();
DataBlock<float> readDataValues = readElectricalSeriesData->values();
auto readBoostMultiArray = readDataValues.as_multi_array<2>();
REQUIRE(readDataValues.data.size() == (numSamples * numChannels));
REQUIRE(readDataValues.shape[0] == numSamples);
REQUIRE(readDataValues.shape[1] == numChannels);
// We can also read just subsets of the data, e.g., the first 10 time steps
// for the first channel
// TODO getting the object again is just for debugging Windows issues
auto readElectricalSeriesData2 = readElectricalSeries->readData<float>();
std::vector<SizeType> start = {0, 0};
std::vector<SizeType> count = {9, 1};
DataBlock<float> dataSlice =
readElectricalSeriesData2->values(start, count);
// Validate that the slice was read correctly
REQUIRE(dataSlice.data.size() == 9);
REQUIRE(dataSlice.shape[0] == 9);
REQUIRE(dataSlice.shape[1] == 1);
// Or read a string attribute, e.g., the unit
std::string esUnitValue =
readElectricalSeries->readDataUnit()->values().data[0];
REQUIRE(esUnitValue == std::string("volts"));

Reading arbitrary fields

Even if there is no dedicated DEFINE_FIELD definition available, we can still read any arbitrary sub-field associated with a particular RegisteredType via the generic RegisteredType::readField method. The main difference is that for datasets and attributes we need to specify all the additional information (e.g., the relative path, object type, and data type) ourselves, whereas using DEFINE_FIELD this information has already been specified for us. For example, to read the data from the ElectricalSeries we can call:

// Read the data field via the generic readField method
auto readElectricalSeriesData3 =
readElectricalSeries->readField<StorageObjectType::Dataset, float>(
std::string("data"));
// Read the data values as usual
DataBlock<float> readDataValues3 = readElectricalSeriesData3->values();
REQUIRE(readDataValues3.data.size() == (numSamples * numChannels));

Similarly, we can also read any sub-fields that are itself RegisteredType objects via RegisteredType::readField (e.g., to read custom VectorData columns of a DynamicTable). In contrast to dataset and attribute fields, we here only need to specify the relative path of the field. RegisteredType in turn can read the type information from the neurodata_type and namespace attributes in the file directly.

// read the NWBFile
auto readNWBFile =
// read the ElectricalSeries from the NWBFile object via the readField
// method returning a generic std::shared_ptr<RegisteredType>
auto readRegisteredType = readNWBFile->readField(esdata_path);
// cast the generic pointer to the more specific ElectricalSeries
std::shared_ptr<AQNWB::NWB::ElectricalSeries> readElectricalSeries2 =
std::dynamic_pointer_cast<AQNWB::NWB::ElectricalSeries>(
readRegisteredType);
REQUIRE(readElectricalSeries2 != nullptr);