This page focuses on the software architecture of AqNWB for implementing data read and is mainly aimed at software developers. See Reading data if you want to learn more about how read is used in practice. The read system in AqNWB is built around several key concepts:
- Lazy loading of datasets and attributes through wrapper objects, discussed in Reading datasets and attributes
- Dynamic type creation through a registration system, discussed in Reading neurodata_type objects
- Field access through generated accessor methods. See the Implementing a new Neurodata Type page to learn more about the macros used to generate accessor methods.
Reading datasets and attributes
AqNWB reads datasets and attributes lazily via wrappers. The main components involved in reading data from an NWB file via AqNWB are:
- Container
- Container type classes represent Groups with an assigned
neurodata_type
(e.g., ElectricalSeries or NWBFile) and expose read access to their specific datasets and attributes via corresponding functions, which return ReadDataWrapper objects for lazily reading from the dataset/attribute.
- ReadDataWrapper
- BaseIO
- BaseIO, HDF5IO is then responsible for reading data from disk and allocating memory for data on read. Read methods, e.g., readDataset and readAttribute, then return data as DataBlockGeneric. This has the advantage that we can let the backend handle memory allocation and typing for us and load data even if we don't know the type yet.
- DataBlockGeneric
- DataBlock
- BOOST::multi_array
BOOST::multi_array
can also be used simplify access to multi-dimensional data. The DataBlock.as_multi_array convenience method generates a boost::const_multi_array_ref<DTYPE, NDIMS>
for us.
- std::variant
- `
std::variant
can also be used when we want to compute on the data in a type-safe manner but do not know the data type beforehand (e.g., when reading NWB data from a third party).
- Note
- The DataBlock.fromGeneric, DataBlock.as_multi_array, and DataBlockGeneric.as_variant methods use casting and referencing to transform the data without making additional copies of the data.
Reading neurodata_type objects
NWB neurodata_types
are represented in AqNWB as classes that inherit from RegisteredType (see also Implementing a new Neurodata Type). RegisteredType in turn maintains a registry of all available registered types that it uses to support convenient read of neurodata_type
objects from an NWB file. For read, a user then typically uses either:
The main components involved in reading typed objects from an NWB file via AqNWB are:
- RegisteredType as the main base class for all classes implementing a type, e.g., Container, Data and all their subtypes. RegisteredType is responsible for managing all type classes and provides the create factory methods for creating instances of subclasses from a file.
- BaseIO, HDF5IO are responsible for i) reading type attribute and group information, ii) searching the file for typed objects via findTypes() methods, and iii) retrieving the paths of all object associated with a storage object (e.g., a Group) via getStorageObjects()
Here we focus mainly on the design of RegisteredType itself. If you want to learn more about how to implement a new subclass of RegisteredType then please see Implementing a new Neurodata Type.
How the Type Registry in RegisteredType Works
The type registry in RegisteredType allows for dynamic creation of registered subclasses by name. Here is how it works:
- Registry Storage:
- The registry is stored as static members within the RegisteredType class and is implemented using 1) an
std::unordered_set
to store subclass names (which can be accessed via getRegistry()) and 2) an std::unordered_map
to store factory functions for creating instances of the subclasses (which can be accessed via getFactoryMap()). The factory methods are the required constructor that uses the io and path as input.
- Preparing for Registration: REGISTER_SUBCLASS
- The REGISTER_SUBCLASS macro macro modifies the class to make it ready for registration by:
- Creating the
_registered
field to trigger the registration when the subclass is loaded
- Defining a static method registerSubclass, which is used to add a subclass name and its corresponding factory function to the registry.
- Adding
getTypeName
and getNamespace
functions for defining the neurodata_type
name
- REGISTER_SUBCLASS_WITH_TYPENAME is a special version of the REGISTER_SUBCLASS macro, which allows setting the typename explicitly as a third argument. This is for the special case where the name of the class cannot be the same as the name of the type (see also How to implement a RegisteredType with a custom type name)
- Actual automatic Registration: REGISTER_SUBCLASS_IMPL
- Dynamic Creation:
- The RegisteredType::create method is used to create an instance of a registered subclass by name. This method looks up the subclass name in the registry and calls the corresponding factory function to create an instance.
- Class Name and Namespace Retrieval:
- The getTypeName and getNamespace return the string name of the class and namespace, respectively. The
REGISTER_SUBCLASS
macro implements an automatic override of the methods to ensure the appropriate type and namespace string are returned. These methods should, hence, not be manually overridden by subclasses, to ensure consistency in type identification.
How to Use the RegisteredType Registry
The RegisteredType registry allows for dynamic creation and management of registered subclasses. Here is how you can use it:
- Creating Instances Dynamically:
- Use the create method to create an instance of a registered subclass by name.
- This method takes the subclass name, path, and a shared pointer to the IO object as arguments. This illustrates how we can read a specific typed object in an NWB file.
auto instance =
REQUIRE(instance != nullptr);
- Retrieving Registered Subclass Names:
- Use the getRegistry method to retrieve the set of registered subclass names.
std::cout << "Registered subclasses:" << std::endl;
for (const auto& subclassName : registry) {
std::cout << " - " << subclassName << std::endl;
}
- Retrieving the Factory Map:
- Use the getFactoryMap method to retrieve the map of factory functions for creating instances of registered subclasses.
std::cout << "Factory functions for registered subclasses:" << std::endl;
for (const auto& pair : factoryMap) {
std::cout << " - " << pair.first << std::endl;
}
Reading templated RegisteredType classes
To facilitate the reading of data arrays and handle data types in a type-safe manner, AqNWB utilizes templated classes. For instance, the VectorData
type in NWB may represent data arrays with varying data types (e.g., int
, string
, etc.). Accordingly, AqNWB implements the VectorData class, which exposes the data as std::any
via the VectorData::readData method for reading.
In some cases, the data type for VectorData
may be predetermined in the schema. For example, the location
column of the ElectrodeTable requires string data. To simplify reading in such cases where the data type is fixed, AqNWB defines VectorDataTyped<DTYPE>, which inherits from VectorData. This class allows the data type to be specified at compile time via the class template, enabling VectorDataTyped<DTYPE>::readData to expose the data with the type already set at compile time. When used in combination with the DEFINE_REGISTERED_FIELD macro, this approach allows ElectrodeTable::readLocationColumn to return the data to the user with the type already set. The same approach is also applied in the case of Data and its derived class DataTyped.
For further details and alternative approaches for implementing templated RegisteredType classes, see Templated RegisteredType Classes.
auto readVectorData =
std::dynamic_pointer_cast<NWB::VectorData>(readDataUntyped);
auto dataAny = readVectorData->readData();
auto dataBlock = dataAny->valuesGeneric();
auto readVectorDataTyped =
auto dataInt = readVectorDataTyped->readData();
auto dataBlockInt = dataInt->values();
Example: Using the type registry
#include "testUtils.hpp"
TEST_CASE("RegisterType Example", "[base]")
{
SECTION("Example to illustrate how the RegisterType registry is working")
{
std::string dataPath = "/tsdata";
std::vector<SizeType> dataShape = {numSamples};
std::vector<SizeType> positionOffset = {0};
BaseDataType dataType = BaseDataType::F32;
std::vector<float> data = getMockData1D(numSamples);
std::vector<double> timestamps = getMockTimestamps(numSamples, 1);
std::string filename = getTestFilePath("testRegisteredTypeExample.h5");
std::shared_ptr<BaseIO> io = std::make_unique<IO::HDF5::HDF5IO>(filename);
io->open();
ts.
writeData(dataShape, positionOffset, data.data(), timestamps.data());
REQUIRE(writeStatus == Status::Success);
io->flush();
auto instance =
REQUIRE(instance != nullptr);
std::cout << "Registered subclasses:" << std::endl;
for (const auto& subclassName : registry) {
std::cout << " - " << subclassName << std::endl;
}
std::cout << "Factory functions for registered subclasses:" << std::endl;
for (const auto& pair : factoryMap) {
std::cout << " - " << pair.first << std::endl;
}
}
}
Example Implementation Details
For implementation examples and detailed code snippets, see the following sections:
Creating and Writing Data
std::vector<Types::ChannelVector> mockArrays = getMockChannelArrays();
BaseDataType dataType = BaseDataType::F32;
std::vector<std::string> mockChannelNames =
getMockChannelArrayNames("esdata");
std::vector<std::vector<float>> mockData =
getMockData2D(numSamples, numChannels);
std::vector<double> mockTimestamps = getMockTimestamps(numSamples, 1);
std::vector<std::vector<float>> mockDataTransposed;
mockDataTransposed.resize(numSamples);
for (
SizeType s = 0; s < numSamples; s++) {
mockDataTransposed[s].resize(numChannels);
for (
SizeType c = 0; c < numChannels; c++) {
mockDataTransposed[s][c] = mockData[c][s];
}
}
std::string path = getTestFilePath("ElectricalSeriesReadExample.h5");
std::shared_ptr<BaseIO> io =
createIO(
"HDF5", path);
io->open();
REQUIRE(initStatus == Status::Success);
std::unique_ptr<NWB::RecordingContainers> recordingContainers =
std::make_unique<NWB::RecordingContainers>();
std::vector<SizeType> containerIndices = {};
nwbfile.createElectrodesTable(mockArrays);
nwbfile.createElectricalSeries(mockArrays,
mockChannelNames,
dataType,
recordingContainers.get(),
containerIndices);
REQUIRE(resultCreate == Status::Success);
recordingContainers->getContainer(0));
REQUIRE(electricalSeries != nullptr);
Status resultStart = io->startRecording();
REQUIRE(resultStart == Status::Success);
for (
SizeType ch = 0; ch < numChannels; ++ch) {
ch, numSamples, mockData[ch].data(), mockTimestamps.data());
}
io->flush();
Reading and Processing Data
auto readDataWrapper = electricalSeries->
readData<
float>();
REQUIRE(readDataWrapper->exists());
DataBlock<float> dataValues = readDataWrapper->values();
REQUIRE(dataValues.data.size() == (numSamples * numChannels));
REQUIRE(dataValues.shape[0] == numSamples);
REQUIRE(dataValues.shape[1] == numChannels);
REQUIRE(dataValues.typeIndex == typeid(float));
for (
SizeType t = 0; t < numSamples; t++) {
std::vector<float> selectedRange(
dataValues.data.begin()
+ static_cast<std::vector<float>::difference_type>(t
* numChannels),
dataValues.data.begin()
+ static_cast<std::vector<float>::difference_type>(
(t + 1) * numChannels));
REQUIRE_THAT(selectedRange,
Catch::Matchers::Approx(mockDataTransposed[t]).margin(1));
}
auto boostMultiArray = dataValues.as_multi_array<2>();
for (
SizeType t = 0; t < numSamples; t++) {
auto row_t = boostMultiArray[static_cast<long>(t)];
std::vector<float> row_t_vector(
row_t.begin(), row_t.end());
REQUIRE_THAT(row_t_vector,
Catch::Matchers::Approx(mockDataTransposed[t]).margin(1));
}
auto resolutionValueFloat = readDataResolutionWrapper->values();
REQUIRE(resolutionValueFloat.shape.empty());
REQUIRE(resolutionValueFloat.data.size() == 1);
REQUIRE(int(resolutionValueFloat.data[0]) == -1);
REQUIRE(resolutionValueFloat.typeIndex == typeid(float));
auto readDataWrapperGeneric = electricalSeries->
readData();
DataBlockGeneric dataValuesGeneric =
readDataWrapperGeneric->valuesGeneric();
REQUIRE(dataValuesGeneric.typeIndex == typeid(float));
DataBlock<float> dataValueFloat =
DataBlock<float>::fromGeneric(dataValuesGeneric);
- Note
- In most cases, we should not need runtime checking of types in the context of specific data acquisition systems. This is mostly likely relevant if one wants to consume arbitrary NWB files that may use different data types. One approach is to use
std::variant
as described in Working with fields with unknown data type. Using std::variant
helps avoid complex code for checking data types at runtime.
If for some reason we cannot easily use the std:variant
approach (e.g., in case we need to use data types not natively supported by AqNWB), an alternative approach would be to define a mapping of the type information to the corresponding statically typedVector functionality, e.g., via switch/case
` logic or by using a map for lookup, such as: DataBlockGeneric dataValuesGeneric = readDataWrapperGeneric->valuesGeneric();
std::unordered_map<std::type_index, std::function<void(const DataBlockGeneric&)>> typeMap = {
{typeid(float), processData<float>},
{typeid(int), processData<<int>},
};
auto it = typeMap.find(dataValuesGeneric.typeIndex);
if (it != typeMap.end()) {
it->second(dataValuesGeneric);
} else {
std::cout << "Unsupported type" << std::endl;
}