aqnwb 0.1.0
Loading...
Searching...
No Matches
Implementation of data read

This page focuses on the software architecture of AqNWB for implementing data read and is mainly aimed at software developers. See Reading data if you want to learn more about how read is used in practice. The read system in AqNWB is built around several key concepts:

  1. Lazy loading of datasets and attributes through wrapper objects, discussed in Reading datasets and attributes
  2. Dynamic type creation through a registration system, discussed in Reading neurodata_type objects
  3. Field access through generated accessor methods. See the Implementing a new Neurodata Type page to learn more about the macros used to generate accessor methods.

Reading datasets and attributes

AqNWB reads datasets and attributes lazily via wrappers. The main components involved in reading data from an NWB file via AqNWB are:

  1. Container
    • Container type classes represent Groups with an assigned neurodata_type (e.g., ElectricalSeries or NWBFile) and expose read access to their specific datasets and attributes via corresponding functions, which return ReadDataWrapper objects for lazily reading from the dataset/attribute.
  2. ReadDataWrapper
  3. BaseIO
    • BaseIO, HDF5IO is then responsible for reading data from disk and allocating memory for data on read. Read methods, e.g., readDataset and readAttribute, then return data as DataBlockGeneric. This has the advantage that we can let the backend handle memory allocation and typing for us and load data even if we don't know the type yet.
  4. DataBlockGeneric
  5. DataBlock
  6. BOOST::multi_array
    • BOOST::multi_array can also be used simplify access to multi-dimensional data. The DataBlock.as_multi_array convenience method generates a boost::const_multi_array_ref<DTYPE, NDIMS> for us.
  7. std::variant
    • `std::variant can also be used when we want to compute on the data in a type-safe manner but do not know the data type beforehand (e.g., when reading NWB data from a third party).
Note
The DataBlock.fromGeneric, DataBlock.as_multi_array, and DataBlockGeneric.as_variant methods use casting and referencing to transform the data without making additional copies of the data.

Reading neurodata_type objects

NWB neurodata_types are represented in AqNWB as classes that inherit from RegisteredType (see also Implementing a new Neurodata Type). RegisteredType in turn maintains a registry of all available registered types that it uses to support convenient read of neurodata_type objects from an NWB file. For read, a user then typically uses either:

The main components involved in reading typed objects from an NWB file via AqNWB are:

  • RegisteredType as the main base class for all classes implementing a type, e.g., Container, Data and all their subtypes. RegisteredType is responsible for managing all type classes and provides the create factory methods for creating instances of subclasses from a file.
  • BaseIO, HDF5IO are responsible for i) reading type attribute and group information, ii) searching the file for typed objects via findTypes() methods, and iii) retrieving the paths of all object associated with a storage object (e.g., a Group) via getStorageObjects()

Here we focus mainly on the design of RegisteredType itself. If you want to learn more about how to implement a new subclass of RegisteredType then please see Implementing a new Neurodata Type.

How the Type Registry in RegisteredType Works

The type registry in RegisteredType allows for dynamic creation of registered subclasses by name. Here is how it works:

  1. Registry Storage:
    • The registry is stored as static members within the RegisteredType class and is implemented using 1) an std::unordered_set to store subclass names (which can be accessed via getRegistry()) and 2) an std::unordered_map to store factory functions for creating instances of the subclasses (which can be accessed via getFactoryMap()). The factory methods are the required constructor that uses the io and path as input.
  2. Preparing for Registration: REGISTER_SUBCLASS
    • The REGISTER_SUBCLASS macro macro modifies the class to make it ready for registration by:
      • Creating the _registered field to trigger the registration when the subclass is loaded
      • Defining a static method registerSubclass, which is used to add a subclass name and its corresponding factory function to the registry.
      • Adding getTypeName and getNamespace functions for defining the neurodata_type name
    • REGISTER_SUBCLASS_WITH_TYPENAME is a special version of the REGISTER_SUBCLASS macro, which allows setting the typename explicitly as a third argument. This is for the special case where the name of the class cannot be the same as the name of the type (see also How to implement a RegisteredType with a custom type name)
  3. Actual automatic Registration: REGISTER_SUBCLASS_IMPL
    • The REGISTER_SUBCLASS_IMPL macro initializes the static member (registered_), which triggers the registerSubclass method and ensures that the subclass is registered when the program starts.
  4. Dynamic Creation:
    • The RegisteredType::create method is used to create an instance of a registered subclass by name. This method looks up the subclass name in the registry and calls the corresponding factory function to create an instance.
  5. Class Name and Namespace Retrieval:
    • The getTypeName and getNamespace return the string name of the class and namespace, respectively. The REGISTER_SUBCLASS macro implements an automatic override of the methods to ensure the appropriate type and namespace string are returned. These methods should, hence, not be manually overridden by subclasses, to ensure consistency in type identification.

How to Use the RegisteredType Registry

The RegisteredType registry allows for dynamic creation and management of registered subclasses. Here is how you can use it:

  1. Creating Instances Dynamically:
    • Use the create method to create an instance of a registered subclass by name.
    • This method takes the subclass name, path, and a shared pointer to the IO object as arguments. This illustrates how we can read a specific typed object in an NWB file.
      // Create an instance of an TimeSeries in a file.
      auto instance =
      AQNWB::NWB::RegisteredType::create("core::TimeSeries", dataPath, io);
      REQUIRE(instance != nullptr);
  2. Retrieving Registered Subclass Names:
    • Use the getRegistry method to retrieve the set of registered subclass names.
      // Retrieve and print registered subclass names
      const auto& registry = AQNWB::NWB::RegisteredType::getRegistry();
      std::cout << "Registered subclasses:" << std::endl;
      for (const auto& subclassName : registry) {
      std::cout << " - " << subclassName << std::endl;
      }
  3. Retrieving the Factory Map:
    • Use the getFactoryMap method to retrieve the map of factory functions for creating instances of registered subclasses.
      // Retrieve and print factory map
      const auto& factoryMap = AQNWB::NWB::RegisteredType::getFactoryMap();
      std::cout << "Factory functions for registered subclasses:" << std::endl;
      for (const auto& pair : factoryMap) {
      std::cout << " - " << pair.first << std::endl;
      }

Reading templated RegisteredType classes

To facilitate the reading of data arrays and handle data types in a type-safe manner, AqNWB utilizes templated classes. For instance, the VectorData type in NWB may represent data arrays with varying data types (e.g., int, string, etc.). Accordingly, AqNWB implements the VectorData class, which exposes the data as std::any via the VectorData::readData method for reading.

In some cases, the data type for VectorData may be predetermined in the schema. For example, the location column of the ElectrodeTable requires string data. To simplify reading in such cases where the data type is fixed, AqNWB defines VectorDataTyped<DTYPE>, which inherits from VectorData. This class allows the data type to be specified at compile time via the class template, enabling VectorDataTyped<DTYPE>::readData to expose the data with the type already set at compile time. When used in combination with the DEFINE_REGISTERED_FIELD macro, this approach allows ElectrodeTable::readLocationColumn to return the data to the user with the type already set. The same approach is also applied in the case of Data and its derived class DataTyped.

For further details and alternative approaches for implementing templated RegisteredType classes, see Templated RegisteredType Classes.

// Read as generic RegisteredType and case to VectorData
auto readDataUntyped = NWB::RegisteredType::create(dataPath, io);
auto readVectorData =
std::dynamic_pointer_cast<NWB::VectorData>(readDataUntyped);
// Read data as DataBlock<std::any>
auto dataAny = readVectorData->readData();
auto dataBlock = dataAny->valuesGeneric();
// Create VectorDataTyped<int> from VectorData and read the
// data as typed DataBlock<int>
auto readVectorDataTyped =
auto dataInt = readVectorDataTyped->readData();
auto dataBlockInt = dataInt->values();

Example: Using the type registry

#include "io/BaseIO.hpp"
#include "testUtils.hpp"
using namespace AQNWB::NWB;
TEST_CASE("RegisterType Example", "[base]")
{
SECTION("Example to illustrate how the RegisterType registry is working")
{
// [example_RegisterType_setup_file]
// Mock data
SizeType numSamples = 10;
std::string dataPath = "/tsdata";
std::vector<SizeType> dataShape = {numSamples};
std::vector<SizeType> positionOffset = {0};
BaseDataType dataType = BaseDataType::F32;
std::vector<float> data = getMockData1D(numSamples);
std::vector<double> timestamps = getMockTimestamps(numSamples, 1);
std::string filename = getTestFilePath("testRegisteredTypeExample.h5");
std::shared_ptr<BaseIO> io = std::make_unique<IO::HDF5::HDF5IO>(filename);
io->open();
NWB::TimeSeries ts = NWB::TimeSeries(dataPath, io);
dataType, SizeArray {numSamples}, SizeArray {numSamples});
ts.initialize(config, "unit");
// Write data to file
Status writeStatus =
ts.writeData(dataShape, positionOffset, data.data(), timestamps.data());
REQUIRE(writeStatus == Status::Success);
io->flush();
// [example_RegisterType_setup_file]
// [example_RegisterType_get_type_instance]
// Create an instance of an TimeSeries in a file.
auto instance =
AQNWB::NWB::RegisteredType::create("core::TimeSeries", dataPath, io);
REQUIRE(instance != nullptr);
// [example_RegisterType_get_type_instance]
// [example_RegisterType_get_registered_names]
// Retrieve and print registered subclass names
const auto& registry = AQNWB::NWB::RegisteredType::getRegistry();
std::cout << "Registered subclasses:" << std::endl;
for (const auto& subclassName : registry) {
std::cout << " - " << subclassName << std::endl;
}
// [example_RegisterType_get_registered_names]
// [example_RegisterType_get_registered_factories]
// Retrieve and print factory map
const auto& factoryMap = AQNWB::NWB::RegisteredType::getFactoryMap();
std::cout << "Factory functions for registered subclasses:" << std::endl;
for (const auto& pair : factoryMap) {
std::cout << " - " << pair.first << std::endl;
}
// [example_RegisterType_get_registered_factories]
}
}

Example Implementation Details

For implementation examples and detailed code snippets, see the following sections:

Creating and Writing Data

// setup mock data for writing
SizeType numSamples = 100;
SizeType numChannels = 2;
std::vector<Types::ChannelVector> mockArrays = getMockChannelArrays();
BaseDataType dataType = BaseDataType::F32;
std::vector<std::string> mockChannelNames =
getMockChannelArrayNames("esdata");
std::vector<std::vector<float>> mockData =
getMockData2D(numSamples, numChannels);
std::vector<double> mockTimestamps = getMockTimestamps(numSamples, 1);
// To verify that the data was written correctly, we here transpose the
// mockData (which is per channel) to the (time x channel) layout used
// in the ElectricalSeries in the NWB file so we can compare
std::vector<std::vector<float>> mockDataTransposed;
mockDataTransposed.resize(numSamples);
for (SizeType s = 0; s < numSamples; s++) {
mockDataTransposed[s].resize(numChannels);
for (SizeType c = 0; c < numChannels; c++) {
mockDataTransposed[s][c] = mockData[c][s];
}
}
// setup io object
std::string path = getTestFilePath("ElectricalSeriesReadExample.h5");
std::shared_ptr<BaseIO> io = createIO("HDF5", path);
io->open();
// setup the NWBFile
NWB::NWBFile nwbfile(io);
Status initStatus = nwbfile.initialize(generateUuid());
REQUIRE(initStatus == Status::Success);
// create the RecordingContainer for managing recordings
std::unique_ptr<NWB::RecordingContainers> recordingContainers =
std::make_unique<NWB::RecordingContainers>();
std::vector<SizeType> containerIndices = {};
// create a new ElectricalSeries
nwbfile.createElectrodesTable(mockArrays);
Status resultCreate =
nwbfile.createElectricalSeries(mockArrays,
mockChannelNames,
dataType,
recordingContainers.get(),
containerIndices);
REQUIRE(resultCreate == Status::Success);
// get the new ElectricalSeries
NWB::ElectricalSeries* electricalSeries =
static_cast<NWB::ElectricalSeries*>(
recordingContainers->getContainer(0));
REQUIRE(electricalSeries != nullptr);
// start recording
Status resultStart = io->startRecording();
REQUIRE(resultStart == Status::Success);
// write channel data
for (SizeType ch = 0; ch < numChannels; ++ch) {
electricalSeries->writeChannel(
ch, numSamples, mockData[ch].data(), mockTimestamps.data());
}
io->flush();

Reading and Processing Data

// Get a ReadDatasetWrapper<float> for lazy reading of ElectricalSeries.data
// By specifying the value type as a template parameter allows us to read
// typed data. However, in the particular case of ElectricalSeries.data, we
// could also have used readData() with <float> as the template parameter
// is already set to float by default for ElectricalSeries.readData()
auto readDataWrapper = electricalSeries->readData<float>();
REQUIRE(readDataWrapper->exists());
// Read the full ElectricalSeries.data back
DataBlock<float> dataValues = readDataWrapper->values();
// Check that the data we read has the expected size and shape
REQUIRE(dataValues.data.size() == (numSamples * numChannels));
REQUIRE(dataValues.shape[0] == numSamples);
REQUIRE(dataValues.shape[1] == numChannels);
REQUIRE(dataValues.typeIndex == typeid(float));
// Iterate through all the time steps
for (SizeType t = 0; t < numSamples; t++) {
// Get the data for the single time step t from the DataBlock
std::vector<float> selectedRange(
dataValues.data.begin()
+ static_cast<std::vector<float>::difference_type>(t
* numChannels),
dataValues.data.begin()
+ static_cast<std::vector<float>::difference_type>(
(t + 1) * numChannels));
// Check that the values are correct
REQUIRE_THAT(selectedRange,
Catch::Matchers::Approx(mockDataTransposed[t]).margin(1));
}
// Use the boost multi-array feature to simply interaction with data
auto boostMultiArray = dataValues.as_multi_array<2>();
// Iterate through all the time steps again, but now using the boost array
for (SizeType t = 0; t < numSamples; t++) {
// Access [t, :], i.e., get a 1D array with the data
// from all channels for time step t.
auto row_t = boostMultiArray[static_cast<long>(t)];
// Compare to check that the data is correct.
std::vector<float> row_t_vector(
row_t.begin(), row_t.end()); // convert to std::vector for comparison
REQUIRE_THAT(row_t_vector,
Catch::Matchers::Approx(mockDataTransposed[t]).margin(1));
}
// Get a ReadDataWrapper<ReadObjectType::Attribute, float> to read data
// lazily
auto readDataResolutionWrapper = electricalSeries->readDataResolution();
// Read the data values as a DataBlock<float>
auto resolutionValueFloat = readDataResolutionWrapper->values();
REQUIRE(resolutionValueFloat.shape.empty()); // Scalar
REQUIRE(resolutionValueFloat.data.size() == 1);
REQUIRE(int(resolutionValueFloat.data[0]) == -1);
REQUIRE(resolutionValueFloat.typeIndex == typeid(float));
// Get a generic ReadDatasetWrapper<std::any> for lazy reading of
// ElectricalSeries.data
auto readDataWrapperGeneric = electricalSeries->readData();
// Instead of using values() to read typed data, we can read data as generic
// data first via valuesGeneric
DataBlockGeneric dataValuesGeneric =
readDataWrapperGeneric->valuesGeneric();
// Note that the I/O backend determines the data type and allocates
// the memory for us. The std::type_index is stored in our data block as
// well
REQUIRE(dataValuesGeneric.typeIndex == typeid(float));
// We can then later convert the data block to a typed data block
DataBlock<float> dataValueFloat =
DataBlock<float>::fromGeneric(dataValuesGeneric);
Note
In most cases, we should not need runtime checking of types in the context of specific data acquisition systems. This is mostly likely relevant if one wants to consume arbitrary NWB files that may use different data types. One approach is to use std::variant as described in Working with fields with unknown data type. Using std::variant helps avoid complex code for checking data types at runtime.
If for some reason we cannot easily use the std:variant approach (e.g., in case we need to use data types not natively supported by AqNWB), an alternative approach would be to define a mapping of the type information to the corresponding statically typedVector functionality, e.g., via switch/case` logic or by using a map for lookup, such as:
DataBlockGeneric dataValuesGeneric = readDataWrapperGeneric->valuesGeneric();
// Map to associate std::type_index with corresponding type-specific functions
std::unordered_map<std::type_index, std::function<void(const DataBlockGeneric&)>> typeMap = {
{typeid(float), processData<float>},
{typeid(int), processData<<int>},
// Add more types as needed
};
// Use the map to process the data with the approbriate type
auto it = typeMap.find(dataValuesGeneric.typeIndex);
if (it != typeMap.end()) {
it->second(dataValuesGeneric); // call the correct processData function
} else {
std::cout << "Unsupported type" << std::endl;
}