Reading data from an existing NWB file via AqNWB consists of the following main steps:
- Opening an existing file for reading by creating and opening a read I/O object for the file.
- Reading NWB neurodata_types by constructing the corresponding RegisteredType class to represent the
neurodata_type
, e.g., NWBFile or ElectricalSeries.
- Reading data from RegisteredType objects by creating a ReadDataWrapper wrapper object for lazy read access to the particular dataset or attribute field.
- Using ReadDataWrapper::values we can then request the parts of the data of interest, at which point the data is being loaded from disk and returned as a DataBlock, which contains a 1D vector with the data and the shape of the data.
Opening an existing file for reading
std::shared_ptr<BaseIO> readio =
createIO(
"HDF5", path);
readio->open(FileMode::ReadOnly);
- References:
- See createIO and HDF5IO
Reading NWB neurodata_types
Reading known RegisteredType objects
When the path and type of objects is fixed in the schema (or we know them based on other conventions), then we can read the types directly from the file. E.g., here we first read the NWBFile directly, which we know exists at the root "/" of the file. We then read the ElectrodeTable via the predefined NWBFile::readElectrodeTable method. The advantage of this approach is that we do not need to manually specify paths or object types. Similarly, when we read the locations
columns, we do not need to specify the name or the data type to use.
auto readNWBFile =
auto readElectrodeTable = readNWBFile->readElectrodeTable();
auto locationColumn = readElectrodeTable->readLocationColumn();
auto locationColumnValues = locationColumn->readData()->values();
std::vector<std::string> expectedLocationValues = {
"unknown", "unknown", "unknown", "unknown"};
REQUIRE(locationColumnValues.data == expectedLocationValues);
Searching for RegisteredType objects
When paths are not fixed, we can use the findTypes() function of our I/O object to conveniently search for objects with a given type.
std::unordered_set<std::string> typesToSearch = {"core::ElectricalSeries"};
std::unordered_map<std::string, std::string> found_electrical_series =
readio->findTypes(
"/",
typesToSearch,
);
- Note
- Any RegisteredType (such as our NWBFile) object) provides the convenience method findOwnedTypes which uses findTypes() to search within the given object (so that we don't need to specify the path argument). By default, findOwnedTypes uses the STOP_ON_TYPE mode, i.e., the search does not recurse further into defined types (hence, returning only data elements that the object owns directly). Alternatively, we can set the search mode to CONTINUE_ON_TYPE to search recursively through all types (here the whole file since we started at the root
"/"
).
- Warning
- The current implementation of findTypes() is not aware of inheritance but searches for exact matches of types only. However, we can search for objects of multiple different times at the same time by specifying multiple types to search for in our
typesToSearch
.
The found_electrical_series
provides us with a map where each key is the path to an object and its corresponding value is the type of the object. Using this information we can read the neurodata_type
objects from the file via the RegisteredType::create factory methods to conveniently construct an instance of the corresponding class in AqNWB.
std::string esdata_path = "/acquisition/esdata0";
auto readElectricalSeries =
readio);
- Note
- findTypes does not guarantee that objects are returned in any particular order. Instead of retrieving the first object via
found_electrical_series.begin()->first;
we here fix the esdata_path
path variable to ensure consistent behavior of the tutorial across platforms.
- Note
- RegisteredType::create comes in a few different flavors:
- When passing the 1)
path
and 2) io
, and 3) specifying the type as a template parameter (as in the example above), the instance is being constructed using the common constructor and we get a pointer to the specific type directly. I.e., the above example is equivalent to creating the object via auto readElectricalSeries = ElectricalSeries(path, io)
.
- When passing only 1)
path
and 2) io
, AqNWB reads the neurodata_type
and namespace
attributes from the NWB file to determine the type to use (e.g., ElectricalSeries) and then returns generated instance is then returned as a generic RegisteredType pointer that we can then cast to the specific type if necessary, e.g., via auto readElectricalSeries = std::dynamic_pointer_cast<AQNWB::NWB::ElectricalSeries>(readRegisteredType);
.
- When passing the 1)
fullname
(e.g., core::ElectricalSeries
), 2) path
and 3) io
, the behavior is the same as in option 2, but we avoid reading the type neurodata_type
and namespace
attributes from the file to determine the type. This option is useful when we used findTypes, since we have already determined the type information during the search, so that we can use found_electrical_series.begin()->second
to set the fullname
.
Reading data from RegisteredType objects
Now we can read fields and subsets of data from the fields
Reading predefined data fields
For fields with a predefined, fixed name in the schema, AQNWB provides read methods for convenient access to such common data fields.
auto readElectricalSeriesData = readElectricalSeries->readData();
auto readDataValues = readElectricalSeriesData->values();
auto readBoostMultiArray = readDataValues.as_multi_array<2>();
REQUIRE(readDataValues.data.size() == (numSamples * numChannels));
REQUIRE(readDataValues.shape[0] == numSamples);
REQUIRE(readDataValues.shape[1] == numChannels);
std::vector<SizeType> start = {0, 0};
std::vector<SizeType> count = {9, 1};
auto dataSlice = readElectricalSeriesData->values(start, count);
REQUIRE(dataSlice.data.size() == 9);
REQUIRE(dataSlice.shape[0] == 9);
REQUIRE(dataSlice.shape[1] == 1);
std::string esUnitValue =
readElectricalSeries->readDataUnit()->values().data[0];
REQUIRE(esUnitValue == std::string("volts"));
- Note
- For attributes, slicing is disabled at compile time since attributes are intended for small data only.
Reading arbitrary fields
Even if there is no dedicated DEFINE_FIELD
definition available, we can still read any arbitrary sub-field associated with a particular RegisteredType via the generic RegisteredType::readField method. For example, to read the data from the ElectricalSeries:
auto readElectricalSeriesData3 =
readElectricalSeries->readField<StorageObjectType::Dataset, float>(
std::string("data"));
auto readDataValues3 = readElectricalSeriesData3->values();
REQUIRE(readDataValues3.data.size() == (numSamples * numChannels));
- Note
- Using this approach, we need to specify the template parameters to use with the ReadDataWrapper, i.e.:
- Warning
- In particular for fields that are optional, it is useful to first check that the field actually exists via ReadDataWrapper::exists.
Similarly, we can also read any sub-fields that are themselves RegisteredType objects:
auto readRegisteredType = readNWBFile->readField(esdata_path);
std::shared_ptr<AQNWB::NWB::ElectricalSeries> readElectricalSeries2 =
std::dynamic_pointer_cast<AQNWB::NWB::ElectricalSeries>(
readRegisteredType);
REQUIRE(readElectricalSeries2 != nullptr);
- Note
- Even though we here do not specify the template parameter for RegisteredType::create, the function still creates the correct type by reading the type information from the NWB file, however, because we do not specify the type, the function returns the object as a pointer of RegisteredType, that we can then subsequently cast to the approbriate type if necessary.
Working with fields with unknown data type
C++ is a statically typed language, i.e., we need to know the type of every variable at compile time. This can be particularly challenging when reading data from disk where the data type may not be known before-hand. AqNWB helps us here by allocating memory and determining data types for us when reading data fields. However, when we want to compute on the data, we still need to know the data type, e.g., to use the typed DataBlock<DTYPE> we need to know the DTYPE.
Using std::variant with std::visit (introduced in C++17) provides an alternative approach, that can help us avoid having to write complex switch/case
statements to check for all possible types when we don't know the data type beforehand. E.g., using std::visit
we can define a set of functions to compute the mean for any 1D std::vector
:
template<typename T>
inline double compute_mean(const T& data)
{
if (data.empty()) {
throw std::runtime_error("Data vector is empty");
}
double sum = std::accumulate(data.begin(), data.end(), 0.0);
return sum / data.size();
}
inline double compute_mean(const BaseDataType::BaseDataVectorVariant& variant)
{
return std::visit(
[](auto&& arg) -> double
{
using T = std::decay_t<decltype(arg)>;
if constexpr (std::is_same_v<T, std::monostate>) {
throw std::runtime_error("Invalid data type");
} else if constexpr (std::is_same_v<T, std::vector<std::string>>) {
throw std::runtime_error("Cannot compute mean of string data");
} else {
return compute_mean(arg);
}
},
variant);
}
Using DataBlockGeneric::as_variant we can then cast our data to a BaseDataVectorVariant, which is am std::variant
representing a 1D std::vector
containing values of any valid BaseDataType. We can then using our compute_mean
methods to conveniently compute on the data without having to explicitly specify the type of the data ourselves.
DataBlockGeneric genericDataBlock =
readElectricalSeriesData->valuesGeneric();
BaseDataType::BaseDataVectorVariant variantData =
genericDataBlock.as_variant();
double meanFromVariant = compute_mean(variantData);
double meanFromTypedVector =
compute_mean<std::vector<float>>(readDataValues.data);
REQUIRE(meanFromVariant == Catch::Approx(meanFromTypedVector));
Further reading