Chunking
For datasets intended for recording, AqNWB
uses chunking to ensure the dataset can be extended as new data arrives during the recording process. Using chunking in HDF5, a dataset is divided into fixed-size blocks (called chunks), which are stored separately in the file. This technique is particularly beneficial for large datasets and offers several advantages:
- Extend datasets: Chunked datasets can be easily extended in any dimension. This flexibility is crucial for recording datasets where the size of the dataset is not known in advance.
- Performance Optimization: By carefully choosing the chunk size, you can optimize performance based on your particular read/write access patterns. When only a portion of a chunked dataset is accessed, only the relevant chunks are read or written, reducing the amount of I/O operations.
- Compression: Data within each chunk can be compressed independently, which can help to significant reduce data size, especially for datasets with redundancy.
- Warning
- Choosing a chunking configuration that does not align well with the desired read/write pattern may lead to reduced performance due to repeated read, decompression, and update to the same chunk or read of extra data as chunks are always read fully.
Single-Writer Multiple-Reader (SWMR) Mode
The HDF5IO I/O backend uses by default SWMR mode while recording data. Using SWMR, one process can write to the HDF5 file and multiple other processes can read from the file concurrently while ensuring that the readers see a consistent view of the data.
- Warning
- There are known issues using SWMR mode on Windows due to file locking by the reader processes. One workaround is to set the environment variable
HDF5_USE_FILE_LOCKING=FALSE
to prevent file access errors when using a writer process with other reader processes.
Why does AqNWB use SMWR mode?
Using SWMR has several key advantages for data acquisition applications:
- Concurrent Access: Enables one writer process to update the file while multiple reader processes read from it without blocking each other.
- Data Consistency and Integrity: Ensures that readers see a consistent view of the data, even as it is being written. Readers will only see data that has been completely written and flushed to disk. Hence, SWMR mode, maintains the integrity and consistency of the data, ensuring that the HDF5 file remains readable even if errors should occur during the data acquisition process.
- Real-Time Data Access: Useful for applications that need to monitor and analyze data in real-time as it is being generated.
- Simplified Workflow for Real Time Analyses: Simplifies the architecture of applications that require real-time data consumption during acquisition, avoiding the need for intermediate storage solutions and complex inter-process communication or file locking mechanisms.
- Note
- While SWMR mode ensures data integrity, some data loss may still occur if the application crashes. Only data that has been completely written and flushed to disk will be readable. To manually flush data to disk use HDF5IO::flush.
Writing an NWB file with SWMR mode
SWMR mode is enabled when calling HDF5IO::startRecording. Once SWMR mode is enabled, no new data objects (Datasets, Groups, Attributes etc.) can be created, but we can only add and set values to existing data objects. Since other processes may read from the HDF5 file, it is not possible to intermittently disable SWMR mode to add new objects, i.e., once SWMR mode is enabled, the only way to add new objects to the file is to close the file and reopen in read/write mode. As such, the typical workflow when using SWMR mode during data acquisition is to:
- Open the HDF5 file
- Create all elements of the NWB file
- Start the recording process
- Stop recording and close the file
This workflow is applicable to a wide range of data acquisition use-cases. However, for use cases that require creation of new Groups and Datasets during acquisition, you can disable the use of SWMR mode by setting disableSWMRMode=true
when constructing the AQNWB::HDF5::HDF5IO object.
- Warning
- While disabling SWMR mode allows Groups and Datasets to be created during and after recording, this comes at the cost of losing the concurrent access and data integrity features that SWMR mode provides.
Code Examples
This code snippet shows all the includes that are being used by the code examples shown in this section:
#include <filesystem>
#include <future>
#include <iostream>
#include <memory>
#include <numeric>
#include <vector>
#include <catch2/catch_test_macros.hpp>
#include "testUtils.hpp"
namespace fs = std::filesystem;
Workflow with SWMR
std::string path = getTestFilePath("testWithSWMRMode.h5");
std::unique_ptr<HDF5::HDF5IO> hdf5io = std::make_unique<HDF5::HDF5IO>(path);
hdf5io->open();
std::vector<int> testData(10000);
std::iota(testData.begin(), testData.end(), 1);
std::string dataPath = "/data";
std::unique_ptr<BaseRecordingData> dataset = hdf5io->createArrayDataSet(
BaseDataType::I32,
dataPath);
Status status = hdf5io->startRecording();
REQUIRE(status == Status::Success);
REQUIRE(hdf5io->canModifyObjects() == false);
for (
SizeType b = 0; b <= numBlocks; b++) {
std::vector<SizeType> dataShape = {numSamples};
dataset->writeDataBlock(dataShape, BaseDataType::I32, &testData[0]);
status = hdf5io->flush();
REQUIRE(status == Status::Success);
}
status = hdf5io->stopRecording();
REQUIRE(hdf5io->isOpen() == false);
REQUIRE(hdf5io->startRecording() == Status::Failure);
Workflow with SWMR disabled
std::string path = getTestFilePath("testWithoutSWMRMode.h5");
std::unique_ptr<HDF5::HDF5IO> hdf5io =
std::make_unique<HDF5::HDF5IO>(path,
true
);
hdf5io->open();
std::vector<int> testData(10000);
std::iota(testData.begin(), testData.end(), 1);
std::string dataPath = "/data";
std::unique_ptr<BaseRecordingData> dataset = hdf5io->createArrayDataSet(
BaseDataType::I32,
dataPath);
Status status = hdf5io->startRecording();
REQUIRE(status == Status::Success);
REQUIRE(hdf5io->canModifyObjects() == true);
for (
SizeType b = 0; b <= numBlocks; b++) {
std::vector<SizeType> dataShape = {numSamples};
dataset->writeDataBlock(dataShape, BaseDataType::I32, &testData[0]);
status = hdf5io->flush();
REQUIRE(status == Status::Success);
}
status = hdf5io->stopRecording();
REQUIRE(hdf5io->isOpen() == true);
REQUIRE(hdf5io->startRecording() == Status::Success);
hdf5io->stopRecording();
hdf5io->close();
REQUIRE(hdf5io->isOpen() == false);
Reading with SWMR mode
While the file is being written to in SWMR mode, readers must open the file with the H5F_ACC_RDONLY
flag and then enable SWMR read mode using the H5Fstart_swmr_read
function, e.g.:
hid_t file_id = H5Fopen("example.h5", H5F_ACC_RDONLY, H5P_DEFAULT);
H5Fstart_swmr_read(file_id);