aqnwb 0.1.0
Loading...
Searching...
No Matches
HDF5 I/O

Chunking

For datasets intended for recording, AqNWB uses chunking to ensure the dataset can be extended as new data arrives during the recording process. Using chunking in HDF5, a dataset is divided into fixed-size blocks (called chunks), which are stored separately in the file. This technique is particularly beneficial for large datasets and offers several advantages:

  • Extend datasets: Chunked datasets can be easily extended in any dimension. This flexibility is crucial for recording datasets where the size of the dataset is not known in advance.
  • Performance Optimization: By carefully choosing the chunk size, you can optimize performance based on your particular read/write access patterns. When only a portion of a chunked dataset is accessed, only the relevant chunks are read or written, reducing the amount of I/O operations.
  • Compression: Data within each chunk can be compressed independently, which can help to significant reduce data size, especially for datasets with redundancy.
Warning
Choosing a chunking configuration that does not align well with the desired read/write pattern may lead to reduced performance due to repeated read, decompression, and update to the same chunk or read of extra data as chunks are always read fully.

Single-Writer Multiple-Reader (SWMR) Mode

The HDF5IO I/O backend uses by default SWMR mode while recording data. Using SWMR, one process can write to the HDF5 file and multiple other processes can read from the file concurrently while ensuring that the readers see a consistent view of the data.

Warning
There are known issues using SWMR mode on Windows due to file locking by the reader processes. One workaround is to set the environment variable HDF5_USE_FILE_LOCKING=FALSE to prevent file access errors when using a writer process with other reader processes.

Why does AqNWB use SMWR mode?

Using SWMR has several key advantages for data acquisition applications:

  • Concurrent Access: Enables one writer process to update the file while multiple reader processes read from it without blocking each other.
  • Data Consistency and Integrity: Ensures that readers see a consistent view of the data, even as it is being written. Readers will only see data that has been completely written and flushed to disk. Hence, SWMR mode, maintains the integrity and consistency of the data, ensuring that the HDF5 file remains readable even if errors should occur during the data acquisition process.
  • Real-Time Data Access: Useful for applications that need to monitor and analyze data in real-time as it is being generated.
  • Simplified Workflow for Real Time Analyses: Simplifies the architecture of applications that require real-time data consumption during acquisition, avoiding the need for intermediate storage solutions and complex inter-process communication or file locking mechanisms.
Note
While SWMR mode ensures data integrity, some data loss may still occur if the application crashes. Only data that has been completely written and flushed to disk will be readable. To manually flush data to disk use HDF5IO::flush.

Writing an NWB file with SWMR mode

SWMR mode is enabled when calling HDF5IO::startRecording. Once SWMR mode is enabled, no new data objects (Datasets, Groups, Attributes etc.) can be created, but we can only add and set values to existing data objects. Since other processes may read from the HDF5 file, it is not possible to intermittently disable SWMR mode to add new objects, i.e., once SWMR mode is enabled, the only way to add new objects to the file is to close the file and reopen in read/write mode. As such, the typical workflow when using SWMR mode during data acquisition is to:

  1. Open the HDF5 file
  2. Create all elements of the NWB file
  3. Start the recording process
  4. Stop recording and close the file

This workflow is applicable to a wide range of data acquisition use-cases. However, for use cases that require creation of new Groups and Datasets during acquisition, you can disable the use of SWMR mode by setting disableSWMRMode=true when constructing the AQNWB::HDF5::HDF5IO object.

Warning
While disabling SWMR mode allows Groups and Datasets to be created during and after recording, this comes at the cost of losing the concurrent access and data integrity features that SWMR mode provides.

Code Examples

This code snippet shows all the includes that are being used by the code examples shown in this section:

#include <filesystem>
#include <future>
#include <iostream>
#include <memory>
#include <numeric>
#include <vector>
#include <catch2/catch_test_macros.hpp>
#include "hdf5/HDF5IO.hpp"
#include "nwb/NWBFile.hpp"
#include "testUtils.hpp"
using namespace AQNWB;
namespace fs = std::filesystem;

Workflow with SWMR

// create and open the HDF5 file. SWMR mode is used by default
std::string path = getTestFilePath("testWithSWMRMode.h5");
std::unique_ptr<HDF5::HDF5IO> hdf5io = std::make_unique<HDF5::HDF5IO>(path);
hdf5io->open();
// add a dataset
std::vector<int> testData(10000);
std::iota(testData.begin(), testData.end(), 1); // Initialize testData
std::string dataPath = "/data";
SizeType numBlocks = 10; // write 10 chunks of
SizeType numSamples = testData.size();
std::unique_ptr<BaseRecordingData> dataset = hdf5io->createArrayDataSet(
BaseDataType::I32, // type
SizeArray {0}, // size. Initial size of the dataset
SizeArray {1000}, // chunking. Size of a data chunk
dataPath); // path. Path to the dataset in the HDF5 file
// Start recording. Starting the recording places the HDF5 file in SWMR mode
Status status = hdf5io->startRecording();
REQUIRE(status == Status::Success);
// Once in SWMR mode we can add data to the file but we can no longer create
// new data objects (Groups, Datasets, Attributes etc.).
REQUIRE(hdf5io->canModifyObjects() == false);
// write the our testData to the file.
for (SizeType b = 0; b <= numBlocks; b++) {
// write a single 1D block of data and flush to file
std::vector<SizeType> dataShape = {numSamples};
dataset->writeDataBlock(dataShape, BaseDataType::I32, &testData[0]);
// Optionally we can flush all data to disk
status = hdf5io->flush();
REQUIRE(status == Status::Success);
}
// stop recording. In SWMR mode the file is now closed and recording cannot
// be restarted
status = hdf5io->stopRecording();
REQUIRE(hdf5io->isOpen() == false);
REQUIRE(hdf5io->startRecording() == Status::Failure);

Workflow with SWMR disabled

// create and open the HDF5 file. With SWMR mode explicitly disabled
std::string path = getTestFilePath("testWithoutSWMRMode.h5");
std::unique_ptr<HDF5::HDF5IO> hdf5io =
std::make_unique<HDF5::HDF5IO>(path,
true // Disable SWMR mode
);
hdf5io->open();
// add a dataset
std::vector<int> testData(10000);
std::iota(testData.begin(), testData.end(), 1); // Initialize testData
std::string dataPath = "/data";
SizeType numBlocks = 10; // write 10 chunks of
SizeType numSamples = testData.size();
std::unique_ptr<BaseRecordingData> dataset = hdf5io->createArrayDataSet(
BaseDataType::I32, // type
SizeArray {0}, // size. Initial size of the dataset
SizeArray {1000}, // chunking. Size of a data chunk
dataPath); // path. Path to the dataset in the HDF5 file
// Start recording. Starting the recording places the HDF5 file in SWMR mode
Status status = hdf5io->startRecording();
REQUIRE(status == Status::Success);
// With SWMR mode disabled we are still allowed to create new data objects
// (Groups, Datasets, Attributes etc.) during the recording. However, with
// SWMR mode disabled, we lose the data consistency and concurrent read
// features that SWMR mode provides.
REQUIRE(hdf5io->canModifyObjects() == true);
// write the our testData to the file.
for (SizeType b = 0; b <= numBlocks; b++) {
// write a single 1D block of data and flush to file
std::vector<SizeType> dataShape = {numSamples};
dataset->writeDataBlock(dataShape, BaseDataType::I32, &testData[0]);
// Optionally we can flush all data to disk
status = hdf5io->flush();
REQUIRE(status == Status::Success);
}
// stop recording.
status = hdf5io->stopRecording();
// Since SWMR mode is disabled, stopping the recording won't close the file
// so that we can restart the recording if we want to
REQUIRE(hdf5io->isOpen() == true);
// Restart the recording
REQUIRE(hdf5io->startRecording() == Status::Success);
// Stop the recording and close the file
hdf5io->stopRecording();
hdf5io->close();
REQUIRE(hdf5io->isOpen() == false);

Reading with SWMR mode

While the file is being written to in SWMR mode, readers must open the file with the H5F_ACC_RDONLY flag and then enable SWMR read mode using the H5Fstart_swmr_read function, e.g.:

hid_t file_id = H5Fopen("example.h5", H5F_ACC_RDONLY, H5P_DEFAULT);
H5Fstart_swmr_read(file_id);