In [1]:
import json
from dandi.dandiapi import DandiAPIClient
from tqdm.notebook import tqdm

In [2]:
client = DandiAPIClient()
dandisets = list(client.get_dandisets())

# Identify NWB dandisets
Most dandisets hold NWB-formatted data, but DANDI also hold data of other formats.

Let's start by filtering down to only the dandisets that contain at least one NWB file.

We can do this by querying the metadata of each dandiset, which tells us the data formats within in `raw_metadata["assetsSummary"]["dataStandard"]`.

If no data has been uploaded to that dandiset, the "dataStandard" field is not present.

We handle this by using the `.get` method to iterate over an empty list.

In [3]:
nwb_dandisets = []

for dandiset in tqdm(dandisets):
    raw_metadata = dandiset.get_raw_metadata()

    if any(
        "NWB" in data_standard["name"]
        for data_standard in raw_metadata["assetsSummary"].get("dataStandard", [])
    ):
        nwb_dandisets.append(dandiset)
print(f"There are currently {len(nwb_dandisets)} NWB datasets on DANDI!")

  0%|          | 0/465 [00:00<?, ?it/s]

There are currently 277 NWB datasets on DANDI!


# Filtering dandisets: species
Let's use the `nwb_dandisets` list from the previous recipe and see which of them used mice in their study.

You can find this information in `raw_metadata["assetsSummary"]["species"]`.

We'll use the same `.get` trick as above for if no data has been uploaded.

In [4]:
mouse_nwb_dandisets = []

for dandiset in tqdm(nwb_dandisets):
    raw_metadata = dandiset.get_raw_metadata()

    if any(
        "mouse" in species["name"]
        for species in raw_metadata["assetsSummary"].get("species", [])
    ):
        mouse_nwb_dandisets.append(dandiset)
print(f"There are currently {len(mouse_nwb_dandisets)} NWB datasets on DANDI that use mice!")

  0%|          | 0/277 [00:00<?, ?it/s]

There are currently 118 NWB datasets on DANDI that use mice!


# Filtering by session: species and sex
Let's say you have identified a dandiset of interest, "000005", and you want to identify all of the sessions on female mice.

You can do this by querying asset-level metadata.

Assets correspond to individual NWB files, and contain metadata extracted from those files.

The metadata of each asset contains a `.wasAttributedTo` attribute, which is a list of `Participant` objects corresponding to the subjects for that session.

We do that by first testing that attribute exists (is not `None` - some older dandisets may not have included it) and then checking the value of its `name` parameter.

In [5]:
dandiset = client.get_dandiset("000005")
female_mouse_nwb_sessions = []

assets = list(dandiset.get_assets())
for asset in tqdm(assets):
    asset_metadata = asset.get_metadata()
    subjects = asset_metadata.wasAttributedTo

    if any(
        subject.species and "mouse" in subject.species.name.lower()
        and subject.sex and subject.sex.name == "Female"
        for subject in subjects
    ):
        female_mouse_nwb_sessions.append(asset)
print(f"Dandiset #5 has {len(female_mouse_nwb_sessions)} out of {len(assets)} files that use female mice!")

  0%|          | 0/148 [00:00<?, ?it/s]

Dandiset #5 has 69 out of 148 files that use female mice!


# Going beyond
These examples show a few types of queries, but since the metadata structures are quite rich on both the dandiset and asset levels, they enable many complex queries beyond the examples here.

These metadata structures are also expanding over time as DANDI becomes more strict about what counts as essential metadata.

The `.get_raw_metadata` method of both `client.get_dandiset(...)` and `client.get_dandiset(...).get_assets()` provides a nice view into the available fields.

Note: for any attribute, it is recommended to first check that it is not `None` before checking for its value.

In [6]:
print(json.dumps(dandisets[0].get_raw_metadata(), indent=4))

{
    "id": "DANDI:000003/0.230629.1955",
    "doi": "10.48324/dandi.000003/0.230629.1955",
    "url": "https://dandiarchive.org/dandiset/000003/0.230629.1955",
    "name": "Physiological Properties and Behavioral Correlates of Hippocampal Granule Cells and Mossy Cells",
    "about": [
        {
            "name": "hippocampus",
            "schemaKey": "Anatomy",
            "identifier": "UBERON:0002421"
        }
    ],
    "access": [
        {
            "status": "dandi:OpenAccess",
            "schemaKey": "AccessRequirements",
            "contactPoint": {
                "email": "petersen.peter@gmail.com",
                "schemaKey": "ContactPoint"
            }
        }
    ],
    "license": [
        "spdx:CC-BY-4.0"
    ],
    "version": "0.230629.1955",
    "@context": "https://raw.githubusercontent.com/dandi/schema/master/releases/0.6.0/context.json",
    "citation": "Senzai, Yuta; Fernandez-Ruiz, Antonio; Buzs\u00e1ki, Gy\u00f6rgy (2023) Physiological Properties and