exploring-with-almirah.md 5.3 KB

Exploring the CALM Brain Resource with almirah

This tutorial will guide you through the process of exploring the CALM Brain Resource using the almirah Python library. We'll cover how to load the dataset, query layouts, and databases, and generate summaries.

Loading the Dataset

First, we'll import the necessary library and load the dataset.

from almirah import Dataset

To see the available datasets, we can use the options method:

Dataset.options()

This should output:

[<Dataset name: 'calm-brain'>]

Next, we load the CALM Brain dataset:

ds = Dataset(name="calm-brain")
ds.components

The components of the dataset are:

[<Layout root: '/path/to/data'>,
 <Layout root: '/path/to/genome'>,
 <Database url: 'request:calm-brain@https://calm-brain.ncbs.res.in/db-request/'>]

Querying Layouts

Layouts are parts of the dataset that represent organized data structures. Let's start by querying a layout:

lay = ds.components[0]
print(lay)
len(lay.files)

This should print the layout root and the number of files:

<Layout root: '/path/to/data'>
42652

Next, we'll explore the tags available for querying the layout:

from almirah import Tag

tags = Tag.options()
len(tags)

This returns the total number of tags:

1589

We can also view the possible tag names:

tags_names_possible = {tag.name for tag in tags}
tags_names_possible

Which outputs:

{'acquisition',
 'datatype',
 'direction',
 'extension',
 'run',
 'sample',
 'session',
 'space',
 'subject',
 'suffix',
 'task'}

Let's look at the options for a specific tag, such as datatype:

Tag.options(name="datatype")

This returns:

[<Tag datatype: 'anat'>,
 <Tag datatype: 'dwi'>,
 <Tag datatype: 'eeg'>,
 <Tag datatype: 'eyetrack'>,
 <Tag datatype: 'fmap'>,
 <Tag datatype: 'func'>,
 <Tag datatype: 'genome'>,
 <Tag datatype: 'nirs'>]

Now, let's query the layout for files of a specific datatype, such as EEG:

files = lay.query(datatype="eeg")
len(files)

This should give us the number of EEG files:

15821

We can inspect one of these files:

file = files[0]
file.rel_path

This prints the relative path of the file:

'sub-D0828/ses-101/eeg/sub-D0828_ses-101_task-auditoryPCP_run-01_events.json'

And the tags associated with the file:

file.tags

Which returns:

{'datatype': 'eeg', 'extension': '.json', 'run': '01', 'session': '101',
'subject': 'D0828', 'suffix': 'events', 'task': 'auditoryPCP'}

Querying Databases

Next, we query the databases associated with the dataset:

db = ds.components[2]
db

This outputs the database information:

<Database url: 'request:calm-brain@https://calm-brain.ncbs.res.in/db-request/'>

We connect to the database using credentials:

db.connect("username", "password")

Now, let's query a specific table, such as presenting_disorders, and display some of the data:

df = db.query(table="presenting_disorders")
df[["subject", "session", "addiction"]].head()

This displays the first few rows of the queried table in a DataFrame format:

subject session addiction
0 D0019 101 0
1 D0019 111 0
2 D0020 101 0
3 D0020 111 <NA>
4 D0021 101 0

Generating Summaries

We can also generate summaries based on the dataset queries. For example, let's find the number of subjects with anatomical data:

anat_subject_tags = ds.query(returns="subject", datatype="anat")
anat_subjects = {subject for t in anat_subject_tags for subject in t}
len(anat_subjects)

This gives us the count of subjects with anatomical data:

699

Similarly, we can find the number of subjects with eye-tracking data:

eyetrack_subject_tags = ds.query(returns="subject", datatype="eyetrack")
eyetrack_subjects = {subject for t in eyetrack_subject_tags for subject in t}
len(eyetrack_subjects)

This gives us the count:

1075    

Lastly, let's query the total number of subjects in the database:

df = db.query(table="subjects")
len(df)

This returns the total number of subjects:

2276

And the number of entries in a specific table, such as the modified Kuppuswamy socioeconomic scale:

df = db.query(table="modified_kuppuswamy_socioeconomic_scale")
len(df)

This gives us the count:

1444

This concludes the tutorial. You've learned how to load the dataset, query its components, and generate summaries using the almirah library.