This tutorial will guide you through the process of exploring the CALM
Brain Resource using the almirah
Python library. We'll cover how to
load the dataset, query layouts, and databases, and generate
summaries.
First, we'll import the necessary library and load the dataset.
from almirah import Dataset
To see the available datasets, we can use the options
method:
Dataset.options()
This should output:
[<Dataset name: 'calm-brain'>]
Next, we load the CALM Brain dataset:
ds = Dataset(name="calm-brain")
ds.components
The components of the dataset are:
[<Layout root: '/path/to/data'>,
<Layout root: '/path/to/genome'>,
<Database url: 'request:calm-brain@https://calm-brain.ncbs.res.in/db-request/'>]
Layouts are parts of the dataset that represent organized data structures. Let's start by querying a layout:
lay = ds.components[0]
print(lay)
len(lay.files)
This should print the layout root and the number of files:
<Layout root: '/path/to/data'>
42652
Next, we'll explore the tags available for querying the layout:
from almirah import Tag
tags = Tag.options()
len(tags)
This returns the total number of tags:
1589
We can also view the possible tag names:
tags_names_possible = {tag.name for tag in tags}
tags_names_possible
Which outputs:
{'acquisition',
'datatype',
'direction',
'extension',
'run',
'sample',
'session',
'space',
'subject',
'suffix',
'task'}
Let's look at the options for a specific tag, such as datatype
:
Tag.options(name="datatype")
This returns:
[<Tag datatype: 'anat'>,
<Tag datatype: 'dwi'>,
<Tag datatype: 'eeg'>,
<Tag datatype: 'eyetrack'>,
<Tag datatype: 'fmap'>,
<Tag datatype: 'func'>,
<Tag datatype: 'genome'>,
<Tag datatype: 'nirs'>]
Now, let's query the layout for files of a specific datatype, such as EEG:
files = lay.query(datatype="eeg")
len(files)
This should give us the number of EEG files:
15821
We can inspect one of these files:
file = files[0]
file.rel_path
This prints the relative path of the file:
'sub-D0828/ses-101/eeg/sub-D0828_ses-101_task-auditoryPCP_run-01_events.json'
And the tags associated with the file:
file.tags
Which returns:
{'datatype': 'eeg', 'extension': '.json', 'run': '01', 'session': '101',
'subject': 'D0828', 'suffix': 'events', 'task': 'auditoryPCP'}
Next, we query the databases associated with the dataset:
db = ds.components[2]
db
This outputs the database information:
<Database url: 'request:calm-brain@https://calm-brain.ncbs.res.in/db-request/'>
We connect to the database using credentials:
db.connect("username", "password")
Now, let's query a specific table, such as presenting_disorders
, and
display some of the data:
df = db.query(table="presenting_disorders")
df[["subject", "session", "addiction"]].head()
This displays the first few rows of the queried table in a DataFrame format:
subject | session | addiction | |
---|---|---|---|
0 | D0019 | 101 | 0 |
1 | D0019 | 111 | 0 |
2 | D0020 | 101 | 0 |
3 | D0020 | 111 | <NA> |
4 | D0021 | 101 | 0 |
We can also generate summaries based on the dataset queries. For example, let's find the number of subjects with anatomical data:
anat_subject_tags = ds.query(returns="subject", datatype="anat")
anat_subjects = {subject for t in anat_subject_tags for subject in t}
len(anat_subjects)
This gives us the count of subjects with anatomical data:
699
Similarly, we can find the number of subjects with eye-tracking data:
eyetrack_subject_tags = ds.query(returns="subject", datatype="eyetrack")
eyetrack_subjects = {subject for t in eyetrack_subject_tags for subject in t}
len(eyetrack_subjects)
This gives us the count:
1075
Lastly, let's query the total number of subjects in the database:
df = db.query(table="subjects")
len(df)
This returns the total number of subjects:
2276
And the number of entries in a specific table, such as the modified Kuppuswamy socioeconomic scale:
df = db.query(table="modified_kuppuswamy_socioeconomic_scale")
len(df)
This gives us the count:
1444
This concludes the tutorial. You've learned how to load the dataset,
query its components, and generate summaries using the almirah
library.