# Exploring the CALM Brain Resource with almirah This tutorial will guide you through the process of exploring the CALM Brain Resource using the `almirah` Python library. We'll cover how to load the dataset, query layouts, and databases, and generate summaries. ## Loading the Dataset First, we'll import the necessary library and load the dataset. ```python from almirah import Dataset ``` To see the available datasets, we can use the `options` method: ```python Dataset.options() ``` This should output: [] Next, we load the CALM Brain dataset: ```python ds = Dataset(name="calm-brain") ds.components ``` The components of the dataset are: [, , ] ## Querying Layouts Layouts are parts of the dataset that represent organized data structures. Let's start by querying a layout: ```python lay = ds.components[0] print(lay) len(lay.files) ``` This should print the layout root and the number of files: 42652 Next, we'll explore the tags available for querying the layout: ```python from almirah import Tag tags = Tag.options() len(tags) ``` This returns the total number of tags: 1589 We can also view the possible tag names: ```python tags_names_possible = {tag.name for tag in tags} tags_names_possible ``` Which outputs: {'acquisition', 'datatype', 'direction', 'extension', 'run', 'sample', 'session', 'space', 'subject', 'suffix', 'task'} Let's look at the options for a specific tag, such as `datatype`: ```python Tag.options(name="datatype") ``` This returns: [, , , , , , , ] Now, let's query the layout for files of a specific datatype, such as EEG: ```python files = lay.query(datatype="eeg") len(files) ``` This should give us the number of EEG files: 15821 We can inspect one of these files: ```python file = files[0] file.rel_path ``` This prints the relative path of the file: 'sub-D0828/ses-101/eeg/sub-D0828_ses-101_task-auditoryPCP_run-01_events.json' And the tags associated with the file: ```python file.tags ``` Which returns: {'datatype': 'eeg', 'extension': '.json', 'run': '01', 'session': '101', 'subject': 'D0828', 'suffix': 'events', 'task': 'auditoryPCP'} ## Querying Databases Next, we query the databases associated with the dataset: ```python db = ds.components[2] db ``` This outputs the database information: We connect to the database using credentials: ```python db.connect("username", "password") ``` Now, let's query a specific table, such as `presenting_disorders`, and display some of the data: ```python df = db.query(table="presenting_disorders") df[["subject", "session", "addiction"]].head() ``` This displays the first few rows of the queried table in a DataFrame format:
subject session addiction
0 D0019 101 0
1 D0019 111 0
2 D0020 101 0
3 D0020 111 <NA>
4 D0021 101 0
## Generating Summaries We can also generate summaries based on the dataset queries. For example, let's find the number of subjects with anatomical data: ```python anat_subject_tags = ds.query(returns="subject", datatype="anat") anat_subjects = {subject for t in anat_subject_tags for subject in t} len(anat_subjects) ``` This gives us the count of subjects with anatomical data: 699 Similarly, we can find the number of subjects with eye-tracking data: ```python eyetrack_subject_tags = ds.query(returns="subject", datatype="eyetrack") eyetrack_subjects = {subject for t in eyetrack_subject_tags for subject in t} len(eyetrack_subjects) ``` This gives us the count: 1075 Lastly, let's query the total number of subjects in the database: ```python df = db.query(table="subjects") len(df) ``` This returns the total number of subjects: 2276 And the number of entries in a specific table, such as the modified Kuppuswamy socioeconomic scale: ```python df = db.query(table="modified_kuppuswamy_socioeconomic_scale") len(df) ``` This gives us the count: 1444 This concludes the tutorial. You've learned how to load the dataset, query its components, and generate summaries using the `almirah` library.