|
@@ -1,27 +1,45 @@
|
|
|
# Exploring the CALM Brain Resource with almirah
|
|
|
|
|
|
-## Load the dataset
|
|
|
+This tutorial will guide you through the process of exploring the CALM
|
|
|
+Brain Resource using the `almirah` Python library. We'll cover how to
|
|
|
+load the dataset, query layouts, and databases, and generate
|
|
|
+summaries.
|
|
|
|
|
|
+## Loading the Dataset
|
|
|
+
|
|
|
+First, we'll import the necessary library and load the dataset.
|
|
|
|
|
|
```python
|
|
|
from almirah import Dataset
|
|
|
```
|
|
|
+
|
|
|
+To see the available datasets, we can use the `options` method:
|
|
|
+
|
|
|
```python
|
|
|
Dataset.options()
|
|
|
```
|
|
|
|
|
|
+This should output:
|
|
|
+
|
|
|
[<Dataset name: 'calm-brain'>]
|
|
|
-
|
|
|
+
|
|
|
+Next, we load the CALM Brain dataset:
|
|
|
+
|
|
|
```python
|
|
|
ds = Dataset(name="calm-brain")
|
|
|
ds.components
|
|
|
```
|
|
|
|
|
|
+The components of the dataset are:
|
|
|
+
|
|
|
[<Layout root: '/path/to/data'>,
|
|
|
<Layout root: '/path/to/genome'>,
|
|
|
- <Database url: 'request:calm-brain@https://www.calm-brain.ncbs.res.in/db-request/'>]
|
|
|
+ <Database url: 'request:calm-brain@https://calm-brain.ncbs.res.in/db-request/'>]
|
|
|
+
|
|
|
+## Querying Layouts
|
|
|
|
|
|
-## Quering layouts
|
|
|
+Layouts are parts of the dataset that represent organized data
|
|
|
+structures. Let's start by querying a layout:
|
|
|
|
|
|
```python
|
|
|
lay = ds.components[0]
|
|
@@ -29,10 +47,13 @@ print(lay)
|
|
|
len(lay.files)
|
|
|
```
|
|
|
|
|
|
- <Layout root: '/path/to/data'>
|
|
|
+This should print the layout root and the number of files:
|
|
|
|
|
|
+ <Layout root: '/path/to/data'>
|
|
|
42652
|
|
|
|
|
|
+Next, we'll explore the tags available for querying the layout:
|
|
|
+
|
|
|
```python
|
|
|
from almirah import Tag
|
|
|
|
|
@@ -40,13 +61,19 @@ tags = Tag.options()
|
|
|
len(tags)
|
|
|
```
|
|
|
|
|
|
+This returns the total number of tags:
|
|
|
+
|
|
|
1589
|
|
|
-
|
|
|
+
|
|
|
+We can also view the possible tag names:
|
|
|
+
|
|
|
```python
|
|
|
tags_names_possible = {tag.name for tag in tags}
|
|
|
tags_names_possible
|
|
|
```
|
|
|
|
|
|
+Which outputs:
|
|
|
+
|
|
|
{'acquisition',
|
|
|
'datatype',
|
|
|
'direction',
|
|
@@ -59,10 +86,14 @@ tags_names_possible
|
|
|
'suffix',
|
|
|
'task'}
|
|
|
|
|
|
+Let's look at the options for a specific tag, such as `datatype`:
|
|
|
+
|
|
|
```python
|
|
|
Tag.options(name="datatype")
|
|
|
```
|
|
|
|
|
|
+This returns:
|
|
|
+
|
|
|
[<Tag datatype: 'anat'>,
|
|
|
<Tag datatype: 'dwi'>,
|
|
|
<Tag datatype: 'eeg'>,
|
|
@@ -72,51 +103,78 @@ Tag.options(name="datatype")
|
|
|
<Tag datatype: 'genome'>,
|
|
|
<Tag datatype: 'nirs'>]
|
|
|
|
|
|
+Now, let's query the layout for files of a specific datatype, such as EEG:
|
|
|
+
|
|
|
```python
|
|
|
files = lay.query(datatype="eeg")
|
|
|
len(files)
|
|
|
```
|
|
|
|
|
|
+This should give us the number of EEG files:
|
|
|
+
|
|
|
15821
|
|
|
|
|
|
+We can inspect one of these files:
|
|
|
+
|
|
|
```python
|
|
|
file = files[0]
|
|
|
file.rel_path
|
|
|
```
|
|
|
|
|
|
+This prints the relative path of the file:
|
|
|
+
|
|
|
'sub-D0828/ses-101/eeg/sub-D0828_ses-101_task-auditoryPCP_run-01_events.json'
|
|
|
|
|
|
+And the tags associated with the file:
|
|
|
+
|
|
|
```python
|
|
|
file.tags
|
|
|
```
|
|
|
|
|
|
- {'datatype': 'eeg', 'extension': '.json', 'run': '01', 'session': '101', 'subject': 'D0828', 'suffix': 'events', 'task': 'auditoryPCP'}
|
|
|
+Which returns:
|
|
|
+
|
|
|
+ {'datatype': 'eeg', 'extension': '.json', 'run': '01', 'session': '101',
|
|
|
+ 'subject': 'D0828', 'suffix': 'events', 'task': 'auditoryPCP'}
|
|
|
|
|
|
-## Querying databases
|
|
|
+## Querying Databases
|
|
|
+
|
|
|
+Next, we query the databases associated with the dataset:
|
|
|
|
|
|
```python
|
|
|
db = ds.components[2]
|
|
|
db
|
|
|
```
|
|
|
|
|
|
- <Database url: 'request:calm-brain@https://www.calm-brain.ncbs.res.in/db-request/'>
|
|
|
+This outputs the database information:
|
|
|
+
|
|
|
+ <Database url: 'request:calm-brain@https://calm-brain.ncbs.res.in/db-request/'>
|
|
|
+
|
|
|
+We connect to the database using credentials:
|
|
|
|
|
|
```python
|
|
|
db.connect("username", "password")
|
|
|
+```
|
|
|
+
|
|
|
+Now, let's query a specific table, such as `presenting_disorders`, and
|
|
|
+display some of the data:
|
|
|
+
|
|
|
+```python
|
|
|
df = db.query(table="presenting_disorders")
|
|
|
df[["subject", "session", "addiction"]].head()
|
|
|
```
|
|
|
|
|
|
+This displays the first few rows of the queried table in a DataFrame format:
|
|
|
+
|
|
|
<div>
|
|
|
<style scoped>
|
|
|
.dataframe tbody tr th:only-of-type {
|
|
|
- vertical-align: middle;
|
|
|
+ vertical-align: middle.
|
|
|
}
|
|
|
.dataframe tbody tr th {
|
|
|
- vertical-align: top;
|
|
|
+ vertical-align: top.
|
|
|
}
|
|
|
.dataframe thead th {
|
|
|
- text-align: right;
|
|
|
+ text-align: right.
|
|
|
}
|
|
|
</style>
|
|
|
<table border="1" class="dataframe">
|
|
@@ -136,7 +194,7 @@ df[["subject", "session", "addiction"]].head()
|
|
|
<td>0</td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
- <th>1</th>
|
|
|
+ <th>1</td>
|
|
|
<td>D0019</td>
|
|
|
<td>111</td>
|
|
|
<td>0</td>
|
|
@@ -163,7 +221,10 @@ df[["subject", "session", "addiction"]].head()
|
|
|
</table>
|
|
|
</div>
|
|
|
|
|
|
-## Generating summaries
|
|
|
+## Generating Summaries
|
|
|
+
|
|
|
+We can also generate summaries based on the dataset queries. For
|
|
|
+example, let's find the number of subjects with anatomical data:
|
|
|
|
|
|
```python
|
|
|
anat_subject_tags = ds.query(returns="subject", datatype="anat")
|
|
@@ -171,27 +232,45 @@ anat_subjects = {subject for t in anat_subject_tags for subject in t}
|
|
|
len(anat_subjects)
|
|
|
```
|
|
|
|
|
|
+This gives us the count of subjects with anatomical data:
|
|
|
+
|
|
|
699
|
|
|
|
|
|
+Similarly, we can find the number of subjects with eye-tracking data:
|
|
|
+
|
|
|
```python
|
|
|
eyetrack_subject_tags = ds.query(returns="subject", datatype="eyetrack")
|
|
|
eyetrack_subjects = {subject for t in eyetrack_subject_tags for subject in t}
|
|
|
len(eyetrack_subjects)
|
|
|
```
|
|
|
|
|
|
+This gives us the count:
|
|
|
+
|
|
|
1075
|
|
|
|
|
|
+Lastly, let's query the total number of subjects in the database:
|
|
|
+
|
|
|
```python
|
|
|
df = db.query(table="subjects")
|
|
|
len(df)
|
|
|
```
|
|
|
|
|
|
+This returns the total number of subjects:
|
|
|
+
|
|
|
2276
|
|
|
|
|
|
+And the number of entries in a specific table, such as the modified
|
|
|
+Kuppuswamy socioeconomic scale:
|
|
|
+
|
|
|
```python
|
|
|
df = db.query(table="modified_kuppuswamy_socioeconomic_scale")
|
|
|
len(df)
|
|
|
```
|
|
|
|
|
|
+This gives us the count:
|
|
|
+
|
|
|
1444
|
|
|
|
|
|
+This concludes the tutorial. You've learned how to load the dataset,
|
|
|
+query its components, and generate summaries using the `almirah`
|
|
|
+library.
|