Support for (hierarchical) settings files in file-tree and fsl-pipe
Rationale
The goal is to allow the setting of pipeline parameters in a JSON or YAML file and to allow these parameters to be overridden for specific subjects/sessions.
These parameters could represent acquisition parameters relevant to the processing of the data (e.g., total_readout_time
and pe_direction
for topup) or flags used in the processing pipeline (e.g., the f
-parameter in FSL BET).
Overriding parameters will be possible by providing a separate JSON/YAML file specific to the subject (like in BIDS) or by adding subject-specific values to the top-level JSON/YAML file.
This will allow analysis of data with inconsistent acquisition (e.g., changing the pe_direction
for one subject) or that require different parameters (e.g., changing BET's f
-parameter for a subject for which the default value failed to give a good brain extraction).
These subject-specific values would be used consistently when the pipeline is rerun, making such an approach far more reproducible than the alternative of running the pipeline separately for individual subjects. The pipeline creator will be able to include default values for any parameters within the pipeline, which would allow the pipeline to run even if the user does not provide a JSON/YAML file including all possible settings.
Proposal
-
Setting files will be marked in a file-tree based on their key (
"<settings group name>_s1"
,"<settings group name>_s2"
, etc.). For example, the BIDS hierarchical settings would look line:T1w.json (T1w_s1) sub-{subject} sub-{subject}_T1w.json (T1w_s2) [ses-{session}] sub-{subject}[_ses-{session}]_T1w.json (T1w_s3) sub-{subject}[_ses-{session}]_T1w.nii.gz (T1w)
-
new
FileTree.setting("<settings group name>", "<setting_name>", default=None)
method will search the settings iteratively for the given setting. For example, the following retrieves the TE set in the top-level JSON file ("T1w_s1"), as well as the TE for one specific subject (i.e., the value defined in "T1w_s2", with the value in "T1w_s1" used as a fallback).
import file_tree
tree = file_tree.read("data.tree")
general_TE = tree.setting("T1w", "echo-time")
print(f"Generic TE is {subject_TE} ms.")
for subject_tree in tree.iter("subject"):
subject_id = subject_tree.placeholders["subject"]
subject_TE = subject_tree.settings("T1w", "echo-time")
print(f"Subject {subject_id} has a TE of {subject_TE} ms.")
- These settings will be available in
fsl-pipe
usingSetting
. For example, the following job retrieves the echo time (key of "echo-time") and repetition time (key of "TR") for the T1-weighted image to be used for processing:
@pipe
def structural_processing(T1w: In, TR: Setting("T1w"), TE: Setting("diffusion", "echo-time"), ...):
...
- If
no_iter
is used for some Variable infsl-pipe
, multiple values might be returned. For example, in the following the TR and TE is returned for every subject in axarray.DataArray
(same as for multiple input/output filenames):@pipe def structural_processing(T1w: In, TR: Setting("T1w"), TE: Setting("diffusion", "echo-time")=10., subject: Var(no_iter=True)): ...
- Note, that we set a default value of 10 for "echo-time" here. If the "echo-time" is not defined in any of the settings file, this default will be used. If there is no value for the "TR" in any of the setting files, an error will be raised.
Features
- Support JSON and YAML
- Also allow subject- and session-specific parameters to be set in the top-level settings file. For example, in the following JSON the echo-time for subject A was 4 ms rather than the default 3 ms. Similarly, the first session of subject B had a different TR (800 ms instead of 1000 ms).
{
"echo-time": 3.,
"TR": 1000,
"subject": {
"A": {
"echo-time": 4.
},
"B": {
"session": {
1: {"TR": 800}
}
}
}
}
More formal specification of hierarchical parameter search
The first parameter to FileTree.setting()
or fsl_pipe.Setting
is the <group setting name>
. We start by searching for any templates in the FileTree
matching <group setting name>_s<integer>
. These are sorted by the <integer>
value from highest number to lowest (not alphabetically).
For each file we try to find a value for the user-requested setting (i.e., second parameter to FileTree.setting()
or fsl_pipe.Setting
; if not provided in fsl_pipe
it defaults to the keyword argument name). As soon as the value is found it is returned to the user.
For each file, we take the following steps to find the setting value:
- read the file into a dictionary-like object
- apply a depth-first search for the parameter value:
- Identify any keys matching placeholders that have set values (e.g., "subject" in JSON shown above)
- Follow down that dictionary if the placeholder values matches any dictionary values (e.g., "subject" of "A" or "B" in JSON shown above)
- Repeat process above until the deepest level is reached
- If this dictionary contains a value for the requested setting, return it. Otherwise, move back up to find the lowest level on which the value exists.
- Resolve any conflicts that might arise from the procedure described above. For example, in the following example, it is not clear which value to return for the "echo-time" if the subject is "A" and the session is "1":
{ "echo-time": 50, "subject": { "A": { {"echo-time": 100} } } "session": { 1: { {"echo-time": 30} } } }
{ "echo-time": 50, "subject": { "A": { {"echo-time": 100} } "session": { 1: { {"echo-time": 70} } } } "session": { 1: { {"echo-time": 30} } } }
If after iterating over all files no value for the setting is found, the default value is returned. This is the third parameter to FileTree.setting()
or the value assigned to the keyword in fsl_pipe
(i.e. TR: Setting("T1w")=<default_value>
). If no default value is provided, an SettingNotFoundError
is raised.