__init__.py 11.6 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
"""
Easy format to define input/output files in a python pipeline.

The goal is to separate the definition of the input/output filenames from the actual code
by defining a directory tree (i.e., FileTree) in a separate file from the code.

Loading FileTrees
-----------------
.. code-block:: python

    from fsl.utils.filetree import FileTree, tree_directories
    tree = FileTree.read('bids_raw')

This creates a `tree` object that describes input/output filenames
for your pipeline based on `this file <trees/bids_raw.tree>`

:py:func:`filetree.FileTree.read` will search through the `filetree.tree_directories` list of directories
for any FileTrees matching the given name. This list by default includes the current directory. Of course,
19
20
a full path to the requested FileTree can also be provided. This includes all FileTrees defined
`here <https://git.fmrib.ox.ac.uk/fsl/fslpy/tree/master/fsl/utils/filetree/trees>`_.
21
22
23
24
25
26
27

FileTree format
---------------
The FileTrees are defined in a simple to type format, where indendation is used to indicate subdirectories, for example:

::

28
    # Any text following a #-character can be used for comments
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
    parent
        file1.txt
        child
            file2
        file3.txt
    file4.txt

In the top-level directory this represents one file ("file4.txt") and one directory ("parent"). The directory
contains two files ("file1.txt" and "file3.txt") and one directory ("child") which contains a single file ("file2").

Individual aspects of this format are defined in more detail below.

Short names
^^^^^^^^^^^
Each directory and file in the FileTree is assigned a short name for convenient access.
For example, for the FileTree

::

    parent
        file1.txt
        child
            file2
        file3.txt
    file4.txt

We can load this FileTree using

.. code-block:: python

    >>> tree = FileTree.read(<tree filename>)
    >>> tree.get('file2')
    'parent/child/file2'
    >>> tree.get('child')
    'parent/child'

These filenames will be returned whether the underlying file exists or not (see :py:func:`filetree.FileTree.get`).

By default the short name will be the name of the file or directory without extension (i.e., everything the first dot).
The short name can be explicitly set by including it in round brackets behind the filename,
so ``left_hippocampus_segment_from_first.nii.gz (Lhipp)`` will have the short name "Lhipp"
rather than "left_hippocampus_segment_from_first"). This allows changing of the filenames
without having to alter the short names used to refer to those filenames in the code.

Variables
^^^^^^^^^
FileTrees can have placeholders for variables such as subject id:

::

    {subject}
        T1w.nii.gz
        {hemi}_pial.surf.gii (pial)

Any part of the directory or file names contained within curly brackets will have to be filled when getting the path:

.. code-block:: python

    >>> tree = FileTree.read(<tree filename>, subject='A')
    >>> tree.get('T1w')
    'A/T1w.nii.gz
    >>> B_tree = tree.update(subject='B')
    >>> B_tree.get('T1w')
    'B/T1w.nii.gz
    >>> tree.get('pial')  # note that pial was explicitly set as the short name in the file above
    # Raises a MissingVariable error as the hemi variable is not defined

Variables can be either set during initialisation of the FileTree or by :py:func:`filetree.FileTree.update`, which
returns a new `FileTree` rather than updating the existing one.

Finally initial values for the variables can be set in the FileTree itself, for example in

::

    hemi = left

    {subject}
        T1w.nii.gz
        {hemi}_pial.surf.gii (pial)

the variable "hemi" will be "left" unless explicitly set during initialisation or updating of the `FileTree`.

Optional Variables
^^^^^^^^^^^^^^^^^^
Normally having undefined variables will lead to :py:exc:`filetree.MissingVariable` being raised.
This can be avoided by putting these variables in square brackets, indicating that they can simply
be skipped. For example for the FileTree:

::

    {subject}
        [{session}]
            T1w[_{session}].nii.gz (T1w)

.. code-block:: python

    >>> tree = FileTree.read(<tree filename>, subject='A')
    >>> tree.get('T1w')
    'A/T1w.nii.gz'
    >>> tree.update(session='test').get('T1w')
    'A/test/T1w_test.nii.gz'

Note that if any variable within the square brackets is missing, any text within those square brackets is omitted.

133
134
Extensive use of optional variables can be found in the
`FileTree of the BIDS raw data format <https://git.fmrib.ox.ac.uk/fsl/fslpy/blob/master/fsl/utils/filetree/trees/bids_raw.tree>`_.
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153

Sub-trees
^^^^^^^^^
FileTrees can include other FileTrees within their directory structure. For example,

::

    {subject}
        topup
            b0.nii.gz
            ->topup basename=out (topup)
        eddy
            ->eddy (eddy)
            nodif_brain_mask.nii.gz
        Diffusion
            ->Diffusion (diff)
            ->dti (dti)

which might represent a diffusion MRI pipeline, which contains references to the predefined trees for the
154
155
156
157
158
`topup <https://git.fmrib.ox.ac.uk/fsl/fslpy/blob/master/fsl/utils/filetree/trees/topup.tree>`_,
`eddy <https://git.fmrib.ox.ac.uk/fsl/fslpy/blob/master/fsl/utils/filetree/trees/eddy.tree>`_,
`Diffusion <https://git.fmrib.ox.ac.uk/fsl/fslpy/blob/master/fsl/utils/filetree/trees/Diffusion.tree>`_, and
`dti <https://git.fmrib.ox.ac.uk/fsl/fslpy/blob/master/fsl/utils/filetree/trees/dti.tree>`_
FileTrees describing the input/output of various FSL tools.
159
160
161
162
163
164
165
166
167
168
169
170
171
172

The general format of this is:
``-><tree name> [<variable in sub-tree>=<value>, ...] (<sub-tree short name)``

The filenames defined in the sub-trees can be accessed using a "/" in the short name:

.. code-block:: python

    >>> tree = FileTree.read(<tree filename>, subject='A')
    >>> tree.get('dti/FA')
    'A/Diffusion/dti_FA.nii.gz'
    >>> tree.get('topup/fieldcoef')
    'A/topup/out_fielcoef.nii.gz

173
174
175
176
Extensive use of sub-trees can be found in
`the FileTree of the HCP pre-processed directory structure <https://git.fmrib.ox.ac.uk/fsl/fslpy/blob/master/fsl/utils/filetree/trees/HCP_directory.tree>`_,
which amongst others refers to
`the HCP surface directory format FileTree <https://git.fmrib.ox.ac.uk/fsl/fslpy/blob/master/fsl/utils/filetree/trees/HCP_Surface.tree>`_.
177
178
179

Example pipeline
----------------
Evan Edmond's avatar
Evan Edmond committed
180
A very simple pipeline to run BET on every subject can start with a FileTree like
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
::

    {subject}
        T1w.nii.gz
        T1w_brain.nii.gz (bet_output)
        T1w_brain_mask.nii.gz (bet_mask)


Assuming that the input T1w's already exist, we can then simply run BET for every subject using:

.. code-block:: python

    from fsl.utils.filetree import FileTree
    from fsl.wrappers.bet import bet
    tree = FileTree.read(<tree filename>)
196
197
198

    # Iterates over set of variables that correspond to each T1-weighted image file matching the template
    for T1w_tree in tree.get_all_trees('T1w', glob_vars='all'):
199
200
201
202
        # get retrieves the filenames based on the current set of variables
        # make_dir=True ensures that the output directory containing the "bet_output" actually exists
        bet(input=T1w_tree.get('T1w'), output=T1w_tree.get('bet_output', make_dir=True), mask=True)

Evan Edmond's avatar
Evan Edmond committed
203
204
205
206
207
208
Useful tips
-----------

Changing directory structure
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

209
210
211
212
213
214
215
If later on in our input files change, because for some subjects we added a second session, we could keep our script
and simply update the FileTree:
::

    {subject}
        [ses-{session}]
            T1w.nii.gz
216
217
            T1w_brain.nii.gz (bet_output)
            T1w_brain_mask.nii.gz (bet_mask)
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238

Note the square brackets around the session sub-directory. This indicates that this sub-directory is optional and
will only be present if the "session" variable is defined (see `Optional variables`_).

This means that with the script run with this updated tree will run bet on each T1-weighted image even for a directory
structure like:
::

    subjectA/
        T1w.nii.gz
    subjectB/
        ses-01/
            T1w.nii.gz
        ses-02/
            T1w.nii.gz

If we get told off that our script is writing the output to the same directory as our input data,
altering this behaviour is again as simple as altering the FileTree to something like:
::

    raw_data
239
240
        {subject} (input_subject_dir)
            [ses-{session}] (input_session_dir)
241
242
                T1w.nii.gz
    processed_data
243
244
        {subject} (output_subject_dir)
            [ses-{session}] (output_session_dir)
245
246
247
248
249
                bet
                    {subject}[_{session}]_T1w_brain.nii.gz (bet_output)
                    {subject}[_{session}]_T1w_brain_mask.nii.gz (bet_mask)

Note that we also encoded the subject and session ID in the output filename.
250
251
252
253
We also have to explicitly assign short names to the subject and session directories,
even though we don't explicitly reference these in the script.
The reason for this is that each directory and filename template must have a unique short name and
in this case the default short names (respectively, "{subject}" and "[ses-{session}]") would not have been unique.
254

Evan Edmond's avatar
Evan Edmond committed
255
256
257
Output "basenames"
^^^^^^^^^^^^^^^^^^

258
259
260
261
262
263
Some tools like FSL's FAST produce many output files. Rather than entering all
of these files in our FileTree by hand you can include them all at once by including `Sub-trees`_:

::

    raw_data
264
265
        {subject} (input_subject_dir)
            [ses-{session}] (input_session_dir)
266
267
                T1w.nii.gz
    processed_data
268
269
        {subject} (output_subject_dir)
            [ses-{session}] (output_session_dir)
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
                bet
                    {subject}[_{session}]_T1w_brain.nii.gz (bet_output)
                    {subject}[_{session}]_T1w_brain_mask.nii.gz (bet_mask)
                fast
                    ->fast basename={subject}[_{session}] (segment)

Here we chose to set the "basename" of the FAST output to a combination of the subject and if available session ID.

Within the script we can generate the fast output by running

.. code-block:: python

    from fsl.wrappers.fast import fast
    fast(imgs=[T1w_tree.get('T1w')], out=T1w_tree.get('segment/basename'))

The output files will be available as `T1w_tree.get('segment/<variable name>')`, where `<variable name>` is one
of the short variable names defined in the
`FAST FileTree <https://git.fmrib.ox.ac.uk/fsl/fslpy/blob/master/fsl/utils/filetree/trees/fast.tree>`_.
Evan Edmond's avatar
Evan Edmond committed
288
289
290
291

Running a pipeline on a subset of participants/sessions/runs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

292
Suppose you want to run your pipeline on a subset of your data while testing.
Evan Edmond's avatar
Evan Edmond committed
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
You may want to do this if your data has a a hierarchy of variables (e.g. participant, session, run) as in the example below.

::

    sub-001
        ses-01
            sub-001_ses-01_run-1.feat
            sub-001_ses-01_run-2.feat
        ses-02
            sub-{participant}_ses-{session}_run-{run}.feat (feat_dir)
            ...
    sub-002
    sub-003
    ...

You can update the FileTree with one or more variables before calling `get_all_trees` as follows:

.. code-block:: python

312
313
314
    for participant in ("001", "002"):
        for t in tree.update(participant=participant, run="1").get_all_trees("feat_dir", glob_vars="all"):
            my_pipeline(t)
Evan Edmond's avatar
Evan Edmond committed
315

316
This code will iterate over all sessions that have a run="1" for participants "001" and "002".
317
"""
Michiel Cottaar's avatar
Michiel Cottaar committed
318
319
320
321

__author__ = 'Michiel Cottaar <Michiel.Cottaar@ndcn.ox.ac.uk>'

from .filetree import FileTree, register_tree, MissingVariable
322
from .parse import tree_directories, list_all_trees
323
from .query import FileTreeQuery
324
325
326
327

import fsl.utils.deprecated as deprecated

deprecated.warn('fsl.utils.filetree',
Paul McCarthy's avatar
Paul McCarthy committed
328
                stacklevel=2,
329
330
331
332
333
                vin='3.6.0',
                rin='4.0.0',
                msg='The filetree package is now released as a separate '
                    'Python library ("file-tree" on PyPi), and will be '
                    'removed in a future version of fslpy.')