From 65d2deb779df6bd07bced054fc4a8de1677dd162 Mon Sep 17 00:00:00 2001 From: Paul McCarthy <pauldmccarthy@gmail.com> Date: Thu, 15 Feb 2018 10:48:16 +0000 Subject: [PATCH] Adjustments to file management practical --- getting_started/03_file_management.ipynb | 156 ++++++++----------- getting_started/03_file_management.md | 158 +++++++++----------- getting_started/03_file_management_extra.md | 62 ++++++++ 3 files changed, 196 insertions(+), 180 deletions(-) create mode 100644 getting_started/03_file_management_extra.md diff --git a/getting_started/03_file_management.ipynb b/getting_started/03_file_management.ipynb index d1ee18a..b9d93b6 100644 --- a/getting_started/03_file_management.ipynb +++ b/getting_started/03_file_management.ipynb @@ -58,10 +58,6 @@ "* [Exercises](#exercises)\n", " * [Re-name subject directories](#re-name-subject-directories)\n", " * [Re-organise a data set](#re-organise-a-data-set)\n", - " * [Re-name subject files](#re-name-subject-files)\n", - " * [Compress all uncompressed images](#compress-all-uncompressed-images)\n", - " * [Write your own `os.path.splitext`](#write-your-own-os-path-splitext)\n", - " * [Write a function to return a specific image file](#write-a-function-to-return-a-specific-image-file)\n", " * [Solutions](#solutions)\n", "\n", "\n", @@ -161,13 +157,13 @@ "cwd = os.getcwd()\n", "listing = os.listdir(cwd)\n", "print('Directory listing: {}'.format(cwd))\n", - "print('\\n'.join([p for p in listing]))\n", + "print('\\n'.join(listing))\n", "print()\n", "\n", "datadir = 'raw_mri_data'\n", "listing = os.listdir(datadir)\n", "print('Directory listing: {}'.format(datadir))\n", - "print('\\n'.join([p for p in listing]))" + "print('\\n'.join(listing))" ] }, { @@ -326,7 +322,7 @@ "source": [ "> Here we have explicitly named the `topdown` argument when passing it to the\n", "> `os.walk` function. This is referred to as a a _keyword argument_ - unnamed\n", - "> arguments aqe referred to as _positional arguments_. We'll give some more\n", + "> arguments are referred to as _positional arguments_. We'll give some more\n", "> examples of positional and keyword arguments below.\n", "\n", "\n", @@ -447,11 +443,6 @@ "metadata": {}, "outputs": [], "source": [ - "# This function takes an optional keyword\n", - "# argument \"existonly\", which controls\n", - "# whether the path is only tested for\n", - "# existence. We can call it either with\n", - "# or without this argument.\n", "def whatisit(path, existonly=False):\n", "\n", " print('Does {} exist? {}'.format(path, op.exists(path)))\n", @@ -465,11 +456,53 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Now let's use that function to test some paths.\n", + "> This is the first time in this series of practicals that we have defined our\n", + "> own function, [hooray!](https://www.youtube.com/watch?v=zQiibNVIvK4) All\n", + "> function definitions in Python begin with the `def` keyword:\n", + ">\n", + "> ```\n", + "> def myfunction():\n", + "> function_body\n", + "> ```\n", + ">\n", + "> Just like with other control flow tools, such as `if`, `for`, and `while`\n", + "> statements, the body of a function must be indented (with four spaces\n", + "> please!).\n", + ">\n", + "> Python functions can be written to accept any number of arguments:\n", + ">\n", + "> ```\n", + "> def myfunction(arg1, arg2, arg3):\n", + "> function_body\n", + "> ```\n", + ">\n", + "> Arguments can also be given default values:\n", + ">\n", + "> ```\n", + "> def myfunction(arg1, arg2, arg3=False):\n", + "> function_body\n", + "> ```\n", + ">\n", + "> In our `whatisit` function above, we gave the `existonly` argument (which\n", + "> controls whether the path is only tested for existence) a default value.\n", + "> This makes the `existonly` argument optional - we can call `whatisit` either\n", + "> with or without this argument.\n", + ">\n", + "> To return a value from a function, use the `return` keyword:\n", + ">\n", + "> ```\n", + "> def add(n1, n2):\n", + "> return n1 + n2\n", + "> ```\n", + ">\n", + "> Take a look at the [official Python\n", + "> tutorial](https://docs.python.org/3.5/tutorial/controlflow.html#defining-functions)\n", + "> for more details on defining your own functions.\n", "\n", "\n", - "> Here we are using the `op.join` function to construct paths - it is [covered\n", - "> below](#cross-platform-compatbility)." + "Now let's use that function to test some paths. Here we are using the\n", + "`op.join` function to construct paths - it is [covered\n", + "below](#cross-platform-compatbility):" ] }, { @@ -639,18 +672,18 @@ "metadata": {}, "outputs": [], "source": [ - "import glob\n", + "from glob import glob\n", "\n", "root = 'raw_mri_data'\n", "\n", "# find all niftis for subject 1\n", - "images = glob.glob(op.join(root, 'subj_1', '*.nii*'))\n", + "images = glob(op.join(root, 'subj_1', '*.nii*'))\n", "\n", "print('Subject #1 images:')\n", "print('\\n'.join([' {}'.format(i) for i in images]))\n", "\n", "# find all subject directories\n", - "subjdirs = glob.glob(op.join(root, 'subj_*'))\n", + "subjdirs = glob(op.join(root, 'subj_*'))\n", "\n", "print('Subject directories:')\n", "print('\\n'.join([' {}'.format(d) for d in subjdirs]))" @@ -661,7 +694,7 @@ "metadata": {}, "source": [ "As with [`os.walk`](walking-a-directory-tree), the order of the results\n", - "returned by `glob.glob` is arbitrary. Unfortunately the undergraduate who\n", + "returned by `glob` is arbitrary. Unfortunately the undergraduate who\n", "acquired this specific data set did not think to use zero-padded subject IDs\n", "(you'll be pleased to know that this student was immediately kicked out of his\n", "college and banned from ever returning), so we can't simply sort the paths\n", @@ -734,7 +767,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "As of Python 3.5, `glob.glob` also supports recursive pattern matching via the\n", + "> Note that in Python, we can pass a function around just like any other\n", + "> variable - we passed the `get_subject_id` function as an argument to the\n", + "> `sorted` function. This is possible (and normal) because functions are\n", + "> [first class citizens](https://en.wikipedia.org/wiki/First-class_citizen) in\n", + "> Python!\n", + "\n", + "\n", + "As of Python 3.5, `glob` also supports recursive pattern matching via the\n", "`recursive` flag. Let's say we want a list of all resting-state scans in our\n", "data set:" ] @@ -745,7 +785,7 @@ "metadata": {}, "outputs": [], "source": [ - "rscans = glob.glob('raw_mri_data/**/rest.nii.gz', recursive=True)\n", + "rscans = glob('raw_mri_data/**/rest.nii.gz', recursive=True)\n", "\n", "print('Resting state scans:')\n", "print('\\n'.join(rscans))" @@ -774,7 +814,7 @@ "metadata": {}, "outputs": [], "source": [ - "allimages = glob.glob(op.join('raw_mri_data', '**', '*.nii*'), recursive=True)\n", + "allimages = glob(op.join('raw_mri_data', '**', '*.nii*'), recursive=True)\n", "print('All images in experiment:')\n", "\n", "# Let's just print the first and last few\n", @@ -844,8 +884,8 @@ "\n", "You have [already been\n", "introduced](#querying-and-changing-the-current-directory) to the\n", - "`op.expanduser` function. Another handy function is the `op.expandvars` function.\n", - "which will expand expand any environment variables in a path:" + "`op.expanduser` function. Another handy function is the `op.expandvars`\n", + "function, which will expand expand any environment variables in a path:" ] }, { @@ -942,72 +982,8 @@ " - A list of lists, with each list containing the subject IDs for one group.\n", "\n", "\n", - "<a class=\"anchor\" id=\"re-name-subject-files\"></a>\n", - "### Re-name subject files\n", - "\n", - "\n", - "Write a function which, given a subject directory, renames all of the image\n", - "files for this subject so that they are prefixed with `[group]_subj_[id]`,\n", - "where `[group]` is either `CON` or `PAT`, and `[id]` is the (zero-padded)\n", - "subject ID.\n", - "\n", - "\n", - "This function should accept the following parameters:\n", - " - The subject directory\n", - " - The subject group\n", - "\n", - "\n", - "**Bonus 1** Make your function work with both `.nii` and `.nii.gz` files.\n", - "\n", - "**Bonus 2** If you completed [the previous exercise](#re-organise-a-data-set),\n", - "write a second function which accepts the data set directory as a sole\n", - "parameter, and then calls the first function for every subject.\n", - "\n", - "\n", - "<a class=\"anchor\" id=\"compress-all-uncompressed-images\"></a>\n", - "### Compress all uncompressed images\n", - "\n", - "\n", - "Write a function which recursively scans a directory, and replaces all `.nii`\n", - "files with `.nii.gz` files, using the built-in\n", - "[`gzip`](https://docs.python.org/3.5/library/gzip.html) library to perform\n", - "the compression.\n", - "\n", - "\n", - "<a class=\"anchor\" id=\"write-your-own-os-path-splitext\"></a>\n", - "### Write your own `os.path.splitext`\n", - "\n", - "\n", - "Write an implementation of `os.path.splitext` which works with compressed or\n", - "uncompressed NIFTI images.\n", - "\n", - "\n", - "> Hint: you know what suffixes to expect!\n", - "\n", - "\n", - "<a class=\"anchor\" id=\"write-a-function-to-return-a-specific-image-file\"></a>\n", - "### Write a function to return a specific image file\n", - "\n", - "\n", - "Assuming that you have completed the previous exercises, and re-organised\n", - "`raw_mri_data` so that it has the structure:\n", - "\n", - " `raw_mri_data/[group]/subj_[id]/[group]_subj_[id]_[modality].nii.gz`\n", - "\n", - "write a function which is given:\n", - "\n", - " - the data set directory\n", - " - a group label\n", - " - integer ubject ID\n", - " - modality (`'t1'`, `'t2'`, `'task'`, `'rest'`)\n", - "\n", - "and which returns the fully resolved path to the relevant image file.\n", - "\n", - " > Hint: Python has [regular\n", - " expressions](https://docs.python.org/3.5/library/re.html) - you might want\n", - " to use one to cope with zero-padding.\n", - "\n", - "**Bonus** Modify the function so the group label does not need to be passed in.\n", + "__Extra exercises:__ If you are looking for something more to do, you can find\n", + "some more exercises in the file `03_file_management_extra.md`.\n", "\n", "\n", "<a class=\"anchor\" id=\"solutions\"></a>\n", diff --git a/getting_started/03_file_management.md b/getting_started/03_file_management.md index a3bfea0..73527ca 100644 --- a/getting_started/03_file_management.md +++ b/getting_started/03_file_management.md @@ -52,10 +52,6 @@ other sections as a reference. You might miss out on some neat tricks though. * [Exercises](#exercises) * [Re-name subject directories](#re-name-subject-directories) * [Re-organise a data set](#re-organise-a-data-set) - * [Re-name subject files](#re-name-subject-files) - * [Compress all uncompressed images](#compress-all-uncompressed-images) - * [Write your own `os.path.splitext`](#write-your-own-os-path-splitext) - * [Write a function to return a specific image file](#write-a-function-to-return-a-specific-image-file) * [Solutions](#solutions) @@ -126,13 +122,13 @@ command): cwd = os.getcwd() listing = os.listdir(cwd) print('Directory listing: {}'.format(cwd)) -print('\n'.join([p for p in listing])) +print('\n'.join(listing)) print() datadir = 'raw_mri_data' listing = os.listdir(datadir) print('Directory listing: {}'.format(datadir)) -print('\n'.join([p for p in listing])) +print('\n'.join(listing)) ``` @@ -248,7 +244,7 @@ for root, dirs, files in os.walk('raw_mri_data', topdown=False): > Here we have explicitly named the `topdown` argument when passing it to the > `os.walk` function. This is referred to as a a _keyword argument_ - unnamed -> arguments aqe referred to as _positional arguments_. We'll give some more +> arguments are referred to as _positional arguments_. We'll give some more > examples of positional and keyword arguments below. @@ -338,12 +334,8 @@ it exists at all, then the `os.path` module has got your back with its `isfile`, `isdir`, and `exists` functions. Let's define a silly function which will tell us what a path is: + ``` -# This function takes an optional keyword -# argument "existonly", which controls -# whether the path is only tested for -# existence. We can call it either with -# or without this argument. def whatisit(path, existonly=False): print('Does {} exist? {}'.format(path, op.exists(path))) @@ -354,11 +346,53 @@ def whatisit(path, existonly=False): ``` -Now let's use that function to test some paths. +> This is the first time in this series of practicals that we have defined our +> own function, [hooray!](https://www.youtube.com/watch?v=zQiibNVIvK4) All +> function definitions in Python begin with the `def` keyword: +> +> ``` +> def myfunction(): +> function_body +> ``` +> +> Just like with other control flow tools, such as `if`, `for`, and `while` +> statements, the body of a function must be indented (with four spaces +> please!). +> +> Python functions can be written to accept any number of arguments: +> +> ``` +> def myfunction(arg1, arg2, arg3): +> function_body +> ``` +> +> Arguments can also be given default values: +> +> ``` +> def myfunction(arg1, arg2, arg3=False): +> function_body +> ``` +> +> In our `whatisit` function above, we gave the `existonly` argument (which +> controls whether the path is only tested for existence) a default value. +> This makes the `existonly` argument optional - we can call `whatisit` either +> with or without this argument. +> +> To return a value from a function, use the `return` keyword: +> +> ``` +> def add(n1, n2): +> return n1 + n2 +> ``` +> +> Take a look at the [official Python +> tutorial](https://docs.python.org/3.5/tutorial/controlflow.html#defining-functions) +> for more details on defining your own functions. -> Here we are using the `op.join` function to construct paths - it is [covered -> below](#cross-platform-compatbility). +Now let's use that function to test some paths. Here we are using the +`op.join` function to construct paths - it is [covered +below](#cross-platform-compatbility): ``` @@ -483,18 +517,18 @@ files, based on unix-style wildcard pattern matching. ``` -import glob +from glob import glob root = 'raw_mri_data' # find all niftis for subject 1 -images = glob.glob(op.join(root, 'subj_1', '*.nii*')) +images = glob(op.join(root, 'subj_1', '*.nii*')) print('Subject #1 images:') print('\n'.join([' {}'.format(i) for i in images])) # find all subject directories -subjdirs = glob.glob(op.join(root, 'subj_*')) +subjdirs = glob(op.join(root, 'subj_*')) print('Subject directories:') print('\n'.join([' {}'.format(d) for d in subjdirs])) @@ -502,7 +536,7 @@ print('\n'.join([' {}'.format(d) for d in subjdirs])) As with [`os.walk`](walking-a-directory-tree), the order of the results -returned by `glob.glob` is arbitrary. Unfortunately the undergraduate who +returned by `glob` is arbitrary. Unfortunately the undergraduate who acquired this specific data set did not think to use zero-padded subject IDs (you'll be pleased to know that this student was immediately kicked out of his college and banned from ever returning), so we can't simply sort the paths @@ -550,13 +584,21 @@ print('Subject directories, sorted by ID:') print('\n'.join([' {}'.format(d) for d in subjdirs])) ``` -As of Python 3.5, `glob.glob` also supports recursive pattern matching via the + +> Note that in Python, we can pass a function around just like any other +> variable - we passed the `get_subject_id` function as an argument to the +> `sorted` function. This is possible (and normal) because functions are +> [first class citizens](https://en.wikipedia.org/wiki/First-class_citizen) in +> Python! + + +As of Python 3.5, `glob` also supports recursive pattern matching via the `recursive` flag. Let's say we want a list of all resting-state scans in our data set: ``` -rscans = glob.glob('raw_mri_data/**/rest.nii.gz', recursive=True) +rscans = glob('raw_mri_data/**/rest.nii.gz', recursive=True) print('Resting state scans:') print('\n'.join(rscans)) @@ -576,7 +618,7 @@ pattern matching logic. For example, let's retrieve all images that are in our data set: ``` -allimages = glob.glob(op.join('raw_mri_data', '**', '*.nii*'), recursive=True) +allimages = glob(op.join('raw_mri_data', '**', '*.nii*'), recursive=True) print('All images in experiment:') # Let's just print the first and last few @@ -627,8 +669,8 @@ for filename in uncompressed: You have [already been introduced](#querying-and-changing-the-current-directory) to the -`op.expanduser` function. Another handy function is the `op.expandvars` function. -which will expand expand any environment variables in a path: +`op.expanduser` function. Another handy function is the `op.expandvars` +function, which will expand expand any environment variables in a path: ``` @@ -709,72 +751,8 @@ parameters: - A list of lists, with each list containing the subject IDs for one group. -<a class="anchor" id="re-name-subject-files"></a> -### Re-name subject files - - -Write a function which, given a subject directory, renames all of the image -files for this subject so that they are prefixed with `[group]_subj_[id]`, -where `[group]` is either `CON` or `PAT`, and `[id]` is the (zero-padded) -subject ID. - - -This function should accept the following parameters: - - The subject directory - - The subject group - - -**Bonus 1** Make your function work with both `.nii` and `.nii.gz` files. - -**Bonus 2** If you completed [the previous exercise](#re-organise-a-data-set), -write a second function which accepts the data set directory as a sole -parameter, and then calls the first function for every subject. - - -<a class="anchor" id="compress-all-uncompressed-images"></a> -### Compress all uncompressed images - - -Write a function which recursively scans a directory, and replaces all `.nii` -files with `.nii.gz` files, using the built-in -[`gzip`](https://docs.python.org/3.5/library/gzip.html) library to perform -the compression. - - -<a class="anchor" id="write-your-own-os-path-splitext"></a> -### Write your own `os.path.splitext` - - -Write an implementation of `os.path.splitext` which works with compressed or -uncompressed NIFTI images. - - -> Hint: you know what suffixes to expect! - - -<a class="anchor" id="write-a-function-to-return-a-specific-image-file"></a> -### Write a function to return a specific image file - - -Assuming that you have completed the previous exercises, and re-organised -`raw_mri_data` so that it has the structure: - - `raw_mri_data/[group]/subj_[id]/[group]_subj_[id]_[modality].nii.gz` - -write a function which is given: - - - the data set directory - - a group label - - integer ubject ID - - modality (`'t1'`, `'t2'`, `'task'`, `'rest'`) - -and which returns the fully resolved path to the relevant image file. - - > Hint: Python has [regular - expressions](https://docs.python.org/3.5/library/re.html) - you might want - to use one to cope with zero-padding. - -**Bonus** Modify the function so the group label does not need to be passed in. +__Extra exercises:__ If you are looking for something more to do, you can find +some more exercises in the file `03_file_management_extra.md`. <a class="anchor" id="solutions"></a> diff --git a/getting_started/03_file_management_extra.md b/getting_started/03_file_management_extra.md new file mode 100644 index 0000000..a56a9fb --- /dev/null +++ b/getting_started/03_file_management_extra.md @@ -0,0 +1,62 @@ +### Re-name subject files + + +Write a function which, given a subject directory, renames all of the image +files for this subject so that they are prefixed with `[group]_subj_[id]`, +where `[group]` is either `CON` or `PAT`, and `[id]` is the (zero-padded) +subject ID. + + +This function should accept the following parameters: + - The subject directory + - The subject group + + +**Bonus 1** Make your function work with both `.nii` and `.nii.gz` files. + +**Bonus 2** If you completed [the previous exercise](#re-organise-a-data-set), +write a second function which accepts the data set directory as a sole +parameter, and then calls the first function for every subject. + + +### Compress all uncompressed images + + +Write a function which recursively scans a directory, and replaces all `.nii` +files with `.nii.gz` files, using the built-in +[`gzip`](https://docs.python.org/3.5/library/gzip.html) library to perform +the compression. + + +### Write your own `os.path.splitext` + + +Write an implementation of `os.path.splitext` which works with compressed or +uncompressed NIFTI images. + + +> Hint: you know what suffixes to expect! + + +### Write a function to return a specific image file + + +Assuming that you have completed the previous exercises, and re-organised +`raw_mri_data` so that it has the structure: + + `raw_mri_data/[group]/subj_[id]/[group]_subj_[id]_[modality].nii.gz` + +write a function which is given: + + - the data set directory + - a group label + - integer ubject ID + - modality (`'t1'`, `'t2'`, `'task'`, `'rest'`) + +and which returns the fully resolved path to the relevant image file. + + > Hint: Python has [regular + expressions](https://docs.python.org/3.5/library/re.html) - you might want + to use one to cope with zero-padding. + +**Bonus** Modify the function so the group label does not need to be passed in. -- GitLab