08_scripts.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Callable scripts in python\n",
    "\n",
    "In this tutorial we will cover how to write simple stand-alone scripts in\n",
    "python that can be used as alternatives to bash scripts.\n",
    "\n",
    "**Important**: Throughout this series of practicals we have been working\n",
    "entirely within the Jupyter notebook environment. But it's now time to\n",
    "graduate to writing *real* Python  scripts, and running them within a\n",
    "*real* enviromnent.\n",
    "\n",
    "So within this practical there are some code blocks, but instead of running\n",
    "them inside the notebook we **strongly recommend that you write the code in\n",
    "an IDE or editor**,and then run the scripts from a terminal.  [Don't\n",
    "panic](https://www.youtube.com/watch?v=KojYatpLPSE), we're right here,\n",
    "ready to help.\n",
    "\n",
    "## Contents\n",
    "\n",
    "* [Basic script](#basic-script)\n",
    "* [Calling other executables](#calling-other-executables)\n",
    "* [Command line arguments](#command-line-arguments)\n",
    "* [Example script](#example-script)\n",
    "* [Exercise](#exercise)\n",
    "\n",
    "---\n",
    "\n",
    "<a class=\"anchor\" id=\"basic-script\"></a>\n",
    "## Basic script\n",
    "\n",
    "The first line of a python script is usually:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#!/usr/bin/env python"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "which invokes whichever version of python can be found by `/usr/bin/env` since python can be located in many different places.\n",
    "\n",
    "For FSL scripts we use an alternative, to ensure that we pick up the version of python (and associated packages) that we ship with FSL.  To do this we use the line:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#!/usr/bin/env fslpython"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After this line the rest of the file just uses regular python syntax, as in the other tutorials.  Make sure you make the file executable - just like a bash script.\n",
    "\n",
    "<a class=\"anchor\" id=\"calling-other-executables\"></a>\n",
    "## Calling other executables\n",
    "\n",
    "The most essential call that you need to use to replicate the way a bash script calls executables is `subprocess.run()`.  A simple call looks like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import subprocess as sp\n",
    "import shlex\n",
    "sp.run(['ls', '-la'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> Passing the arguments as a list is good practice and improves the safety of\n",
    "> the call.\n",
    "\n",
    "To suppress the output do this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "spobj = sp.run(['ls'], stdout = sp.PIPE)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To store the output do this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "spobj = sp.run(shlex.split('ls -la'), stdout = sp.PIPE)\n",
    "sout = spobj.stdout.decode('utf-8')\n",
    "print(sout)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> shlex.split and shlex.quote are functions designed to break up and quote\n",
    "> (respectively) shell command lines and arguments. Quoting of user provided\n",
    "> arguments helps to prevent unintended consequences from inappropriate inputs.\n",
    ">\n",
    "> Note that the `decode` call in the middle line converts the string from a byte\n",
    "> string to a normal string. In Python 3 there is a distinction between strings\n",
    "> (sequences of characters, possibly using multiple bytes to store each\n",
    "> character) and bytes (sequences of bytes). The world has moved on from ASCII,\n",
    "> so in this day and age, this distinction is absolutely necessary, and Python\n",
    "> does a fairly good job of it.\n",
    "\n",
    "If the output is numerical then this can be extracted like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "fsldir = os.getenv('FSLDIR')\n",
    "spobj = sp.run([fsldir+'/bin/fslstats', fsldir+'/data/standard/MNI152_T1_1mm_brain', '-V'], stdout = sp.PIPE)\n",
    "sout = spobj.stdout.decode('utf-8')\n",
    "results = sout.split()\n",
    "vol_vox = float(results[0])\n",
    "vol_mm = float(results[1])\n",
    "print('Volumes are: ', vol_vox, ' in voxels and ', vol_mm, ' in mm')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An alternative way to run a set of commands would be like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import shlex\n",
    "commands = \"\"\"\n",
    "{fsldir}/bin/fslmaths {t1} -bin {t1_mask}\n",
    "{fsldir}/bin/fslmaths {t2} -mas {t1_mask} {t2_masked}\n",
    "\"\"\"\n",
    "\n",
    "fsldirpath = os.getenv('FSLDIR')\n",
    "commands = commands.format(t1 = 't1.nii.gz', t1_mask = 't1_mask', t2 = 't2', t2_masked = 't2_masked', fsldir = fsldirpath)\n",
    "\n",
    "sout=[]\n",
    "for cmd in commands.splitlines():\n",
    "    if cmd:   # avoids empty strings getting passed to sp.run()\n",
    "        print('Running command: ', cmd)\n",
    "        spobj = sp.run(shlex.split(cmd), stdout = sp.PIPE)\n",
    "        sout.append(spobj.stdout.decode('utf-8'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> Don't be tempted to use the shell=True argument to subprocess.run, especially\n",
    "> if you are dealing with user input - if the user gave\n",
    "> *myfile; rm -f ~*\n",
    "> as a file name and you called the command with shell=True **and** you\n",
    "> passed the command in as a string then bad things happen!\n",
    ">\n",
    "> The safe way to use these kinds of inputs is to pass them through shlex.quote()\n",
    "> before sending.\n",
    ">\n",
    "> ```a = shlex.quote('myfile; rm -f ~')\n",
    "> cmd = \"ls {}\".format(a)\n",
    "> sp.run(shlex.split(cmd))```\n",
    "\n",
    "\n",
    "> If you're calling lots of FSL tools, the `fslpy` library has a number of\n",
    "> *wrapper* functions, which can be used to call an FSL command directly\n",
    "> from Python - check out `advanced_topics/08_fslpy.ipynb`.\n",
    "\n",
    "<a class=\"anchor\" id=\"command-line-arguments\"></a>\n",
    "## Command line arguments\n",
    "\n",
    "The simplest way of dealing with command line arguments is use the module `sys`, which gives access to an `argv` list:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "print(len(sys.argv))\n",
    "print(sys.argv[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For more sophisticated argument parsing you can use `argparse` -  good documentation and examples of this can be found on the web.\n",
    "\n",
    "> argparse can automatically produce help text for the user, validate input etc., so it is strongly recommended.\n",
    "\n",
    "---\n",
    "\n",
    "<a class=\"anchor\" id=\"example-script\"></a>\n",
    "## Example script\n",
    "\n",
    "Here is a simple bash script (it masks an image and calculates volumes - just as a random example). DO NOT execute the code blocks here within the notebook/webpage:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#!/bin/bash\n",
    "if [ $# -lt 2 ] ; then\n",
    "  echo \"Usage: $0 <input image> <output image>\"\n",
    "  exit 1\n",
    "fi\n",
    "infile=$1\n",
    "outfile=$2\n",
    "# mask input image with MNI\n",
    "$FSLDIR/bin/fslmaths $infile -mas $FSLDIR/data/standard/MNI152_T1_1mm_brain $outfile\n",
    "# calculate volumes of masked image\n",
    "vv=`$FSLDIR/bin/fslstats $outfile -V`\n",
    "vol_vox=`echo $vv | awk '{ print $1 }'`\n",
    "vol_mm=`echo $vv | awk '{ print $2 }'`\n",
    "echo \"Volumes are: $vol_vox in voxels and $vol_mm in mm\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And an alternative in python:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#!/usr/bin/env fslpython\n",
    "import os, sys\n",
    "import subprocess as sp\n",
    "fsldir=os.getenv('FSLDIR')\n",
    "if len(sys.argv)<2:\n",
    "  print('Usage: ', sys.argv[0], ' <input image> <output image>')\n",
    "  sys.exit(1)\n",
    "infile = sys.argv[1]\n",
    "outfile = sys.argv[2]\n",
    "# mask input image with MNI\n",
    "spobj = sp.run([fsldir+'/bin/fslmaths', infile, '-mas', fsldir+'/data/standard/MNI152_T1_1mm_brain', outfile], stdout = sp.PIPE)\n",
    "# calculate volumes of masked image\n",
    "spobj = sp.run([fsldir+'/bin/fslstats', outfile, '-V'], stdout = sp.PIPE)\n",
    "sout = spobj.stdout.decode('utf-8')\n",
    "results = sout.split()\n",
    "vol_vox = float(results[0])\n",
    "vol_mm = float(results[1])\n",
    "print('Volumes are: ', vol_vox, ' in voxels and ', vol_mm, ' in mm')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "<a class=\"anchor\" id=\"exercise\"></a>\n",
    "## Exercise\n",
    "\n",
    "Write a simple version of fslstats that is able to calculate either a\n",
    "mean or a _sum_ (and hence can do something that fslstats cannot!)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Don't write anything here - do it in a standalone script!"
   ]
  }
 ],
 "metadata": {},
 "nbformat": 4,
 "nbformat_minor": 2
}