diff --git a/getting_started/basics.ipynb b/getting_started/basics.ipynb index 9facb873c7f07175e4c0508cfb7b244ce956e607..6a2266f2c052aae71d4339e8accfa75a6ab9d159 100644 --- a/getting_started/basics.ipynb +++ b/getting_started/basics.ipynb @@ -4,6 +4,18 @@ "cell_type": "markdown", "metadata": {}, "source": [ + "# Basic python\n", + "\n", + "This tutorial is aimed at briefly introducing you to the main language\n", + "features of python, with emphasis on some of the common difficulties\n", + "and pitfalls that are commonly encountered when moving to python.\n", + "\n", + "When going through this make sure that you _run_ each code block\n", + "and look at the output, as these are crucial for understanding the\n", + "explanations. You can run each block by using _shift + enter_ (including the text blocks, so you can just move down the document with shift + enter).\n", + "\n", + "---\n", + "\n", "# Basic types\n", "\n", "Python has many different types and variables are dynamic and can change types (like MATLAB). Some of the most commonly used in-built types are:\n", @@ -18,32 +30,52 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 63, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "4\n" + ] + } + ], "source": [ "a = 4\n", "b = 3.6\n", "c = 'abc'\n", - "d = [10,20,30]\n", - "e = {'a' : 10, 'b': 20}" + "d = [10, 20, 30]\n", + "e = {'a' : 10, 'b': 20}\n", + "print(a)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Any variable can be printed using the function `print()`:" + "Any variable or combination of variables can be printed using the function `print()`:" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 64, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[10, 20, 30]\n", + "{'b': 20, 'a': 10}\n", + "4 3.6 abc\n" + ] + } + ], "source": [ "print(d)\n", - "print(e)" + "print(e)\n", + "print(a, b, c)" ] }, { @@ -52,7 +84,7 @@ "source": [ "> _*Python 3 versus python 2*_:\n", ">\n", - "> Print - for the print statement the brackets are compulsory for *python 3*, but are optional in python 2. So you will see plenty of code without the brackets but you should get into the habit of using them.\n", + "> Print - for the print statement the brackets are compulsory for *python 3*, but are optional in python 2. So you will see plenty of code without the brackets but you should never use `print` without brackets, as this is incompatible with Python 3.\n", ">\n", "> Division - in python 3 all division is floating point (like in MATLAB), even if the values are integers, but in python 2 integer division works like it does in C.\n", "\n", @@ -68,12 +100,21 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 65, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "test string :: another test string\n" + ] + } + ], "source": [ - "s1=\"test string\"\n", - "s2='another test string'" + "s1 = \"test string\"\n", + "s2 = 'another test string'\n", + "print(s1, ' :: ', s2)" ] }, { @@ -85,11 +126,22 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 66, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "This is\n", + "a string over\n", + "multiple lines\n", + "\n" + ] + } + ], "source": [ - "s3='''This is\n", + "s3 = '''This is\n", "a string over\n", "multiple lines\n", "'''\n", @@ -100,21 +152,182 @@ "cell_type": "markdown", "metadata": {}, "source": [ + "### Format\n", + "\n", + "More interesting strings can be created using the `format` statement, which is very useful in print statements:" + ] + }, + { + "cell_type": "code", + "execution_count": 67, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "The numerical value is 1 and a name is PyTreat\n", + "A name is PyTreat and a number is 1\n" + ] + } + ], + "source": [ + "x = 1\n", + "y = 'PyTreat'\n", + "s = 'The numerical value is {} and a name is {}'.format(x, y)\n", + "print(s)\n", + "print('A name is {} and a number is {}'.format(y, x))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There are also other options along these lines, but this is the more modern version, although you will see plenty of the other alternatives in old code (i.e., code written before last week). \n", + "\n", + "### String manipulation\n", + "\n", + "The methods `lower()` and `upper()` are useful for strings. For example:" + ] + }, + { + "cell_type": "code", + "execution_count": 68, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "THIS IS A TEST STRING\n", + "this is a test string\n" + ] + } + ], + "source": [ + "s = 'This is a Test String'\n", + "print(s.upper())\n", + "print(s.lower())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Another useful method is:" + ] + }, + { + "cell_type": "code", + "execution_count": 69, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "This is a Better String\n" + ] + } + ], + "source": [ + "s = 'This is a Test String'\n", + "s2 = s.replace('Test', 'Better')\n", + "print(s2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you like regular expressions then you're in luck as these are well supported in python using the `re` module. To use this (like many other \"extensions\" - called _modules_ in Python - you need to `import` it). For example:" + ] + }, + { + "cell_type": "code", + "execution_count": 70, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "This is an example of an example String\n" + ] + } + ], + "source": [ + "import re\n", + "s = 'This is a test of a Test String'\n", + "s1 = re.sub(r'a [Tt]est', \"an example\", s)\n", + "print(s1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "where the `r` before the quote is used to force the regular expression specification to be a `raw string`.\n", + "\n", + "For more information on matching and substitutions, look up the regular expression module on the web.\n", + "\n", + "\n", + "You can also split, or tokenize, a string (to turn it into a list) like this:" + ] + }, + { + "cell_type": "code", + "execution_count": 71, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['This', 'is', 'a', 'test', 'of', 'a', 'Test', 'String']\n" + ] + } + ], + "source": [ + "print(s.split())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> Note that strings in python 3 are _unicode_ so can represent Chinese characters, etc, and is therefore very flexible. However, in general you can just be blissfully ignorant of this fact.\n", + "\n", "---\n", "\n", "## Tuples and Lists\n", "\n", + "Both tuples and lists are builtin python types and are like vectors, \n", + "but for numerical vectors and arrays it is much better to use _numpy_\n", + "arrays (or matrices), which are covered in a later tutorial.\n", + "\n", "A tuple is like a list or a vector, but with less flexibility than a full list, however anything can be stored in either a list or tuple, without any consistency being required. For example:" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 72, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(3, 7.6, 'str')\n", + "[1, 'mj', -5.4]\n" + ] + } + ], "source": [ - "xtuple=(3, 7.6, 'str')\n", - "xlist=[1,'mj',-5.4]" + "xtuple = (3, 7.6, 'str')\n", + "xlist = [1, 'mj', -5.4]\n", + "print(xtuple)\n", + "print(xlist)" ] }, { @@ -126,14 +339,23 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 73, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "x2 is: ((3, 7.6, 'str'), [1, 'mj', -5.4])\n", + "x3 is: [(3, 7.6, 'str'), [1, 'mj', -5.4]]\n" + ] + } + ], "source": [ - "x2=(xtuple,xlist)\n", - "x3=[xtuple,xlist]\n", - "print(x2)\n", - "print(x3)" + "x2 = (xtuple, xlist)\n", + "x3 = [xtuple, xlist]\n", + "print('x2 is: ', x2)\n", + "print('x3 is: ', x3)" ] }, { @@ -147,13 +369,21 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 74, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[10, 20, 30, 70, 80]\n" + ] + } + ], "source": [ - "a = [10,20,30]\n", + "a = [10, 20, 30]\n", "a = a + [70]\n", - "a += [80]\n", + "a += [80]\n", "print(a)" ] }, @@ -161,18 +391,26 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Dereferencing\n", + "### Indexing\n", "\n", - "Square brackets are used to dereference tuples, lists, dictionaries, etc. For example:" + "Square brackets are used to index tuples, lists, dictionaries, etc. For example:" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 75, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "20\n" + ] + } + ], "source": [ - "d = [10,20,30]\n", + "d = [10, 20, 30]\n", "print(d[1])" ] }, @@ -186,11 +424,20 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 76, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "10\n", + "30\n" + ] + } + ], "source": [ - "a=[10,20,30,40,50,60]\n", + "a = [10, 20, 30, 40, 50, 60]\n", "print(a[0])\n", "print(a[2])" ] @@ -204,9 +451,18 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 77, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "60\n", + "10\n" + ] + } + ], "source": [ "print(a[-1])\n", "print(a[-6])" @@ -221,18 +477,42 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 78, "metadata": {}, - "outputs": [], + "outputs": [ + { + "ename": "IndexError", + "evalue": "list index out of range", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mIndexError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m<ipython-input-78-f4cf4536701c>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m7\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mIndexError\u001b[0m: list index out of range" + ] + } + ], "source": [ "print(a[-7])" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 79, "metadata": {}, - "outputs": [], + "outputs": [ + { + "ename": "IndexError", + "evalue": "list index out of range", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mIndexError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m<ipython-input-79-52d95fbe5286>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m6\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mIndexError\u001b[0m: list index out of range" + ] + } + ], "source": [ "print(a[6])" ] @@ -246,9 +526,17 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 80, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "6\n" + ] + } + ], "source": [ "print(len(a))" ] @@ -257,16 +545,25 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Nested lists can have nested dereferences:" + "Nested lists can have nested indexing:" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 81, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "20\n", + "40\n" + ] + } + ], "source": [ - "b=[[10,20,30],[40,50,60]]\n", + "b = [[10, 20, 30], [40, 50, 60]]\n", "print(b[0][1])\n", "print(b[1][0])" ] @@ -275,10 +572,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "but *not* a dereference like b[0,1].\n", + "but *not* an index like b[0, 1].\n", "\n", "> Note that `len` will only give the length of the top level.\n", - "> In general arrays should be preferred to nested lists when the contents are numerical.\n", + "> In general, numpy arrays should be preferred to nested lists when the contents are numerical.\n", "\n", "### Slicing\n", "\n", @@ -287,9 +584,17 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 82, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[10, 20, 30]\n" + ] + } + ], "source": [ "print(a[0:3])" ] @@ -300,16 +605,26 @@ "source": [ "> _*Pitfall:*_\n", ">\n", - "> Slicing syntax is different from MATLAB in that second number is one plus final index - this is in addition to the zero index difference." + "> Slicing syntax is different from MATLAB in that second number is\n", + "> exclusive (i.e., one plus final index) - this is in addition to the zero index difference." ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 83, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[10, 20, 30]\n", + "[20, 30]\n" + ] + } + ], "source": [ - "a=[10,20,30,40,50,60]\n", + "a = [10, 20, 30, 40, 50, 60]\n", "print(a[0:3]) # same as a(1:3) in MATLAB\n", "print(a[1:3]) # same as a(2:3) in MATLAB" ] @@ -320,16 +635,29 @@ "source": [ "> _*Pitfall:*_\n", ">\n", - "> Unlike in MATLAB, you cannot use a list as indices instead of an integer or a slice" + "> Unlike in MATLAB, you cannot use a list as indices instead of an\n", + "> integer or a slice (although these can be done in _numpy_)." ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 84, "metadata": {}, - "outputs": [], + "outputs": [ + { + "ename": "TypeError", + "evalue": "list indices must be integers or slices, not list", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m<ipython-input-84-aad7915ae3d8>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mb\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m4\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mb\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m: list indices must be integers or slices, not list" + ] + } + ], "source": [ - "b=[3,4]\n", + "b = [3, 4]\n", "print(a[b])" ] }, @@ -344,12 +672,20 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 85, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[10, 20, 30, 10, 20, 30, 10, 20, 30, 10, 20, 30]\n" + ] + } + ], "source": [ - "d=[10,20,30]\n", - "print(d*4)" + "d = [10, 20, 30]\n", + "print(d * 4)" ] }, { @@ -361,9 +697,19 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 86, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[10, 20, 30, 40]\n", + "[10, 30, 40]\n", + "[30, 40]\n" + ] + } + ], "source": [ "d.append(40)\n", "print(d)\n", @@ -382,11 +728,21 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 87, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "10\n", + "20\n", + "30\n" + ] + } + ], "source": [ - "d=[10,20,30]\n", + "d = [10, 20, 30]\n", "for x in d:\n", " print(x)" ] @@ -399,14 +755,210 @@ "\n", "### Getting help\n", "\n", - "The function `dir()` can be used to get information about any variable/object/function in python. It lists the possible operations." + "The function `help()` can be used to get information about any variable/object/function in python. It lists the possible operations. In `ipython` you can also just type `?<blah>` or `<blah>?` instead:" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 88, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Help on list object:\n", + "\n", + "class list(object)\n", + " | list() -> new empty list\n", + " | list(iterable) -> new list initialized from iterable's items\n", + " | \n", + " | Methods defined here:\n", + " | \n", + " | __add__(self, value, /)\n", + " | Return self+value.\n", + " | \n", + " | __contains__(self, key, /)\n", + " | Return key in self.\n", + " | \n", + " | __delitem__(self, key, /)\n", + " | Delete self[key].\n", + " | \n", + " | __eq__(self, value, /)\n", + " | Return self==value.\n", + " | \n", + " | __ge__(self, value, /)\n", + " | Return self>=value.\n", + " | \n", + " | __getattribute__(self, name, /)\n", + " | Return getattr(self, name).\n", + " | \n", + " | __getitem__(...)\n", + " | x.__getitem__(y) <==> x[y]\n", + " | \n", + " | __gt__(self, value, /)\n", + " | Return self>value.\n", + " | \n", + " | __iadd__(self, value, /)\n", + " | Implement self+=value.\n", + " | \n", + " | __imul__(self, value, /)\n", + " | Implement self*=value.\n", + " | \n", + " | __init__(self, /, *args, **kwargs)\n", + " | Initialize self. See help(type(self)) for accurate signature.\n", + " | \n", + " | __iter__(self, /)\n", + " | Implement iter(self).\n", + " | \n", + " | __le__(self, value, /)\n", + " | Return self<=value.\n", + " | \n", + " | __len__(self, /)\n", + " | Return len(self).\n", + " | \n", + " | __lt__(self, value, /)\n", + " | Return self<value.\n", + " | \n", + " | __mul__(self, value, /)\n", + " | Return self*value.n\n", + " | \n", + " | __ne__(self, value, /)\n", + " | Return self!=value.\n", + " | \n", + " | __new__(*args, **kwargs) from builtins.type\n", + " | Create and return a new object. See help(type) for accurate signature.\n", + " | \n", + " | __repr__(self, /)\n", + " | Return repr(self).\n", + " | \n", + " | __reversed__(...)\n", + " | L.__reversed__() -- return a reverse iterator over the list\n", + " | \n", + " | __rmul__(self, value, /)\n", + " | Return self*value.\n", + " | \n", + " | __setitem__(self, key, value, /)\n", + " | Set self[key] to value.\n", + " | \n", + " | __sizeof__(...)\n", + " | L.__sizeof__() -- size of L in memory, in bytes\n", + " | \n", + " | append(...)\n", + " | L.append(object) -> None -- append object to end\n", + " | \n", + " | clear(...)\n", + " | L.clear() -> None -- remove all items from L\n", + " | \n", + " | copy(...)\n", + " | L.copy() -> list -- a shallow copy of L\n", + " | \n", + " | count(...)\n", + " | L.count(value) -> integer -- return number of occurrences of value\n", + " | \n", + " | extend(...)\n", + " | L.extend(iterable) -> None -- extend list by appending elements from the iterable\n", + " | \n", + " | index(...)\n", + " | L.index(value, [start, [stop]]) -> integer -- return first index of value.\n", + " | Raises ValueError if the value is not present.\n", + " | \n", + " | insert(...)\n", + " | L.insert(index, object) -- insert object before index\n", + " | \n", + " | pop(...)\n", + " | L.pop([index]) -> item -- remove and return item at index (default last).\n", + " | Raises IndexError if list is empty or index is out of range.\n", + " | \n", + " | remove(...)\n", + " | L.remove(value) -> None -- remove first occurrence of value.\n", + " | Raises ValueError if the value is not present.\n", + " | \n", + " | reverse(...)\n", + " | L.reverse() -- reverse *IN PLACE*\n", + " | \n", + " | sort(...)\n", + " | L.sort(key=None, reverse=False) -> None -- stable sort *IN PLACE*\n", + " | \n", + " | ----------------------------------------------------------------------\n", + " | Data and other attributes defined here:\n", + " | \n", + " | __hash__ = None\n", + "\n" + ] + } + ], + "source": [ + "help(d)" + ] + }, + { + "cell_type": "markdown", "metadata": {}, - "outputs": [], + "source": [ + "There is also a `dir()` function that gives a basic listing of the operations:" + ] + }, + { + "cell_type": "code", + "execution_count": 89, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['__add__',\n", + " '__class__',\n", + " '__contains__',\n", + " '__delattr__',\n", + " '__delitem__',\n", + " '__dir__',\n", + " '__doc__',\n", + " '__eq__',\n", + " '__format__',\n", + " '__ge__',\n", + " '__getattribute__',\n", + " '__getitem__',\n", + " '__gt__',\n", + " '__hash__',\n", + " '__iadd__',\n", + " '__imul__',\n", + " '__init__',\n", + " '__iter__',\n", + " '__le__',\n", + " '__len__',\n", + " '__lt__',\n", + " '__mul__',\n", + " '__ne__',\n", + " '__new__',\n", + " '__reduce__',\n", + " '__reduce_ex__',\n", + " '__repr__',\n", + " '__reversed__',\n", + " '__rmul__',\n", + " '__setattr__',\n", + " '__setitem__',\n", + " '__sizeof__',\n", + " '__str__',\n", + " '__subclasshook__',\n", + " 'append',\n", + " 'clear',\n", + " 'copy',\n", + " 'count',\n", + " 'extend',\n", + " 'index',\n", + " 'insert',\n", + " 'pop',\n", + " 'remove',\n", + " 'reverse',\n", + " 'sort']" + ] + }, + "execution_count": 89, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "dir(d)" ] @@ -426,9 +978,20 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 90, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2\n", + "dict_keys(['b', 'a'])\n", + "dict_values([20, 10])\n", + "10\n" + ] + } + ], "source": [ "e = {'a' : 10, 'b': 20}\n", "print(len(e))\n", @@ -441,7 +1004,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The keys and values can take on any type, even dictionaries! Python is nothing if not flexible. However, each key must be unique.\n", + "The keys and values can take on almost any type, even dictionaries!\n", + "Python is nothing if not flexible. However, each key must be unique\n", + "and the dictionary must be \"hashable\".\n", "\n", "### Adding to a dictionary\n", "\n", @@ -450,11 +1015,19 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 91, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{'c': 555, 'b': 20, 'a': 10}\n" + ] + } + ], "source": [ - "e['c']=555 # just like in Biobank! ;)\n", + "e['c'] = 555 # just like in Biobank! ;)\n", "print(e)" ] }, @@ -469,9 +1042,18 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 92, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{'c': 555, 'a': 10}\n", + "{'a': 10}\n" + ] + } + ], "source": [ "e.pop('b')\n", "print(e)\n", @@ -490,13 +1072,23 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 93, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "('c', 555)\n", + "('b', 20)\n", + "('a', 10)\n" + ] + } + ], "source": [ "e = {'a' : 10, 'b': 20, 'c':555}\n", - "for k,v in e.items():\n", - " print((k,v))" + "for k, v in e.items():\n", + " print((k, v))" ] }, { @@ -510,12 +1102,22 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 94, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "('c', 555)\n", + "('b', 20)\n", + "('a', 10)\n" + ] + } + ], "source": [ "for k in e:\n", - " print((k,e[k]))" + " print((k, e[k]))" ] }, { @@ -523,6 +1125,8 @@ "metadata": {}, "source": [ "> Note that in both cases the order is arbitrary. The `sorted` function can be used if you want keys in a sorted order; e.g. `for k in sorted(e):` ...\n", + ">\n", + "> There are also other options if you want a dictionary with ordering.\n", "\n", "---\n", "\n", @@ -533,9 +1137,17 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 95, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "7\n" + ] + } + ], "source": [ "a = 7\n", "b = a\n", @@ -552,9 +1164,17 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 96, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[8888]\n" + ] + } + ], "source": [ "a = [7]\n", "b = a\n", @@ -571,12 +1191,20 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 97, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[7, 7]\n" + ] + } + ], "source": [ "a = [7]\n", - "b = a*2\n", + "b = a * 2\n", "a[0] = 8888\n", "print(b)" ] @@ -585,17 +1213,25 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "If an explicit copy is necessary then this can be made using the `copy()` function:" + "If an explicit copy is necessary then this can be made using the `list()` constructor:" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 98, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[7]\n" + ] + } + ], "source": [ "a = [7]\n", - "b = a.copy()\n", + "b = list(a)\n", "a[0] = 8888\n", "print(b)" ] @@ -604,14 +1240,55 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "> When writing functions this is something to be particularly careful about." + "There is a constructor for each type and this con be useful for converting between types:" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 99, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(2, 5, 7)\n", + "[2, 5, 7]\n" + ] + } + ], + "source": [ + "xt = (2, 5, 7)\n", + "xl = list(xt)\n", + "print(xt)\n", + "print(xl)" + ] + }, + { + "cell_type": "markdown", "metadata": {}, - "outputs": [], + "source": [ + "> _*Pitfall:*_\n", + ">\n", + "> When writing functions you need to be particularly careful about references and copies." + ] + }, + { + "cell_type": "code", + "execution_count": 100, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[5]\n", + "[5, 10]\n", + "[5, 10]\n", + "[5, 10]\n" + ] + } + ], "source": [ "def foo1(x):\n", " x.append(10)\n", @@ -620,7 +1297,7 @@ "def foo3(x):\n", " return x + [10]\n", "\n", - "a=[5]\n", + "a = [5]\n", "print(a)\n", "foo1(a)\n", "print(a)\n", @@ -638,14 +1315,398 @@ "\n", "## Control flow\n", "\n", - " - boolean operators\n", - " - if/else/for\n", - " - a if condition else b\n", - " - introduce range/enumerate" + "### Boolean operators\n", + "\n", + "There is a boolean type in python that can be `True` or `False` (note the capitals). Other values can also be used for True or False (e.g., 1 for True; 0 or None or [] or {} or \"\") although they are not considered 'equal' in the sense that the operator `==` would consider them the same.\n", + "\n", + "Relevant boolean and comparison operators include: `not`, `and`, `or`, `==` and `!=`\n", + "\n", + "For example:" + ] + }, + { + "cell_type": "code", + "execution_count": 101, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Not a is: False\n", + "Not 1 is: False\n", + "Not 0 is: True\n", + "Not {} is: True\n", + "{}==0 is: False\n" + ] + } + ], + "source": [ + "a = True\n", + "print('Not a is:', not a)\n", + "print('Not 1 is:', not 1)\n", + "print('Not 0 is:', not 0)\n", + "print('Not {} is:', not {})\n", + "print('{}==0 is:', {}==0)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There is also the `in` test for strings, lists, etc:" + ] + }, + { + "cell_type": "code", + "execution_count": 102, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "False\n", + "True\n", + "True\n" + ] + } + ], + "source": [ + "print('the' in 'a number of words')\n", + "print('of' in 'a number of words')\n", + "print(3 in [1, 2, 3, 4])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### If statements\n", + "\n", + "The basic syntax of `if` statements is fairly standard, though don't forget that you _*must*_ indent the statements within the conditional/loop block as this is the way of delineating blocks of code in python. For example:" + ] + }, + { + "cell_type": "code", + "execution_count": 103, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0.9933407850276534\n", + "Positive\n" + ] + } + ], + "source": [ + "import random\n", + "a = random.uniform(-1, 1)\n", + "print(a)\n", + "if a>0:\n", + " print('Positive')\n", + "elif a<0:\n", + " print('Negative')\n", + "else:\n", + " print('Zero')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Or more generally:" + ] + }, + { + "cell_type": "code", + "execution_count": 104, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Variable is true, or at least not empty\n" + ] + } + ], + "source": [ + "a = [] # just one of many examples\n", + "if not a:\n", + " print('Variable is true, or at least not empty')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This can be useful for functions where a variety of possible input types are being dealt with. \n", + "\n", + "---\n", + "\n", + "### For loops\n", + "\n", + "The `for` loop works like in bash:" + ] + }, + { + "cell_type": "code", + "execution_count": 105, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2\n", + "is\n", + "more\n", + "than\n", + "1\n" + ] + } + ], + "source": [ + "for x in [2, 'is', 'more', 'than', 1]:\n", + " print(x)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "where a list or any other sequence (e.g. tuple) can be used.\n", + "\n", + "If you want a numerical range then use:" + ] + }, + { + "cell_type": "code", + "execution_count": 106, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2\n", + "3\n", + "4\n", + "5\n", + "6\n", + "7\n", + "8\n" + ] + } + ], + "source": [ + "for x in range(2, 9):\n", + " print(x)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note that, like slicing, the maximum value is one less than the value specified. Also, `range` actually returns an object that can be iterated over but is not just a list of numbers. If you want a list of numbers then `list(range(2, 9))` will give you this.\n", + "\n", + "A very nice feature of python is that multiple variables can be assigned from a tuple or list:" + ] + }, + { + "cell_type": "code", + "execution_count": 107, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "4\n", + "7\n" + ] + } + ], + "source": [ + "x, y = [4, 7]\n", + "print(x)\n", + "print(y)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "and this can be combined with a function called `zip` to make very convenient dual variable loops:" + ] + }, + { + "cell_type": "code", + "execution_count": 108, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[('Some', 0), ('set', 1), ('of', 2), ('items', 3)]\n", + "0 Some\n", + "1 set\n", + "2 of\n", + "3 items\n" + ] + } + ], + "source": [ + "alist = ['Some', 'set', 'of', 'items']\n", + "blist = list(range(len(alist)))\n", + "print(list(zip(alist, blist)))\n", + "for x, y in zip(alist, blist):\n", + " print(y, x)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This type of loop can be used with any two lists (or similar) to iterate over them jointly.\n", + "\n", + "### While loops\n", + "\n", + "The syntax for this is pretty standard:" + ] + }, + { + "cell_type": "code", + "execution_count": 109, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "35.041627991396396\n" + ] + } + ], + "source": [ + "import random\n", + "n = 0\n", + "x = 0\n", + "while n<100:\n", + " x += random.uniform(0, 1)**2 # where ** is a power operation\n", + " if x>50:\n", + " break\n", + " n += 1\n", + "print(x)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also use `continue` as in other languages.\n", + "\n", + "---\n", + "\n", + "### A quick intro to conditional expressions and list comprehensions\n", + "\n", + "These are more advanced bits of python but are really useful and common, so worth having a little familiarity with at this stage.\n", + "\n", + "#### Conditional expressions\n", + "\n", + "A general expression that can be used in python is: A `if` condition `else` B\n", + "\n", + "For example:" + ] + }, + { + "cell_type": "code", + "execution_count": 110, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0.6576892259376057 0.11717666603919556\n" + ] + } + ], + "source": [ + "import random\n", + "x = random.uniform(0, 1)\n", + "y = x**2 if x<0.5 else (1 - x)**2\n", + "print(x, y)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### List comprehensions\n", + "\n", + "This is a shorthand syntax for building a list like a for loop but doing it in one line, and is very popular in python. It is quite similar to mathematical set notation. For example:" ] + }, + { + "cell_type": "code", + "execution_count": 111, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]\n", + "[0, 1, 4, 9, 16, 25, 36, 64, 81]\n" + ] + } + ], + "source": [ + "v1 = [ x**2 for x in range(10) ]\n", + "print(v1)\n", + "v2 = [ x**2 for x in range(10) if x!=7 ]\n", + "print(v2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You'll find that python programmers use this kind of construction _*a lot*_." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] } ], - "metadata": {}, + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.5.2" + } + }, "nbformat": 4, "nbformat_minor": 2 } diff --git a/getting_started/basics.md b/getting_started/basics.md index 44da7c938fa178738cde1f5884ed43049e1f85f3..e4093ac8ad84741f0fd7c52c1baa8da28719b4f6 100644 --- a/getting_started/basics.md +++ b/getting_started/basics.md @@ -1,3 +1,15 @@ +# Basic python + +This tutorial is aimed at briefly introducing you to the main language +features of python, with emphasis on some of the common difficulties +and pitfalls that are commonly encountered when moving to python. + +When going through this make sure that you _run_ each code block +and look at the output, as these are crucial for understanding the +explanations. You can run each block by using _shift + enter_ (including the text blocks, so you can just move down the document with shift + enter). + +--- + # Basic types Python has many different types and variables are dynamic and can change types (like MATLAB). Some of the most commonly used in-built types are: @@ -13,19 +25,21 @@ N-dimensional arrays and other types are supported through common modules (e.g., a = 4 b = 3.6 c = 'abc' -d = [10,20,30] +d = [10, 20, 30] e = {'a' : 10, 'b': 20} +print(a) ``` -Any variable can be printed using the function `print()`: +Any variable or combination of variables can be printed using the function `print()`: ``` print(d) print(e) +print(a, b, c) ``` > _*Python 3 versus python 2*_: > -> Print - for the print statement the brackets are compulsory for *python 3*, but are optional in python 2. So you will see plenty of code without the brackets but you should get into the habit of using them. +> Print - for the print statement the brackets are compulsory for *python 3*, but are optional in python 2. So you will see plenty of code without the brackets but you should never use `print` without brackets, as this is incompatible with Python 3. > > Division - in python 3 all division is floating point (like in MATLAB), even if the values are integers, but in python 2 integer division works like it does in C. @@ -38,52 +52,107 @@ Strings can be dereferenced like lists (see later). For example: ``` -s1="test string" -s2='another test string' +s1 = "test string" +s2 = 'another test string' +print(s1, ' :: ', s2) ``` You can also use triple quotes to capture multi-line strings. For example: ``` -s3='''This is +s3 = '''This is a string over multiple lines ''' print(s3) ``` +### Format + +More interesting strings can be created using the `format` statement, which is very useful in print statements: +``` +x = 1 +y = 'PyTreat' +s = 'The numerical value is {} and a name is {}'.format(x, y) +print(s) +print('A name is {} and a number is {}'.format(y, x)) +``` + +There are also other options along these lines, but this is the more modern version, although you will see plenty of the other alternatives in old code (i.e., code written before last week). + +### String manipulation + +The methods `lower()` and `upper()` are useful for strings. For example: +``` +s = 'This is a Test String' +print(s.upper()) +print(s.lower()) +``` + +Another useful method is: +``` +s = 'This is a Test String' +s2 = s.replace('Test', 'Better') +print(s2) +``` + +If you like regular expressions then you're in luck as these are well supported in python using the `re` module. To use this (like many other "extensions" - called _modules_ in Python - you need to `import` it). For example: +``` +import re +s = 'This is a test of a Test String' +s1 = re.sub(r'a [Tt]est', "an example", s) +print(s1) +``` +where the `r` before the quote is used to force the regular expression specification to be a `raw string`. + +For more information on matching and substitutions, look up the regular expression module on the web. + + +You can also split, or tokenize, a string (to turn it into a list) like this: +``` +print(s.split()) +``` + +> Note that strings in python 3 are _unicode_ so can represent Chinese characters, etc, and is therefore very flexible. However, in general you can just be blissfully ignorant of this fact. + --- ## Tuples and Lists +Both tuples and lists are builtin python types and are like vectors, +but for numerical vectors and arrays it is much better to use _numpy_ +arrays (or matrices), which are covered in a later tutorial. + A tuple is like a list or a vector, but with less flexibility than a full list, however anything can be stored in either a list or tuple, without any consistency being required. For example: ``` -xtuple=(3, 7.6, 'str') -xlist=[1,'mj',-5.4] +xtuple = (3, 7.6, 'str') +xlist = [1, 'mj', -5.4] +print(xtuple) +print(xlist) ``` They can also be nested: ``` -x2=(xtuple,xlist) -x3=[xtuple,xlist] -print(x2) -print(x3) +x2 = (xtuple, xlist) +x3 = [xtuple, xlist] +print('x2 is: ', x2) +print('x3 is: ', x3) ``` ### Adding to a list This is easy: ``` -a = [10,20,30] +a = [10, 20, 30] a = a + [70] -a += [80] +a += [80] print(a) ``` -### Dereferencing +### Indexing -Square brackets are used to dereference tuples, lists, dictionaries, etc. For example: +Square brackets are used to index tuples, lists, dictionaries, etc. For example: ``` -d = [10,20,30] +d = [10, 20, 30] print(d[1]) ``` @@ -91,7 +160,7 @@ print(d[1]) > Python uses zero-based indexing, unlike MATLAB ``` -a=[10,20,30,40,50,60] +a = [10, 20, 30, 40, 50, 60] print(a[0]) print(a[2]) ``` @@ -115,16 +184,16 @@ Length of a tuple or list is given by the `len()` function: print(len(a)) ``` -Nested lists can have nested dereferences: +Nested lists can have nested indexing: ``` -b=[[10,20,30],[40,50,60]] +b = [[10, 20, 30], [40, 50, 60]] print(b[0][1]) print(b[1][0]) ``` -but *not* a dereference like b[0,1]. +but *not* an index like b[0, 1]. > Note that `len` will only give the length of the top level. -> In general arrays should be preferred to nested lists when the contents are numerical. +> In general, numpy arrays should be preferred to nested lists when the contents are numerical. ### Slicing @@ -135,20 +204,22 @@ print(a[0:3]) > _*Pitfall:*_ > -> Slicing syntax is different from MATLAB in that second number is one plus final index - this is in addition to the zero index difference. +> Slicing syntax is different from MATLAB in that second number is +> exclusive (i.e., one plus final index) - this is in addition to the zero index difference. ``` -a=[10,20,30,40,50,60] +a = [10, 20, 30, 40, 50, 60] print(a[0:3]) # same as a(1:3) in MATLAB print(a[1:3]) # same as a(2:3) in MATLAB ``` > _*Pitfall:*_ > -> Unlike in MATLAB, you cannot use a list as indices instead of an integer or a slice +> Unlike in MATLAB, you cannot use a list as indices instead of an +> integer or a slice (although these can be done in _numpy_). ``` -b=[3,4] +b = [3, 4] print(a[b]) ``` @@ -157,8 +228,8 @@ print(a[b]) Multiplication can be used with lists, where multiplication implements replication. ``` -d=[10,20,30] -print(d*4) +d = [10, 20, 30] +print(d * 4) ``` There are also other operations such as: @@ -174,7 +245,7 @@ print(d) ### Looping over elements in a list (or tuple) ``` -d=[10,20,30] +d = [10, 20, 30] for x in d: print(x) ``` @@ -183,12 +254,20 @@ for x in d: ### Getting help -The function `dir()` can be used to get information about any variable/object/function in python. It lists the possible operations. +The function `help()` can be used to get information about any variable/object/function in python. It lists the possible operations. In `ipython` you can also just type `?<blah>` or `<blah>?` instead: + +``` +help(d) +``` + + +There is also a `dir()` function that gives a basic listing of the operations: ``` dir(d) ``` + > Note that google is often more helpful! --- @@ -204,13 +283,15 @@ print(e.values()) print(e['a']) ``` -The keys and values can take on any type, even dictionaries! Python is nothing if not flexible. However, each key must be unique. +The keys and values can take on almost any type, even dictionaries! +Python is nothing if not flexible. However, each key must be unique +and the dictionary must be "hashable". ### Adding to a dictionary This is very easy: ``` -e['c']=555 # just like in Biobank! ;) +e['c'] = 555 # just like in Biobank! ;) print(e) ``` @@ -230,8 +311,8 @@ print(e) Several variables can jointly work as loop variables in python, which is very convenient. For example: ``` e = {'a' : 10, 'b': 20, 'c':555} -for k,v in e.items(): - print((k,v)) +for k, v in e.items(): + print((k, v)) ``` The print statement here constructs a tuple, which is often used in python. @@ -239,10 +320,12 @@ The print statement here constructs a tuple, which is often used in python. Another option is: ``` for k in e: - print((k,e[k])) + print((k, e[k])) ``` > Note that in both cases the order is arbitrary. The `sorted` function can be used if you want keys in a sorted order; e.g. `for k in sorted(e):` ... +> +> There are also other options if you want a dictionary with ordering. --- @@ -267,20 +350,30 @@ print(b) But if an operation is performed then a copy might be made: ``` a = [7] -b = a*2 +b = a * 2 a[0] = 8888 print(b) ``` -If an explicit copy is necessary then this can be made using the `copy()` function: +If an explicit copy is necessary then this can be made using the `list()` constructor: ``` a = [7] -b = a.copy() +b = list(a) a[0] = 8888 print(b) ``` -> When writing functions this is something to be particularly careful about. +There is a constructor for each type and this con be useful for converting between types: +``` +xt = (2, 5, 7) +xl = list(xt) +print(xt) +print(xl) +``` + +> _*Pitfall:*_ +> +> When writing functions you need to be particularly careful about references and copies. ``` def foo1(x): @@ -290,7 +383,7 @@ def foo2(x): def foo3(x): return x + [10] -a=[5] +a = [5] print(a) foo1(a) print(a) @@ -304,8 +397,138 @@ print(a) ## Control flow - - boolean operators - - if/else/for - - a if condition else b - - introduce range/enumerate +### Boolean operators + +There is a boolean type in python that can be `True` or `False` (note the capitals). Other values can also be used for True or False (e.g., 1 for True; 0 or None or [] or {} or "") although they are not considered 'equal' in the sense that the operator `==` would consider them the same. + +Relevant boolean and comparison operators include: `not`, `and`, `or`, `==` and `!=` + +For example: +``` +a = True +print('Not a is:', not a) +print('Not 1 is:', not 1) +print('Not 0 is:', not 0) +print('Not {} is:', not {}) +print('{}==0 is:', {}==0) +``` + +There is also the `in` test for strings, lists, etc: +``` +print('the' in 'a number of words') +print('of' in 'a number of words') +print(3 in [1, 2, 3, 4]) +``` + + +### If statements + +The basic syntax of `if` statements is fairly standard, though don't forget that you _*must*_ indent the statements within the conditional/loop block as this is the way of delineating blocks of code in python. For example: +``` +import random +a = random.uniform(-1, 1) +print(a) +if a>0: + print('Positive') +elif a<0: + print('Negative') +else: + print('Zero') +``` + +Or more generally: +``` +a = [] # just one of many examples +if not a: + print('Variable is true, or at least not empty') +``` +This can be useful for functions where a variety of possible input types are being dealt with. + +--- + +### For loops + +The `for` loop works like in bash: +``` +for x in [2, 'is', 'more', 'than', 1]: + print(x) +``` +where a list or any other sequence (e.g. tuple) can be used. + +If you want a numerical range then use: +``` +for x in range(2, 9): + print(x) +``` +Note that, like slicing, the maximum value is one less than the value specified. Also, `range` actually returns an object that can be iterated over but is not just a list of numbers. If you want a list of numbers then `list(range(2, 9))` will give you this. + +A very nice feature of python is that multiple variables can be assigned from a tuple or list: +``` +x, y = [4, 7] +print(x) +print(y) +``` + +and this can be combined with a function called `zip` to make very convenient dual variable loops: +``` +alist = ['Some', 'set', 'of', 'items'] +blist = list(range(len(alist))) +print(list(zip(alist, blist))) +for x, y in zip(alist, blist): + print(y, x) +``` + +This type of loop can be used with any two lists (or similar) to iterate over them jointly. + +### While loops + +The syntax for this is pretty standard: +``` +import random +n = 0 +x = 0 +while n<100: + x += random.uniform(0, 1)**2 # where ** is a power operation + if x>50: + break + n += 1 +print(x) +``` + +You can also use `continue` as in other languages. + +--- + +### A quick intro to conditional expressions and list comprehensions + +These are more advanced bits of python but are really useful and common, so worth having a little familiarity with at this stage. + +#### Conditional expressions + +A general expression that can be used in python is: A `if` condition `else` B + +For example: +``` +import random +x = random.uniform(0, 1) +y = x**2 if x<0.5 else (1 - x)**2 +print(x, y) +``` + + +#### List comprehensions + +This is a shorthand syntax for building a list like a for loop but doing it in one line, and is very popular in python. It is quite similar to mathematical set notation. For example: +``` +v1 = [ x**2 for x in range(10) ] +print(v1) +v2 = [ x**2 for x in range(10) if x!=7 ] +print(v2) +``` + +You'll find that python programmers use this kind of construction _*a lot*_. + + + + diff --git a/getting_started/nifti.ipynb b/getting_started/nifti.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..994edd80c08830bcf5e2d0ab2bb116f1ac5f4029 --- /dev/null +++ b/getting_started/nifti.ipynb @@ -0,0 +1,171 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# NIfTI images and python\n", + "\n", + "The [nibabel](http://nipy.org/nibabel/) module is used to read and write NIfTI images and also some other medical imaging formats (e.g., ANALYZE, GIFTI, MINC, MGH). This module is included within the FSL python environment.\n", + "\n", + "## Reading images\n", + "\n", + "It is easy to read an image:" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(182, 218, 182)\n" + ] + } + ], + "source": [ + "import numpy as np\n", + "import nibabel as nib\n", + "filename = '/usr/local/fsl/data/standard/MNI152_T1_1mm.nii.gz'\n", + "imobj = nib.load(filename, mmap=False)\n", + "# display header object\n", + "imhdr = imobj.header\n", + "# extract data (as an numpy array)\n", + "imdat = imobj.get_data().astype(float)\n", + "print(imdat.shape)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> Make sure you use the full filename, including the .nii.gz extension.\n", + "\n", + "Reading the data off the disk is not done until `get_data()` is called.\n", + "\n", + "> Pitfall:\n", + ">\n", + "> The option `mmap=False`is necessary as turns off memory mapping, which otherwise would be invoked for uncompressed NIfTI files but not for compressed files. Since some functionality behaves differently on memory mapped objects, it is advisable to turn this off.\n", + "\n", + "Once the data is read into a numpy array then it is easily manipulated.\n", + "\n", + "> We recommend converting it to float at the start to avoid problems with integer arithmetic and overflow, though this is not compulsory.\n", + "\n", + "## Header info\n", + "\n", + "There are many methods available on the header object - for example, look at `dir(imhdr)` or `help(imhdr)` or the [nibabel webpage about NIfTI images](http://nipy.org/nibabel/nifti_images.html)\n", + "\n", + "### Voxel sizes\n", + "\n", + "Dimensions of the voxels, in mm, can be found from:" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(1.0, 1.0, 1.0)\n" + ] + } + ], + "source": [ + "voxsize = imhdr.get_zooms()\n", + "print(voxsize)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Coordinate orientations and mappings\n", + "\n", + "Information about the NIfTI qform and sform matrices can be extracted like this:" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "4\n", + "[[ -1. 0. 0. 90.]\n", + " [ 0. 1. 0. -126.]\n", + " [ 0. 0. 1. -72.]\n", + " [ 0. 0. 0. 1.]]\n" + ] + } + ], + "source": [ + "sform = imhdr.get_sform()\n", + "sformcode = imhdr['sform_code']\n", + "qform = imhdr.get_qform()\n", + "qformcode = imhdr['qform_code']\n", + "print(qformcode)\n", + "print(qform)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Writing images\n", + "\n", + "If you have created a modified image by making or modifying a numpy array then you need to put this into a NIfTI image object in order to save it to a file. The easiest way to do this is to copy all the header info from an existing image like this:" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "newdata = imdat * imdat\n", + "newhdr = imhdr.copy()\n", + "newobj = nib.nifti1.Nifti1Image(newdata, None, header=newhdr)\n", + "nib.save(newobj, \"mynewname.nii.gz\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "where `newdata` is the numpy array (the above is a random example only) and `imhdr` is the existing image header (as above).\n", + "\n", + "If the dimensions of the image are different, then extra modifications will be required. For this, or for making an image from scratch, see the [nibabel documentation](http://nipy.org/nibabel/nifti_images.html) on NIfTI images." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.5.2" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/getting_started/nifti.md b/getting_started/nifti.md new file mode 100644 index 0000000000000000000000000000000000000000..973250c21744b9aaa5c56b37f49c2aacfa761e4e --- /dev/null +++ b/getting_started/nifti.md @@ -0,0 +1,75 @@ +# NIfTI images and python + +The [nibabel](http://nipy.org/nibabel/) module is used to read and write NIfTI images and also some other medical imaging formats (e.g., ANALYZE, GIFTI, MINC, MGH). This module is included within the FSL python environment. + +## Reading images + +It is easy to read an image: + +``` +import numpy as np +import nibabel as nib +filename = '/usr/local/fsl/data/standard/MNI152_T1_1mm.nii.gz' +imobj = nib.load(filename, mmap=False) +# display header object +imhdr = imobj.header +# extract data (as an numpy array) +imdat = imobj.get_data().astype(float) +print(imdat.shape) +``` + +> Make sure you use the full filename, including the .nii.gz extension. + +Reading the data off the disk is not done until `get_data()` is called. + +> Pitfall: +> +> The option `mmap=False`is necessary as turns off memory mapping, which otherwise would be invoked for uncompressed NIfTI files but not for compressed files. Since some functionality behaves differently on memory mapped objects, it is advisable to turn this off. + +Once the data is read into a numpy array then it is easily manipulated. + +> We recommend converting it to float at the start to avoid problems with integer arithmetic and overflow, though this is not compulsory. + +## Header info + +There are many methods available on the header object - for example, look at `dir(imhdr)` or `help(imhdr)` or the [nibabel webpage about NIfTI images](http://nipy.org/nibabel/nifti_images.html) + +### Voxel sizes + +Dimensions of the voxels, in mm, can be found from: + +``` +voxsize = imhdr.get_zooms() +print(voxsize) +``` + +### Coordinate orientations and mappings + +Information about the NIfTI qform and sform matrices can be extracted like this: + +``` +sform = imhdr.get_sform() +sformcode = imhdr['sform_code'] +qform = imhdr.get_qform() +qformcode = imhdr['qform_code'] +print(qformcode) +print(qform) +``` + +## Writing images + +If you have created a modified image by making or modifying a numpy array then you need to put this into a NIfTI image object in order to save it to a file. The easiest way to do this is to copy all the header info from an existing image like this: + +``` +newdata = imdat * imdat +newhdr = imhdr.copy() +newobj = nib.nifti1.Nifti1Image(newdata, None, header=newhdr) +nib.save(newobj, "mynewname.nii.gz") +``` +where `newdata` is the numpy array (the above is a random example only) and `imhdr` is the existing image header (as above). + +If the dimensions of the image are different, then extra modifications will be required. For this, or for making an image from scratch, see the [nibabel documentation](http://nipy.org/nibabel/nifti_images.html) on NIfTI images. + + + + diff --git a/getting_started/scripts.ipynb b/getting_started/scripts.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..1d3d705d5b636c281ce97e6a527431c69075847d --- /dev/null +++ b/getting_started/scripts.ipynb @@ -0,0 +1,243 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Callable scripts in python\n", + "\n", + "In this tutorial we will cover how to write simple stand-alone scripts in python that can be used as alternatives to bash scripts.\n", + "\n", + "There are some code blocks within this webpage, but we recommend that you write the code in an IDE or editor instead and then run the scripts from a terminal.\n", + "\n", + "## Basic script\n", + "\n", + "The first line of a python script is usually:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#!/usr/bin/env python" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "which invokes whichever version of python can be found by `/usr/bin/env` since python can be located in many different places.\n", + "\n", + "For FSL scripts we use an alternative, to ensure that we pick up the version of python (and associated packages) that we ship with FSL. To do this we use the line:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#!/usr/bin/env fslpython" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "After this line the rest of the file just uses regular python syntax, as in the other tutorials. Make sure you make the file executable - just like a bash script.\n", + "\n", + "## Calling other executables\n", + "\n", + "The most essential call that you need to use to replicate the way a bash script calls executables is `subprocess.run()`. A simple call looks like this:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import subprocess as sp\n", + "sp.run(['ls', '-la'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To suppress the output do this:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "spobj = sp.run(['ls'], stdout = sp.PIPE)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To store the output do this:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "spobj = sp.run('ls -la'.split(), stdout = sp.PIPE)\n", + "sout = spobj.stdout.decode('utf-8')\n", + "print(sout)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> Note that the `decode` call in the middle line converts the string from a byte string to a normal string. In Python 3 there is a distinction between strings (sequences of characters, possibly using multiple bytes to store each character) and bytes (sequences of bytes). The world has moved on from ASCII, so in this day and age, this distinction is absolutely necessary, and Python does a fairly good job of it.\n", + "\n", + "If the output is numerical then this can be extracted like this:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "fsldir = os.getenv('FSLDIR')\n", + "spobj = sp.run([fsldir+'/bin/fslstats', fsldir+'/data/standard/MNI152_T1_1mm_brain', '-V'], stdout = sp.PIPE)\n", + "sout = spobj.stdout.decode('utf-8')\n", + "vol_vox = float(sout.split()[0])\n", + "vol_mm = float(sout.split()[1])\n", + "print('Volumes are: ', vol_vox, ' in voxels and ', vol_mm, ' in mm')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "An alternative way to run a set of commands would be like this:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "commands = \"\"\"\n", + "{fsldir}/bin/fslmaths {t1} -bin {t1_mask}\n", + "{fsldir}/bin/fslmaths {t2} -mas {t1_mask} {t2_masked}\n", + "\"\"\"\n", + "\n", + "fsldirpath = os.getenv('FSLDIR')\n", + "commands = commands.format(t1 = 't1.nii.gz', t1_mask = 't1_mask', t2 = 't2', t2_masked = 't2_masked', fsldir = fsldirpath)\n", + "\n", + "sout=[]\n", + "for cmd in commands.split('\\n'):\n", + " if cmd: # avoids empty strings getting passed to sp.run()\n", + " print('Running command: ', cmd)\n", + " spobj = sp.run(cmd.split(), stdout = sp.PIPE)\n", + " sout.append(spobj.stdout.decode('utf-8'))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Command line arguments\n", + "\n", + "The simplest way of dealing with command line arguments is use the module `sys`, which gives access to an `argv` list:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import sys\n", + "print(len(sys.argv))\n", + "print(sys.argv[0])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For more sophisticated argument parsing you can use `argparse` - good documentation and examples of this can be found on the web.\n", + "\n", + "\n", + "## Example script\n", + "\n", + "Here is a simple bash script (it masks an image and calculates volumes - just as a random example). DO NOT execute the code blocks here within the notebook/webpage:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#!/bin/bash\n", + "if [ $# -lt 2 ] ; then\n", + " echo \"Usage: $0 <input image> <output image>\"\n", + " exit 1\n", + "fi\n", + "infile=$1\n", + "outfile=$2\n", + "# mask input image with MNI\n", + "$FSLDIR/bin/fslmaths $infile -mas $FSLDIR/data/standard/MNI152_T1_1mm_brain $outfile\n", + "# calculate volumes of masked image \n", + "vv=`$FSLDIR/bin/fslstats $outfile -V`\n", + "vol_vox=`echo $vv | awk '{ print $1 }'`\n", + "vol_mm=`echo $vv | awk '{ print $2 }'`\n", + "echo \"Volumes are: $vol_vox in voxels and $vol_mm in mm\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And an alternative in python:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#!/usr/bin/env fslpython\n", + "import os, sys\n", + "import subprocess as sp\n", + "fsldir=os.getenv('FSLDIR')\n", + "if len(sys.argv)<2:\n", + " print('Usage: ', sys.argv[0], ' <input image> <output image>')\n", + " sys.exit(1)\n", + "infile = sys.argv[1]\n", + "outfile = sys.argv[2]\n", + "# mask input image with MNI\n", + "spobj = sp.run([fsldir+'/bin/fslmaths', infile, '-mas', fsldir+'/data/standard/MNI152_T1_1mm_brain', outfile], stdout = sp.PIPE)\n", + "# calculate volumes of masked image \n", + "spobj = sp.run([fsldir+'/bin/fslstats', outfile, '-V'], stdout = sp.PIPE)\n", + "sout = spobj.stdout.decode('utf-8')\n", + "vol_vox = float(sout.split()[0])\n", + "vol_mm = float(sout.split()[1])\n", + "print('Volumes are: ', vol_vox, ' in voxels and ', vol_mm, ' in mm')" + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/getting_started/scripts.md b/getting_started/scripts.md new file mode 100644 index 0000000000000000000000000000000000000000..417b2f14172ec66d0e71e27324a5cc21a8cc4ca3 --- /dev/null +++ b/getting_started/scripts.md @@ -0,0 +1,136 @@ +# Callable scripts in python + +In this tutorial we will cover how to write simple stand-alone scripts in python that can be used as alternatives to bash scripts. + +There are some code blocks within this webpage, but we recommend that you write the code in an IDE or editor instead and then run the scripts from a terminal. + +## Basic script + +The first line of a python script is usually: +``` +#!/usr/bin/env python +``` +which invokes whichever version of python can be found by `/usr/bin/env` since python can be located in many different places. + +For FSL scripts we use an alternative, to ensure that we pick up the version of python (and associated packages) that we ship with FSL. To do this we use the line: +``` +#!/usr/bin/env fslpython +``` + +After this line the rest of the file just uses regular python syntax, as in the other tutorials. Make sure you make the file executable - just like a bash script. + +## Calling other executables + +The most essential call that you need to use to replicate the way a bash script calls executables is `subprocess.run()`. A simple call looks like this: + +``` +import subprocess as sp +sp.run(['ls', '-la']) +``` + + +To suppress the output do this: + +``` +spobj = sp.run(['ls'], stdout = sp.PIPE) +``` + +To store the output do this: + +``` +spobj = sp.run('ls -la'.split(), stdout = sp.PIPE) +sout = spobj.stdout.decode('utf-8') +print(sout) +``` + +> Note that the `decode` call in the middle line converts the string from a byte string to a normal string. In Python 3 there is a distinction between strings (sequences of characters, possibly using multiple bytes to store each character) and bytes (sequences of bytes). The world has moved on from ASCII, so in this day and age, this distinction is absolutely necessary, and Python does a fairly good job of it. + +If the output is numerical then this can be extracted like this: +``` +import os +fsldir = os.getenv('FSLDIR') +spobj = sp.run([fsldir+'/bin/fslstats', fsldir+'/data/standard/MNI152_T1_1mm_brain', '-V'], stdout = sp.PIPE) +sout = spobj.stdout.decode('utf-8') +vol_vox = float(sout.split()[0]) +vol_mm = float(sout.split()[1]) +print('Volumes are: ', vol_vox, ' in voxels and ', vol_mm, ' in mm') +``` + + + +An alternative way to run a set of commands would be like this: +``` +commands = """ +{fsldir}/bin/fslmaths {t1} -bin {t1_mask} +{fsldir}/bin/fslmaths {t2} -mas {t1_mask} {t2_masked} +""" + +fsldirpath = os.getenv('FSLDIR') +commands = commands.format(t1 = 't1.nii.gz', t1_mask = 't1_mask', t2 = 't2', t2_masked = 't2_masked', fsldir = fsldirpath) + +sout=[] +for cmd in commands.split('\n'): + if cmd: # avoids empty strings getting passed to sp.run() + print('Running command: ', cmd) + spobj = sp.run(cmd.split(), stdout = sp.PIPE) + sout.append(spobj.stdout.decode('utf-8')) +``` + + +## Command line arguments + +The simplest way of dealing with command line arguments is use the module `sys`, which gives access to an `argv` list: +``` +import sys +print(len(sys.argv)) +print(sys.argv[0]) +``` + +For more sophisticated argument parsing you can use `argparse` - good documentation and examples of this can be found on the web. + + +## Example script + +Here is a simple bash script (it masks an image and calculates volumes - just as a random example). DO NOT execute the code blocks here within the notebook/webpage: + +``` +#!/bin/bash +if [ $# -lt 2 ] ; then + echo "Usage: $0 <input image> <output image>" + exit 1 +fi +infile=$1 +outfile=$2 +# mask input image with MNI +$FSLDIR/bin/fslmaths $infile -mas $FSLDIR/data/standard/MNI152_T1_1mm_brain $outfile +# calculate volumes of masked image +vv=`$FSLDIR/bin/fslstats $outfile -V` +vol_vox=`echo $vv | awk '{ print $1 }'` +vol_mm=`echo $vv | awk '{ print $2 }'` +echo "Volumes are: $vol_vox in voxels and $vol_mm in mm" +``` + + +And an alternative in python: + +``` +#!/usr/bin/env fslpython +import os, sys +import subprocess as sp +fsldir=os.getenv('FSLDIR') +if len(sys.argv)<2: + print('Usage: ', sys.argv[0], ' <input image> <output image>') + sys.exit(1) +infile = sys.argv[1] +outfile = sys.argv[2] +# mask input image with MNI +spobj = sp.run([fsldir+'/bin/fslmaths', infile, '-mas', fsldir+'/data/standard/MNI152_T1_1mm_brain', outfile], stdout = sp.PIPE) +# calculate volumes of masked image +spobj = sp.run([fsldir+'/bin/fslstats', outfile, '-V'], stdout = sp.PIPE) +sout = spobj.stdout.decode('utf-8') +vol_vox = float(sout.split()[0]) +vol_mm = float(sout.split()[1]) +print('Volumes are: ', vol_vox, ' in voxels and ', vol_mm, ' in mm') +``` + +