Newer
Older
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Basic python\n",
"\n",
"This tutorial is aimed at briefly introducing you to the main language\n",
"features of python, with emphasis on some of the common difficulties\n",
"and pitfalls that are commonly encountered when moving to python.\n",
"\n",
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
"When going through this make sure that you _run_ each code block and\n",
"look at the output, as these are crucial for understanding the\n",
"explanations. You can run each block by using _shift + enter_\n",
"(including the text blocks, so you can just move down the document\n",
"with shift + enter).\n",
"\n",
"It is also possible to _change_ the contents of each code block (these pages are completely interactive) so do experiment with the code you see and try some variations!\n",
"\n",
"## Contents\n",
"\n",
"* [Basic types](#Basic-types)\n",
" - [Strings](#Strings)\n",
" + [Format](#Format)\n",
" + [String manipulation](#String-manipulation)\n",
" - [Tuples and lists](#Tuples-and-lists)\n",
" + [Adding to a list](#Adding-to-a-list)\n",
" + [Indexing](#Indexing)\n",
" + [Slicing](#Slicing)\n",
" - [List operations](#List-operations)\n",
" + [Looping over elements in a list (or tuple)](#Looping)\n",
" + [Getting help](#Getting-help)\n",
" - [Dictionaries](#Dictionaries)\n",
" + [Adding to a dictionary](#Adding-to-a-dictionary)\n",
" + [Removing elements from a dictionary](#Removing-elements-dictionary)\n",
" + [Looping over everything in a dictionary](#Looping-dictionary)\n",
" - [Copying and references](#Copying-and-references)\n",
"* [Control flow](#Control-flow)\n",
" - [Boolean operators](#Boolean-operators)\n",
" - [If statements](#If-statements)\n",
" - [For loops](#For-loops)\n",
" - [While loops](#While-loops)\n",
" - [A quick intro to conditional expressions and list comprehensions](#quick-intro)\n",
" + [Conditional expressions](#Conditional-expressions)\n",
" + [List comprehensions](#List-comprehensions)\n",
"* [Functions](#functions)\n",
"* [Exercise](#exercise)\n",
"<a class=\"anchor\" id=\"Basic-types\"></a>\n",
"# Basic types\n",
"\n",
"Python has many different types and variables are dynamic and can change types (like MATLAB). Some of the most commonly used in-built types are:\n",
"* integer and floating point scalars\n",
"* strings\n",
"* tuples\n",
"* lists\n",
"* dictionary\n",
"\n",
"N-dimensional arrays and other types are supported through common modules (e.g., numpy, scipy, scikit-learn). These will be covered in a subsequent exercise."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = 4\n",
"b = 3.6\n",
"c = 'abc'\n",
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Any variable or combination of variables can be printed using the function `print()`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> _*Python 3 versus python 2*_:\n",
">\n",
"> Print - for the print statement the brackets are compulsory for *python 3*, but are optional in python 2. So you will see plenty of code without the brackets but you should never use `print` without brackets, as this is incompatible with Python 3.\n",
">\n",
"> Division - in python 3 all division is floating point (like in MATLAB), even if the values are integers, but in python 2 integer division works like it does in C.\n",
"\n",
"---\n",
"\n",
"<a class=\"anchor\" id=\"Strings\"></a>\n",
"## Strings\n",
"\n",
"Strings can be specified using single quotes *or* double quotes - as long as they are matched.\n",
"Strings can be indexed like lists (see later).\n",
"\n",
"For example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"s1 = \"test string\"\n",
"s2 = 'another test string'\n",
"print(s1, ' :: ', s2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also use triple quotes to capture multi-line strings. For example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"a string over\n",
"multiple lines\n",
"'''\n",
"print(s3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a class=\"anchor\" id=\"Format\"></a>\n",
"### Format\n",
"\n",
"More interesting strings can be created using the `format` statement, which is very useful in print statements:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"x = 1\n",
"y = 'PyTreat'\n",
"s = 'The numerical value is {} and a name is {}'.format(x, y)\n",
"print('A name is {} and a number is {}'.format(y, x))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are also other options along these lines, but this is the more modern version, although you will see plenty of the other alternatives in \"old\" code (to python coders this means anything written before last week).\n",
"<a class=\"anchor\" id=\"String-manipulation\"></a>\n",
"### String manipulation\n",
"\n",
"The methods `lower()` and `upper()` are useful for strings. For example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"print(s.upper())\n",
"print(s.lower())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Another useful method is:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"s = 'This is a Test String'\n",
"s2 = s.replace('Test', 'Better')\n",
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Strings can be concatenated just by using the `+` operator:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"s3 = s + ' :: ' + s2\n",
"print(s3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you like regular expressions then you're in luck as these are well supported in python using the `re` module. To use this (like many other \"extensions\" - called _modules_ in Python - you need to `import` it). For example:"
"execution_count": null,
"metadata": {},
"outputs": [],
"s = 'This is a test of a Test String'\n",
"s1 = re.sub(r'a [Tt]est', \"an example\", s)\n",
"print(s1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"where the `r` before the quote is used to force the regular expression specification to be a `raw string`.\n",
"\n",
"For more information on matching and substitutions, look up the regular expression module on the web.\n",
"\n",
"Two common and convenient string methods are `strip()` and `split()`. The first will remove any whitespace at the beginning and end of a string:"
"execution_count": null,
"metadata": {},
"outputs": [],
"s2 = ' A very spacy string '\n",
"print('*' + s2 + '*')\n",
"print('*' + s2.strip() + '*')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
"With `split()` we can tokenize a string (to turn it into a list of strings) like this:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(s.split())\n",
"print(s2.split())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"By default it splits at whitespace, but it can also split at a specified delimiter:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"s4 = ' This is, as you can see , a very weirdly spaced and punctuated string ... '\n",
"print(s4.split(','))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are more powerful ways of dealing with this like csv files/strings, which are covered in later practicals, but even this can get you a long way.\n",
"\n",
"> Note that strings in python 3 are _unicode_ so can represent Chinese characters, etc, and is therefore very flexible. However, in general you can just be blissfully ignorant of this fact.\n",
"\n",
"Strings can be converted to integer or floating-point values by using the `int()` and `float()` calls:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sint='23'\n",
"sfp='2.03'\n",
"print(sint + sfp)\n",
"print(int(sint) + float(sfp))\n",
"print(float(sint) + float(sfp))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> Note that calling `int()` on a non-integer (e.g., on `sfp` above) will raise an error.\n",
"\n",
"<a class=\"anchor\" id=\"Tuples-and-lists\"></a>\n",
"## Tuples and lists\n",
"Both tuples and lists are builtin python types and are like vectors, \n",
"but for numerical vectors and arrays it is much better to use _numpy_\n",
"arrays (or matrices), which are covered in a later tutorial.\n",
"\n",
"A tuple is like a list or a vector, but with less flexibility than a full list, however anything can be stored in either a list or tuple, without any consistency being required. For example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"xtuple = (3, 7.6, 'str')\n",
"xlist = [1, 'mj', -5.4]\n",
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"They can also be nested:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"x2 = (xtuple, xlist)\n",
"x3 = [xtuple, xlist]\n",
"print('x2 is: ', x2)\n",
"print('x3 is: ', x3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a class=\"anchor\" id=\"Adding-to-a-list\"></a>\n",
"### Adding to a list\n",
"\n",
"This is easy:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"print(a)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a class=\"anchor\" id=\"Indexing\"></a>\n",
"Square brackets are used to index tuples, lists, dictionaries, etc. For example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"print(d[1])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> _*Pitfall:*_\n",
"> Python uses zero-based indexing, unlike MATLAB"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"print(a[0])\n",
"print(a[2])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Indices naturally run from 0 to N-1, _but_ negative numbers can be used to reference from the end (circular wrap-around)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(a[-1])\n",
"print(a[-6])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"However, this is only true for -1 to -N. Outside of -N to N-1 will generate an `index out of range` error."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(a[-7])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(a[6])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Length of a tuple or list is given by the `len()` function:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(len(a))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"print(b[0][1])\n",
"print(b[1][0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"> Note that `len` will only give the length of the top level.\n",
"> In general, numpy arrays should be preferred to nested lists when the contents are numerical.\n",
"<a class=\"anchor\" id=\"Slicing\"></a>\n",
"### Slicing\n",
"\n",
"A range of values for the indices can be specified to extract values from a list. For example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(a[0:3])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> _*Pitfall:*_\n",
">\n",
"> Slicing syntax is different from MATLAB in that second number is\n",
"> exclusive (i.e., one plus final index) - this is in addition to the zero index difference."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"print(a[0:3]) # same as a(1:3) in MATLAB\n",
"print(a[1:3]) # same as a(2:3) in MATLAB"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> _*Pitfall:*_\n",
">\n",
"> Unlike in MATLAB, you cannot use a list as indices instead of an\n",
"> integer or a slice (although these can be done in _numpy_)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"print(a[b])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a class=\"anchor\" id=\"List-operations\"></a>\n",
"### List operations\n",
"\n",
"Multiplication can be used with lists, where multiplication implements replication."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are also other operations such as:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"d.append(40)\n",
"print(d)\n",
"d.remove(20)\n",
"print(d)\n",
"d.pop(0)\n",
"print(d)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a class=\"anchor\" id=\"Looping\"></a>\n",
"### Looping over elements in a list (or tuple)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"for x in d:\n",
" print(x)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> Note that the indentation within the loop is _*crucial*_. All python control blocks are delineated purely by indentation.\n",
"\n",
"<a class=\"anchor\" id=\"Getting-help\"></a>\n",
"### Getting help\n",
"\n",
"The function `help()` can be used to get information about any variable/object/function in python. It lists the possible operations. In `ipython` you can also just type `?<blah>` or `<blah>?` instead:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"help(d)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There is also a `dir()` function that gives a basic listing of the operations:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dir(d)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> Note that google is often more helpful!\n",
"\n",
"---\n",
"\n",
"<a class=\"anchor\" id=\"Dictionaries\"></a>\n",
"## Dictionaries\n",
"\n",
"These store key-value pairs. For example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"e = {'a' : 10, 'b': 20}\n",
"print(len(e))\n",
"print(e.keys())\n",
"print(e.values())\n",
"print(e['a'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The keys and values can take on almost any type, even dictionaries!\n",
"Python is nothing if not flexible. However, each key must be unique\n",
"and the dictionary must be \"hashable\".\n",
"<a class=\"anchor\" id=\"Adding-to-a-dictionary\"></a>\n",
"### Adding to a dictionary\n",
"\n",
"This is very easy:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"e['c'] = 555 # just like in Biobank! ;)\n",
"print(e)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a class=\"anchor\" id=\"Removing-elements-dictionary\"></a>\n",
"### Removing elements from a dictionary\n",
"\n",
"There are two main approaches - `pop` and `del`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"e.pop('b')\n",
"print(e)\n",
"del e['c']\n",
"print(e)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a class=\"anchor\" id=\"Looping-dictionary\"></a>\n",
"### Looping over everything in a dictionary\n",
"\n",
"Several variables can jointly work as loop variables in python, which is very convenient. For example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"e = {'a' : 10, 'b': 20, 'c':555}\n",
"for k, v in e.items():\n",
" print((k, v))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The print statement here constructs a tuple, which is often used in python.\n",
"\n",
"Another option is:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for k in e:\n",
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> Note that in both cases the order is arbitrary. The `sorted` function can be used if you want keys in a sorted order; e.g. `for k in sorted(e):` ...\n",
">\n",
"> There are also other options if you want a dictionary with ordering.\n",
"<a class=\"anchor\" id=\"Copying-and-references\"></a>\n",
"## Copying and references \n",
"\n",
"In python there are immutable types (e.g. numbers) and mutable types (e.g. lists). The main thing to know is that assignment can sometimes create separate copies and sometimes create references (as in C++). In general, the more complicated types are assigned via references. For example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = 7\n",
"b = a\n",
"a = 2348\n",
"print(b)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As opposed to:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"a = [7]\n",
"b = a\n",
"a[0] = 8888\n",
"print(b)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But if an operation is performed then a copy might be made:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"a[0] = 8888\n",
"print(b)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If an explicit copy is necessary then this can be made using the `list()` constructor:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"a[0] = 8888\n",
"print(b)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There is a constructor for each type and this con be useful for converting between types:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"print(xt)\n",
"print(xl)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> _*Pitfall:*_\n",
">\n",
"> When writing functions you need to be particularly careful about references and copies."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def foo1(x):\n",
" x.append(10)\n",
"def foo2(x):\n",
" x = x + [10]\n",
"def foo3(x):\n",
" return x + [10]\n",
"\n",
"print(a)\n",
"foo1(a)\n",
"print(a)\n",
"foo2(a)\n",
"print(a)\n",
"foo3(a)\n",
"print(a)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> Note that we have defined some functions here - and the syntax\n",
"> should be relatively intuitive. See <a href=\"#functions\">below</a>\n",
"> for a bit more detail on function definitions.\n",
"\n",
"<a class=\"anchor\" id=\"Control-flow\"></a>\n",
"<a class=\"anchor\" id=\"Boolean-operators\"></a>\n",
"### Boolean operators\n",
"\n",
"There is a boolean type in python that can be `True` or `False` (note the capitals). Other values can also be used for True or False (e.g., 1 for True; 0 or None or [] or {} or \"\") although they are not considered 'equal' in the sense that the operator `==` would consider them the same.\n",
"\n",
"Relevant boolean and comparison operators include: `not`, `and`, `or`, `==` and `!=`\n",
"\n",
"For example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"print('Not a is:', not a)\n",
"print('Not 1 is:', not 1)\n",
"print('Not 0 is:', not 0)\n",
"print('Not {} is:', not {})\n",
"print('{}==0 is:', {}==0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There is also the `in` test for strings, lists, etc:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print('the' in 'a number of words')\n",
"print('of' in 'a number of words')\n",
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a class=\"anchor\" id=\"If-statements\"></a>\n",
"### If statements\n",
"\n",
"The basic syntax of `if` statements is fairly standard, though don't forget that you _*must*_ indent the statements within the conditional/loop block as this is the way of delineating blocks of code in python. For example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],