diff --git a/getting_started/basics.ipynb b/getting_started/basics.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..9facb873c7f07175e4c0508cfb7b244ce956e607 --- /dev/null +++ b/getting_started/basics.ipynb @@ -0,0 +1,651 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Basic types\n", + "\n", + "Python has many different types and variables are dynamic and can change types (like MATLAB). Some of the most commonly used in-built types are:\n", + "* integer and floating point scalars\n", + "* strings\n", + "* tuples\n", + "* lists\n", + "* dictionary\n", + "\n", + "N-dimensional arrays and other types are supported through common modules (e.g., numpy, scipy, scikit-learn). These will be covered in a subsequent exercise." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = 4\n", + "b = 3.6\n", + "c = 'abc'\n", + "d = [10,20,30]\n", + "e = {'a' : 10, 'b': 20}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Any variable can be printed using the function `print()`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(d)\n", + "print(e)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> _*Python 3 versus python 2*_:\n", + ">\n", + "> Print - for the print statement the brackets are compulsory for *python 3*, but are optional in python 2. So you will see plenty of code without the brackets but you should get into the habit of using them.\n", + ">\n", + "> Division - in python 3 all division is floating point (like in MATLAB), even if the values are integers, but in python 2 integer division works like it does in C.\n", + "\n", + "---\n", + "\n", + "## Strings\n", + "\n", + "Strings can be specified using single quotes *or* double quotes - as long as they are matched.\n", + "Strings can be dereferenced like lists (see later).\n", + "\n", + "For example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s1=\"test string\"\n", + "s2='another test string'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also use triple quotes to capture multi-line strings. For example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s3='''This is\n", + "a string over\n", + "multiple lines\n", + "'''\n", + "print(s3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Tuples and Lists\n", + "\n", + "A tuple is like a list or a vector, but with less flexibility than a full list, however anything can be stored in either a list or tuple, without any consistency being required. For example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "xtuple=(3, 7.6, 'str')\n", + "xlist=[1,'mj',-5.4]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "They can also be nested:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "x2=(xtuple,xlist)\n", + "x3=[xtuple,xlist]\n", + "print(x2)\n", + "print(x3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Adding to a list\n", + "\n", + "This is easy:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = [10,20,30]\n", + "a = a + [70]\n", + "a += [80]\n", + "print(a)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Dereferencing\n", + "\n", + "Square brackets are used to dereference tuples, lists, dictionaries, etc. For example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "d = [10,20,30]\n", + "print(d[1])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> _*Pitfall:*_\n", + "> Python uses zero-based indexing, unlike MATLAB" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a=[10,20,30,40,50,60]\n", + "print(a[0])\n", + "print(a[2])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Indices naturally run from 0 to N-1, _but_ negative numbers can be used to reference from the end (circular wrap-around)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(a[-1])\n", + "print(a[-6])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "However, this is only true for -1 to -N. Outside of -N to N-1 will generate an `index out of range` error." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(a[-7])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(a[6])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Length of a tuple or list is given by the `len()` function:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(len(a))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Nested lists can have nested dereferences:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "b=[[10,20,30],[40,50,60]]\n", + "print(b[0][1])\n", + "print(b[1][0])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "but *not* a dereference like b[0,1].\n", + "\n", + "> Note that `len` will only give the length of the top level.\n", + "> In general arrays should be preferred to nested lists when the contents are numerical.\n", + "\n", + "### Slicing\n", + "\n", + "A range of values for the indices can be specified to extract values from a list. For example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(a[0:3])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> _*Pitfall:*_\n", + ">\n", + "> Slicing syntax is different from MATLAB in that second number is one plus final index - this is in addition to the zero index difference." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a=[10,20,30,40,50,60]\n", + "print(a[0:3]) # same as a(1:3) in MATLAB\n", + "print(a[1:3]) # same as a(2:3) in MATLAB" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> _*Pitfall:*_\n", + ">\n", + "> Unlike in MATLAB, you cannot use a list as indices instead of an integer or a slice" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "b=[3,4]\n", + "print(a[b])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### List operations\n", + "\n", + "Multiplication can be used with lists, where multiplication implements replication." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "d=[10,20,30]\n", + "print(d*4)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There are also other operations such as:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "d.append(40)\n", + "print(d)\n", + "d.remove(20)\n", + "print(d)\n", + "d.pop(0)\n", + "print(d)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Looping over elements in a list (or tuple)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "d=[10,20,30]\n", + "for x in d:\n", + " print(x)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> Note that the indentation within the loop is _*crucial*_. All python control blocks are delineated purely by indentation.\n", + "\n", + "### Getting help\n", + "\n", + "The function `dir()` can be used to get information about any variable/object/function in python. It lists the possible operations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dir(d)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> Note that google is often more helpful!\n", + "\n", + "---\n", + "\n", + "## Dictionaries\n", + "\n", + "These store key-value pairs. For example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "e = {'a' : 10, 'b': 20}\n", + "print(len(e))\n", + "print(e.keys())\n", + "print(e.values())\n", + "print(e['a'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The keys and values can take on any type, even dictionaries! Python is nothing if not flexible. However, each key must be unique.\n", + "\n", + "### Adding to a dictionary\n", + "\n", + "This is very easy:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "e['c']=555 # just like in Biobank! ;)\n", + "print(e)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Removing elements from a dictionary\n", + "\n", + "There are two main approaches - `pop` and `del`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "e.pop('b')\n", + "print(e)\n", + "del e['c']\n", + "print(e)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Looping over everything in a dictionary\n", + "\n", + "Several variables can jointly work as loop variables in python, which is very convenient. For example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "e = {'a' : 10, 'b': 20, 'c':555}\n", + "for k,v in e.items():\n", + " print((k,v))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The print statement here constructs a tuple, which is often used in python.\n", + "\n", + "Another option is:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for k in e:\n", + " print((k,e[k]))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> Note that in both cases the order is arbitrary. The `sorted` function can be used if you want keys in a sorted order; e.g. `for k in sorted(e):` ...\n", + "\n", + "---\n", + "\n", + "## Copying and references \n", + "\n", + "In python there are immutable types (e.g. numbers) and mutable types (e.g. lists). The main thing to know is that assignment can sometimes create separate copies and sometimes create references (as in C++). In general, the more complicated types are assigned via references. For example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = 7\n", + "b = a\n", + "a = 2348\n", + "print(b)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As opposed to:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = [7]\n", + "b = a\n", + "a[0] = 8888\n", + "print(b)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "But if an operation is performed then a copy might be made:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = [7]\n", + "b = a*2\n", + "a[0] = 8888\n", + "print(b)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If an explicit copy is necessary then this can be made using the `copy()` function:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "a = [7]\n", + "b = a.copy()\n", + "a[0] = 8888\n", + "print(b)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> When writing functions this is something to be particularly careful about." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def foo1(x):\n", + " x.append(10)\n", + "def foo2(x):\n", + " x = x + [10]\n", + "def foo3(x):\n", + " return x + [10]\n", + "\n", + "a=[5]\n", + "print(a)\n", + "foo1(a)\n", + "print(a)\n", + "foo2(a)\n", + "print(a)\n", + "foo3(a)\n", + "print(a)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Control flow\n", + "\n", + " - boolean operators\n", + " - if/else/for\n", + " - a if condition else b\n", + " - introduce range/enumerate" + ] + } + ], + "metadata": {}, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/getting_started/basics.md b/getting_started/basics.md index 4dbbac10e0e1a8639e8261e36d7e0c963c3a0e31..44da7c938fa178738cde1f5884ed43049e1f85f3 100644 --- a/getting_started/basics.md +++ b/getting_started/basics.md @@ -7,7 +7,7 @@ Python has many different types and variables are dynamic and can change types ( * lists * dictionary -N-dimensional arrays and other types are supported through common modules (e.g., numpy, scipy, scikit-learn). +N-dimensional arrays and other types are supported through common modules (e.g., numpy, scipy, scikit-learn). These will be covered in a subsequent exercise. ``` a = 4 @@ -17,15 +17,19 @@ d = [10,20,30] e = {'a' : 10, 'b': 20} ``` -Any variable can be printed using the function: +Any variable can be printed using the function `print()`: ``` -print(...) +print(d) +print(e) ``` -> Python 3 versus python 2: +> _*Python 3 versus python 2*_: +> > Print - for the print statement the brackets are compulsory for *python 3*, but are optional in python 2. So you will see plenty of code without the brackets but you should get into the habit of using them. +> > Division - in python 3 all division is floating point (like in MATLAB), even if the values are integers, but in python 2 integer division works like it does in C. +--- ## Strings @@ -45,112 +49,258 @@ a string over multiple lines ''' print(s3) -This is -a string over -multiple lines ``` +--- ## Tuples and Lists -Anything can be stored within a list and consistency is not required. For example: +A tuple is like a list or a vector, but with less flexibility than a full list, however anything can be stored in either a list or tuple, without any consistency being required. For example: +``` +xtuple=(3, 7.6, 'str') +xlist=[1,'mj',-5.4] +``` + +They can also be nested: +``` +x2=(xtuple,xlist) +x3=[xtuple,xlist] +print(x2) +print(x3) +``` + +### Adding to a list + +This is easy: ``` -a=[1,'mj',-5.4] +a = [10,20,30] +a = a + [70] +a += [80] +print(a) ``` ### Dereferencing -Square brackets are used to dereference lists, dictionaries, etc. For example: +Square brackets are used to dereference tuples, lists, dictionaries, etc. For example: ``` d = [10,20,30] -d[1] - 20 - ``` - - ---- use of -1 as an index value +print(d[1]) +``` -> Pitfall: +> _*Pitfall:*_ > Python uses zero-based indexing, unlike MATLAB -> * a=[10,20,30,40,50,60] -> * a[0] -> 10 -> * a[2] -> 30 + +``` +a=[10,20,30,40,50,60] +print(a[0]) +print(a[2]) +``` + +Indices naturally run from 0 to N-1, _but_ negative numbers can be used to reference from the end (circular wrap-around). +``` +print(a[-1]) +print(a[-6]) +``` + +However, this is only true for -1 to -N. Outside of -N to N-1 will generate an `index out of range` error. +``` +print(a[-7]) +``` +``` +print(a[6]) +``` + +Length of a tuple or list is given by the `len()` function: +``` +print(len(a)) +``` + +Nested lists can have nested dereferences: +``` +b=[[10,20,30],[40,50,60]] +print(b[0][1]) +print(b[1][0]) +``` +but *not* a dereference like b[0,1]. + +> Note that `len` will only give the length of the top level. +> In general arrays should be preferred to nested lists when the contents are numerical. + +### Slicing + +A range of values for the indices can be specified to extract values from a list. For example: +``` +print(a[0:3]) +``` + +> _*Pitfall:*_ +> +> Slicing syntax is different from MATLAB in that second number is one plus final index - this is in addition to the zero index difference. + +``` +a=[10,20,30,40,50,60] +print(a[0:3]) # same as a(1:3) in MATLAB +print(a[1:3]) # same as a(2:3) in MATLAB +``` + +> _*Pitfall:*_ +> +> Unlike in MATLAB, you cannot use a list as indices instead of an integer or a slice + +``` +b=[3,4] +print(a[b]) +``` ### List operations -Addition and multiplication can be used with lists, where multiplication implements replication. +Multiplication can be used with lists, where multiplication implements replication. ``` d=[10,20,30] -d*2 - [10, 20, 30, 10, 20, 30] +print(d*4) ``` There are also other operations such as: ``` d.append(40) +print(d) +d.remove(20) +print(d) +d.pop(0) +print(d) +``` + +### Looping over elements in a list (or tuple) ``` +d=[10,20,30] +for x in d: + print(x) +``` -USE DIR() TO GET MORE INFO FOR ANY PARTICULAR OBJECT - OR GOOGLE! +> Note that the indentation within the loop is _*crucial*_. All python control blocks are delineated purely by indentation. + +### Getting help + +The function `dir()` can be used to get information about any variable/object/function in python. It lists the possible operations. + +``` +dir(d) +``` + +> Note that google is often more helpful! + +--- ## Dictionaries +These store key-value pairs. For example: +``` +e = {'a' : 10, 'b': 20} +print(len(e)) +print(e.keys()) +print(e.values()) +print(e['a']) +``` + +The keys and values can take on any type, even dictionaries! Python is nothing if not flexible. However, each key must be unique. +### Adding to a dictionary + +This is very easy: +``` +e['c']=555 # just like in Biobank! ;) +print(e) +``` + + +### Removing elements from a dictionary + +There are two main approaches - `pop` and `del`: +``` +e.pop('b') +print(e) +del e['c'] +print(e) +``` + +### Looping over everything in a dictionary + +Several variables can jointly work as loop variables in python, which is very convenient. For example: +``` +e = {'a' : 10, 'b': 20, 'c':555} +for k,v in e.items(): + print((k,v)) +``` -## Combinations +The print statement here constructs a tuple, which is often used in python. -Can nest these arbitrarily without needing consistency. For example: +Another option is: ``` -a=[ [3,5,7] , ['a','e','i','o','u'] , { } ] +for k in e: + print((k,e[k])) ``` +> Note that in both cases the order is arbitrary. The `sorted` function can be used if you want keys in a sorted order; e.g. `for k in sorted(e):` ... +--- ## Copying and references -** demonstrate difference between mutable and immutable types +In python there are immutable types (e.g. numbers) and mutable types (e.g. lists). The main thing to know is that assignment can sometimes create separate copies and sometimes create references (as in C++). In general, the more complicated types are assigned via references. For example: +``` a = 7 b = a a = 2348 print(b) +``` +As opposed to: +``` a = [7] b = a a[0] = 8888 print(b) +``` +But if an operation is performed then a copy might be made: +``` +a = [7] +b = a*2 +a[0] = 8888 +print(b) +``` -> Pitfall: -> Slicing syntax is also different from MATLAB (second number is one plus final index) -> * a=[10,20,30,40,50,60] -> * a[0:3] # same as a(1:3) in MATLAB -> [10, 20, 30] -> * a[1:3] # same as a(2:3) in MATLAB -> [20, 30] - -> Pitfall: -> Cannot use a list as indices instead of an integer or a slice -> * b=[3,4] -> * a[b] -> TypeError: list indices must be integers or slices, not list - - +If an explicit copy is necessary then this can be made using the `copy()` function: +``` +a = [7] +b = a.copy() +a[0] = 8888 +print(b) +``` +> When writing functions this is something to be particularly careful about. -def foo(x): +``` +def foo1(x): x.append(10) -def foo(x): +def foo2(x): x = x + [10] - -def foo(x): +def foo3(x): return x + [10] -def foo(y): - y = y + 1 -def foo(y): - return y + 1 +a=[5] +print(a) +foo1(a) +print(a) +foo2(a) +print(a) +foo3(a) +print(a) +``` + +--- ## Control flow