diff --git a/applications/pandas/pandas.ipynb b/applications/pandas/pandas.ipynb
index 1245aaef9b4ee333e1bea90787b8a823193e0223..0d5a968b904704735f927dcb7ab57687087ff485 100644
--- a/applications/pandas/pandas.ipynb
+++ b/applications/pandas/pandas.ipynb
@@ -2,6 +2,7 @@
  "cells": [
   {
    "cell_type": "markdown",
+   "id": "9803940b",
    "metadata": {},
    "source": [
     "# Pandas\n",
@@ -23,6 +24,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "eb7f5417",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -38,6 +40,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "9f998231",
    "metadata": {},
    "source": [
     "> We will mostly be using `seaborn` instead of `matplotlib` for\n",
@@ -54,6 +57,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "7020257e",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -63,6 +67,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "6a17483f",
    "metadata": {},
    "source": [
     "This loads the data into a\n",
@@ -78,6 +83,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "4d0436b1",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -86,6 +92,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "173c767f",
    "metadata": {},
    "source": [
     "If you can not connect to the internet, you can run the command below to load\n",
@@ -95,6 +102,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "9a047455",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -104,6 +112,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "6b801a28",
    "metadata": {},
    "source": [
     "Note that the titanic dataset was also available to us as one of the standard\n",
@@ -113,6 +122,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "7be7954c",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -121,6 +131,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "112b9665",
    "metadata": {},
    "source": [
     "`Dataframes` can also be created from other python objects, using\n",
@@ -131,6 +142,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "de3236a1",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -145,18 +157,50 @@
   },
   {
    "cell_type": "markdown",
+   "id": "0761c3c8",
    "metadata": {},
    "source": [
+    "## A note on types\n",
+    "Each column in the pandas dataframe has its own data type, which can be:\n",
+    "- integer or float for numbers\n",
+    "- boolean for True/False\n",
+    "- datetime for defining specific times (and timedelta for durations)\n",
+    "- categorical, where each element is selected from a finite list of text values\n",
+    "- objects for anything else used for strings or columns with mixed elements\n",
+    "Each element in the column must match the type of the whole column. \n",
+    "When reading in a dataset pandas will try to assign the most specific type to each column.\n",
+    "Every pandas datatype also has support for missing data (which we will look more at below).\n",
+    "\n",
+    "One can check the type of each column using:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "398a2240",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "titanic.dtypes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "839cfc99",
+   "metadata": {},
+   "source": [
+    "Note that in much of python data types are referred to as dtypes.\n",
     "## Getting your data out\n",
-    "For many applications (e.g., ICA, machine learning) you might want to\n",
+    "For some applications you might want to\n",
     "extract your data as a numpy array, even though more and more projects \n",
-    "support pandas Dataframes directly. The underlying numpy array can be \n",
-    "accessed using the `to_numpy` method"
+    "support pandas Dataframes directly (including \"scikit-learn\"). \n",
+    "The underlying numpy array can be accessed using the `to_numpy` method"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "987ee604",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -165,16 +209,25 @@
   },
   {
    "cell_type": "markdown",
+   "id": "1d210770",
    "metadata": {},
    "source": [
-    "Note that the type of the returned array is the most common type (in this case\n",
-    "object). If you just want the numeric parts of the table you can use\n",
-    "`select_dtypes`, which selects specific columns based on their dtype:"
+    "Similarly to the `pandas` types discussed above,\n",
+    "`numpy` also requires all elements to have the same type.\n",
+    "However, `numpy` requires all elements in the whole array,\n",
+    "not just a single column to be the same type.\n",
+    "In this case this means that all data had to be converted\n",
+    "to the generic \"object\" type, which is not particularly useful.\n",
+    "\n",
+    "For most analyses, we would only be interested in the numeric columns.\n",
+    "Thise can be extracted using `select_dtypes`, which selects specific columns \n",
+    "based on their data type (dtype):"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "7836cb90",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -183,16 +236,26 @@
   },
   {
    "cell_type": "markdown",
+   "id": "85bbba82",
    "metadata": {},
    "source": [
-    "Note that the numpy array has no information on the column names or row indices.\n",
-    "Alternatively, when you want to include the categorical variables in your later\n",
-    "analysis (e.g., for machine learning), you can extract dummy variables using:"
+    "Now we get an array with a numeric type rather than the generic \"object\",\n",
+    "which is a lot more useful as we can now run math operations on the\n",
+    "resulting array (e.g., PCA).\n",
+    "\n",
+    "Finally, let's have a look at extracting categorical variables.\n",
+    "These are columns where each element has one of a finite list of possible values\n",
+    "(e.g., the \"embark_town\" column being \"Southampton\", \"Cherbourg\", or, \"Queenstown,\n",
+    "which are the three towns the Titanic docked to let on passengers).\n",
+    "As we will see below, `pandas` has extensive support for categorical values,\n",
+    "but many other tools do not. To support those tools, `pandas` allows you to \n",
+    "replace such columns with dummy variables:"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "60c7e8fc",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -201,8 +264,14 @@
   },
   {
    "cell_type": "markdown",
+   "id": "1defde14",
    "metadata": {},
    "source": [
+    "Note that rather than having a single \"embark_town\" column with a categorical type,\n",
+    "we now have three columns named \"embark_town_<name>\" with a 1 for every passenger\n",
+    "who embarked in that town. These numeric columns can then be fed into a GLM or\n",
+    "a machine learning algorithm.\n",
+    "\n",
     "## Accessing parts of the data\n",
     "\n",
     "[Documentation on indexing](http://pandas.pydata.org/pandas-docs/stable/indexing.html)\n",
@@ -215,6 +284,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "fa00ea38",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -223,6 +293,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "2bb923fa",
    "metadata": {},
    "source": [
     "If the column names is a valid python identifier (i.e., is a string that does not contain stuff like spaces)\n",
@@ -232,6 +303,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "acf0cfc6",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -240,6 +312,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "1bfb79b1",
    "metadata": {},
    "source": [
     "Note that this returns a single column is represented by a pandas\n",
@@ -253,6 +326,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "7c097dc4",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -261,6 +335,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "027848a7",
    "metadata": {},
    "source": [
     "Note that you have to provide a list here (square brackets). If you provide a\n",
@@ -271,6 +346,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "5be86a4a",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -279,6 +355,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "bbf62da6",
    "metadata": {},
    "source": [
     "In this case there is no column called `('class', 'alive')` leading to an\n",
@@ -290,6 +367,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "8fc35ce3",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -299,6 +377,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "766e1a41",
    "metadata": {},
    "source": [
     "We can delete a column using:"
@@ -307,6 +386,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "61d91bdf",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -315,6 +395,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "9ea81208",
    "metadata": {},
    "source": [
     "### Indexing rows by name or integer\n",
@@ -328,6 +409,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "e6263074",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -337,6 +419,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "0a6d1311",
    "metadata": {},
    "source": [
     "Note that the re-sorting did not change the values in the index (i.e., left-most\n",
@@ -348,6 +431,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "1016cb0b",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -356,6 +440,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "00bfb183",
    "metadata": {},
    "source": [
     "We can select the row with the index 0 using"
@@ -364,6 +449,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "63cb04fc",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -372,6 +458,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "b0bf4c58",
    "metadata": {},
    "source": [
     "Note that this gives the same passenger as the first row of the initial table\n",
@@ -381,6 +468,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "ece87592",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -389,6 +477,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "738c3b6c",
    "metadata": {},
    "source": [
     "Another common way to access the first or last N rows of a table is using the\n",
@@ -398,6 +487,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "8937c751",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -407,6 +497,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "0333e9cc",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -415,6 +506,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "e846457f",
    "metadata": {},
    "source": [
     "Note that nearly all methods in pandas return a new `DataFrame`, which means\n",
@@ -427,6 +519,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "84cc7089",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -435,6 +528,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "0611a8bf",
    "metadata": {},
    "source": [
     "> This chaining is usually very efficient, because when creating a new `DataFrame`\n",
@@ -446,6 +540,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "dbd982e7",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -454,6 +549,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "71a1e3a3",
    "metadata": {},
    "source": [
     "**Exercise**: use sorting and tail/head or indexing to find the 10 youngest\n",
@@ -464,6 +560,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "7272cae5",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -472,6 +569,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "c836a51b",
    "metadata": {},
    "source": [
     "### Indexing rows by value\n",
@@ -482,6 +580,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "7dbabecf",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -491,6 +590,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "e9503367",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -500,6 +600,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "edefb548",
    "metadata": {},
    "source": [
     "Note that this required typing `titanic` quite often.\n",
@@ -515,6 +616,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "0c710cfa",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -523,6 +625,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "d8545d1a",
    "metadata": {},
    "source": [
     "When selecting a categorical multiple options from a categorical values you \n",
@@ -532,6 +635,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "93a73be5",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -540,6 +644,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "129a27a3",
    "metadata": {},
    "source": [
     "Particularly useful when selecting data like this is the `isna` method which\n",
@@ -549,6 +654,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "66af0870",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -557,6 +663,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "ce7d05c3",
    "metadata": {},
    "source": [
     "This removing of missing numbers is so common that it has is own method"
@@ -565,6 +672,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "79d28611",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -574,6 +682,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "34ebdb36",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -582,6 +691,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "82ddcb59",
    "metadata": {},
    "source": [
     "**Exercise**: use sorting, indexing by value, `dropna` and `tail`/`head` or\n",
@@ -592,6 +702,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "1fe9d398",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -600,6 +711,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "c394e5ac",
    "metadata": {},
    "source": [
     "## Plotting the data\n",
@@ -611,6 +723,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "1d443d44",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -620,6 +733,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "8bd6d770",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -628,6 +742,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "dc8e64a9",
    "metadata": {},
    "source": [
     "To plot all variables simply call `plot` or `hist` on the full `DataFrame`\n",
@@ -638,6 +753,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "ab6c3514",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -646,6 +762,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "8140217e",
    "metadata": {},
    "source": [
     "Individual `Series` are essentially 1D arrays, so we can use them as such in\n",
@@ -655,6 +772,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "48a59b56",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -663,6 +781,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "07cf3584",
    "metadata": {},
    "source": [
     "However, for most purposes much nicer plots can be obtained using\n",
@@ -679,6 +798,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "8563a7a1",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -687,6 +807,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "0e752be0",
    "metadata": {},
    "source": [
     "**Exercise**: check the documentation from `sns.jointplot` (hover the mouse\n",
@@ -697,6 +818,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "4ccd4177",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -705,6 +827,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "4e513fbc",
    "metadata": {},
    "source": [
     "Here is just a brief example of how we can use multiple columns to illustrate\n",
@@ -714,6 +837,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "f5cb00af",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -723,6 +847,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "6ec94eac",
    "metadata": {},
    "source": [
     "**Exercise**: Split the plot above into two rows with the first row including\n",
@@ -734,6 +859,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "c6f6e763",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -743,6 +869,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "1d54e3ec",
    "metadata": {},
    "source": [
     "One of the nice thing of Seaborn is how easy it is to update how these plots\n",
@@ -754,6 +881,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "52183332",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -764,6 +892,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "ac35e133",
    "metadata": {},
    "source": [
     "## Summarizing the data (mean, std, etc.)\n",
@@ -776,6 +905,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "404be564",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -785,6 +915,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "bd6dd429",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -793,6 +924,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "3f0eaeb2",
    "metadata": {},
    "source": [
     "One very useful one is `describe`, which gives an overview of many common\n",
@@ -802,6 +934,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "e52493af",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -810,6 +943,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "7fd8fba3",
    "metadata": {},
    "source": [
     "Note that non-numeric columns are ignored when summarizing data in this way.\n",
@@ -822,6 +956,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "3fffbcb9",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -832,6 +967,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "d4c09639",
    "metadata": {},
    "source": [
     "We can also define our own functions to apply to the columns (in this case we\n",
@@ -841,6 +977,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "e1b90c3f",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -858,6 +995,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "f869c17f",
    "metadata": {},
    "source": [
     "We can also provide multiple functions to the `apply` method (note that\n",
@@ -867,6 +1005,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "2dd3d814",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -875,6 +1014,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "78e7e950",
    "metadata": {},
    "source": [
     "### Grouping by\n",
@@ -890,6 +1030,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "c271697e",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -899,6 +1040,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "5537b1e4",
    "metadata": {},
    "source": [
     "However, it is more often combined with one of the aggregation functions\n",
@@ -911,6 +1053,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "580a68d4",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -919,6 +1062,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "9a94b1c0",
    "metadata": {},
    "source": [
     "We can also group by multiple variables at once"
@@ -927,6 +1071,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "4d119923",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -935,6 +1080,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "9c5c1119",
    "metadata": {},
    "source": [
     "When grouping it can help to use the `cut` method to split a continuous variable\n",
@@ -944,6 +1090,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "e18ac0a4",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -952,6 +1099,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "0c3e2145",
    "metadata": {},
    "source": [
     "We can use the `aggregate` method to apply a different function to each series"
@@ -960,6 +1108,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "cf6abd30",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -968,6 +1117,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "eaca0a93",
    "metadata": {},
    "source": [
     "Note that both the index (on the left) and the column names (on the top) now\n",
@@ -981,6 +1131,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "79780e3b",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -990,6 +1141,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "fb15d602",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -999,6 +1151,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "e7ba4b48",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1007,6 +1160,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "5414e4c5",
    "metadata": {},
    "source": [
     "Remember that indexing based on the index was done through `loc`. The rest is\n",
@@ -1016,6 +1170,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "ed55f8ee",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1025,6 +1180,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "1376c35c",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1033,6 +1189,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "1289b2db",
    "metadata": {},
    "source": [
     "More advanced use of the `MultiIndex` is possible through `xs`:"
@@ -1041,6 +1198,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "472127b8",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1050,6 +1208,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "61e73d0b",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1058,6 +1217,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "7fc120ae",
    "metadata": {},
    "source": [
     "## Reshaping tables\n",
@@ -1069,6 +1229,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "6f3d1ccd",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1077,6 +1238,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "b9a13425",
    "metadata": {},
    "source": [
     "However, this single-column table is difficult to read. The reason for this is\n",
@@ -1091,6 +1253,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "c5cf521a",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1099,6 +1262,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "55d2c5a4",
    "metadata": {},
    "source": [
     "The former table, where the different groups are defined in different rows, is\n",
@@ -1117,6 +1281,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "69301d4e",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1127,6 +1292,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "d81eb236",
    "metadata": {},
    "source": [
     "> There are also many ways to produce prettier tables in pandas. \n",
@@ -1139,6 +1305,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "2ddece59",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1147,6 +1314,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "e730a134",
    "metadata": {},
    "source": [
     "The first argument is the numeric variable that will be summarised. \n",
@@ -1159,6 +1327,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "641a14bf",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1167,6 +1336,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "ee37ad6b",
    "metadata": {},
    "source": [
     "We can also change the function to be used to aggregate the data (by default the mean is computed)"
@@ -1175,6 +1345,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "a2ebc6c2",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1184,6 +1355,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "b43f10b8",
    "metadata": {},
    "source": [
     "As in `groupby` the aggregation function can be a string of a common aggregation\n",
@@ -1195,6 +1367,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "dc66e7c0",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1204,6 +1377,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "25a8f2f1",
    "metadata": {},
    "source": [
     "The opposite of `pivot_table` is `melt`. This can be used to change a wide-form\n",
@@ -1215,6 +1389,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "2d509083",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1228,6 +1403,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "59b81e11",
    "metadata": {},
    "source": [
     "This wide-form table (i.e., all the information is in different columns) makes\n",
@@ -1240,6 +1416,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "70e01ab3",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1249,6 +1426,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "4c942e3b",
    "metadata": {},
    "source": [
     "We can see that `melt` took all the columns (we could also have specified a\n",
@@ -1262,6 +1440,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "ee385ba6",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1272,6 +1451,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "6145528a",
    "metadata": {},
    "source": [
     "Finally we probably do want the FA and MD variables as different columns.\n",
@@ -1283,6 +1463,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "2fb60939",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1291,6 +1472,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "2932f74b",
    "metadata": {},
    "source": [
     "We can now use the tools discussed above to visualize the table (`seaborn`) or\n",
@@ -1300,6 +1482,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "621dfde3",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1308,6 +1491,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "12aa9cb8",
    "metadata": {},
    "source": [
     "In general pandas is better at handling long-form than wide-form data, because\n",
@@ -1321,6 +1505,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "acd7334f",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1329,6 +1514,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "146c04ca",
    "metadata": {},
    "source": [
     "## Linear fitting (`statsmodels`)\n",
@@ -1350,6 +1536,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "d9b6fae7",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1359,6 +1546,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "ef53b1cd",
    "metadata": {},
    "source": [
     "Note that `statsmodels` understands categorical variables and automatically\n",
@@ -1372,6 +1560,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
+   "id": "aa96dedf",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -1382,6 +1571,7 @@
   },
   {
    "cell_type": "markdown",
+   "id": "74d2b523",
    "metadata": {},
    "source": [
     "Cherbourg passengers clearly paid a lot more...\n",
@@ -1400,14 +1590,13 @@
     "Other useful features:\n",
     "- [Concatenating and merging tables](https://pandas.pydata.org/pandas-docs/stable/getting_started/intro_tutorials/08_combine_dataframes.html)\n",
     "- [Lots of time series support](https://pandas.pydata.org/pandas-docs/stable/getting_started/intro_tutorials/09_timeseries.html)\n",
-    "- [Rolling Window\n",
-    "  functions](http://pandas.pydata.org/pandas-docs/stable/computation.html#window-\n",
-    "  functions) for after you have meaningfully sorted your data\n",
+    "- [Rolling Window functions](http://pandas.pydata.org/pandas-docs/stable/computation.html#window-functions) \n",
+    "  for after you have meaningfully sorted your data\n",
     "- and much, much more"
    ]
   }
  ],
  "metadata": {},
  "nbformat": 4,
- "nbformat_minor": 4
+ "nbformat_minor": 5
 }
diff --git a/applications/pandas/pandas.md b/applications/pandas/pandas.md
index 76dd30c74254105c8e62cb160bcf713a3822b89c..c627103c42fcf4a47877b6892feeb4d14eafdcad 100644
--- a/applications/pandas/pandas.md
+++ b/applications/pandas/pandas.md
@@ -80,31 +80,63 @@ pd.DataFrame.from_dict({
     'constant_value': 'same_value'
 })
 ```
+
+## A note on types
+Each column in the pandas dataframe has its own data type, which can be:
+- integer or float for numbers
+- boolean for True/False
+- datetime for defining specific times (and timedelta for durations)
+- categorical, where each element is selected from a finite list of text values
+- objects for anything else used for strings or columns with mixed elements
+Each element in the column must match the type of the whole column. 
+When reading in a dataset pandas will try to assign the most specific type to each column.
+Every pandas datatype also has support for missing data (which we will look more at below).
+
+One can check the type of each column using:
+```
+titanic.dtypes
+```
+Note that in much of python data types are referred to as dtypes.
 ## Getting your data out
-For many applications (e.g., ICA, machine learning) you might want to
+For some applications you might want to
 extract your data as a numpy array, even though more and more projects 
-support pandas Dataframes directly. The underlying numpy array can be 
-accessed using the `to_numpy` method
-
+support pandas Dataframes directly (including "scikit-learn"). 
+The underlying numpy array can be accessed using the `to_numpy` method
 ```
 titanic.to_numpy()
 ```
 
-Note that the type of the returned array is the most common type (in this case
-object). If you just want the numeric parts of the table you can use
-`select_dtypes`, which selects specific columns based on their dtype:
+Similarly to the `pandas` types discussed above,
+`numpy` also requires all elements to have the same type.
+However, `numpy` requires all elements in the whole array,
+not just a single column to be the same type.
+In this case this means that all data had to be converted
+to the generic "object" type, which is not particularly useful.
 
+For most analyses, we would only be interested in the numeric columns.
+Thise can be extracted using `select_dtypes`, which selects specific columns 
+based on their data type (dtype):
 ```
 titanic.select_dtypes(include=np.number).to_numpy()
 ```
+Now we get an array with a numeric type rather than the generic "object",
+which is a lot more useful as we can now run math operations on the
+resulting array (e.g., PCA).
 
-Note that the numpy array has no information on the column names or row indices.
-Alternatively, when you want to include the categorical variables in your later
-analysis (e.g., for machine learning), you can extract dummy variables using:
-
+Finally, let's have a look at extracting categorical variables.
+These are columns where each element has one of a finite list of possible values
+(e.g., the "embark_town" column being "Southampton", "Cherbourg", or, "Queenstown,
+which are the three towns the Titanic docked to let on passengers).
+As we will see below, `pandas` has extensive support for categorical values,
+but many other tools do not. To support those tools, `pandas` allows you to 
+replace such columns with dummy variables:
 ```
 pd.get_dummies(titanic)
 ```
+Note that rather than having a single "embark_town" column with a categorical type,
+we now have three columns named "embark_town_<name>" with a 1 for every passenger
+who embarked in that town. These numeric columns can then be fed into a GLM or
+a machine learning algorithm.
 
 ## Accessing parts of the data
 
@@ -695,7 +727,6 @@ Not all data is well represented by a 2D table. If you want more dimensions to f
 Other useful features:
 - [Concatenating and merging tables](https://pandas.pydata.org/pandas-docs/stable/getting_started/intro_tutorials/08_combine_dataframes.html)
 - [Lots of time series support](https://pandas.pydata.org/pandas-docs/stable/getting_started/intro_tutorials/09_timeseries.html)
-- [Rolling Window
-  functions](http://pandas.pydata.org/pandas-docs/stable/computation.html#window-
-  functions) for after you have meaningfully sorted your data
+- [Rolling Window functions](http://pandas.pydata.org/pandas-docs/stable/computation.html#window-functions) 
+  for after you have meaningfully sorted your data
 - and much, much more
\ No newline at end of file