"Adapted from the `scikit-learn` \"Getting Started\" documetation: https://scikit-learn.org/stable/getting_started.html"
"Adapted from the `scikit-learn` \"Getting Started\" documetation: https://scikit-learn.org/stable/getting_started.html\n",
"\n",
"Three important concepts to understand for using `scikit-learn`:\n",
"1. `estimator` objects and their `fit` and `predict` methods for fitting data and making predictions\n",
"2. `tranformer` objects for pre/post-processing transforms\n",
"3. `pipeline` objects for chaining together `transformers` and `estimators` into a machine learning pipeline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Fitting and predicting: estimator basics\n",
"<a class=\"anchor\" id=\"estimators\"></a>\n",
"### Estimators: fitting and predicting\n",
"\n",
"Each machine learning model in `scikit-learn` is implemented as an [`estimator`](https://scikit-learn.org/stable/glossary.html#term-estimators) object. Here we instantiate [Random Forest](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier) `estimator`:"
]
...
...
@@ -184,18 +193,104 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Transformers and pre-processors\n",
"<a class=\"anchor\" id=\"transformers\"></a>\n",
"### Transformers\n",
"\n",
"The `transformer` is a special object that follows the same API as an `estimator` and allows you to apply pre-processing and/or post-procssing transform to the data in your machine learning pipeline. The `transformer` object has a `transform` method insted of a `predict` method.\n",
"\n",
"In this example we use a `transformer` to standarise the features (e.g. remove mean and scale to unit variance). The `fit` method calculate the mean and variance parameters from the data, and the `transform` method will to the scaling."
"The `transformer` is a special object that follows the same API as an `estimator` and allows you to apply pre-processing and/or post-procssing transform to the data in your machine learning pipeline."
"# apply transform\n",
"scaler.transform(X)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pipelines: chaining pre-processors and estimators\n",
"<a class=\"anchor\" id=\"pipelines\"></a>\n",
"### Pipelines: chaining transforms and estimators\n",
"\n",
"TBD"
"A typical machine learning pipeline will often involve numerous pre-processing transforms and an estimator. The `pipeline` object can be used to combine `transformer` and `estimator` objects into a single object that has an API that is the same as a regular `estimator`. A pipeline is constructed with the `make_pipeline` function. \n",
"\n",
"In this example we create a very simple `pipeline` that is comprised of a StandardScaler `transform` and a Random Forest `estimator`"
*[Estimators: fitting and predicting](#estimators)
*[Transformers](#transformers)
*[Pipelines: chaining transforms and estimators](#pipelines)
%% Cell type:markdown id: tags:
<aclass="anchor"id="getting-started"></a>
## Getting Started
Adapted from the `scikit-learn` "Getting Started" documetation: https://scikit-learn.org/stable/getting_started.html
Three important concepts to understand for using `scikit-learn`:
1.`estimator` objects and their `fit` and `predict` methods for fitting data and making predictions
2.`tranformer` objects for pre/post-processing transforms
3.`pipeline` objects for chaining together `transformers` and `estimators` into a machine learning pipeline
%% Cell type:markdown id: tags:
### Fitting and predicting: estimator basics
<aclass="anchor"id="estimators"></a>
### Estimators: fitting and predicting
Each machine learning model in `scikit-learn` is implemented as an [`estimator`](https://scikit-learn.org/stable/glossary.html#term-estimators) object. Here we instantiate [Random Forest](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier)`estimator`:
%% Cell type:code id: tags:
``` python
# import RF estimator
fromsklearn.ensembleimportRandomForestClassifier
# instantiate RF estimator
clf=RandomForestClassifier(random_state=0)
print(clf)
```
%% Output
RandomForestClassifier(random_state=0)
%% Cell type:markdown id: tags:
After creation, the `estimator` can be **fit** to the training data using the `fit` method.
The fit method accepts 2 inputs:
1.`X` is training input samples of shape (`n_samples`, `n_features`). Thus, rows=samples and columns=features
2.`y` is the target values (class labels in classification, real numbers in regression) of shape (`n_samples`,) for one output, or (`n_samples`, `n_outputs`) for multiple outputs.
> Note: Both `X` and `y` are usually are numpy arrays or equivalent array-like data types.
%% Cell type:code id: tags:
``` python
# create some toy data
X=[[1,2,3],# 2 samples, 3 features
[11,12,13]]
y=[0,1]# classes of each sample
# fit the model to the data
clf.fit(X,y)
```
%% Output
RandomForestClassifier(random_state=0)
%% Cell type:markdown id: tags:
Once trained, the `estimator` can make predictions on new data using the `predict` method.
%% Cell type:code id: tags:
``` python
# predict classes of new data
clf.predict([[4,5,6],[14,15,16]])
```
%% Output
array([0, 1])
%% Cell type:markdown id: tags:
Importantly, this `fit` and `predict` interface is consistent across different estimators making it very easy to change estimators within your code. For example, here we swap the Random Forest `estimator` for a Support Vector Machine `estimator`:
%% Cell type:code id: tags:
``` python
# import an SVM estimator
fromsklearn.svmimportSVC
# instantiate SVM estimator
clf=SVC(random_state=0)
# fit and predict
clf.fit(X,y)
clf.predict([[4,5,6],[14,15,16]])
```
%% Output
array([0, 1])
%% Cell type:markdown id: tags:
### Transformers and pre-processors
<aclass="anchor"id="transformers"></a>
### Transformers
The `transformer` is a special object that follows the same API as an `estimator` and allows you to apply pre-processing and/or post-procssing transform to the data in your machine learning pipeline. The `transformer` object has a `transform` method insted of a `predict` method.
In this example we use a `transformer` to standarise the features (e.g. remove mean and scale to unit variance). The `fit` method calculate the mean and variance parameters from the data, and the `transform` method will to the scaling.
%% Cell type:code id: tags:
``` python
# import StandardScaler transformer
fromsklearn.preprocessingimportStandardScaler
# create some toy data
X=[[0,15],
[1,-10]]
# instantiate StandardScaler transformer
scaler=StandardScaler()
The `transformer` is a special object that follows the same API as an `estimator` and allows you to apply pre-processing and/or post-procssing transform to the data in your machine learning pipeline.
# fit the transform to the data
scaler.fit(X)
# apply transform
scaler.transform(X)
```
%% Output
array([[-1., 1.],
[ 1., -1.]])
%% Cell type:markdown id: tags:
### Pipelines: chaining pre-processors and estimators
<aclass="anchor"id="pipelines"></a>
### Pipelines: chaining transforms and estimators
TBD
A typical machine learning pipeline will often involve numerous pre-processing transforms and an estimator. The `pipeline` object can be used to combine `transformer` and `estimator` objects into a single object that has an API that is the same as a regular `estimator`. A pipeline is constructed with the `make_pipeline` function.
In this example we create a very simple `pipeline` that is comprised of a StandardScaler `transform` and a Random Forest `estimator`