Commit e61172c4 authored by Paul McCarthy's avatar Paul McCarthy 🚵
Browse files

Merge branch 'rf/combine-expressions' into 'master'

Rf/combine expressions

See merge request !28
parents 386e2f38 215b03ad
......@@ -2,8 +2,8 @@ FUNPACK changelog
=================
1.5.0 (Under development)
-------------------------
1.5.0 (Monday 9th December 2019)
--------------------------------
Added
......@@ -14,10 +14,16 @@ Added
this is simply a wrapper around the UNIX ``wc`` tool.
* New :func:`.util.cat` function to concatenate multiple files together;
this is simply a wrapper around the UNIX ``cat`` tool.
* New :func:`.util.inMainProcess` function so a process can determine whether
it is the main process or a worker process.
* New :meth:`.DataTable.subtable` and :meth:`.DataTable.merge` methods, to aid
in passing data to/from worker processes.
* Processing functions can now be specified to run independently on a subset
of variables by using ``'independent'`` in the variable list.
* New ``any`` and ``all`` operations which can be used in conditional
statements to control how the conditional results are combined across
multiple columns for one variable. These can be used with the ``--subject``
option.
Changed
......
......@@ -6,7 +6,7 @@
#
__version__ = '1.5.0.dev0'
__version__ = '1.5.0'
"""The ``funpack`` versioning scheme roughly follows Semantic Versioning
conventions.
"""
......
......@@ -55,11 +55,13 @@ The *equal to* and *not equal to* operators may be used with a value of
``'na'`` to test whether the values for a variable are missing or present
respectively.
Multiple conditional statements may be combined with ``and``, ``or``, and
``not`` logical operations (specific symbols can be found in the
:attr:`SYMBOLS` dictionary), and precedence may be enforced with the use of
round brackets.
The ``any`` and ``all`` operations can be applied to statements which have
been evaluated on multiple columns to combine the results column-wise.
"""
......@@ -78,6 +80,8 @@ SYMBOLS = {
'and' : '&&',
'or' : '||',
'not' : '~',
'any' : 'any',
'all' : 'all',
'eq' : '==',
'ne' : '!=',
'lt' : '<',
......@@ -132,9 +136,10 @@ class Expression(object):
:arg dtable: The :class:`.DataTable` containing the data.
:arg data: Dictionary containing ``{ variable : column_name }``
:arg data: Dictionary containing ``{ variable : [column_name] }``
mappings from the variables used in the expressions to
columns in ``dtable``.
columns in ``dtable``. Each mapping may also contain a
single column name, instead of a list.
:returns: The outcome of the expression - ``True`` or ``False``.
"""
......@@ -250,8 +255,9 @@ def parseExpression(expr):
Expression functions have a few attributes containing metadata about the
expression:
- ``ftype`` contains the expression type, either ``logical_not``,
``logical`` (for *and*/*or* operations), or ``condition`` (for
- ``ftype`` contains the expression type, either ``unary`` (for *not*,
*any* and *all* operations),
``binary`` (for *and*/*or* operations), or ``condition`` (for
comparison operations)
- ``operation`` contains the operation symbol
......@@ -284,13 +290,13 @@ def variablesInExpression(expr):
if expr.ftype == 'condition':
return set([expr.variable])
elif expr.ftype == 'logical':
elif expr.ftype == 'binary':
variables = set()
variables.update(variablesInExpression(expr.operand1))
variables.update(variablesInExpression(expr.operand2))
return variables
elif expr.ftype == 'logical_not':
elif expr.ftype == 'unary':
return variablesInExpression(expr.operand)
......@@ -306,6 +312,8 @@ def makeParser():
CMP = ['eq', 'ne', 'lt', 'le', 'gt', 'ge']
CMP = pp.oneOf([SYMBOLS[c] for c in CMP])
EQS = pp.oneOf([SYMBOLS[c] for c in ['eq', 'ne']])
ANY = pp.CaselessLiteral(SYMBOLS['any'])
ALL = pp.CaselessLiteral(SYMBOLS['all'])
AND = pp.CaselessLiteral(SYMBOLS['and'])
OR = pp.CaselessLiteral(SYMBOLS['or'])
NOT = pp.CaselessLiteral(SYMBOLS['not'])
......@@ -321,12 +329,15 @@ def makeParser():
COND = NUMCOND ^ NACOND
# the infixNotation helper does the heavy
# lifting for boolean operations and precedence
# lifting for boolean/combine operations
# and precedence
parser = pp.infixNotation(
COND,
[(NOT, 1, pp.opAssoc.RIGHT, parseLogicalNot),
(AND, 2, pp.opAssoc.LEFT , parseLogical),
(OR, 2, pp.opAssoc.LEFT, parseLogical)])
[(NOT, 1, pp.opAssoc.RIGHT, parseUnary),
(ANY, 1, pp.opAssoc.RIGHT, parseUnary),
(ALL, 1, pp.opAssoc.RIGHT, parseUnary),
(AND, 2, pp.opAssoc.LEFT , parseBinary),
(OR, 2, pp.opAssoc.LEFT, parseBinary)])
makeParser.parser = parser
return parser
......@@ -339,16 +350,29 @@ def parseVariable(toks):
return toks[1]
def _not(op, *args): return ~ op(*args) # noqa
# conditionals are constructed to produce
# numpy arrays, so that is what these
# operations expect as inputs
def _not(op, *args): return ~op(*args) # noqa
def _and(op1, op2, *args): return op1(*args) & op2(*args) # noqa
def _or( op1, op2, *args): return op1(*args) | op2(*args) # noqa
def _any(op, *args):
result = op(*args)
if len(result.shape) == 2: return result.any(axis=1)
else: return result
def _all(op, *args):
result = op(*args)
if len(result.shape) == 2: return result.all(axis=1)
else: return result
def parseLogicalNot(toks):
"""Called by the parser created by :func:`makeParser`. Parses an expression
of the form ``not expression``, where ``not`` is the corresponding symbol
in the :attr:`OPERATIONS` dictionary, and ``expression`` is a conditional
statement or logical expression.
def parseUnary(toks):
"""Called by the parser created by :func:`makeParser`. Parses an expression of
the form ``[not|any|all] expression``, where ``not``/``any``/``all`` is
the corresponding symbol in the :attr:`SYMBOLS` dictionary, and
``expression`` is a conditional statement or logical expression.
Returns a function which can be used to evaluate the expression.
"""
......@@ -356,20 +380,24 @@ def parseLogicalNot(toks):
operation = toks[0][0]
operand = toks[0][1]
log.debug('Parsing logical: %s %s', operation, operand)
log.debug('Parsing unary: %s %s', operation, operand)
fn = {SYMBOLS['not'] : _not,
SYMBOLS['any'] : _any,
SYMBOLS['all'] : _all}[operation]
fn = ft.partial(_not, operand)
fn.ftype = 'logical_not'
fn = ft.partial(fn, operand)
fn.ftype = 'unary'
fn.operation = operation
fn.operand = operand
return fn
def parseLogical(toks):
def parseBinary(toks):
"""Called by the parser created by :func:`makeParser`. Parses an
expression of the form ``expression1 [and|or] expression2``, where
``and``/``or`` are the corresponding symbols in the :attr:`OPERATIONS`
``and``/``or`` are the corresponding symbols in the :attr:`SYMBOLS`
dictionary, and ``expression1`` and ``expression2`` are conditional
statements or logical expression.
......@@ -386,7 +414,7 @@ def parseLogical(toks):
elif operation == SYMBOLS['or']: fn = _or
fn = ft.partial(fn, operand1, operand2)
fn.ftype = 'logical'
fn.ftype = 'binary'
fn.operation = operation
fn.operand1 = operand1
fn.operand2 = operand2
......@@ -404,6 +432,20 @@ def _lt( var, val, dt, data): return dt[:, data[var]] < val # noqa
def _le( var, val, dt, data): return dt[:, data[var]] <= val # noqa
def _asarray(func, *args):
"""Calls ``func``, passing it ``*args``.
The return value of ``func`` is assumed to be a ``pandas.DataFrame``.
Its contents are converted to a ``numpy`` array.
This function is used by :func:`parseCondition` to construct functions
for evaluating conditional statements.
"""
# DataFrame.to_numpy is only
# available in pandas >= 0.24
return func(*args).to_numpy()
def parseCondition(toks):
"""Parses a conditional statement of the form::
......@@ -415,6 +457,8 @@ def parseCondition(toks):
- ``value`` is a numeric value
Returns a function which can be used to evaluate the conditional statement.
The function is constructed such that it expects a ``pandas.DataFrame``,
and will output a boolean ``numpy`` array.
"""
toks = toks[0]
variable = toks[0]
......@@ -432,7 +476,7 @@ def parseCondition(toks):
elif operation == SYMBOLS['le']: fn = _le
elif operation == SYMBOLS['lt']: fn = _lt
fn = ft.partial(fn, variable, value)
fn = ft.partial(_asarray, fn, variable, value)
fn.ftype = 'condition'
fn.operation = operation
fn.variable = variable
......
......@@ -9,7 +9,6 @@ the data importing stage of the ``funpack`` sequence
"""
import os.path as op
import itertools as it
import functools as ft
import multiprocessing.dummy as mpd
......@@ -258,56 +257,98 @@ def removeSubjects(dtable, exclude=None, exprs=None):
else: mask = np.zeros(orignrows, dtype=np.bool)
if exprs is not None:
# Parse the expressions, and get a
# list of all variables that are
# mentioned in them.
exprs = list(it.chain(*[e.split(',') for e in exprs]))
exprs = [expression.Expression(e) for e in exprs]
vids = list(set(it.chain(*[e.variables for e in exprs])))
# list of the variables that are
# mentioned in each of them,
exprs = list(it.chain(*[e.split(',') for e in exprs]))
exprs = [expression.Expression(e) for e in exprs]
vids = [list(e.variables) for e in exprs]
# Build a list of the visits and
# instances in the data for each
# variable used in the expression.
# variable used in each expression.
try:
visits = [dtable.visits( v) for v in vids]
instances = [dtable.instances(v) for v in vids]
visits = [[dtable.visits( v) for v in evs] for evs in vids]
instances = [[dtable.instances(v) for v in evs] for evs in vids]
except KeyError as e:
raise RuntimeError('Unknown variable used in exclude expression: '
'{} ({})'.format(exprs, e))
# Calculate the intersection of visits/
# instances across all variables - we
# evaluate expressions for each visit/
# instance, and only where a visit/
# instance is present for all variables.
# evaluate an expression only on visits/
# instances present for all variables
# in that expression. All other visits/
# instances are not considered.
def intersection(a, b):
return set(a).intersection(b)
intersection = ft.partial(ft.reduce, intersection)
visits = [intersection(evis) for evis in visits]
instances = [intersection(eis) for eis in instances]
# Build a {vid : [column]} dict for
# each expression, as we need such
# a dict to evaluate them.
exprcols = []
for i in range(len(exprs)):
evs = vids[ i]
evis = visits[ i]
eis = instances[i]
cols = collections.defaultdict(list)
for evid, evisit, einstance in it.product(evs, evis, eis):
cols[evid].extend(dtable.columns(evid, evisit, einstance))
exprcols.append(cols)
# List which will contain one boolean
# numpy array for each subject include
# expression.
exprmasks = []
if len(visits) > 0: visits = ft.reduce(intersection, visits)
if len(instances) > 0: instances = ft.reduce(intersection, instances)
# evalute each expression in parallel
with dtable.pool() as pool:
for i, expr in enumerate(exprs):
# A subject will be retained if *any*
# expression for *any* visit/instance
# evaluates to true.
exprmasks = []
cols = exprcols[i]
for visit, instance in it.product(visits, instances):
if len(cols) == 0:
log.debug('Ignoring expression (%s) - no associated '
'columns are present', str(expr))
continue
colnames = {v : [c.name for c in vcols]
for v, vcols in cols.items()}
cols = list(it.chain(*cols.values()))
subtable = dtable.subtable(cols)
# build a dict of { vid : column } mappings
# for each variable used in the expression
cols = [dtable.columns(v, visit, instance)[0] for v in vids]
cols = {v : c.name for v, c in zip(vids, cols)}
log.debug('Evaluating expression (%s) on columns %s',
expr, colnames)
with dtable.pool() as pool:
for e in exprs:
exprmasks.append(pool.apply_async(
e.evaluate, (dtable, cols, )))
exprmasks.append(pool.apply_async(
expr.evaluate, (subtable, colnames)))
# wait for each expression to complete,
# then combine them using logical OR.
# wait for each expression to complete
exprmasks = [e.get() for e in exprmasks]
mask = ft.reduce(lambda a, b: a | b, exprmasks, mask)
mask = np.array(mask)
# any result which was not combined using
# any() or all() defaults to being combined
# with any(). For example, if "v123 >= 2"
# is applied to columns 123-0.0, 123-1.0,
# and 123-2.0, the final result will be
# a 1D boolean array containing True where
# any of the three columns were >= 2.
for i, em in enumerate(exprmasks):
if len(em.shape) == 2:
exprmasks[i] = em.any(axis=1)
# Finally, all expressions are combined
# in the same manner - i.e. rows which
# passed *any* of the expressions
# are included
mask = ft.reduce(lambda a, b: a | b, exprmasks, mask)
mask = np.array(mask)
# Flag subjects to drop
if exclude is not None:
......@@ -407,7 +448,7 @@ def columnsToLoad(datafiles,
# Turn the unknonwVars list
# into a list of variable IDs
unknownVids = list(sorted(set([c.vid for c in unknownVars])))
unknownVids = list(sorted({c.vid for c in unknownVars}))
if isinstance(datafiles, six.string_types):
datafiles = [datafiles]
......
......@@ -676,6 +676,7 @@ def loadTableBases():
(101, 0) : util.CTYPES.compound,
}
# We need pandas >=0.24 to support enums here
def settype(valtype, basetype):
return typecodes[valtype, basetype]
......
%% Cell type:markdown id: tags:
![image.png](attachment:image.png)
# `funpack`
> Paul McCarthy &lt;paul.mccarthy@ndcn.ox.ac.uk&gt; ([WIN@FMRIB](https://www.win.ox.ac.uk/))
`funpack` is a command-line program which you can use to extract data from UK BioBank (and other tabular) data.
You can give `funpack` one or more input files (e.g. `.csv`, `.tsv`), and it will merge them together, perform some preprocessing, and produce a single output file.
A large number of rules are built into `funpack` which are specific to the UK BioBank data set. But you can control and customise everything that `funpack` does to your data, including which rows and columns to extract, and which cleaning/processing steps to perform on each column.
The `funpack` source code is available at https://git.fmrib.ox.ac.uk/fsl/funpack. You can install `funpack` into a Python environment using `pip`:
pip install fmrib-unpack
Get command-line help by typing:
funpack -h
*The examples in this notebook assume that you have installed `funpack` 1.5.0.dev0 or newer.*
*The examples in this notebook assume that you have installed `funpack` 1.5.0 or newer.*
%% Cell type:code id: tags:
``` bash
funpack -V
```
%% Cell type:markdown id: tags:
### Contents
1. [Overview](#Overview)
1. [Import](#1.-Import)
2. [Cleaning](#2.-Cleaning)
3. [Processing](#3.-Processing)
4. [Export](#4.-Export)
2. [Examples](#Examples)
3. [Import examples](#Import-examples)
1. [Selecting variables (columns)](#Selecting-variables-(columns))
1. [Selecting individual variables](#Selecting-individual-variables)
2. [Selecting variable ranges](#Selecting-variable-ranges)
3. [Selecting variables with a file](#Selecting-variables-with-a-file)
4. [Selecting variables from pre-defined categories](#Selecting-variables-from-pre-defined-categories)
2. [Selecting subjects (rows)](#Selecting-subjects-(rows))
1. [Selecting individual subjects](#Selecting-individual-subjects)
2. [Selecting subject ranges](#Selecting-subject-ranges)
3. [Selecting subjects from a file](#Selecting-subjects-from-a-file)
4. [Selecting subjects by variable value](#Selecting-subjects-by-variable-value)
5. [Excluding subjects](#Excluding-subjects)
3. [Selecting visits](#Selecting-visits)
1. [Evaluating expressions across visits](#Evaluating-expressions-across-visits)
4. [Merging multiple input files](#Merging-multiple-input-files)
1. [Merging by subject](#Merging-by-subject)
2. [Merging by column](#Merging-by-column)
3. [Naive merging](#Merging-by-column)
4. [Cleaning examples](#Cleaning-examples)
1. [NA insertion](#NA-insertion)
2. [Variable-specific cleaning functions](#Variable-specific-cleaning-functions)
3. [Categorical recoding](#Categorical-recoding)
4. [Child value replacement](#Child-value-replacement)
5. [Processing examples](#Processing-examples)
1. [Sparsity check](#Sparsity-check)
2. [Redundancy check](#Redundancy-check)
3. [Categorical binarisation](#Categorical-binarisation)
6. [Custom cleaning, processing and loading - funpack plugins](#Custom-cleaning,-processing-and-loading---funpack-plugins)
1. [Custom cleaning functions](#Custom-cleaning-functions)
2. [Custom processing functions](#Custom-processing-functions)
3. [Custom file loaders](#Custom-file-loaders)
7. [Miscellaneous topics](#Miscellaneous-topics)
1. [Non-numeric data](#Non-numeric-data)
2. [Dry run](#Dry-run)
3. [Built-in rules](#Built-in-rules)
4. [Using a configuration file](#Using-a-configuration-file)
5. [Reporting unknown variables](#Reporting-unknown-variables)
6. [Low-memory mode](#Low-memory-mode)
%% Cell type:markdown id: tags:
# Overview
`funpack` performs the following steps:
## 1. Import
All data files are loaded in, unwanted columns and subjects are dropped, and the data files are merged into a single table (a.k.a. data frame). Multiple files can be merged according to an index column (e.g. subject ID). Or, if the input files contain the same columns/subjects, they can be naively concatenated along rows or columns.
## 2. Cleaning
The following cleaning steps are applied to each column:
1. **NA value replacement:** Specific values for some columns are replaced with NA, for example, variables where a value of `-1` indicates *Do not know*.
2. **Variable-specific cleaning functions:** Certain columns are re-formatted - for example, the [ICD10](https://en.wikipedia.org/wiki/ICD-10) disease codes can be converted to integer representations.
3. **Categorical recoding:** Certain categorical columns are re-coded.
4. **Child value replacement:** NA values within some columns which are dependent upon other columns may have values inserted based on the values of their parent columns.
## 3. Processing
During the processing stage, columns may be removed, merged, or expanded into additional columns. For example, a categorical column may be expanded into a set of binary columns, one for each category.
A column may also be removed on the basis of being too sparse, or being redundant with respect to another column.
## 4. Export
The processed data can be saved as a `.csv`, `.tsv`, or `.hdf5` file.
%% Cell type:markdown id: tags:
# Examples
Throughout these examples, we are going to use a few command line options, which you will probably **not** normally want to use:
- `-ow` (short for `--overwrite`): This tells `funpack` not to complain if the output file already exists.
- `-q` (short for `--quiet`): This tells `funpack` to be quiet.
Without the `-q` option, `funpack` can be quite verbose, which can be annoying, but is very useful when things go wrong. A good strategy is to tell `funpack` to produce verbose output using the `--noisy` (`-n` for short) option, and to send all of its output to a log file with the `--log_file` (or `-lf`) option. For example:
funpack -n -n -n -lf log.txt out.tsv in.tsv
Here's the first example input data set, with UK BioBank-style column names:
%% Cell type:code id: tags:
``` bash
cat data_01.tsv
```
%% Cell type:markdown id: tags:
The numbers in each column name typically represent:
1. The variable ID
2. The visit, for variables which were collected at multiple points in time.
3. The "instance", for multi-valued variables.
Note that one **variable** is typically associated with several **columns**, although we're keeping things simple for this first example - there is only one visit for each variable, and there are no mulit-valued variables.
> _Most but not all_ variables in the UK BioBank contain data collected at different visits, the times that the participants visited a UK BioBank assessment centre. However there are some variables (e.g. [ICD10 diagnosis codes](https://biobank.ctsu.ox.ac.uk/crystal/field.cgi?id=41202)) for which this is not the case.
%% Cell type:markdown id: tags:
# Import examples
## Selecting variables (columns)
You can specify which variables you want to load in the following ways, using the `--variable` (`-v` for short) and `--category` (`-c` for short) command line options:
* By variable ID
* By variable ranges
* By a text file which contains the IDs you want to keep.
* By pre-defined variable categories
* By column name
### Selecting individual variables
Simply provide the IDs of the variables you want to extract:
%% Cell type:code id: tags:
``` bash
funpack -q -ow -v 1 -v 5 out.tsv data_01.tsv
cat out.tsv
```
%% Cell type:markdown id: tags:
### Selecting variable ranges
The `--variable`/`-v` option accepts MATLAB-style ranges of the form `start:step:stop` (where the `stop` is inclusive):
%% Cell type:code id: tags:
``` bash
funpack -q -ow -v 1:3:10 out.tsv data_01.tsv
cat out.tsv
```
%% Cell type:markdown id: tags:
### Selecting variables with a file
If your variables of interest are listed in a plain-text file, you can simply pass that file:
%% Cell type:code id: tags:
``` bash
echo -e "1\n6\n9" > vars.txt
funpack -q -ow -v vars.txt out.tsv data_01.tsv
cat out.tsv
```
%% Cell type:markdown id: tags:
### Selecting variables from pre-defined categories
Some UK BioBank-specific categories are baked into `funpack`, but you can also define your own categories - you just need to create a `.tsv` file, and pass it to `funpack` via the `--category_file` (`-cf` for short):
%% Cell type:code id: tags:
``` bash
echo -e "ID\tCategory\tVariables" > custom_categories.tsv
echo -e "1\tCool variables\t1:5,7" >> custom_categories.tsv
echo -e "2\tUncool variables\t6,8:10" >> custom_categories.tsv
cat custom_categories.tsv
```
%% Cell type:markdown id: tags:
Use the `--category` (`-c` for short) to select categories to output. You can refer to categories by their ID:
%% Cell type:code id: tags:
``` bash
funpack -q -ow -cf custom_categories.tsv -c 1 out.tsv data_01.tsv
cat out.tsv
```
%% Cell type:markdown id: tags:
Or by name:
%% Cell type:code id: tags:
``` bash
funpack -q -ow -cf custom_categories.tsv -c uncool out.tsv data_01.tsv
cat out.tsv
```
%% Cell type:markdown id: tags:
### Selecting column names
If you are working with data that has non-UK BioBank style column names, you can use the `--column` (`-co` for short) to select individual columns by their name, rather than the variable with which they are associated. The `--column` option accepts full column names, and also shell-style wildcard patterns:
%% Cell type:code id: tags:
``` bash
funpack -q -ow -co 4-0.0 -co "??-0.0" out.tsv data_01.tsv
cat out.tsv
```
%% Cell type:markdown id: tags:
## Selecting subjects (rows)
`funpack` assumes that the first column in every input file is a subject ID. You can specify which subjects you want to load via the `--subject` (`-s` for short) option. You can specify subjects in the same way that you specified variables above, and also:
* By specifying a conditional expression on variable values - only subjects for which the expression evaluates to true will be imported
* By specifying subjects to exclude
### Selecting individual subjects
%% Cell type:code id: tags:
``` bash
funpack -q -ow -s 1 -s 3 -s 5 out.tsv data_01.tsv
cat out.tsv
```
%% Cell type:markdown id: tags:
### Selecting subject ranges
%% Cell type:code id: tags:
``` bash
funpack -q -ow -s 2:2:10 out.tsv data_01.tsv
cat out.tsv
```
%% Cell type:markdown id: tags:
### Selecting subjects from a file
%% Cell type:code id: tags:
``` bash
echo -e "5\n6\n7\n8\n9\n10" > subjects.txt
funpack -q -ow -s subjects.txt out.tsv data_01.tsv
cat out.tsv
```
%% Cell type:markdown id: tags:
### Selecting subjects by variable value
The `--subject` option accepts *variable expressions* - you can write an expression performing numerical comparisons on variables (denoted with a leading `v`) and combine these expressions using boolean algebra. Only subjects for which the expression evaluates to true will be imported. For example, to only import subjects where variable 1 is greater than 10, and variable 2 is less than 70, you can type:
%% Cell type:code id: tags:
``` bash
funpack -q -ow -sp -s "v1 > 10 && v2 < 70" out.tsv data_01.tsv
cat out.tsv
```
%% Cell type:markdown id: tags:
The following symbols can be used in variable expressions:
| Symbol | Meaning |
|---------------------------|--------------------------|
| `==` | equal to |
| `!=` | not equal to |
| `>` | greater than |
| `>=` | greater than or equal to |
| `<` | less than |
| `<=` | less than or equal to |
| `na` | N/A |
| `&&` | logical and |
| <code>&#x7c;&#x7c;</code> | logical or |
| `~` | logical not |
| `()` | to denote precedence |
| Symbol | Meaning |
|---------------------------|---------------------------------|
| `==` | equal to |
| `!=` | not equal to |
| `>` | greater than |
| `>=` | greater than or equal to |
| `<` | less than |
| `<=` | less than or equal to |
| `na` | N/A |
| `&&` | logical and |
| <code>&#x7c;&#x7c;</code> | logical or |
| `~` | logical not |
| `all` | all columns must meet condition |
| `any` | any column must meet condition |