funpack merge requestshttps://git.fmrib.ox.ac.uk/fsl/funpack/-/merge_requests2023-12-13T22:00:30+00:00https://git.fmrib.ox.ac.uk/fsl/funpack/-/merge_requests/111MNT: Always emit a warning on missing variables (unless `--fail_if_missing` i...2023-12-13T22:00:30+00:00Paul McCarthyMNT: Always emit a warning on missing variables (unless `--fail_if_missing` is set)https://git.fmrib.ox.ac.uk/fsl/funpack/-/merge_requests/110ENH: New `--fail_if_missing` option, which raises an error if a requested var...2023-12-13T21:49:24+00:00Paul McCarthyENH: New `--fail_if_missing` option, which raises an error if a requested variable is not in input file(s)https://git.fmrib.ox.ac.uk/fsl/funpack/-/merge_requests/109MNT: Administrative and documentation updates2023-09-22T15:46:01+01:00Paul McCarthyMNT: Administrative and documentation updatesMost notably, a brief example on merging two data sets together using a UKB bridging fileMost notably, a brief example on merging two data sets together using a UKB bridging filehttps://git.fmrib.ox.ac.uk/fsl/funpack/-/merge_requests/108MNT: Don't crash if fillVisits(mode) is applied to empty data2023-09-22T13:02:05+01:00Paul McCarthyMNT: Don't crash if fillVisits(mode) is applied to empty data* Don't crash if fillVisits(mode) is applied to empty data, and don't crash on invalid expressions.
* Fix logic error handling the `--no_download` flag.
* Migrate from `setup.py` to `pyproject.toml`.* Don't crash if fillVisits(mode) is applied to empty data, and don't crash on invalid expressions.
* Fix logic error handling the `--no_download` flag.
* Migrate from `setup.py` to `pyproject.toml`.https://git.fmrib.ox.ac.uk/fsl/funpack/-/merge_requests/107ENH: Recoding value expressions2023-04-17T11:59:05+01:00Paul McCarthyENH: Recoding value expressionsAbility to specify categorical recoding replacement values as expressions which are evaluated on the data. For example,
```
--recoding 123 "1, 2, 3" "100, max() + 1, 300"
```
would cause, for data-field 123, values of 1 and 3 to be re...Ability to specify categorical recoding replacement values as expressions which are evaluated on the data. For example,
```
--recoding 123 "1, 2, 3" "100, max() + 1, 300"
```
would cause, for data-field 123, values of 1 and 3 to be respectively replaced with 100 and 300, and value 2 to be replaced with one plus the maximum value of data-field 123.
This MR also includes some miscellaneous updates to make the code compatible with Pandas 2.x.https://git.fmrib.ox.ac.uk/fsl/funpack/-/merge_requests/106DOC: update apiref stub files2023-02-03T11:58:48+00:00Paul McCarthyDOC: update apiref stub filesAnd some other small tweaksAnd some other small tweakshttps://git.fmrib.ox.ac.uk/fsl/funpack/-/merge_requests/105ENH: Allow variable categories to be used in processing rules2023-02-02T17:08:53+00:00Paul McCarthyENH: Allow variable categories to be used in processing rulese.g. `"all,1,2,cat5 someproc()"` will apply someproc to vars 1, 2, and all vars from category 5e.g. `"all,1,2,cat5 someproc()"` will apply someproc to vars 1, 2, and all vars from category 5https://git.fmrib.ox.ac.uk/fsl/funpack/-/merge_requests/104Download latest versions of UKB schema files2023-02-02T16:38:15+00:00Paul McCarthyDownload latest versions of UKB schema filesUpdating FUNPACK to download the latest versions of the UKB data field/encoding schema files when it is executed, rather than relying on the pre-downloaded files stored in `funpack/data/`, which were manually updated on each release.
So...Updating FUNPACK to download the latest versions of the UKB data field/encoding schema files when it is executed, rather than relying on the pre-downloaded files stored in `funpack/data/`, which were manually updated on each release.
Some associated re-organisation and refactoring - `funpack/data/` is renamed to `funpack.schema`, and the `funpack.coding` and `funpack.hierarchy` modules moved into there.https://git.fmrib.ox.ac.uk/fsl/funpack/-/merge_requests/103BF: Fix evaluation order calculation in child value replacement2022-08-16T13:55:55+01:00Paul McCarthyBF: Fix evaluation order calculation in child value replacementThe child value replacement cleaning step will insert otherwise missing values into data fields based on the values of parent data fields. This step is driven by rules such as:
```
20406 "v20401 == 0" "0"
20404 "...The child value replacement cleaning step will insert otherwise missing values into data fields based on the values of parent data fields. This step is driven by rules such as:
```
20406 "v20401 == 0" "0"
20404 "v20401 == 0 || v20406 == 0" "0"
```
which states:
1. Set 20404 to 0 where either 20401 or 20406 are equal to 0
2. Set 20406 to 0 where 20401 is equal to 0
(Note that these are real data fields with real dependent relationships: [20401](https://biobank.ctsu.ox.ac.uk/crystal/field.cgi?id=20401), [20404](https://biobank.ctsu.ox.ac.uk/crystal/field.cgi?id=20404), [20406](https://biobank.ctsu.ox.ac.uk/crystal/field.cgi?id=20406))
The order in which these rules are applied must be consistent, as applying them in different orders would potentially produce different results. To determine the ordering, I build a directed graph which represents the dependent relationships between all of the involved data fields; in the above example, this graph would look like:
```
20401
^ ^
/ \
/ \
20406 |
^ |
\ /
\ /
20404
```
i.e.
* 20406 is dependent on the values of 20401
* 20404 is dependent on the values of 20401 and 20406
* 20401 is not dependent on any other data fields
Early on, I decided to apply these rules "bottom-up", i.e. evaluating and applying the rules for the most dependent data fields first (20404 above), and the least dependent data fields last (20406 above). The code which does this is in the `funpack.expression.calculateExpressionEvaluationOrder` function, which I thought was correctly identifying the hierarchy level of each node in the graph. But I didn't spend much time thinking about, or working on the problem, so the routine was buggy in that it could raise a circular dependency error on perfectly valid expressions. This routine has been re-written to use Kahn's topological sorting algorithm, so should now be more robust.
As an aside, the "bottom-up" approach means that rules must be written so as to consider the values of *all* parent data-fields when inserting a value into a child data field. In the example above, the rule for 20404 must consider both 20406 and 20401. If I were to change the logic to "top-down", this would not be necessary, as the rule for 20406 would have already been applied by the time the rule for 20404 was applied. However, at this point in time, I think it is more important to preserve the existing logic, so these rules will continue to be applied in a bottom-up manner.Paul McCarthyPaul McCarthyhttps://git.fmrib.ox.ac.uk/fsl/funpack/-/merge_requests/102Bf/git call2022-08-08T17:49:08+01:00Paul McCarthyBf/git callCall out to `git` was raising an error if the `git` command was not found (I'd assumed that `check=False` would suppress this)Call out to `git` was raising an error if the `git` command was not found (I'd assumed that `check=False` would suppress this)Paul McCarthyPaul McCarthyhttps://git.fmrib.ox.ac.uk/fsl/funpack/-/merge_requests/101New createDiagnosisColumns(binarised) option2022-08-05T10:32:25+01:00Paul McCarthyNew createDiagnosisColumns(binarised) optionAllow the `createDiagnosisColumns` processing function to be run on columns that have already been passed through the `binariseCategorical` function.Allow the `createDiagnosisColumns` processing function to be run on columns that have already been passed through the `binariseCategorical` function.Paul McCarthyPaul McCarthyhttps://git.fmrib.ox.ac.uk/fsl/funpack/-/merge_requests/100BF: Don't cast values to integers when creating column names in binariseCateg...2022-08-02T17:46:24+01:00Paul McCarthyBF: Don't cast values to integers when creating column names in binariseCategoricalCasting to `int` will potentially result in duplicate column names - for example, data field [41256](https://biobank.ctsu.ox.ac.uk/crystal/field.cgi?id=41256) uses data coding [259](https://biobank.ctsu.ox.ac.uk/crystal/coding.cgi?id=259...Casting to `int` will potentially result in duplicate column names - for example, data field [41256](https://biobank.ctsu.ox.ac.uk/crystal/field.cgi?id=41256) uses data coding [259](https://biobank.ctsu.ox.ac.uk/crystal/coding.cgi?id=259&nl=1) which contains distinct values `231` and `0231`Paul McCarthyPaul McCarthyhttps://git.fmrib.ox.ac.uk/fsl/funpack/-/merge_requests/99New createDiagnosisColumns processing function2022-07-28T10:28:00+01:00Paul McCarthyNew createDiagnosisColumns processing functionThe `createDiagnosisColumns` function can be used to generate binary columns for every unique ICD10 diagnosis code, indicating whether they were a primary or secondary diagnosis.The `createDiagnosisColumns` function can be used to generate binary columns for every unique ICD10 diagnosis code, indicating whether they were a primary or secondary diagnosis.Paul McCarthyPaul McCarthyhttps://git.fmrib.ox.ac.uk/fsl/funpack/-/merge_requests/98BF: need to search multiple config directories2022-06-28T20:43:34+01:00Paul McCarthyBF: need to search multiple config directoriesBuilt-in plugins/config files should still be locatable, even if `$FUNPACK_CONFIG_DIR` is set. Therefore, multiple candidate config directories need to be searched when trying to locate a config/plugin file.
# `funpack` merge request
...Built-in plugins/config files should still be locatable, even if `$FUNPACK_CONFIG_DIR` is set. Therefore, multiple candidate config directories need to be searched when trying to locate a config/plugin file.
# `funpack` merge request
Unless the maintainer is being sloppy, this merge request will not be accepted
unless the following criteria are met:
- [ ] Unit tests pass
- [x] Changelog updated
- [x] Version number in `funpack/__init__.py` updated according to
[Semantic Versioning](https://semver.org) conventionsPaul McCarthyPaul McCarthyhttps://git.fmrib.ox.ac.uk/fsl/funpack/-/merge_requests/97New --add_aux_var option2022-08-15T10:14:03+01:00Paul McCarthyNew --add_aux_var option * New `--add_aux_var` option, which ensures that auxillary/secondary variables used in processing rules will be imported, if they weren't already selected for import. For exmaple, the rule `41270 binariseCategorical(take=41280)` would b... * New `--add_aux_var` option, which ensures that auxillary/secondary variables used in processing rules will be imported, if they weren't already selected for import. For exmaple, the rule `41270 binariseCategorical(take=41280)` would be skipped if `41280` were not imported. With `--add_aux_vars`, 41280 will be imported, even if explicitly excluded.
* The FMRIB configuration profile is now maintained separately from the FUNPACK source code at fsl/funpack-fmrib-config>. This is so it can be updated independently of FUNPACK.
* New utility script, `refresh_showcase_schema.py` to update built-in UKB schema. Only intended for use by me.
* New documentation pages for the command-line interface, and FMRIB profile
* Marked the `broadcast_` feature in processing as deprecated. Alternative is for processing functions to implement their own parallelisation.Paul McCarthyPaul McCarthyhttps://git.fmrib.ox.ac.uk/fsl/funpack/-/merge_requests/96Mnt/cats2022-06-01T16:14:40+01:00Paul McCarthyMnt/catshttps://git.fmrib.ox.ac.uk/fsl/funpack/-/merge_requests/95Mnt/schema2022-05-31T15:18:27+01:00Paul McCarthyMnt/schemahttps://git.fmrib.ox.ac.uk/fsl/funpack/-/merge_requests/94Mnt/categories2022-05-31T10:23:09+01:00Paul McCarthyMnt/categoriesNever ended up tagging 3.2.1, so I can sneak this tiny change inNever ended up tagging 3.2.1, so I can sneak this tiny change inhttps://git.fmrib.ox.ac.uk/fsl/funpack/-/merge_requests/93MNT: More category tweaks2022-05-26T10:48:46+01:00Paul McCarthyMNT: More category tweaks0 (exclude) now has an ID of 100. Explicitly exclude cat31. Some additional vars0 (exclude) now has an ID of 100. Explicitly exclude cat31. Some additional varshttps://git.fmrib.ox.ac.uk/fsl/funpack/-/merge_requests/92Options to exclude variables/categories2022-05-13T14:23:37+01:00Paul McCarthyOptions to exclude variables/categoriesThis MR contains a few changes:
- New `--exclude_variable` and `--exclude_category` options, allowing variables/categories to be excluded from import. These take precedence over the `--variable` / `--category` options.
- Changes to the F...This MR contains a few changes:
- New `--exclude_variable` and `--exclude_category` options, allowing variables/categories to be excluded from import. These take precedence over the `--variable` / `--category` options.
- Changes to the FMRIB categories and configuration to exclude a new `exclude` category.
- Change to the `removeIfRedundant` routine to use double precision floating point by default, as the correlation calculation has issues when using 32 bit precision.