Skip to content

Rf/expression logic

Paul McCarthy requested to merge (removed):rf/expression_logic into master

funpack merge request

The way that funpack currently combines expressions on variables with different numbers of columns is slightly flawed. It attempts to combine results within visit. For example, in the expression v1 > 4 && v2 < 8, if variables 1 and 2 both have columns for 3 visits, the expression will be evaluated separately on the columns for visits 1, 2, and 3, and then the results combined with a logical OR.

This approach falls apart when combining expressions on variables with a different number of columns, such as the ICD columns. For example, in the expression v41202 == "A001" && v25010 != na (all subjects with ICD diagnosis A001 who have imaging data), FUNPACK would still attempt to pair up columns with matching visit/instance numbers, with the result that only the first couple of columns of 41202 would be considered in the expression.

This MR proposes that this logic is simplified, so that all columns of a variable are always used in the evaluation of an expression and that the default behaviour for combining binary expressions from variables with different numbers of columns is for the results to be collapsed with a loglcal OR before being combined to form the final result.

Unless the maintainer is being sloppy, this merge request will not be accepted unless the following criteria are met:

[ ] Unit tests pass [ ] Changelog updated [ ] Version number in funpack/__init__.py updated according to Semantic Versioning conventions

Merge request reports