Skip to content

BF: Fix evaluation order calculation in child value replacement

Paul McCarthy requested to merge bf/calc-expression-order into master

The child value replacement cleaning step will insert otherwise missing values into data fields based on the values of parent data fields. This step is driven by rules such as:

20406    "v20401 == 0"                   "0"
20404    "v20401 == 0 || v20406 == 0"    "0"

which states:

  1. Set 20404 to 0 where either 20401 or 20406 are equal to 0
  2. Set 20406 to 0 where 20401 is equal to 0

(Note that these are real data fields with real dependent relationships: 20401, 20404, 20406)

The order in which these rules are applied must be consistent, as applying them in different orders would potentially produce different results. To determine the ordering, I build a directed graph which represents the dependent relationships between all of the involved data fields; in the above example, this graph would look like:

   20401
    ^ ^
    / \
   /   \
20406   |
   ^    |
   \   /
    \ /
   20404

i.e.

  • 20406 is dependent on the values of 20401
  • 20404 is dependent on the values of 20401 and 20406
  • 20401 is not dependent on any other data fields

Early on, I decided to apply these rules "bottom-up", i.e. evaluating and applying the rules for the most dependent data fields first (20404 above), and the least dependent data fields last (20406 above). The code which does this is in the funpack.expression.calculateExpressionEvaluationOrder function, which I thought was correctly identifying the hierarchy level of each node in the graph. But I didn't spend much time thinking about, or working on the problem, so the routine was buggy in that it could raise a circular dependency error on perfectly valid expressions. This routine has been re-written to use Kahn's topological sorting algorithm, so should now be more robust.

As an aside, the "bottom-up" approach means that rules must be written so as to consider the values of all parent data-fields when inserting a value into a child data field. In the example above, the rule for 20404 must consider both 20406 and 20401. If I were to change the logic to "top-down", this would not be necessary, as the rule for 20406 would have already been applied by the time the rule for 20404 was applied. However, at this point in time, I think it is more important to preserve the existing logic, so these rules will continue to be applied in a bottom-up manner.

Edited by Paul McCarthy

Merge request reports