Problem(s) that meant that eddy couldn't find a volume without outliers to use as shape-ref for slice-to-volume
This should, hopefully, finally fix the problem of eddy not finding volumes without outliers to use as shape-reference for slice-to-volume. It has been a longstanding issue that I have thought depended on timing problems in the sequence, leading to some slices having consistently lower intensity. And I have indeed seen data like that. But since it has been a lot of complaints from the LS-HCP crowd (Mike Harms) I decided to finally look more closely at the problem. But as it turns out there are two distinct mechanisms/problems that can cause this to happen.
- Very large Jacobian determinants in the inverse field can inflate the intensity of the predictions when warped back into observation space. That means that when comparing the observation to the prediction the observation can have appreciatively lower intensity than "expected", and be labelled as an outlier. That slice is the replaced by the prediction (which has inflated intensity), which in turn means that several of the predictions in the nest iteration will have inflated intensities. That can then cause additional volumes for that slice to be labelled as outliers, etc. The end result is that for some slices all volumes (in one or more shells) are labelled as outliers. And in the "corrected" images there may be one or several clusters, typically coinciding with areas of Jacobians>3 in the inverse field, with very high intensities. This problem is not unique to LS-HCP data, and I have even been sent a data set by Fidel where this is a problem in UKB data. In the cases I have seen it has been primarily located near the ear canals, where the Jacobians of the inverse field can get large. The solution to this was to limit (mask) the comparison between prediction and observation to voxels where the Jacobian of the inverse field was < 3 (default, it is accessible at the command line with --jacut). The solution worked well on the UKB data set from Fidel, which is what I used for testing because of the much shorter run-time than for the LS-HCP data. But it turned out not to solve the problem for the LS-HCP data.
- Pooled summary statistics for outlier detection. I must admit I didn't even remember that I did it this way. But what I did was to consider all slice-sum-of-difference values to come from the same distribution, which is unsurprisingly not a very sensible thing to do. If looking at the distributions from different shells it is quite clear that the low b-value shells have a much greater standard deviation than the high b-value shells. If there is a reasonable balance in terms of number of volumes per shell (like in the original HCP data) it will not have any "catastrophic" consequences. The main consequence in that case would be slightly inflated statistics for the lower shell and slightly too low statistics for the higher shell. I am now thinking this is one of the reasons why we tend to detect more outliers in the lower b-value shells. But if there is one (or more) very low b-value shells with a very small number of volumes (in the LS-HCP data there is a b=200 shell with 3 directions and a b=500 shell with 6 directions) the summary statistics (mean and standard deviation) is completely dominated by the higher b-values with more than an order of magnitude more volumes. The means that the slice-sum-of-difference values for the b=200 and b=500 shells are transformed to z-scores using a standard deviation that is much too small, leading to a lot of false positives. When combined with the problem of large Jacobian determinants (see above) it leads to these false positives being located in particular slices (near the ear canals in the examples that I have seen) and on gets a similar situation as described above where the actual data is being replaced by predictions with too high signal. And similar to above this then spreads to the other predictions in the same shell. I have solved that by instead calculating shell-wise summary statistics, something that should obviously have been done from the beginning. Using shell-wise stats is now the default, since this is clearly the right thing to do. That will mean that even on less extreme data sets, e.g. two shell data with ~equal number volumes per shell, there will be appreciable differences. If someone was to re-run their data they would see fewer outliers detected in the lower b-shell and more in the higher b-shell. There is therefore the option to set --ol_ss=pooled to use the same model as before if one wants to keep things constant. This will have to be carefully pointed out in the release notes. I have tested it on one of the problematic data sets sent me by Mike H and Selena, and it seems to run fine. No need to specify reference volumes, and the results look very good. I have also run all the Feeds tests with this new version, and they have all passed.