ENH: CPU implementation of `VolumeBSpline` class (!21) · Merge requests · FSL / MMORF

Paul McCarthy requested to merge enh/volumebspline-cpu into master Aug 22, 2024

Note that this MR depends on the following:

New CPU implementation of the VolumeBSpline class, using the spline interpolation implementation from miscmaths/splinterpolator.h, which gives identical results to the existing CUDA-based implementation (the only differences being due to floating point precision).

The MMORF code has been restructured slightly to accommodate both CPU and GPU implementations:

The src/VolumeBSpline.cpp class has been made CPU/GPU-agnostic. Data access/interpolation is now delegated to a VolumeBSplineSampler instance, with its interface defined in inc/VolumeBSplineSampler.h.
CUDA-based interpolation is handled by a VolumeBSplineSampler implementation in src/VolumeBSplineSamplerGPU.cu.
CPU-based interpolation is handled by an implementation in src/VolumeBSplineSamplerCPU.cpp.
The implementation is selected at compile time, by linking either VolumeBSplineSamplerCPU.o or VolumeBSplineSamplerGPU.o

The VolumeBSplineSamplerCPU implementation parallelises sampling across voxels via std::thread objects. This is performed in a general-purpose parfor function, defined in inc/Threading.h. The splinterpolator module also uses std::thread objects to parallelise initial calculation of the spline coefficients. The parfor function essentially performs the same task as std::transform, but allows for explicit control over the number of threads used.

During this work, an issue was found in the CUDA interpolation routines, implemented in the vendored CubicInterpolationCUDA library. The built-in tex3D<float> linear interpolation routines (which are used for spline interpolation) use 8-bit precision to discretise texture coordinates, which was resulting in artifacts in interpolated volume gradient fields. This has therefore been replaced with our own linear interpolation routine which uses 32 bit precision, with no apparent drop in speed. Note that this change does result in slight numeric differences in MMORF outputs.

Another change in this MR is re-arrangement of some utility functions for converting between 3D and 1D (flattened/unravelled) indices (e.g. index_vol_to_lin, index_lin_to_vol, etc). These functions need to be compiled for both host and device in the CUDA implementation, and also need to be compiled for the CPU implementation. In order to avoid the need to duplicate code, these routines have been moved into an "inline" file src/IndexUtils.inl, which is included (via #include "IndexUtils.inl") into src/IndexUtilsCPU.cpp and src/IndexUtilsGPU.cu. This is slightly clunky, but avoids the need to have separate identical implementations.

Edited Nov 22, 2024 by Paul McCarthy

ENH: CPU implementation of `VolumeBSpline` class

Merge request reports