ENH: CPU implementation of CostFxnKernels
This MR adds CPU implementations of the core cost function kernels, for calculating the values in hessian matrices, which are used by all of the main MMORF cost functions. This was the final change needed to fully port MMORF to the CPU - it is now possible to build and run MMORF without a GPU.
From a practical point of view, the outputs from both the GPU and CPU builds are identical - with the same inputs and options, they produce an effectively identical warp field (i.e. which will register volumes equally well).
The outputs do differ numerically by small amount however - I think this is down to two main reasons:
- Calculating the value for an element in a hessian matrix requires a large number of floating point calculations across a large numeric range (i.e. involving both very small and very large numbers). When the order in which these calculations are performed, the end result can differ. Consecutive runs of the GPU build, and CPU build when using multiple threads, will produce slightly different (but still effectively identical) results. In contrast, consecutive runs of the CPU build, restricting it to use a single thread, will produce numerically identical results.
- While the CPU and GPU spline interpolation implementations produce very similar results, there can be very small differences in interpolated volumes at the FOV boundaries (i.e. voxels at end slices). For example, a single voxel on the boundary slice may be thresholded in one output, but preserved in another.
This change was accomplished by merging the CostFxnKernels and CostFxnSplineUtils into a single module to provide the MMORF::CostFxnKernels namespace, which is split across several files:
-
inc/CostFxnKernels.hdeclares CPU/GPU-agnostic functions which form the interface used by the cost functions. -
inc/CostFxnKernelsCoreGPU.cuhdeclares some CUDA kernel functions used by the GPU implementation -
src/CostFxnKernels.cppdefines a few CPU/GPU-agnostic utility functions. -
src/CostFxnKernelsCPU.cppprovides the CPU implementation of the kernel functions. -
src/CostFxnKernelsGPU.cuprovides the GPU implementation of the kernel functions. -
src/CostFxnKernelsCoreGPU.cuprovides the CUDA kernels used byCostFxnKernelsGPU.cu. This could be merged intoCostFxnKernelsGPU.cu, but has been left in a separate file for the time being.
The implementation can be selected at link time by linking against either CostFxnKernelsCPU.o or CostFxnKernelsGPU.o/CostFxnKernelsCoreGPU.o.
Unit tests for all of the functions declared in inc/CostFxnKernels.h have been added, using benchmark data generated with MMORF v0.3.3.