ENH: CPU implementation of CostFxnLogJacobianSingularValues class

Builds on utils!18 (merged), !21 (merged), !24 (merged), !26 (merged), and !32 (merged), which must be all merged first.

This MR provides a CPU implementation of the CostFxnLogJacobianSingularValues class. After the re-organisation in !26 (merged), achieving this was very straightforward. The functionality has been split into a number of separate files:

  • CostFxnLogJacobianSingularValues (.h/.cpp): GPU/CPU-agnostic class definition and logic which is common to both implementations.
  • CostFxnLogJacobianSingularValuesCore (.h/GPU.cu/CPU.cpp): Core functions with a GPU/CPU agnostic interface, but with separate GPU/CPU implementations.
  • CostFxnLogJacobianSingularValuesKernels (.inl/GPU.cuh/CPU.h/Impl.inl/GPU.cu/CPU.cpp): Main kernel routines - this code was automatically generated with the MATLAB Symbolic Toolbox, and remains identical for both CPU/GPU implementations, so is located in "inlined" (.inl) files, which are then included in separate CPU/GPU header and source files.

The main kernel routines have been adjusted so that, instead of accepting pointers containing the input/output values, and an offset into those pointers, they accept the values/pointers to the output locations. This allows them to be easily re-used for both CPU and GPU implementations.

The CostFxnLogJacobianSingularValues routines still call out to some GPU functions in CostFxnSplineUtils, but this is through a GPU/GPU agnostic interface, and will be addressed in a separate MR.

This MR also removes the CostFxnSSDWarpFieldSymmetricMaskedExcludedUtils.cpp file, and merges its logic back into the CostFxnSSDWarpFieldSymmetricMaskedExcluded.cpp file. This was separated because the Utils file needed to be compiled with nvcc due to the use of thrust:: data types in the SparseDiagonalMatrixTiled class. But the SparseDiagonalMatrixTiled interface is now decoupled from CUDA, meaning that the CostFxnSSDWarpFieldSymmetricMaskedExcluded logic can now be compiled entirely with g++ (and it was always entirely CPU-based).

Edited by Paul McCarthy

Merge request reports

Loading