Skip to content

ENH: CPU implementation of `SparseDiagonalMatrixTiled` class

Paul McCarthy requested to merge enh/spdt-cpu into master

This MR depends on !21 and !24 which must be merged first.

The SparseDiagonalMatrixTiled class has been restructured to allow for separate CPU and GPU implementations.

  • There is a single SPDT class definition in inc/SparseDiagonalMatrixTiled.h. The class has been adjusted so that it does not use the PIMPL idiom, but instead holds a reference to an opaque SPDT::Data instance.
  • Platform agnostic logic is contained in src/SparseDiagonalMatrixTiled.cpp.
  • CPU and GPU specific logic is contained in src/SparseDiagonalMatrixTiledCPU.cpp src/SparseDiagonalMatrixTiledGPU.cpp, which also contain definitions of the SPDT::Data class - the GPU implementation uses thrust::device_vector to store the matrix data, and the CPU implementation uses std::vector.
  • The implementation can be selected at compilation time simply by linking either SparseDiagonalMatrixTiledCPU.o or SparseDiagonalMatrixTiledGPU.o

Some changes have been made to the SPDT interface to remove coupling to CUDA device memory, and allow for a CPU implementation:

  • The SPDT.get_element and SPDT.set_element functions have been removed. The only use of these functions was by the CostFxnPointCloudWarpField class, which has been adjusted so that the matrix data is created on the CPU, and then copied into the SPDT matrix via a call to the new SPDT.set_elements function. This change has resulted in a substantial speed-up of the initialisation logic in the CostFxnPointCloudWarpField class, and has allowed CPU-GPU synchronisation logic to be removed from the SPDT class.

  • The SPDT.get_raw_pointer and SPDT.get_const_raw_pointer functions have been removed and replaced with SPDT.get_tile and SPDT.set_tile functions, which allow a client to read/write entire matrix tiles. This means that a copy must be made when reading/writing matrix data, but this doesn't seem to have a substantial effect on performance.

In order to allow the CPU SPDT implementation to be used with GPU code (e.g. cost functions for which we don't yet have CPU implementations), a few functions have been added in src/SparseDiagonalMatrixTiled_temporary_shim.cu - these functions simply copy between thrust::device_vector/thrust::host_vector and std::vector, and can be removed once we have a full CPU implementation.

Edited by Paul McCarthy

Merge request reports

Loading