1. 10 Sep, 2021 1 commit
    • Duncan Mortimer's avatar
      Improve robustness wrt to failed jobs · ca0df73b
      Duncan Mortimer authored
        Jobs can fail in a way that the epilog doesn't run.
        This now checks lock file against running sge shepherd processes and
        auto-removes locks that don't make sense
      ca0df73b
  2. 15 Jul, 2020 3 commits
  3. 24 Apr, 2020 3 commits
  4. 23 Apr, 2020 2 commits
  5. 18 Feb, 2020 1 commit
  6. 04 Feb, 2019 3 commits
  7. 13 Oct, 2017 1 commit
  8. 11 Oct, 2017 1 commit
  9. 04 Oct, 2017 3 commits
  10. 03 Oct, 2017 2 commits
    • Duncan Mortimer's avatar
      Include documentation about cuda.conf · 6a4ab0cc
      Duncan Mortimer authored
      6a4ab0cc
    • Duncan Mortimer's avatar
      Multiple fixes · ac0cab5b
      Duncan Mortimer authored
      Improve security of cuda epilog
      Ensure device mode is such that only the cluster can access.
      Only assign GPUs designated for cluster use
      Correct behaviour when first GPU is already allocated - don't fail the script prematurely
      Improve clean up of temporary files
      Mail admins when there is a fatal error which causes the queue to go into the error state
      Move dev special file ownership states and exit states to constants defined at script start
      ac0cab5b
  11. 02 Oct, 2017 1 commit
  12. 21 Sep, 2017 2 commits
  13. 20 Sep, 2017 9 commits
  14. 18 Aug, 2017 1 commit
  15. 16 Aug, 2017 2 commits