gen_STRUCTMOTION/script_predict.py fails

With some success, I've been using the UKB_UNCONFOUND_v2 pipeline, but instead of installing the crucial bb_pipeline_v_2.0 myself, I have been using the installed version on Rescomp.

This has worked fine until it came to run gen_STRUCTMOTION/script_predict.py which failed with the error

Traceback (most recent call last):
  File "./script_predict.py", line 35, in <module>
    final_model = pickle.load(open('MODEL/model.p','rb'))
  File "/gpfs2/well/win/projects/ukbiobank/fbp/bb_pipeline_v_2.0/bb_python/bb_python/lib/python3.5/site-packages/sklearn/grid_search.py", line 24, in <module>
    from .cross_validation import check_cv
  File "/gpfs2/well/win/projects/ukbiobank/fbp/bb_pipeline_v_2.0/bb_python/bb_python/lib/python3.5/site-packages/sklearn/cross_validation.py", line 32, in <module>
    from .metrics.scorer import check_scoring
  File "/gpfs2/well/win/projects/ukbiobank/fbp/bb_pipeline_v_2.0/bb_python/bb_python/lib/python3.5/site-packages/sklearn/metrics/__init__.py", line 7, in <module>
    from .ranking import auc
  File "/gpfs2/well/win/projects/ukbiobank/fbp/bb_pipeline_v_2.0/bb_python/bb_python/lib/python3.5/site-packages/sklearn/metrics/ranking.py", line 32, in <module>
    from ..utils.stats import rankdata
  File "/gpfs2/well/win/projects/ukbiobank/fbp/bb_pipeline_v_2.0/bb_python/bb_python/lib/python3.5/site-packages/sklearn/utils/stats.py", line 2, in <module>
    from scipy.stats import rankdata as _sp_rankdata
  File "/gpfs2/well/win/projects/ukbiobank/fbp/bb_pipeline_v_2.0/bb_python/bb_python/lib/python3.5/site-packages/scipy/stats/__init__.py", line 344, in <module>
    from .stats import *
  File "/gpfs2/well/win/projects/ukbiobank/fbp/bb_pipeline_v_2.0/bb_python/bb_python/lib/python3.5/site-packages/scipy/stats/stats.py", line 176, in <module>
    from . import distributions, mstats_basic, _stats
  File "/gpfs2/well/win/projects/ukbiobank/fbp/bb_pipeline_v_2.0/bb_python/bb_python/lib/python3.5/site-packages/scipy/stats/distributions.py", line 13, in <module>
    from . import _continuous_distns
  File "/gpfs2/well/win/projects/ukbiobank/fbp/bb_pipeline_v_2.0/bb_python/bb_python/lib/python3.5/site-packages/scipy/stats/_continuous_distns.py", line 17, in <module>
    from scipy._lib._numpy_compat import broadcast_to
  File "/gpfs2/well/win/projects/ukbiobank/fbp/bb_pipeline_v_2.0/bb_python/bb_python/lib/python3.5/site-packages/scipy/_lib/_numpy_compat.py", line 10, in <module>
    from numpy.testing.nosetester import import_nose
ImportError: No module named 'numpy.testing.nosetester'

which seems to be an error replicated in this stack exchange posting, identified as a 3-way conflict with particular versions of numpy, scipy and sklearn.

Notably, there is a difference in the versions specified in bb_pipeline_v_2.0/initvars on the repo and on rescomp. On the bb_pipeline_v_2.0 repo bb_python/python_installation/install_bb_python.sh only installs 10 packages, while the same install_bb_python.sh file on Rescomp installs 62 packages. While both have numpy, scipy and sklearn pinned versions that shouldn't create this problem, subsequent installs must bring up the version.

numpy repo pin: 1.11.1
numpy repo version after all installs: 1.12.0

numpy Rescomp pin: 1.12.0
numpy Rescomp version after all installs: 1.18.1

For scipy and scikit-learn, repo and rescomp versions are the same, and the pinned and final install also the same (0.18.0 and 0.17.1, respectively).

Once I switched to using my own bb_python (using only the 10 packages) I no longer get that error. Instead I get another error :( one that I have less ability to diagnose.

(bb_python) [kfh142@rescomp1 gen_STRUCTMOTION]$ ./script_predict.py 
Traceback (most recent call last):
  File "./script_predict.py", line 36, in <module>
    prediction  = final_model.predict(test_data_norm)
  File "/gpfs2/well/nichols/shared/UK_biobank_pipeline_v_1/bb_python/bb_python/lib/python3.5/site-packages/sklearn/utils/metaestimators.py", line 37, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "/gpfs2/well/nichols/shared/UK_biobank_pipeline_v_1/bb_python/bb_python/lib/python3.5/site-packages/sklearn/grid_search.py", line 435, in predict
    return self.best_estimator_.predict(X)
  File "/gpfs2/well/nichols/shared/UK_biobank_pipeline_v_1/bb_python/bb_python/lib/python3.5/site-packages/sklearn/linear_model/base.py", line 200, in predict
    return self._decision_function(X)
  File "/gpfs2/well/nichols/shared/UK_biobank_pipeline_v_1/bb_python/bb_python/lib/python3.5/site-packages/sklearn/linear_model/base.py", line 185, in _decision_function
    dense_output=True) + self.intercept_
  File "/gpfs2/well/nichols/shared/UK_biobank_pipeline_v_1/bb_python/bb_python/lib/python3.5/site-packages/sklearn/utils/extmath.py", line 184, in safe_sparse_dot
    return fast_dot(a, b)
ValueError: shapes (45480,9) and (11,) not aligned: 9 (dim 1) != 11 (dim 0)

which I assumed is linked to PREDICT_DATA having the wrong number of columns. I suspect there's a SGE error lurking here that I'm still tracking down.

Edited Feb 22, 2021 by Tom Nichols