With some success, I've been using the
UKB_UNCONFOUND_v2 pipeline, but instead of installing the crucial
bb_pipeline_v_2.0 myself, I have been using the installed version on Rescomp.
This has worked fine until it came to run
gen_STRUCTMOTION/script_predict.py which failed with the error
Traceback (most recent call last): File "./script_predict.py", line 35, in <module> final_model = pickle.load(open('MODEL/model.p','rb')) File "/gpfs2/well/win/projects/ukbiobank/fbp/bb_pipeline_v_2.0/bb_python/bb_python/lib/python3.5/site-packages/sklearn/grid_search.py", line 24, in <module> from .cross_validation import check_cv File "/gpfs2/well/win/projects/ukbiobank/fbp/bb_pipeline_v_2.0/bb_python/bb_python/lib/python3.5/site-packages/sklearn/cross_validation.py", line 32, in <module> from .metrics.scorer import check_scoring File "/gpfs2/well/win/projects/ukbiobank/fbp/bb_pipeline_v_2.0/bb_python/bb_python/lib/python3.5/site-packages/sklearn/metrics/__init__.py", line 7, in <module> from .ranking import auc File "/gpfs2/well/win/projects/ukbiobank/fbp/bb_pipeline_v_2.0/bb_python/bb_python/lib/python3.5/site-packages/sklearn/metrics/ranking.py", line 32, in <module> from ..utils.stats import rankdata File "/gpfs2/well/win/projects/ukbiobank/fbp/bb_pipeline_v_2.0/bb_python/bb_python/lib/python3.5/site-packages/sklearn/utils/stats.py", line 2, in <module> from scipy.stats import rankdata as _sp_rankdata File "/gpfs2/well/win/projects/ukbiobank/fbp/bb_pipeline_v_2.0/bb_python/bb_python/lib/python3.5/site-packages/scipy/stats/__init__.py", line 344, in <module> from .stats import * File "/gpfs2/well/win/projects/ukbiobank/fbp/bb_pipeline_v_2.0/bb_python/bb_python/lib/python3.5/site-packages/scipy/stats/stats.py", line 176, in <module> from . import distributions, mstats_basic, _stats File "/gpfs2/well/win/projects/ukbiobank/fbp/bb_pipeline_v_2.0/bb_python/bb_python/lib/python3.5/site-packages/scipy/stats/distributions.py", line 13, in <module> from . import _continuous_distns File "/gpfs2/well/win/projects/ukbiobank/fbp/bb_pipeline_v_2.0/bb_python/bb_python/lib/python3.5/site-packages/scipy/stats/_continuous_distns.py", line 17, in <module> from scipy._lib._numpy_compat import broadcast_to File "/gpfs2/well/win/projects/ukbiobank/fbp/bb_pipeline_v_2.0/bb_python/bb_python/lib/python3.5/site-packages/scipy/_lib/_numpy_compat.py", line 10, in <module> from numpy.testing.nosetester import import_nose ImportError: No module named 'numpy.testing.nosetester'
which seems to be an error replicated in this stack exchange posting, identified as a 3-way conflict with particular versions of
Notably, there is a difference in the versions specified in
bb_pipeline_v_2.0/initvars on the repo and on rescomp. On the
bb_pipeline_v_2.0 repo bb_python/python_installation/install_bb_python.sh only installs 10 packages, while the same
install_bb_python.sh file on Rescomp installs 62 packages. While both have
sklearn pinned versions that shouldn't create this problem, subsequent installs must bring up the version.
numpy repo pin: 1.11.1 numpy repo version after all installs: 1.12.0 numpy Rescomp pin: 1.12.0 numpy Rescomp version after all installs: 1.18.1
For scipy and scikit-learn, repo and rescomp versions are the same, and the pinned and final install also the same (0.18.0 and 0.17.1, respectively).
Once I switched to using my own
bb_python (using only the 10 packages) I no longer get that error. Instead I get another error :( one that I have less ability to diagnose.
(bb_python) [kfh142@rescomp1 gen_STRUCTMOTION]$ ./script_predict.py Traceback (most recent call last): File "./script_predict.py", line 36, in <module> prediction = final_model.predict(test_data_norm) File "/gpfs2/well/nichols/shared/UK_biobank_pipeline_v_1/bb_python/bb_python/lib/python3.5/site-packages/sklearn/utils/metaestimators.py", line 37, in <lambda> out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs) File "/gpfs2/well/nichols/shared/UK_biobank_pipeline_v_1/bb_python/bb_python/lib/python3.5/site-packages/sklearn/grid_search.py", line 435, in predict return self.best_estimator_.predict(X) File "/gpfs2/well/nichols/shared/UK_biobank_pipeline_v_1/bb_python/bb_python/lib/python3.5/site-packages/sklearn/linear_model/base.py", line 200, in predict return self._decision_function(X) File "/gpfs2/well/nichols/shared/UK_biobank_pipeline_v_1/bb_python/bb_python/lib/python3.5/site-packages/sklearn/linear_model/base.py", line 185, in _decision_function dense_output=True) + self.intercept_ File "/gpfs2/well/nichols/shared/UK_biobank_pipeline_v_1/bb_python/bb_python/lib/python3.5/site-packages/sklearn/utils/extmath.py", line 184, in safe_sparse_dot return fast_dot(a, b) ValueError: shapes (45480,9) and (11,) not aligned: 9 (dim 1) != 11 (dim 0)
which I assumed is linked to PREDICT_DATA having the wrong number of columns. I suspect there's a SGE error lurking here that I'm still tracking down.