Handle duplicate columns when merging input files
Given this command:
funpack out.tsv one.tsv two.tsv
if one.tsv
and two.tsv
both have a column with the same name, funpack
will crash during the import stage:
Traceback (most recent call last):
File "/opt/fmrib/fsl/bin/funpack", line 10, in <module>
sys.exit(main())
File "/opt/fmrib/fsltmp/fsl_59f8789b/fslpython/envs/fslpython/lib/python3.7/site-packages/funpack/mai
n.py", line 124, in main
dtable, unknowns, uncategorised, drop = doImport(args, mgr)
File "/opt/fmrib/fsltmp/fsl_59f8789b/fslpython/envs/fslpython/lib/python3.7/site-packages/funpack/mai
n.py", line 209, in doImport
dryrun=args.dry_run)
File "/opt/fmrib/fsltmp/fsl_59f8789b/fslpython/envs/fslpython/lib/python3.7/site-packages/funpack/imp
orting/core.py", line 196, in importData
dryrun=dryrun)
File "/opt/fmrib/fsltmp/fsl_59f8789b/fslpython/envs/fslpython/lib/python3.7/site-packages/funpack/imp
orting/core.py", line 364, in loadFiles
pool=pool)
File "/opt/fmrib/fsltmp/fsl_59f8789b/fslpython/envs/fslpython/lib/python3.7/site-packages/funpack/imp
orting/core.py", line 570, in loadFile
chunks = pool.starmap(func, offsets)
File "/opt/fmrib/fsltmp/fsl_59f8789b/fslpython/envs/fslpython/lib/python3.7/multiprocessing/pool.py",
line 276, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
File "/opt/fmrib/fsltmp/fsl_59f8789b/fslpython/envs/fslpython/lib/python3.7/multiprocessing/pool.py",
line 657, in get
raise self._value
ValueError: Duplicate names are not allowed.
Maybe the default behaviour should be to drop the column with less rows, and emit a warning to that effect.