Age | Commit message (Collapse) | Author |
|
|
|
the database
Also add a -r or --reprocess command line option to reprocess runs which are
already in the database.
|
|
|
|
This commit updates submit-grid-jobs so that it keeps a database of jobs. This
allows the script to make sure that we only have a certain number of jobs in
the job queue at a single time and automatically resubmitting failed jobs. The
idea is that it can now be run once to add jobs to the database:
$ submit-grid-jobs ~/zdabs/SNOCR_0000010000_000_p4_reduced.xzdab.gz
and then be run periodically via crontab:
PATH=/usr/bin:$HOME/local/bin
SDDM_DATA=$HOME/sddm/src
DQXX_DIR=$HOME/dqxx
0 * * * * submit-grid-jobs --auto --logfile ~/submit.log
Similarly I updated cat-grid-jobs so that it uses the same database and can
also be run via a cron job:
PATH=/usr/bin:$HOME/local/bin
SDDM_DATA=$HOME/sddm/src
DQXX_DIR=$HOME/dqxx
0 * * * * cat-grid-jobs --logfile cat.log --output-dir $HOME/fit_results
I also updated fit so that it keeps track of the total time elapsed including
the initial fits instead of just counting the final fits.
|
|
I noticed that many of my jobs were failing with the following error:
module: command not found
My submit description files *should* only be selecting nodes with modules because of this line:
requirements = (HAS_MODULES =?= true) && (OSGVO_OS_STRING == "RHEL 7") && (OpSys == "LINUX")
which I think I got from
https://support.opensciencegrid.org/support/solutions/articles/12000048518-accessing-software-using-distributed-environment-modules.
I looked up what the =?= operator does and it's a case sensitive search. I also
found another site
(https://support.opensciencegrid.org/support/solutions/articles/5000633467-steer-your-jobs-with-htcondor-job-requirements)
which uses the normal == operator. Therefore, I'm going to switch to the ==
operator and hope that fixes the issue.
|
|
This commit updates the submit-grid-jobs script to use my version of splitext()
which removes the full extension from the filename. This fixes an issue where
the output HDF5 files had xzdab in the name whenever the input file had the
file extension .xzdab.gz.
|
|
This commit updates the fit program to accept a particle combo from the command
line so you can fit for a single particle combination hypothesis. For example
running:
$ ./fit ~/zdabs/mu_minus_700_1000.hdf5 -p 2020
would just fit for the 2 electron hypothesis.
The reason for adding this ability is that my grid jobs were getting evicted
when fitting muons in run 10,000 since it takes 10s of hours to fit for all the
particle hypothesis. With this change, and a small update to the
submit-grid-jobs script we now submit a single grid job per particle
combination hypothesis which should make each grid job run approximately 4
times faster.
|
|
|
|
This commit updates zdab-cat to output each event as an individual YAML
document. The advantage of this is that we can then iterate over these without
loading the entire YAML document in submit-grid-jobs which means that we won't
use GB of memory on the grid submission node.
|
|
|
|
This commit updates submit-grid-jobs so that it keeps track of which files it's
already submitted grid jobs for.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|