This function pauses execution of an R script while a scheduled qsub job is not yet complete.
Source:R/hpc_functions.R
wait_for_job.RdIt is intended to give you control over job dependencies within R when the formal PBS depend approach is insufficient, especially in the case of a script that spawns child jobs that need to be scheduled or complete before the parent script should continue.
Usage
wait_for_job(
job_ids,
repolling_interval = 60,
max_wait = 60 * 60 * 24,
scheduler = "local",
quiet = TRUE,
stop_on_timeout = TRUE
)Arguments
- job_ids
One or more job ids of existing PBS or slurm jobs, or process ids of a local process for
scheduler="sh".- repolling_interval
How often to recheck the job status, in seconds. Default: 30
- max_wait
How long to wait on the job before giving up, in seconds. Default: 24 hours (86,400 seconds)
- scheduler
What scheduler is used for job execution. Options: c("torque", "qsub", "slurm", "sbatch", "sh", "local")
- quiet
If
TRUE,wait_for_jobwill not print out any status updates on jobs. IfFALSE, the function prints out status updates for each tracked job so that the user knows what's holding up progress.- stop_on_timeout
Logical. If
TRUE, the function throws an error if themax_waitis exceeded. IfFALSE, it returnsFALSEinstead of stopping. Default isTRUE.
Value
Returns (invisibly) TRUE if all jobs completed successfully, FALSE if any job failed or timeout occurred
and stop_on_timeout = FALSE. Otherwise, stops execution with an error if the timeout is exceeded.
Details
Note that for the scheduler argument, "torque" and "qsub" are the same;
"slurm" and "sbatch" are the same, and "sh" and "local" are the same.
Examples
if (FALSE) { # \dontrun{
# example on qsub/torque cluster
wait_for_job("7968857.torque01.util.production.int.aci.ics.psu.edu", scheduler = "torque")
# example of waiting for two jobs on slurm cluster
wait_for_job(c("24147864", "24147876"), scheduler = "slurm")
# example of waiting for two jobs on local machine
wait_for_job(c("9843", "9844"), scheduler = "local")
} # }