Monitoring and Retrieving Extraction Jobs

Overview

When extract_batch() submits a table-exporter job, it runs asynchronously on the RAP cloud. The job_* functions let you monitor progress, inspect job history, and load results once the job completes.

Typical Workflow

library(ukbflow)

# 1. Submit extraction job
job_id <- extract_batch(c(31, 53, 21022, 22189), file = "ukb_demographics")

# 2. Wait for completion
job_wait(job_id)

# 3. Load result (RAP only)
df <- job_result(job_id)

Monitoring a Job

Check status

job_status() returns the current state of a job:

job_status(job_id)
#> job-XXXXXXXXXXXX
#>            done

Possible states:

State	Meaning
`idle`	Queued, waiting to be scheduled
`runnable`	Resources being allocated
`running`	Actively executing
`done`	Completed successfully
`failed`	Failed — see failure message
`terminated`	Manually terminated

For failed jobs, the error message is accessible via:

s <- job_status(job_id)
if (s == "failed") cli::cli_inform(attr(s, "failure_message"))

Wait for completion

job_wait() polls at regular intervals until the job reaches a terminal state:

job_wait(job_id)                    # wait indefinitely (default)
job_wait(job_id, interval = 60)     # poll every 60 seconds
job_wait(job_id, timeout = 7200)    # give up after 2 hours

job_wait() stops with an error if the job fails or is terminated, so you can safely chain it with job_result():

job_wait(job_id)
df <- job_result(job_id)

Retrieving Results

Get the file path

job_path() returns the /mnt/project/ path of the output CSV on RAP:

path <- job_path(job_id)
#> "/mnt/project/results/ukb_demographics.csv"

Use this to read the file directly or pass it to other tools:

df <- data.table::fread(job_path(job_id))

Load into R

job_result() combines job_path() and fread() in one step. Must be run inside the RAP environment:

df <- job_result(job_id)
# returns a data.table, e.g. 502353 rows x 5 cols (incl. eid)

Browsing Job History

job_ls() returns a summary of recent jobs:

job_ls()          # last 20 jobs
job_ls(n = 5)     # last 5 jobs

# Filter by state
job_ls(state = "failed")
job_ls(state = c("done", "failed"))

The result is a data.frame with columns:

Column	Description
`job_id`	Job ID, e.g. `job-XXXXXXXXXXXX`
`name`	Job name (typically `Table exporter`)
`state`	Current state
`created`	Job creation time (`POSIXct`)
`runtime`	Runtime string, e.g. `0:04:36` (`NA` if still running)

Getting Help

?job_status, ?job_wait, ?job_path, ?job_result, ?job_ls
vignette("extract") — submitting extraction jobs
GitHub Issues