Monitoring and Retrieving Extraction Jobs

Overview

When extract_batch() submits a table-exporter job, it runs asynchronously on the RAP cloud. The job_* functions let you monitor progress, inspect job history, and load results once the job completes.


Typical Workflow

library(ukbflow)

# 1. Submit extraction job
job_id <- extract_batch(c(31, 53, 21022, 22189), file = "ukb_demographics")

# 2. Wait for completion
job_wait(job_id)

# 3. Load result (RAP only)
df <- job_result(job_id)

Monitoring a Job

Check status

job_status() returns the current state of a job:

job_status(job_id)
#> job-XXXXXXXXXXXX
#>            done

Possible states:

State Meaning
idle Queued, waiting to be scheduled
runnable Resources being allocated
running Actively executing
done Completed successfully
failed Failed — see failure message
terminated Manually terminated

For failed jobs, the error message is accessible via:

s <- job_status(job_id)
if (s == "failed") cli::cli_inform(attr(s, "failure_message"))

Wait for completion

job_wait() polls at regular intervals until the job reaches a terminal state:

job_wait(job_id)                    # wait indefinitely (default)
job_wait(job_id, interval = 60)     # poll every 60 seconds
job_wait(job_id, timeout = 7200)    # give up after 2 hours

job_wait() stops with an error if the job fails or is terminated, so you can safely chain it with job_result():

job_wait(job_id)
df <- job_result(job_id)

Retrieving Results

Get the file path

job_path() returns the /mnt/project/ path of the output CSV on RAP:

path <- job_path(job_id)
#> "/mnt/project/results/ukb_demographics.csv"

Use this to read the file directly or pass it to other tools:

df <- data.table::fread(job_path(job_id))

Load into R

job_result() combines job_path() and fread() in one step. Must be run inside the RAP environment:

df <- job_result(job_id)
# returns a data.table, e.g. 502353 rows x 5 cols (incl. eid)

Browsing Job History

job_ls() returns a summary of recent jobs:

job_ls()          # last 20 jobs
job_ls(n = 5)     # last 5 jobs

# Filter by state
job_ls(state = "failed")
job_ls(state = c("done", "failed"))

The result is a data.frame with columns:

Column Description
job_id Job ID, e.g. job-XXXXXXXXXXXX
name Job name (typically Table exporter)
state Current state
created Job creation time (POSIXct)
runtime Runtime string, e.g. 0:04:36 (NA if still running)

Getting Help