Perform a list of requests in parallel — req_perform

This variation on req_perform_sequential() performs multiple requests in parallel. Exercise caution when using this function; it's easy to pummel a server with many simultaneous requests. Only use it with hosts designed to serve many files at once, which are typically web servers, not API servers.

req_perform_parallel() has a few limitations:

Will not retrieve a new OAuth token if it expires part way through the requests.
Does not perform throttling with req_throttle().
Does not attempt retries as described by req_retry().
Only consults the cache set by req_cache() before/after all requests.

If any of these limitations are problematic for your use case, we recommend req_perform_sequential() instead.

Usage

req_perform_parallel(
  reqs,
  paths = NULL,
  pool = NULL,
  on_error = c("stop", "return", "continue"),
  progress = TRUE
)

Arguments

reqs

A list of requests.

paths

An optional list of paths, if you want to download the request bodies to disks. If supplied, must be the same length as reqs.

pool

Optionally, a curl pool made by curl::new_pool(). Supply this if you want to override the defaults for total concurrent connections (100) or concurrent connections per host (6).

on_error

What should happen if one of the requests fails?

stop, the default: stop iterating with an error.
return: stop iterating, returning all the successful responses received so far, as well as an error object for the failed request.
continue: continue iterating, recording errors in the result.

progress

Display a progress bar? Use TRUE to turn on a basic progress bar, use a string to give it a name, or see progress_bars to customise it in other ways.

Value

A list, the same length as reqs, containing responses and possibly error objects, if on_error is "return" or "continue" and one of the responses errors. If on_error is "return" and it errors on the ith request, the ith element of the result will be an error object, and the remaining elements will be NULL. If on_error is "continue", it will be a mix of requests and error objects.

Only httr2 errors are captured; see req_error() for more details.

Examples

# Requesting these 4 pages one at a time would take 2 seconds:
request_base <- request(example_url())
reqs <- list(
  request_base |> req_url_path("/delay/0.5"),
  request_base |> req_url_path("/delay/0.5"),
  request_base |> req_url_path("/delay/0.5"),
  request_base |> req_url_path("/delay/0.5")
)
# But it's much faster if you request in parallel
system.time(resps <- req_perform_parallel(reqs))
#>    user  system elapsed 
#>   0.373   0.658   1.031 

# req_perform_parallel() will fail on error
reqs <- list(
  request_base |> req_url_path("/status/200"),
  request_base |> req_url_path("/status/400"),
  request("FAILURE")
)
try(resps <- req_perform_parallel(reqs))
#> Error in req_perform_parallel(reqs) : Could not resolve host: FAILURE

# but can use on_error to capture all successful results
resps <- req_perform_parallel(reqs, on_error = "continue")

# Inspect the successful responses
resps |> resps_successes()
#> [[1]]
#> <httr2_response>
#> GET http://127.0.0.1:37349/status/200
#> Status: 200 OK
#> Content-Type: text/plain
#> Body: None
#> 

# And the failed responses
resps |> resps_failures() |> resps_requests()
#> [[1]]
#> <httr2_request>
#> GET http://127.0.0.1:37349/status/400
#> Body: empty
#> 
#> [[2]]
#> <httr2_request>
#> GET FAILURE
#> Body: empty
#>