Perform a list of requests in parallel — req_perform

This variation on req_perform_sequential() performs multiple requests in parallel. Never use it without req_throttle(); otherwise it's too easy to pummel a server with a very large number of simultaneous requests.

While running, you'll get a progress bar that looks like: [working] (1 + 4) -> 5 -> 5. The string tells you the current status of the queue (e.g. working, waiting, errored) followed by (the number of pending requests + pending retried requests) -> the number of active requests -> the number of complete requests.

Limitations

The main limitation of req_perform_parallel() is that it assumes applies req_throttle() and req_retry() are across all requests. This means, for example, that if request 1 is throttled, but request 2 is not, req_perform_parallel() will wait for request 1 before performing request 2. This makes it most suitable for performing many parallel requests to the same host, rather than a mix of different hosts. It's probably possible to remove these limitation, but it's enough work that I'm unlikely to do it unless I know that people would fine it useful: so please let me know!

Additionally, it does not respect the max_tries argument to req_retry() because if you have five requests in flight and the first one gets rate limited, it's likely that all the others do too. This also means that the circuit breaker is never triggered.

Usage

req_perform_parallel(
  reqs,
  paths = NULL,
  pool = deprecated(),
  on_error = c("stop", "return", "continue"),
  progress = TRUE,
  max_active = 10
)

Arguments

reqs

A list of requests.

paths

An optional character vector of paths, if you want to download the response bodies to disk. If supplied, must be the same length as reqs.

pool

. No longer supported; to control the maximum number of concurrent requests, set max_active.

on_error

What should happen if one of the requests fails?

stop, the default: stop iterating with an error.
return: stop iterating, returning all the successful responses received so far, as well as an error object for the failed request.
continue: continue iterating, recording errors in the result.

progress

Display a progress bar for the status of all requests? Use TRUE to turn on a basic progress bar, use a string to give it a name, or see progress_bars to customize it in other ways. Not compatible with req_progress(), as httr2 can only display a single progress bar at a time.

max_active

Maximum number of concurrent requests.

Value

A list, the same length as reqs, containing responses and possibly error objects, if on_error is "return" or "continue" and one of the responses errors. If on_error is "return" and it errors on the ith request, the ith element of the result will be an error object, and the remaining elements will be NULL. If on_error is "continue", it will be a mix of requests and error objects.

Only httr2 errors are captured; see req_error() for more details.

Examples

# Requesting these 4 pages one at a time would take 2 seconds:
request_base <- request(example_url()) |>
  req_throttle(capacity = 100, fill_time_s = 60)
reqs <- list(
  request_base |> req_url_path("/delay/0.5"),
  request_base |> req_url_path("/delay/0.5"),
  request_base |> req_url_path("/delay/0.5"),
  request_base |> req_url_path("/delay/0.5")
)
# But it's much faster if you request in parallel
system.time(resps <- req_perform_parallel(reqs))
#> [working] (0 + 0) -> 2 -> 2 | ■■■■■■■■■■■■■■■■                  50%
#> [working] (0 + 0) -> 0 -> 4 | ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■  100%
#>    user  system elapsed 
#>   0.049   0.001   1.063 

# req_perform_parallel() will fail on error
reqs <- list(
  request_base |> req_url_path("/status/200"),
  request_base |> req_url_path("/status/400"),
  request("FAILURE")
)
try(resps <- req_perform_parallel(reqs))
#> Error in req_perform_parallel(reqs) : HTTP 400 Bad Request.

# but can use on_error to capture all successful results
resps <- req_perform_parallel(reqs, on_error = "continue")

# Inspect the successful responses
resps |> resps_successes()
#> [[1]]
#> <httr2_response>
#> GET http://127.0.0.1:44111/status/200
#> Status: 200 OK
#> Content-Type: text/plain
#> Body: None
#> 

# And the failed responses
resps |> resps_failures() |> resps_requests()
#> [[1]]
#> <httr2_request>
#> GET http://127.0.0.1:44111/status/400
#> Body: empty
#> Policies:
#> • throttle_realm: "127.0.0.1"
#> 
#> [[2]]
#> <httr2_request>
#> GET FAILURE
#> Body: empty
#>