This variation on req_perform_sequential()
performs multiple requests in
parallel. Never use it without req_throttle()
; otherwise it's too easy to
pummel a server with a very large number of simultaneous requests.
While running, you'll get a progress bar that looks like:
[working] (1 + 4) -> 5 -> 5
. The string tells you the current status of
the queue (e.g. working, waiting, errored) followed by (the
number of pending requests + pending retried requests) -> the number of
active requests -> the number of complete requests.
Limitations
The main limitation of req_perform_parallel()
is that it assumes applies
req_throttle()
and req_retry()
are across all requests. This means,
for example, that if request 1 is throttled, but request 2 is not,
req_perform_parallel()
will wait for request 1 before performing request 2.
This makes it most suitable for performing many parallel requests to the same
host, rather than a mix of different hosts. It's probably possible to remove
these limitation, but it's enough work that I'm unlikely to do it unless
I know that people would fine it useful: so please let me know!
Additionally, it does not respect the max_tries
argument to req_retry()
because if you have five requests in flight and the first one gets rate
limited, it's likely that all the others do too. This also means that
the circuit breaker is never triggered.
Usage
req_perform_parallel(
reqs,
paths = NULL,
pool = deprecated(),
on_error = c("stop", "return", "continue"),
progress = TRUE,
max_active = 10
)
Arguments
- reqs
A list of requests.
- paths
An optional character vector of paths, if you want to download the response bodies to disk. If supplied, must be the same length as
reqs
.- pool
. No longer supported; to control the maximum number of concurrent requests, set
max_active
.- on_error
What should happen if one of the requests fails?
stop
, the default: stop iterating with an error.return
: stop iterating, returning all the successful responses received so far, as well as an error object for the failed request.continue
: continue iterating, recording errors in the result.
- progress
Display a progress bar for the status of all requests? Use
TRUE
to turn on a basic progress bar, use a string to give it a name, or see progress_bars to customize it in other ways. Not compatible withreq_progress()
, as httr2 can only display a single progress bar at a time.- max_active
Maximum number of concurrent requests.
Value
A list, the same length as reqs
, containing responses and possibly
error objects, if on_error
is "return"
or "continue"
and one of the
responses errors. If on_error
is "return"
and it errors on the ith
request, the ith element of the result will be an error object, and the
remaining elements will be NULL
. If on_error
is "continue"
, it will
be a mix of requests and error objects.
Only httr2 errors are captured; see req_error()
for more details.
Examples
# Requesting these 4 pages one at a time would take 2 seconds:
request_base <- request(example_url()) |>
req_throttle(capacity = 100, fill_time_s = 60)
reqs <- list(
request_base |> req_url_path("/delay/0.5"),
request_base |> req_url_path("/delay/0.5"),
request_base |> req_url_path("/delay/0.5"),
request_base |> req_url_path("/delay/0.5")
)
# But it's much faster if you request in parallel
system.time(resps <- req_perform_parallel(reqs))
#> [working] (0 + 0) -> 2 -> 2 | ■■■■■■■■■■■■■■■■ 50%
#> [working] (0 + 0) -> 0 -> 4 | ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 100%
#> user system elapsed
#> 0.05 0.00 1.09
# req_perform_parallel() will fail on error
reqs <- list(
request_base |> req_url_path("/status/200"),
request_base |> req_url_path("/status/400"),
request("FAILURE")
)
try(resps <- req_perform_parallel(reqs))
#> Error in req_perform_parallel(reqs) : HTTP 400 Bad Request.
# but can use on_error to capture all successful results
resps <- req_perform_parallel(reqs, on_error = "continue")
# Inspect the successful responses
resps |> resps_successes()
#> [[1]]
#> <httr2_response>
#> GET http://127.0.0.1:41919/status/200
#> Status: 200 OK
#> Content-Type: text/plain
#> Body: None
#>
# And the failed responses
resps |> resps_failures() |> resps_requests()
#> [[1]]
#> <httr2_request>
#> GET http://127.0.0.1:41919/status/400
#> Body: empty
#> Policies:
#> • throttle_realm: "127.0.0.1"
#>
#> [[2]]
#> <httr2_request>
#> GET FAILURE
#> Body: empty
#>