This variation on req_perform_sequential()
performs multiple requests in
parallel. Never use it without req_throttle()
; otherwise it's too easy to
pummel a server with a very large number of simultaneous requests.
While running, you'll get a progress bar that looks like:
[working] (1 + 4) -> 5 -> 5
. The string tells you the current status of
the queue (e.g. working, waiting, errored, finishing) followed by (the
number of pending requests + pending retried requests) -> the number of
active requests -> the number of complete requests.
Limitations
The main limitation of req_perform_parallel()
is that it assumes applies
req_throttle()
and req_retry()
are across all requests. This means,
for example, that if request 1 is throttled, but request 2 is not,
req_perform_parallel()
will wait for request 1 before performing request 2.
This makes it most suitable for performing many parallel requests to the same
host, rather than a mix of different hosts. It's probably possible to remove
these limitation, but it's enough work that I'm unlikely to do it unless
I know that people would fine it useful: so please let me know!
Additionally, it does not respect the max_tries
argument to req_retry()
because if you have five requests in flight and the first one gets rate
limited, it's likely that all the others do too.
Usage
req_perform_parallel(
reqs,
paths = NULL,
pool = deprecated(),
on_error = c("stop", "return", "continue"),
progress = TRUE,
max_active = 10
)
Arguments
- reqs
A list of requests.
- paths
An optional character vector of paths, if you want to download the response bodies to disk. If supplied, must be the same length as
reqs
.- pool
. No longer supported; to control the maximum number of concurrent requests, set
max_active
.- on_error
What should happen if one of the requests fails?
stop
, the default: stop iterating with an error.return
: stop iterating, returning all the successful responses received so far, as well as an error object for the failed request.continue
: continue iterating, recording errors in the result.
- progress
Display a progress bar for the status of all requests? Use
TRUE
to turn on a basic progress bar, use a string to give it a name, or see progress_bars to customize it in other ways. Not compatible withreq_progress()
, as httr2 can only display a single progress bar at a time.- max_active
Maximum number of concurrent requests.
Value
A list, the same length as reqs
, containing responses and possibly
error objects, if on_error
is "return"
or "continue"
and one of the
responses errors. If on_error
is "return"
and it errors on the ith
request, the ith element of the result will be an error object, and the
remaining elements will be NULL
. If on_error
is "continue"
, it will
be a mix of requests and error objects.
Only httr2 errors are captured; see req_error()
for more details.
Examples
# Requesting these 4 pages one at a time would take 2 seconds:
request_base <- request(example_url()) |>
req_throttle(capacity = 100, fill_time_s = 60)
reqs <- list(
request_base |> req_url_path("/delay/0.5"),
request_base |> req_url_path("/delay/0.5"),
request_base |> req_url_path("/delay/0.5"),
request_base |> req_url_path("/delay/0.5")
)
# But it's much faster if you request in parallel
system.time(resps <- req_perform_parallel(reqs))
#> [finishing] (0 + ) -> 2 -> 2 | ■■■■■■■■■■■■■■■■ 50%
#> [finishing] (0 + ) -> 0 -> 4 | ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 100%
#> user system elapsed
#> 0.044 0.001 1.085
# req_perform_parallel() will fail on error
reqs <- list(
request_base |> req_url_path("/status/200"),
request_base |> req_url_path("/status/400"),
request("FAILURE")
)
try(resps <- req_perform_parallel(reqs))
#> Error in req_perform_parallel(reqs) : HTTP 400 Bad Request.
# but can use on_error to capture all successful results
resps <- req_perform_parallel(reqs, on_error = "continue")
# Inspect the successful responses
resps |> resps_successes()
#> [[1]]
#> <httr2_response>
#> GET http://127.0.0.1:33733/status/200
#> Status: 200 OK
#> Content-Type: text/plain
#> Body: None
#>
# And the failed responses
resps |> resps_failures() |> resps_requests()
#> [[1]]
#> <httr2_request>
#> GET http://127.0.0.1:33733/status/400
#> Body: empty
#> Policies:
#> • throttle_realm: "127.0.0.1"
#>
#> [[2]]
#> <httr2_request>
#> GET FAILURE
#> Body: empty
#>