The retry errors feature in web scraping agent allow us retry failed requests automatically, to increase the chance of successful data scraping. When, the retry-errors feature is enabled - Agenty will automatically retry the pages where 4xx-5xx status code is returned by the website you are crawling.
There may be several reasons of a web page returns the error code, like
5xx- Website down or temporary unavailable
4xx- Timed out, or IP address rate limited or blocked
- See more here - https://developer.mozilla.org/en-US/docs/Web/HTTP/Status#Client_error_responses
So, the fail-retry feature in web scraping agent is designed to keep in mind all those errors. And, when it’s in use - Agenty will retry the same request with another IP address, user-agent or geo-proxy the agent is configured for.
- Edit the agent by clicking on the Edit tab
- Scroll down to RETRY ERRORS section and Enable Failed Request Retry switch
- Set the Max retry(n) value between
- Set the Retry with Interval(seconds) value between
300seconds, If you want to delay few seconds before retrying the same request again
- You may also set the Max Time to Spent in Retry(seconds) to tell Agenty when to stop if the error continues
- Then Save the scraping agent configuration.
- Finally, Re-run you agent.
Agenty doesn’t take any extra page’s credit when retry-errors is enabled, and you’ll be charged only for successful requests. So, it’s best practice for web scraping to have this feature enabled in your scraping agent.
To find out what request was errored and re-tried by your scraping agent, you may find it in your agent logs. For example, if you see this screenshot - The
502 HTTP error was retried 4 times
Not just the status code. You can also use the advance rules option to define custom rules to tell Agenty when a request should be retried automatically.
For example, in this screenshot I added rule
selector-not-match : .price to retry the web-page scraping, if it doesn’t have the matching selector given in the value.