Throttling features in scraping agent allow us to add few seconds delay in batch URL scraping. Using this feature, you can slow-down your crawler while scraping a complex website; or to follow their Crawl-delay guidelines given in robots.txt
The website scraping agent is blazing fast, and can make multiple concurrent requests to make the most of the capability under your pricing plan and the power of cloud machines backed by Agenty. However, with great power comes with great responsibility and the first-rule of website scraping is not to harm the target website.
So, we can use the Throttling feature to add few seconds delay in sequential requests. There are two types of Delay options available in Agenty to configure your scraping agent to wait (n) seconds before making a 2nd request to scrape the data.
- Fixed Delay
- Random Delay
The fix number of seconds to wait after each sequential request in a web scraping job.
If you see the screenshot below, the Throttling feature is not used in this agent, so there are no delay and the scraping agent is running continuously for all the input URLs one-by-one.
Steps to configure Throttling in a scraping agent
Go to your agent page, and edit the agent by clicking on the Edit tab
Scroll down to Advance option and find to
Throttlingsection and Enable Request Throttling switch
Select the Delay Type as Fixed delay from drop-down
Now enter the
seconds to delayvalue
Then Save the scraping agent configuration.
Finally, re-run your agent to see the difference.
We can see the agent logs to ensure the agent is waiting for given interval time before making the next request, as in screenshot below:
Random Delay option is used to add random seconds delay after each request. Agenty will automatically generate a random number between
0 and the value you enter in Seconds of delay option, and then use that number to delay between pages you are scraping.
Follow the same steps as in Fixed delay, but select the Random delay option instead in Type of delay drop-down as in screenshot below :
Now, if you read the agent logs, it says waiting 7 seconds on first URL, and then waiting 6 seconds for second URL. So, Agenty will keep generating the random number between
10 (maximum value given) and apply that number to delay automatically.