Throttling in Web Scraping

Throttling features in scraping agent allow us to add few seconds delay in batch URL scraping. Using this feature, you can slow-down your crawler while scraping a complex website; or to follow their Crawl-delay guidelines given in robots.txt

The website scraping agent is blazing fast, and can make multiple concurrent requests to make the most of the capability under your pricing plan and the power of cloud machines backed by Agenty. However, with great power comes with great responsibility and the first-rule of website scraping is not to harm the target website.

So, we can use the Throttling feature to add few seconds delay in sequential requests. There are two types of Delay options available in Agenty to configure your scraping agent to wait (n) seconds before making a 2nd request to scrape the data.

  1. Fixed Delay
  2. Random Delay

Fixed Delay

The fix number of seconds to wait after each sequential request in a web scraping job.

Before

If you see the screenshot below, the Throttling feature is not used in this agent, so there are no delay and the scraping agent is running continuously for all the input URLs one-by-one.

scraping data before throttling

Steps to configure Throttling in a scraping agent

Steps

  1. Go to your agent page, and edit the agent by clicking on the Edit tab

  2. Scroll down to Advance option and find to Throttling section and Enable Request Throttling switch

  3. Select the Delay Type as Fixed delay from drop-down

  4. Now enter the seconds to delay value

  5. Then Save the scraping agent configuration.

    fixed seconds throttlin

  6. Finally, re-run your agent to see the difference.

After

We can see the agent logs to ensure the agent is waiting for given interval time before making the next request, as in screenshot below:

scraping data with throttling

Random Delay

Random Delay option is used to add random seconds delay after each request. Agenty will automatically generate a random number between 0 and the value you enter in Seconds of delay option, and then use that number to delay between pages you are scraping.

Steps

Follow the same steps as in Fixed delay, but select the Random delay option instead in Type of delay drop-down as in screenshot below :

random throttling feature

After

Now, if you read the agent logs, it says waiting 7 seconds on first URL, and then waiting 6 seconds for second URL. So, Agenty will keep generating the random number between 0 and 10 (maximum value given) and apply that number to delay automatically.

random seconds delay in web scraping