Infinite Scrolling, Load More and Next Click Pagination in Web Scraping

Pagination is common technique used by web developers to display the large set of products or items in search/list website pages, instead loading the entire set of products on single page load event.

Setting up pagination to click on the next button (or infinite scroll, load more) to scrape multiple pages in Agenty is very easy and doesn’t require any technical skills in most of the cases.

In this article, I will discuss the different ways of pagination used on websites + some pro techniques to share how to configure your web scraping agent to automatically paginate; and scrape the data from websites with pagination.

Options

  • Enable pagination : True/False
  • Pagination type : Click, Infinite-Scroll or Load-More — The type of pagination you want to run in your scraping agent
  • Next page selector : The unique CSS selector of Next button — The agent will click on that button to paginate until that button is hidden or disabled
  • Script : Advance JavaScript expression for developer to write your own code for pagination to handle complex sites.
  • Page limits : Maximum number of pages needs to be paginated — The maximum number can be anything like 100 or 1000 but the web scraper will exit the pagination if “Next” button not found, or disabled, or reached end of page. So the web scraping with pagination will keep running until it reaches to maximum pages limit you set or next button invisible/disabled on web page.

Next Button Pagination

Next button pagination is most commonly used pagination in many websites and has a Button (or hyperlink) with “Next” option to click and go to next page. For example, this web-page in screenshot :

  • It has a next button right in the bottom of the page.
  • If you use the Agenty chrome extension and click on the button, you can easily find the CSS selector of this button or can view it in source/inspect element if you are friendly with Chrome Developer tools

Go to the web page you want to crawl and find the unique CSS selector of next button using Agenty Chrome extension or manually by inspecting the element in Chrome Developer Tool if you are a developer like me :)

For example, I am using the extension in below example and found a.next is the unique selector for next button in this page to click.

next button pagination web scraping

Configure Pagination

  • Go to your scraping agent page and click on the Edit tab, will take you to advance agent editor as in this screenshot below.

edit the scraper

  • Scroll down to find the Pagination section and enable the pagination switch
  • Select the pagination type : Click
  • Enter the Next button selector in “Next page CSS selector” box
  • Then, enter the “Max pages” value to limit the maximum number of pages to scrape

configure next button pagination to scrape multiple pages

  • Once the Pagination configuration is completed, save the agent (or scraper if you call it that) and re-run to scrape the data from multiple pages automatically.

Infinite Scrolling Pagination

Infinite scrolling is a web-designing technique to loads the content on list pages continuously as the user scrolls down the page in browser, eliminating the need for pagination with next-previous buttons. This is mostly done using front end frameworks like JavaScript, Jquery, AJAX, AngularJS, ReactJS, Vue.js etc, and the output for those request are mostly in JSON or XML format.

So, a typical infinite scrolling page send a HTTP GET or POST request to server in background, to fetch the data. Then, the response handler function parse the response and append to list/search container on web-page to keeps showing more and more items when the user scroll down to the page.

So scraping data from infinite scrolling pages will be bit different then usual next-previous pagination we see in start of this post, where we just clicked on Next button to load the next page and continue scraping until it was not there.

Infinite Scrolling Website

So, to start with infinite scrolling web-pages scraping follow these steps :

  • Edit your scraping agent and enable the Pagination
  • Select the pagination type : Infinite-Scroll
  • In the next page CSS selector option — Leave it blank, if no selector to enter. Or enter the particular element selector if you want Agenty to scroll/mouse over to somewhere specific, instead scrolling to the bottom of page. By default, Agenty will go to end of the page.
  • Max pages : Set the maximum number of scrolling to limit how many pages you want to scrape with infinite scrolling

Infinite Scrolling Website Scraping

  • Just save the agent and run it to scrape data from infinite scrolling website.

Infinite Scrolling Website Scraping Output

  • If you want to try it out — The scraping agent is available in demo agents with name as “Quotes- Infinite scrolling pagination”. Just clone it in your account and learn how to crawl an infinite scrolling AJAX websites.

Load More Pagination

The Load more pagination is almost same as infinite scroll, with the only difference is you will see a Load More or View More button on the page end.

So, instead of keeps scrolling down, we need to click on Load more button as well to load more items on webpage —

Load More Pagination Scraping

Follow these steps below to scrape data from pages with Load-more pagination

  • Select the pagination type : Load-More
  • Enter the button CSS selector, where Agenty will click to load more items
  • Set the max pages limit(n) to tell Agenty how many pages should be crawled at maximum

Load More Pagination Configuration

Pagination with JavaScript Injection

If you are professional web scraper — You know the web is vast, and not all websites are same to scrape in terms of complexity, technique requires. Sometime you need to wait few seconds, before starting the pagination to look more realistic human(click on next button) and sometime you need to wait for particular element to be visible before starting scraping pages behind pagination.

So, having a JavaScript option injection in scraping agent allow developers to write their own code and insert in page, to control the full pagination feature in website scraping. Just bring your own code and logic to tell Agenty —

  • What element to wait/ or watch for
  • Where to click/hover for pagination
  • When to stop the pagination (or exit to continue on next input URL)

That’s it. Nothing gets in your way.

Test your Script

  • Go to the page you are crawling
  • Open Developer tools in Chrome and go to Sources tab
  • Click on the Snippets option and New snippets to open the code editor
  • Here, you can write and debug your script. Press Ctrl + Enter to execute it or you may click on the Play icon in bottom right corner of code editor as in this screenshot.

Test JavaScript in Chrome Developer Snippets

    var element = document.querySelector(".next a");
    if(element.getAttribute("href") != "#")
    {
        element.click(); // Click on Next button
    }
    else
    {
       throw "No more pages" // Exit the pagination
    }

Apply script in agent

  • Go to the agent page
  • Enable pagination and select the appropriate pagination type
  • Enter the JavaScript code to execute your custom JS function, instead Agenty built-in module for pagination

Remember: It’s important that you select — The right pagination type to tell Agenty if more data will be appeared on same page(infinite-scroll pagination) or on the next page(click pagination) to handle the data extraction accordingly.

Configure JavaScript for Web Scraping Pagination

  • Save your agent configuration and re-run it.