Scraping Agent

Agenty "Scraping Agent" is a container which holds the configuration like fields, selectors, URLs etc of a particular website scraping.The scraping agent can be used to extract data from Public Websites, Password-protected websites, Sitemaps, RSS feeds, XML pages, web APIs, flat JSON files and many more sources.

The scraping agent can be created using our Chrome Extension available on Chrome store

One Agent for One Website

A single scraping agent can extract data from millions of similar structured pages by adding a URL List in agent input or can use advance features like pagination, password-protected site crawling by supplying the credentials automatically and scripting to clean, validate or manipulate the result data.

Most of the website has their own different structure to display the page content and body, so a single scraping agent can extract the data from a particular website only, where it was setup. But can extract any number of pages with similar structure using pagination or by adding a URL list.

One Agent for Many Website

There are cases where one agent can be used for many websites as well, for example :

  1. Meta tags :  If you are looking to extract the meta tags (title, description, canonical etc) used for SEO purpose from thousands of websites. You can use the same agent fro all the websites because every website has the same structure for those tags
  2. Structured Data :  Google and most of the other search engine uses the structured data to display instant result on search engine about an organization, product, review, rating and many more and most of the popular websites uses the structured data markup to display the information on their websites to ensure their website is search engine friendly and better rank. So, you can create an agent to extract the structured data and then can use the same agent for many websites which is using the structured data
    <script type="application/ld+json">
    {
      "@context": "http://schema.org",
      "@type": "Organization",
      "url": "http://www.example.com",
      "name": "Unlimited Ball Bearings Corp.",
      "contactPoint": {
        "@type": "ContactPoint",
        "telephone": "+1-401-555-1212",
        "contactType": "Customer service"
      }
    }
    </script>