Creating a scraping Agent

Agenty scraping agents are easy and powerful suite of website scraping. Using the Agenty scraping agent, you can :

  1. Create and host web scraping agents online
  2. Enter the URL manually in input or upload a CSV file for batch crawling
  3. Start, schedule and paginate to extract data automatically
  4. Scrape data anonymously using our managed distributed servers with thousands of proxies on cloud
  5. Crawl password protected websites easily
  6. Get email alerts when your extraction jobs completes or configure a webhook, to post the scraped data on your server automatically
  7. Use the REST API to start, schedule, fetch the scraped data, change urls and more..

To setup a scraping agent, you'd need to install Agenty Chrome Extension

Create Agents

Go to the webpage you want to scrape, then launch the extension and it will display a panel in right side as in the this screenshot.

Once the extension panel is up and visible, Click on the New button to add a field and give a name to your field as I did - and give ProductName to my first field. Then click on the (asterisk) button to enable the point-and-click feature to generate automatic CSS selectors when you click on the HTML element you want to scrape. For example, I want to scrape the name of products in this field. So, I clicked on the product name, and the extension generated the selector and highlighted the other matching products with same selector in the list.

Sometime you may see other matching items might be selected due to same CSS class or selector - So you can click on the yellow highlighted items to deselect them or can also write your selector manually by learning from here.

The extension will highlight the matching result, and will also show you the result preview. Once you are satisfied with the result and the number of records looks per your expectation, click on the Accept button to save that field in your agent.

Capture 1

Now, follow the same process, to add as many fields as you want for text, attribute or html items to scrape anything from html pages. If you want to extract the link, image or any other attribute from the HTML tag, then you can use the ATTR option from the Extract drop down, which will display a new text box where, you can enter the name of the attribute to extract instead simple TEXT or HTML.

For example -

  • Image Scraping :- In case of Images I want to extract the "src" value, so after generating my selector I selected the ATTR option and entered "src" in the corresponding text box, to tell the extractor that I need the value of src in output instead the entire HTML for images scraping.
  • Link Scraping:- To scrape URL links - Write your selector and then select the ATTR option, and entered href in the corresponding text box, to tell the extractor that you need the value of href in output instead the entire HTML or text.
  • The ATTR (attribute) option is powerful extractor feature, and can be used to extract any attribute from a HTML tag.

Capture

Once you are done with all the fields setup, click on Done button and the below dialog box will appear. Now enter the API Id and Admin API Key in text boxes under Send to Cloud Hosted App and click on the Save button, the agent will be created in your online account. (If you don't have the API id and key, you can get one by logging in and then go to your account page in hosted app online)

The API Id and Key is stored in your chrome local storage when you enter it first time to remember in future. If you want to change any time later, just paste again, or the same will be used forever.

Capture

Once the agent is created, you can click on the link in success message which will take you on the agent page, for start, schedule and further configuration to manage and automate your data collection using the hosted scraping app online.

Capture