Input Types in Web Scraping Agent

Input Type can be used to connect the agents through the URL. There are 4 Input types in Agenty.

  1. Source URL Only
  2. Manual URLs
  3. Select a URL List
  4. URL From Source Agent

Source URL Only

When we create an agent from a URL, this URL is known as the source URL for that particular agent. There is a mandatory source URL. We can only edit the source URL but not remove it. For example, we have this source URL https://cdn.agenty.com/sample_content/list/ecommerce-product-list.html and created an agent with 4 fields (ProductName, ProductPrice, ProductImage, ProductCartLink) as shown in screenshot below.

Now, we can select the source URL manually.

Steps

  1. Go to your Scraping Agent page
  2. Click on the Input tab
  3. Now select the Input Type as “Source URL Only”
  4. Save the input configuration
  5. Now, re-run the agent to execute the job for the selected source URL.

Manual URLs

Manual URLs also used for extracting the bulk amount of data of different pages with the same structure provided by the link. For Example, I have these two URL:

  1. https://cdn.agenty.com/sample_content/list/simple-list.html
  2. https://cdn.agenty.com/sample_content/list/list-2.html.

If you see, the structure of given URLs are the same. So, I created the agent of the first URL https://cdn.agenty.com/sample_content/list/simple-list.html with 5 fields (URL, Name, Brand, Color, Price) as given in the screenshot below.

Before Manual URLs

Now, I put manually all URLs in my scraping agent(Manual URLs Example) to get the same fields.

Steps

  1. Go to your Scraping Agent page

  2. Click on the Input tab

  3. Now select the Input Type as “Manual URLs”

  4. Put another URL in URLs List

    ,

  5. Save the input configuration

  6. And, re-run the agent to execute the job for selected “Manual URLs”.

After Manual URLs

Now. If you see the updated result, the agent consists of another URL values.

,

Select a URL List

Select a URL List Input type allow us to create and manage large numbers of input/URLs in agents input, because we can’t enter a lot of URLs in manual input text area on agent page, which might freeze your browser due to size of in-memory text. This feature is helpful especially when we are scraping a big website with same structure and we have more than 5000 URLs list. For Example we have this scraping agent (“Select a URL List Example”)with 4 fields (URL, Title, Description, Keywords, Canonical).

,

Now we want to take more URLs field so, we are using input type Select a URL List.

Steps

  1. Click on the Input tab and select Input type as “Select a URL List”
  2. Click on the Create new list button to create a list, now you appear a list page
  3. Enter the list Name and then choose the delimited file to upload
  4. Select the “Delimiter” as per your file. For example, Comma(,) separated for CSV
  5. And click on check box of Has headers? if your file has the headers or un-check if no headers and Agenty will
    auto-generate the heading with names like Field1, Field2…
  6. Before uploading the file, you need to click on the Upload Preview button to ensure that Agenty is reading the file correctly with settings which you have applied
  7. If you see the data is populated correctly in table preview, click on the Confirm upload button to finally upload the file
  8. Now come back on Input tab page and Select the list which you want to use as input
  9. Finally, select the field which contains the URL in your list
    ,
  10. Save the input configuration
  11. And re-run the agent to see the updated result.

,

URL From Source Agent

URL From Source Agent input type can be used to connect List and Details agent. List scraping agent is source agent and Details scraping agent is used for extracting data individually using URL from the List scraping agent. It is also used for extracting the bulk amount of data of different pages provided by the link. For Example, I have this source URL https://news.ycombinator.com/news where the content is displaying by this URL, And if you look on the content then you find the different “Page URL” corresponding with “Website URL”. Now we create the scraping agent for both fields.

Steps

  1. Create the List agent with 2 fields Page\_URL and Website\_URL. Here is list agent id https://cloud.agenty.com/app/agents/34507ed25b
  2. Create the Details agent with 4 fields (Title, User\_name, Votes, Comments). Here is Details agent id https://cloud.agenty.com/app/agents/d76738cf2e
  3. Now go to Input tab in Details agent
  4. Select Input type as “URL from Source Agent”
  5. Select the List agent in select the Agent drop-down list
  6. Select “Collection1.Page_URL” in select the Field contains URL drop-down list
  7. Save the input changes
  8. And, re-run the agent to see the updated result. https://cloud.agenty.com/app/agents/d76738cf2e

Signup now to get 100 pages credit free

14 days free trial, no credit card required!