Input Types in Web Scraping Agent

Input Type can be used to connect an agents through the URL. There are 4 Input types in Agenty.

  1. Source URL Only
  2. Manual URLs
  3. Select a URL List
  4. URL From Source Agent

Source URL Only

When we create an agent from URL, this URL is known as source URL for that particular agent. There is mandatory of source URL. We can only edit the source URL but not remove. For Example, we have this source URL https://cdn.agenty.com/sample_content/list/ecommerce-product-list.html and we have create an agent with 4 fields (ProductName, ProductPrice, ProductImage, ProductCartLink) as shown in screenshot below.

Source URL Example-before,

Now, we can select source URL manually.

Steps

  1. Go to your Scraping Agent page
  2. Click on the Input tab
  3. Now select the Input Type as "Source URL Only"
    Capture,
  4. Save the input configuration
  5. Now, re-run the agent to execute the job for selected source URL.

Manual URLs

Manual URL's also used for extracting the bulk amount of data of different pages with the same structure provided by the link. For Example, I have these two URL:

  1. https://cdn.agenty.com/sample_content/list/simple-list.html
  2. https://cdn.agenty.com/sample_content/list/list-2.html.

If you see the structure of given URL's are same. So, I create the agent of first URL https://cdn.agenty.com/sample_content/list/simple-list.html with 5 fields (URL, Name, Brand, Color, Price) as given in screenshot below.

Before Manual URLs

Manual URLs Example-before,

Now, I put manually all URL's in my scraping agent ("Manual URLs Example") to get the same fields.

Steps

  1. Go to your Scraping Agent page

  2. Click on the Input tab

  3. Now select the Input Type as "Manual URL's"

  4. Put another URL's in URL's List

    Capture,

  5. Save the input configuration

  6. And, re-run the agent to execute the job for selected "Manual URL's".

After Manual URLs

Now. If you see the updated result, the agent consist of another URL's values.

after,

Select a URL List

Select a URL List Input type allow us to create and manage large numbers of input/URLs in agents input, because we can't enter a lot of URLs in manual input text area on agent page, which might freeze your browser due to size of in-memory text. This feature is helpful especially when we are scraping a big website with same structure and we have more than 5000 URL's list. For Example we have this scraping agent ("Select a URL List Example")with 4 fields (URL, Title, Description, Keywords, Canonical).

Select a URL List Example-before,

Now we want to take more URL's field so, we are using input type Select a URL List.

Steps

  1. Click on the Input tab and select Input type as "Select a URL List"
  2. Click on the Create new list button to create a list, now you appear a list page
  3. Enter the list Name and then choose the delimited file to upload
  4. Select the "Delimiter" as per your file. For example, Comma(,) separated for CSV
  5. And click on check box of Has headers? if your file has the headers or un-check if no headers and Agenty will
    auto-generate the heading with names like Field1, Field2......
  6. Before uploading the file, you need to click on the Upload Preview button to ensure that Agenty is reading the file correctly with settings which you have applied
  7. If you see the data is populated correctly in table preview, click on the Confirm upload button to finally upload the file
  8. Now come back on Input tab page and Select the list which you want to use as input
  9. Finally, select the field which contains the URL in your list
    Capture,
  10. Save the input configuration
  11. And re-run the agent to see the updated result.

Select a URL List Example-after,

URL From Source Agent

URL From Source Agent input type can be used to connect List and Details agent. List scraping agent is source agent and Details scraping agent is used for extracting data individually using URL from the List scraping agent. It is also used for extracting the bulk amount of data of different pages provided by the link. For Example, I have this source URL https://news.ycombinator.com/news where the content is displaying by this URL, And if you look on the content then you find the different "Page URL" corresponding with "Website URL". Now we create the scraping agent for both fields.

Steps

  1. Create the List agent with 2 fields Page\_URL and Website\_URL. Here is list agent id https://cloud.agenty.com/app/agents/34507ed25b
  2. Create the Details agent with 4 fields (Title, User\_name, Votes, Comments). Here is Details agent id https://cloud.agenty.com/app/agents/d76738cf2e
  3. Now go to Input tab in Details agent
  4. Select Input type as "URL from Source Agent"
  5. Select the List agent in select the Agent drop-down list
  6. Select "Collection1.Page_URL" in select the Field contains URL drop-down list
    details input
  7. Save the input changes
  8. And, re-run the agent to see the updated result. https://cloud.agenty.com/app/agents/d76738cf2e