Join Two or More Extracted Rows into Single Cell

The join option in scraping agent allows you to combine multiple extracted values into one cell. This option is helpful especially when you are scraping some element with multiple matches and wants to combine that into a single delimited string.

For example, if you are scraping a product website and the product page displays multiple categories, sizes, images or colour variants scraping. The scraping agent will display each result in separate row by default. So, having an option to join two or more extracted result is helpful in transforming the data in desired format. So, we can use the JoinResult option to combine all matches into a single cell to make our data table as one-product, one-row.

The default join delimiter is comma(,). And you may also pass a custom delimiter using JoinDelimiter Post-processing function to tell Agenty what delimiter should be used to club the values.

Example

If we see this product page screenshot, the product has the category as Home > Books > Poetry and then the book name. And using the .breadcrumb a selector extracted 3 matches in separate rows, while we have product_name and price on 1st row.

http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html

multiple matches scraping

Before

JoinResult_Before

Steps

  1. Edit the scraping agent by clicking on the Edit tab
  2. Go to the field you want to join. In this case Category and then enable the JoinResult switch
    Enable the join result option in scraper
  3. Then Save the scraping agent configuration
  4. And finally, re-run the scraping agent to apply the changes.

After

After executing your web scraping agent, you'll see that the field result will be joined in single cell. As in this screenshot below for Category column.

After joining the scraping agent field