CSS Extractor

Agenty CSS extractor offers multiple extract types to extract anything from a HTML element. We can define our CSS selector using Agenty chrome extension or manually in scraping agent editor, and then choose the extract type in "Extract" drop-down box, without writing any code. Here are the 5 extract types available in scraping agent when you've selected a field type as CSS : 

css extractor

  1. TEXT
  2. HTML
  3. ATTR
  4. InnerHTML
  5. OuterText

TEXT

The TEXT extractor is used to extract the plain-text of given CSS Selector, for example if I wants to extract the TEXT only from the p tag from below HTML sample, I can use the .page > p selector and then extract type as TEXT in the Agenty CSS extractor

Sample HTML :

<div class="page">
<h1>Some Heading</h1>
<p>This is some text with <a href="https://www.domain.com">link to other page.</a></p>
<div>

Selector :

.page > p

Type : CSS

Extract : TEXT

Result :

This is some text with link to other page.

HTML

The HTML extract type is used to extract the complete HTML source of given selector, for example if I wants to extract the complete p tag from below HTML sample, I can use the the .page > p selector and then extract type as HTML in the CSS extractor

Sample HTML :

<div class="page">
<h1>Some Heading</h1>
<p>This is some text with <a href="https://www.domain.com">link to other page.</a></p>
<div>

Selector :

.page > p

Type : CSS

Extract : HTML

Result :

<p>This is some text with <a href="https://www.domain.com">link to other page.</a></p>

ATTR

The attribute extractor is used to extract the value of any attribute on selected element. The ATTR option is mostly used to extract the hyperlinks, images but can also be used to extract any attribute value by providing the name of the attribute we wants to extract. For example, if I wants to extract the href link from below sample HTML : I can use the .page > a  as my selector and then ATTR option with href as the name of attribute to extract it's value

Sample HTML :

<div class="page">
<h1>Some Heading</h1>
<p>This is some text with <a href="https://www.domain.com">link to other page.</a></p>
<div>

Selector :

.page > a

Type : CSS

Extract : ATTR

Attribute: href

Result :

https://www.domain.com

InnerHTML

The InnerHTML extract type is used to extract the inner HTML of given selector, for example if I wants to extract the complete p tag from sample HTML given below but don't the selector element tag in my result, but it's inner portion only, I can use the the .page > p selector and then extract type as InnerHTML in the CSS extractor

Sample HTML :

<div class="page">
<h1>Some Heading</h1>
<p>This is some text with <a href="https://www.domain.com">link to other page.</a></p>
<div>

Selector :

.page > p

Type : CSS

Extract : InnerHTML

Result :

This is some text with <a href="https://www.domain.com">link to other page.</a>

OuterText

The OuterText extract type is designed to extract the outer text only, by deleting all the children's for given selector. Because, there may be some scenario where the target HTML has some text you wants to extract, but that text is not a part of particular child element where we can use the nested selector and needs a way to pick that particular text only instead the entire TEXT of that selector.

For example, if you see the sample HTML below it has a discount value as : Discount (15%). And, this is just a text content after br tag and don't have it's own html tag, which can be used to extract the discount value in a scraping agent field.

And if you use the .price selector and then extract type as TEXT. It will result in  : $49 Discount ($15) $41.65 (Because, the TEXT extractor is designed to extract the entire text of given selector and those div tags with old-price and new-price class are also the part of .price selector)

So, we need to use the OuterText option here, so the the extractor can delete all the HTML tags first from given selector > and then extract the leftover text

Sample HTML :

​<div class="page">
<h1>Some Heading</h1>
<div class="price">
    <div class="old-price">$49</div> 
    <br/> Discount (15%)
    <div class="new-price">$41.65</div>
</div
<div>

Selector :

.price

Type : CSS

Extract : OuterText

Result :

Discount (15%)