How to extract value based on parent class value?

I am very new to even the idea of web scraping, data mining... harvesting, what have you. Very quick study though. I recently was able to finish a project for a client and if it were not for Data Scraping Studio, I would have never completed my task, I LOVE DATA SCRAPE STUDIO!!! -- stick that in your customer reviews LOL

Here my issue, I need to scrape a span class value depending if its parent is a specific value. For example...

http://www.encuentra24.com/nicaragua-es/bienes-raices-alquiler-casas/carretera-sur-linda-casa-grande-en-alquiler/3923265?catslug=

(link in alt text)

As you can see, what I have achieved here is to extract value which is a child of a parent category... If you notice in the chrome ext. the value of which I need is listed in consequential order.

  • Prop_type | .ad-info li:nth-child(1) .info-value = Casa
  • prop_size | .ad-info li:nth-child(4) .info-value = 1,300
  • prop_location | li:nth-child(8) .info-value = CARRETERA SUR/SOUTH PANAMERICAN ROAD

The issue lies in that each span class being ordered (nested), leaves error when a particular property has more or less information disrupting the order of the .info-value and spitting out the wrong information im scrapping for... 

The page source for mentioned section is as follows...

The picture above is a snippet of the values i am achieving to scrape but keeping in mind there are many more and as mentioned, if at the posting of each property included more or less info to fill these boxes the order changes.  I have been looking at If then statements for regex of which i am still very new to begin with, thinking its possible to add some sort of rule if span class "info-name" value equals "Categoria:" then out put its adjacent info-value, in this case being "Casas" to assure the scrape obtains the value I'm... well, scraping for.

Any help or direction would be greatly appreciated,

-Erick

 

Posted by Erick 1 years ago


@Erick First I'd recommend you to extract all the features vertically, that way you just need to write 2 selector (.info-name and .info-value) and you can include URL or something in output to map it later in excel or any other data management tool. Or if you want to stick with the current ask, then you can use the :contains('text here') selector, where you can pass the text to be searched in contains selector and then use the +, child, n-th selector in combination to make your complete selector.

For example in your example you want to extract the category "Casas" from the HTML below

<span class="info-name">Categoria:</span>
<span class="info-value">Casas</span>

The selector will be 

.info-name:contains('Categoria') + span

And the result will be below in HTML (you can select TXT from drop down if wanted text only in output)

<span class="info-value">Casas</span>

website scraping using css selectors

Posted by anonymous 1 years ago


Such a quick responce, I owe you brother... and yeah, I do need to get a lot more css classes then just three from each page, sorting from the get-go would be much better apposed to writing more excel formulas to re order my scrape... its about  4k properties, remapping would be to much of a pain ;p;//// I would totally send you you a little something for a cup of coffee if you would pm me with paypal email.

 my regards,

-Erick

Posted by Erick 1 years ago

Topic Closed! This question is closed and don't accept posts now.

Close me