I have list of around million websites. How can I extract the bulk email address from their about or contact page automatically without writing the web scraping agent for each website?
Use this REGEX option with this expression:
This will find and scrape all valid emails on any website URL you crawl, here is my test on Rubular site - https://rubular.com/r/sJHIDEwCZHmcLk with this test string which extracted all 6 valid emails.
If you want to send an email, please email us at firstname.lastname@example.org
Contact us for business inquiry on email@example.com
Github link with example HTML - https://agenty.github.io/Agenty.TestData/forum/forum-33.html
Then create an agent (you may clone any sample agent) and follow the steps:
- Go to edit tab
- Add/Edit a field and change the Type : REGEX
- Enter your regex expression and Group : 1
Save it and enter(or upload) the URL to extract emails from all the websites.