Algolia Integration: Crawl your website and import to Algolia indices to create custom search engine

Agenty’s Algolia integration allows you to automatically crawl your website pages to add them on your Algolia indices to create, re-create or refresh your custom search engine on schedule, or on-demand.

Algolia is a web search product company with SaaS pricing modal, they have a track record of fastest and most relevant search engine available in the custom search engine market. If you want to add a search engine on your website or blog, Algolia is the way to go.

But they don’t crawl your website automatically as Google does to refresh the search engine and auto-index new pages, blog posts or products as soon they published on your site publicly.

So the problem with Algolia and almost every custom search tool is: You’d need to manually upload a CSV file, or write your own code to crawl your website pages, or pull content from database > then use Algolia’s open-source library to add those pages to your indices(search index) to make them searchable. That’s the pain!

There comes Agenty to solve that problem! Our Algolia integration allows you to:

  1. Automatically crawl your website on-demand, on schedule or via API
  2. Extract fields of choice: Like title, canonical, html_body, description, crawled_at etc.
  3. Crawl HTML sitemaps, RSS, JSON or XML feeds to make a workflow of steps. For example — Scrape sitemap first with web scraping agent #1 then each page details using agent #2
  4. Schedule the crawler to run it daily, weekly etc.
  5. Send the agent result to your Algolia indices to refresh the indices

Prerequisites

  1. Agenty professional or higher plan to get access of Algolia plugin
  2. Algolia account to get your application_id, and api_key to authorize Agenty to connect to your indices

Algolia API Key

Algolia API Keys

  • This application id and API key will be used by Agenty to authenticate and connect to your Algolia account to add, update objects in indices

Setup your web crawler

Setting up a crawler is easy using our Chrome extension available on Chrome store. You can just go to the website page you want to scrape and add the field of your choice by clicking on the elements to generate selector or write manually if you know how CSS selector works. See this detailed article to learn how to create a scraping agent or video tutorial here.

So, in this example I created 2 scraping agents to crawl agenty.com website

  • First agent to crawl hyperlinks from the sitemap and home page

Hyperlinks crawler

Scrape details of hyperlinks

Configure Algolia plugin

  • Go to Agenty plugin page
  • Click on the Add button for Algolia plugin row

Confiugre Algolia plugin in Agenty

  • The plugin page will open, where we need to select the final agent to attach our plugin(because that’s the one, which has the final result - we want to send to Algolia) and enter the application id, api key and name of indices where the crawling job result will be sent to:

Algolia credentials and api key

  • Click on the save button to attach this plugin to your agent

The plugin will fire on job completion event. For example, if you are crawling > 5,000 pages from your website. The plugin will start the execution when all 5,000 pages crawling has been completed.

Start your web crawler

Once the web crawling agent has been created; plugin has been attached; We are ready to start our web crawling job.

Start web crawler

When the job complete, see logs :

2019-05-30 13:00:09.1279 TRACE Algolia plugin started with timeout: 15 minutes
2019-05-30 13:00:15.7217 TRACE Algolia Indices: Cleared successfully
2019-05-30 13:00:15.7217 TRACE Rows 0 to 1000 sent to Algolia successfully
2019-05-30 13:00:15.7217 TRACE Plugin task completed successfully. Duration: 00:00:06.5624766

Preview Algolia Custom Search Engine

  • Now, the search indices is ready. We can integrate Algolia in website or can use their buil-in UI as well for searching.
  • Go back to your Algolia account
  • Go to Indices page, and you’ll find your indices has been created, re-created or refreshed with the data sent from Agenty.

Algolia Search Indices

  • Now, you can generate the UI demo or can use their open-source library in almost every language to add the search feature to your website. For example, we are using instant-search JavaScript library to add the search engine to our website.
  • Include the main instantsearch.js library
<script src="https://cdn.jsdelivr.net/npm/algoliasearch@3.32.0"></script> 
<script src="https://cdn.jsdelivr.net/npm/instantsearch.js"></script>
  • Modify this code with your_application_id and your_api_key or other optional variable if needed.
// 1. Instantiate the search
const search = instantsearch({
  indexName: 'Agenty-Search-Index',
  searchClient: algoliasearch('your_app_id', 'your_api_key'),
});

// 2. Create an interactive search box
search.addWidget(
  instantsearch.widgets.searchBox({
    container: '#searchbox',
    placeholder: 'Search...',
  })
);

// 3. Plug the search results into the product container
search.addWidget(
  instantsearch.widgets.hits({
    container: '#searchResult',
    templates: {
      item: '{{#helpers.highlight}}{ "attribute": "title" }{{/helpers.highlight}}',
    },
  })
);

// 4. Start the search!
search.start();
  • Publish your website on server or test on localhost

Alogolia Search Preview of Agenty Website