Skip to main content

Setting up Federated Search Tutorial

Setting up Federated Search has a few moving parts. It would be a good idea to have some familiarity with the following concepts before getting started:

  • Creating connections
  • Jobs
    • Tasks
    • Mappings

This is intended as a walk-through so that you can index and search content on your filesystem.

Prerequisites

This tutorial is going to assume you have Simflofy Admin and TSearch installed and configured. We're also going to assume you have installed ElasticSearch on your system. Default configuration is fine.

Wizards

The Federation Wizard can generate all the necessary connections, tasks, and jobs to index content. This tutorial will walk you through the manual approach.

  1. Connections
    1. Under Connections > Integration Create a Filesystem Connection, or, if you already have one, skip this step.
    2. Under Connections > Content Service click Create New Content Service Connection
      1. Use fs for the connector ID, as you'll want to keep it simple.
      2. Give it whatever description you like
      3. Under Type select Filesystem Content Service Connector
      4. In the Connection Configuration Tab add the folder you wish to crawl for content later
        1. This connection will be used to interact with the content in your content view.
    3. Under Connections > Authentication Create an ElasticSearch Authentication Connection
      1. You can test this connection by clicking on the icon that looks like a clipboard with the check under Actions in the Authentication Connection list page
    4. Under Connections > Integration Create an ElasticSearch Integration Connection and set the auth connection from the previous step
    5. Under Connections > Content Search click Create New Content Search Connection
      1. Use el for the connectorId
      2. Give it whatever description you like
      3. Under Type select Elasticsearch Search Connector
      4. Under Security Mode select Authentication Connection, then select your authentication connections for Elasticsearch
      5. In the Search Configuration tab under in the collection field, pick a name where the indexes will be stored.
        1. Make note of this for later, as you will need it for your indexing job
      6. in the Facet Fields input, add simflofy_content_type, then Click the Add All Default Fields button.
      7. Click Save
  2. The Indexing Job
    1. Under the Integration sidebar header, click List Jobs.
    2. Create a Simple Migration job (name it whatever you like) with the filesystem as the repository and Elasticsearch as the output.
    3. In your job, in the Details Tab, select your filesystem content service connector, under the job's name. 1) This will add the connectorId of the connection to each document's data, so TSearch can access it later.
    4. Also in the Details Tab, make sure the Start Time and End Time are broad enough to capture the documents on your system you wish to index. 1) Simflofy checks the last modified date against this date range
    5. If you wish to index documents without extracting their text content, under Advanced Options, uncheck Include Binaries, and skip to step 7.
    6. If you wish to index documents without extracting their text content, skip this step.
      1. Click the Tasks Tab, and, from the dropdown select the Tika Extractor Task.
      2. Click the plus-circle
      3. Uncheck Fail Document on Extraction Error, in case some files you attempt to index are not supported by Tika.
      4. Click Done
    7. If you wish to index documents without extracting their text content, skip this step.
      1. Click the Mappings Tab and, under the "Add Additional Job Mappings or Mapping Groups" menu, select Basic ElasticSearch Mapping
        1. This will take the Tika Content Field from the previous step's task, and map it to the "content" field in Elasticsearch.
    8. Under the tab with your repo connections name, in the Paths tab, add the folder path of the folder you wish to crawl for documents.
    9. Under the tab with your output connections name, in the Server tab, 10).Click Save
  3. Run the job
    1. Under the Integration sidebar header, click Run and Monitor jobs
    2. Click the arrow to the left of the job name to run the job.
    3. You should see the Read and Written numbers rise as the job runs.
  4. Content View
    1. Under Federation > Content Views, click Create New View
      1. Set the short name as tutorial
      2. Select the Simflofy Virtual Repository Template
      3. Choose whatever display name you like
      4. Select your search connection from the previous steps
      5. Click Save
    2. This is the view builder, where you can add widgets to your view. Right now the view only has a result set and pagination, so we'll add some more widgets.
    3. Select Body from the Sections dropdown
      1. On the left click the plus sign for the Content Download and Metadata Display widgets
    4. Select Left Sidebar from the Sections dropdown
      1. On the left click the plus sign for the Content Type Dropdown and Content Size Slider widgets
    5. Select Top Menu from the Sections dropdown
      1. On the left click the plus sign for the select Drag and Drop and Bulk Download widgets.
    6. Click Save and Publish
      1. Publishing generates a sort of shorthand for TSearch to read. Content views can be saved without publishing.
      2. Changes to the view's search connections will also not be reflected until the view is published.
  5. Viewing the results
    1. Go to TSearch, which should be at the same hostname, but under /tsearch. The default is localhost:8080/tsearch.
    2. Log in using your Simflofy credentials
    3. You should see the Display Name of your view under the Search Views window. Click it.
    4. You will be taken to your view, whose url will be (TSearchUrl)/view/simflofy/tutorial

This completes the walk through.

Look Around!

Nothing you do in this view will have any effect on your files other than uploading them to the root folder of your Content Service Connection. So play with the widgets and check out their documentation for more in depth information on their configuration.