Skip to main content

Elasticsearch

Elasticsearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. ElasticSearch is developed in Java.

AWS Compatability

As of September 2021 this connector will not work with AWS instances of ElasticSearch. AWS has its own version, now called OpenSearch, which is incompatible with current ElasticSearch libraries.

Version Support

Simflofy currently only supports version 7.15 of ElasticSearch and does not support version 8.


Authentication Connection

  • Name: Unique connection name
  • Username: Username for Authentication or blank when no auth needed.
  • Password: Password for Authentication or blank when no auth needed.
  • Server URL: Server URL with protocol, host and port http://127.0.0.1:9200/
  • Socket Timeout in milliseconds: How long to wait before requests fail

Discovery Connector

note

There is no Discovery Instance available for Elasticsearch


Integration Connection

Integration Connection Configuration

  • Connection Name: This is a unique name given to the connector instance upon creation.
  • Description: A description of the connector to help identify it better.
  • Authentication Connector: Your FileNet Auth Connector

Job Configuration

ID Encoding

Simflofy uses the source repository id of a document as a default value for the id in ElasticSearch. These can sometimes contain illegal characters, especially if they are file paths, such as from a Filesystem or Amazon S3. As part of the indexing process, the value of this field will be encoded to ensure its validity. Currently, only slashes, spaces and apostrophes are encoded, but this will likely change to full encoding in the future to better support non-standard character sets.

File Content

If Include Binaries is checked in the Details tab, the connector will convert it to a base64 encoded String and store it in the binaryData field

  • ID Attribute: The field that will be used to set the document id
  • Index Name: The name of the collection where the indexes will be created.
    • If the collection already exists and does not have the required mappings, Simflofy will attempt to update the mappings
  • Batch size: The number of documents to generate before sending a request.
  • Out Renditions as array to the renditionData field: If there are multiple renditions, they will be stored as a list of base64 encoded strings.
  • Term Vectors: Term vectors increase the size of an index but are required for highlighting and More Like This searches.
    • All text based default simflofy fields are included by default
    • Term vectors can only be applied to text fields.
    • Term vectors will be enabled for any custom text field added to mappings

Content Search Connection

Content Search Connection Configuration

A Content View Connector defines the who, what and how of search. A better term may be "Data Set" because the data you search and find is based on the configuration of the Content View Connection. More info

Legacy Fields

All other fields in this tab are legacy features used for the Solr Search Connection and will be removed in future releases.

  • Collection: The name of the collection to query against. Elasticsearch refers to these and "Indexes", but for our purposes they are collections.
  • Sort Field/Order: Will contain the values in your field list. Allows you to choose which field to sort on and whether to sort ascending or descending.
  • Facet Fields: Facet fields are simply occurrence counts for the entered fields. Content type counting is the most common example. Facet fields are required for a number of sidebar widgets.
  • Field List: The field values to return in a result set. Similar to the SELECT Field1, Field2 clause in SQL.
  • Result Link: Used on the TSearch UI to determine what to do when a user clicks on the link to the document.
  • Facet Limit: Maximum number of facet values to return.
  • Highlight: Yes if you want contextual highlighting,No otherwise.
  • Highlighted Fields: Comma delimited list of fields for highlighting (i.e. content).
  • Highlight Field Length: The maximum number of characters to highlight.
  • External Links: Setup external links for the search results. The widget is not

Indexing Content into ElasticSearch for Federation

Prerequisites and the Federation Wizard

These steps can be performed automatically by using the Federation Wizard, but will still require job configuration. If you use the wizard, skip steps 1 and 2.

For indexing content you will need:

  • A working Authentication Connection for your source system
  • An Integration Connection for your source system
  • A Content Service Connection for your source system
  • A working Authentication Connection for ElasticSearch
  • An Integration Connection for ElasticSearch
  1. Create a job using your two connections.
  2. In the Details tab Set the source repository's content service connection directly below the job name.
  3. In the Details tab make sure the start and end times are set to a wide enough range to capture all the data you wish to index
  4. In the Tasks tab, select the Tika Extractor Task.
    1. This task will extract the content from a file and set it as a field on the document for indexing
  5. In the Mappings tab, select "Basic Elasticsearch Mapping" from the Additional Mappings dropdown
    1. If this is not present, simply add the field you set on the task in step 2 as a field mapping.
      1. The default is content so the mapping would be content ----Field Mapping----> content
    2. (optional) Add any additional mappings. The target fields will be created and mapped dynamically as part of the migration
  6. In the Output Specification, select your id attribute (or leave it as the default) and pick what collection to index to.
  7. (optional) If you wish to enable highlighting and your extracted content is not in the "content" field, place the name of your content field from your Tika task in Term Vector field.
  8. (optional) If you wish to use the More Like This (MLT) to search on custom fields, add them to the Term Vector field.

Viewing Indexed Content

  1. Create a Search Connection for ElasticSearch if you have not already. Use the authentication connection you used for indexing
  2. Using the configuration section above, pick the fields you wish to see and get counts for.
    1. You can add the basic Simflofy metadata by clicking the Add All Default Fields
  3. Under the Federation Menu > Content Views, create a new Content View.

Looking to integrate with ElasticSearch? We can help.

Indexing Document Level Permissions

Simflofy content views offer a number of security layers. Using the Javascript processor permissions can be added to each document, which can restrict widget usage and the ability to search for the document.

Repository Document ACLS

Each document, whether it has source permissions or not, will have an Allow and Deny ACL (Access Control List). Both lists exist as a list of strings (String []) on the document, and can be access through javascript. In order to apply document level permissions to documents, permissions will need to take the form

action=principal1,principal2,principal3

Where action can be Search, or the id of a Widget Definition The principals are Simflofy user logins, or User Group names.

Limitations

Only the Search permission is checked at the API level. Meaning, that a user can still access documents directly through the Content Services API. These permissions simply alter content views to prevent them from performing these actions via Widgets.

{
"allow": true,
"action": "Search",
"principals": [
"everyone"
]
}

Here is an example of some javascript that will prevent users in group1 from downloading documents through TSearch. It will also stop user1 from searching for the document.


var deny = ['DownloadWidget=group1', 'Search=user1'];
rd.setDenyAcl(deny);


Content Service Connector

This section covers the Dropbox specific configuration of the Content Service Connector. For a description of how to set up a content services connector generically see Content Service Connectors.

Supported Method

  • Create File
  • Create Folder
  • Delete Object By Id
  • Get File Content
  • Get Id By Path
  • Get Object Properties
  • Get Types
  • List Folder Items
  • List Versions
  • Update File