SharePoint Online to ElasticSearch Integration Tutorial
Overview
In this tutorial we will walk through how to set up a job in Simflofy that will index the content in your SharePoint Online repository into your Elasticsearch system so that you can view and manage your documents in our TSearch platform. This tutorial will include the use of the PII Task which is used to determine how many and which documents contain PII or Personally Identifiable Information.
Step 1. Setting up your SharePoint Online Repository Authentication Connection.
This will allow simflofy to connect to your SharePoint Online Repository.
To set up your SharePoint Authentication Connection
- Select Authentication from the Connections section of the navigation menu.
- At the bottom of the Authentication Connections page select the Create New Auth Connection button.
- In the New Auth Connection form
- Name your connection. In this example we will use SPO
- Select the connection type which for this example will be SharePoint REST Auth Connector
- Next edit your new SharePoint Authentication Connection with the following input
- Give your connection a description. Example: SharePoint REST Auth Connector
- Enter the username and password for your SharePoint System
- Enter the Server URL for your SharePoint system. Example:
https://simflofy.sharepoint.com
- Leave the rest of the fields as is and click the Save button at the top of the page to save your changes and return to the Authentication Connections Page
Step 2. Set up your Elasticsearch Output Connection.
This will allow Simflofy to connect to your ElasticSearch Repository.
To set up your Elasticsearch Output Connection:
- At the bottom of the Authentication Connections page select the Create New Auth Connection button.
- In the New Auth Connection form
- Name your connection. In this example we will use rm-Elastic Search Authentication Connector
- Select the connection type. In this case we will use Elasticsearch Authentication Connector
- In the next page we will edit your new Elasticsearch connection
- In the server URL field enter the URL for your Elasticsearch Server. For example:
http://127.0.0.1:9200/
- Leave the other fields as is and click on the Save button to save these changes.
- In the server URL field enter the URL for your Elasticsearch Server. For example:
Step 3. Set up your SharePoint Integration Connection.
Using this connection Simflofy will generate a query, or use one provided, to retrieve unique ids for documents.
To set up your SharePoint Integration Connection:
- Select Integration from the Connections section of the navigation menu.
- At the bottom of the Integration Connections page click the Create Integration Connection button.
- In the New Integration Form
- Name your integration connection. In this example we will use the name rm-SPO
- For the connection type choose SharePoint REST Connector from the drop-down list. You can also begin typing SharePoint in the search field to filter the drop-down list and then select it.
- Click save to edit this new integration connection
- In the Edit Connection: RM-SPO page enter the following
- Give your connection a description. Example: SharePoint Online
- Choose the Authentication Connection you created in Step 1. SPO
- Leave the rest of the fields as is and click Save to lock in your changes and return to the Integration Connections page
Step 4. Set up your ElasticSearch Integration Connection
In Output mode, connectors push content and metadata. Many of them can also build version series' from the source systems.
To set up your Elasticsearch Integration Connection:
- At the bottom of the Integration Connections page click the Create Integration Connection button.
- In the New Integration Form
- Name your integration connection. Here we will use the name rm-ElasticSearch
- Chose the Elasticsearch Authentication Connection we created in Step 2. rm-Elastic Search Authentication Connector
- Leave the rest of the fields as is and click Save to finish creating this connection and return to the Integration Connections page.
Step 5. Create your Content Service Connection.
Simflofy Content Service connections offer public REST endpoints that allow for integration with external applications.
To create a content service connection for your Simflofy Repository:
- Select Content Service from the Connections section of the navigation menu
- At the bottom of the Content Service Connection page click the Create New Content Service Connection button
- In the new Content Service Connection page enter the following information
- Connector ID: name your content service connection for this example we will use the name SharePointRM
- Description: Give your connection a description. Here we will use SharePoint Online RM Demo Content Service
- Type: The type for this connection will be SharePoint REST Content Service Connector
- Security Mode: Choose Authentication Connection as the security mode. And select the SharePoint Authentication Connection we created in Step 1 from the dropdown SPO
- Under the Connection Configuration Tab enter your SharePoint Site Name. Example: sites/SimflofyDemo
- Leave all other fields as is and click Save to finish the creation of this connection and return to the Content Service Connection Page
Step 6. Create a Job Mapping for your Content Service Connection
Content mappings will allow you to map custom parameters to properties in the destination system.
To create a Job Mapping for ElasticSearch:
- Under the Integrations section of the navigation menu select Job Mappings
- At the bottom of the Job Mappings page click the Create New Job Mapping button
- If you don't name your job Simflofy will assign a name to it. Ours is Job Mapping 1628095318249
- Enter the following mappings for Title and Date Created
- For the Title field mapping enter
- Source:
Document.Title
- Target:
cmis:description
- Type:
String
- Click Add New Mapping
- Select Field Mapping from the drop-down
- Source:
- For the Document field mapping enter
- Source:
Document.Created
- Target:
cmis:creationDate
- Type:
Date Time
- Click Add New Mapping
- Select Field Mapping from the drop-down
- Source:
- For the Title field mapping enter
- Click Save to save this mapping for use in your Job Configuration.
Step 7. Set up your integration job.
This will integrate your SharePoint repository with your Elasticsearch Server. Allowing you to view your SharePoint files in TSearch through your Elasticsearch connection
To set up your integration job:
- Select List Jobs from the Integration section of the navigation menu
- At the bottom of the Jobs page click the button Create New Job
- In the Create New Job Form
- Give your new job a name. For this example we will use rm-SPO
- For Repository Connection select the SharePoint Integration connection we created in Step 3 from the drop-down. rm-SPO
- For Output Connection select the Elasticsearch Connection we created in Step 4 from the drop-down. rm-ElasticSearch
- Leave the other fields as is and click save to continue setting up this new integration job.
- In the Details Tab
- Under Content Service Connector, add the SharePoint Content Service Connection you created in Step 5. SharePointRM
- In the Tasks Tab we will be adding the Default Tika Extractor Task.
- Select Tika Extractor Task from the dropdown
- Click the plus button to edit the task properties
- Uncheck the Fail Document on Extract Error box
- Uncheck the Remove Binary After Extraction box
- Leave all other fields as is and select Done
- Next in the Task Tab we will be adding the Default PII Detection Task
- Select PII Detection Task from the dropdown
- Click the plus button to edit the task properties
- Check the Break up pii data into individual fields box.
- Leave all other fields as is and select Done
- In the Mappings Tab under the Select Additional Mappings section
- Select the Job Mapping we created in Step 6. Job Mapping 1628095318249
- The RM-SPO tab is where you will add any necessary configurations for the SharePoint Integration connection you are using as the repository.
- Repository Tab:
- Enter your SharePoint Site Name. Example: sites/SimflofyDemo/
- Leave all other fields as is
- Repository Tab:
- The RM-Elasticsearch Tab is where you will enter any additional settings needed to use Elasticsearch as your Output integration connection under the Server Tab.
- Under Index Name add the index where the documents will be stored. In this example we will use the index rm
- Leave all other fields as is
- Click Save to save your job configurations and return to the Jobs page
Step 8. Run and Monitor the job
This will integrate the chosen content from your SharePoint repository to your Elasticsearch repository allowing you to view the content in the TSearch Platform. It will also identify and add the necessary fields for PII.
To run and monitor this job:
- Select Run and Monitor Jobs from the Integration section of the navigation menu
- Find the job created in Step 7. RM-SPO
- Select the triangle next to the job to run this Integration job.
Depending on how many files you have stored this could take more than a few minutes. To monitor the progress of this integration you can click the Refresh button at the top of the page. You can also set Simflofy to automatically refresh every 30 seconds, every minute, or every 5 minutes.
Once the integration is complete the status will change to green and state Complete.