Alfresco
The Alfresco connector only operates in write mode. For read operations from Alfresco, use the CMIS connector.
Batch API Required
This connector was built to work in tandem with the Simflofy Batch API to offer a more performant migration.
If you're planning to use the Transparent Content Services for Manage In Place, this API is already included in that module. However, it can be installed separately for standard migrations.
Alfresco Authentication
Alfresco Webscript Auth Connector
This connector uses basic authentication to retrieve a ticket from Alfresco. That ticket will be used to perform operations. The authenticating user will need the rights to access the folder you're attempting to write to.
- Name: The name of the authentication connection
- Username: Name of the authenticating user
- Password: Password of the authenticating user
- Service URL: Url to contact Alfresco. Takes the form
(Alfresco)/alfresco/service
Discovery Connector
- Name: Unique Name for the Discovery Connection to identify it in the UI.
- Authentication Connection: The authentication connection you want to search for
- Ignore Types (comma delimited list): Comma delimited list of types to ignore. Note that you can have regex as well. So to ignore all types with "workflow" in the name, you will enter(.)workflow(.) into the ignore types text box.
- Alfresco Ticket Key: The ticket key is used to generate credentials.
Integration Connection
Batching
The Alfresco Connector requires batching, meaning it will batch up documents before attempting to upload them. Batch size is set under the Advanced Options under the Details job tab. When using a connector that requires batching, the value will be automatically set to 50.
When batching is on, batch ids will be generated based on the job run id, which is the job id plus the timestamp of the current run. The batch number will then be added to that. Batch IDs will be prepended to parent folders of each document in a batch leading to the following:
/1612814334091_1624028461422_1/parentFolder/myfile.txt
- Connection Name: Name the Connector
- Description: Add a description for the connector
- Authentication Connection: Alfresco Auth connector
- Secondary Auth Connection: For Alfresco, the secondary authentication will route the file content to different stores. Available Options are:
- Amazon S3 - Requires Alfresco instance with the Amazon S3 Module
- Azure Blob - Requires Alfresco instance with the Azure Blob Connector
- None - Binaries will be output to the filesystem if Include Binaries in the Details tab is checked.
Job Configuration
- Configuration
- Advanced Settings
Alfresco Path: The output folder path in your alfresco repository.
Binary Output Path: Binary Output Path for Azure or S3. If populated and no Secondary Auth Connection is present, will output to the filesystem.
Create Content Url Always: Only used for Manage In Place (MIP). Simflofy will generate an Alfresco Content Url based on a number of configuration:
Be aware that these are checked in order
- If the content service connector is set in the Details tab the content url will be
connectorId://documentId
- If the document has the metadata field simflofy.contenturl. It will be checked for validity and used.
- If the secondary auth connection is for Amazon S3 the content url will be
s3v2://documentId
- If the secondary auth connection is for Azure Blob the content url will be
azb://documentId
- If none of these are true, the content url will be
store://documentId
- Included Un-Mapped properties: Add all properties from the source to the target metadata. If unchecked, only mapped properties will be added
- Do not convert metadata keys to lowercase: Simflofy converts all type and field values to lowercase by default. If this is checked all fields will keep their original case
- Debug mode: Each document's metadata will be output as json file to a specified location
- Debug Path: Appears when debug mode is active. A path on the filesystem where the debug jsons will go.
- Alternative Thread Count: If using a secondary auth connections, this will control the number of worker threads. Should never be set to 0, or the job will fail
- Multi-Value Separator: Multi-value fields will be combined into a list using this separator
- Replace Existing Content: If a document already exists at the output location, it will be overwritten
- Use File Hash: If checked, the file keys for documents will be an SHA Hash of their parent path + file name + modified date plus the extension '.sim'
- Process ACLS: Requires the use of the ACL Mapper or ACL Conversion Task.
- Will transform the product of these tasks into Alfresco's format for processing
- Inherit ACLS: If processing ACLs, the parent folder's permissions will be merged into the list of acls.
- Include Aspects with no Field Mappings: Mapped aspects will be applied even if none of the aspect's fields are present
- Aspect Remove Field Mapping: Takes a JSON string. Remove aspects if the listed fields are not present. The example of the UI:
{"myaspect:two":["field1","field2"],"myaspect:one":["field1","field2"]}
Meaning that if field 1 or field 2 is not present, do not add the aspects.
- Date Format: Date field mappings will be formatted this way.
- Date Time Format: Date/Time field mappings will be formatted this way.
Simflofy Batch API for Alfresco
Installation
If you're planning on using Transparent Content Services, the batch api is included in that module.
Compatability
The batch API is only compatible with Alfresco v6.2 and higher.
- Retrieve the latest batch api version from Simflofy Launch
- Stop Alfresco
- Inside the Tomcat folder which contains ACS (referred to as TOMCAT_HOME), add the following folder if it does not exist
/modules/platform
- Place the batch api jar in the folder
- Start Alfresco
Configuration Options
These properties can be added to the alfresco-global.properties file
- batchapi.behaviours.disableall: If this is false, reattempts to import data can cause an infinite loop.
- batchapi.rules.disableall: If this is false, reattempts to import data can cause an infinite loop.
Rules and Behaviors
Both of these values must be true to avoid conflicts with Alfresco's content store rules and behaviors
- batchapi.jsonthreadcount: The amount of threads to delegate to the json batch integration queue. Default: 20
- batchapi.batchthreadcount: The amount of threads to delegate to the batch integration queue. Default: 50
- batchapi.userid: The Alfresco User to perform the Batch API actions as. Must be a valid Alfresco user. Default: admin
- batchapi.licensekey: This property is Required and does NOT have a default. The license key of the Simflofy instance you are connecting to. This does not have a default. Jobs will not write batches to Alfresco if this value is not set with a valid license key.
Versioning
Alfresco turns on the initial version and auto versioning by default. However, this can cause Alfresco to have an incorrect initial version, such as version 0.1, which will lead to incorrect numbering of your version series. In order to make versioning work with the Batch API, add the following mapping to your job
(source type) ----Aspect Mapping----> cm:versionable
Or, as an alternative, disable some of Alfresco versioning features with the following
version.store.initialVersion=false
version.store.enableAutoVersioning=false
Alfresco configuration for better ingestion speeds
Disable subsystems, precursor for bulk ingestion
notification.email.siteinvite=false
alfresco.jmx.connector.enabled=false
index.subsystem.name=noindex
ooo.enabled=false
jodconverter.enabled=false
notification.email.siteinvite=false
smart.folders.enabled=false
system.thumbnail.generate=false
extracter.RFC822.enabled=false
extracter.TikaAuto.enabled=false
Access and remote protocols
ftp.enabled=false
system.webdav.servlet.enabled=false
cifs.enabled=false
transferservice.receiver.enabled=false
syncService.mode=OFF
Workflow
system.workflow.engine.activiti.enabled=false
system.workflow.engine.jbpm.enabled=false
Indexing
index.recovery.mode=NONE
index.tracking.disableInTransactionIndexing=true
Auditing
audit.enabled=false
audit.alfresco-access.enabled=false
audit.filter.alfresco-access.default.enabled=false
Repo and file management
system.content.caching.cacheOnInbound=false
home.folder.creation.eager=false
system.enableTimestampPropagation=false
system.usages.enabled=false
activities.feed.notifier.enabled=false
alfresco.cluster.enabled=false
db.schema.update=true
system.acl.maxPermissionChecks=100000
system.readpermissions.bulkfetchsize=100000
Transparent Content Services(TCS) and Manage In Place(MIP)
Transparent Content Services is an Alfresco module that enables Manage in Place capabilities using Simflofy's content services. Meaning, you can manage multiple repositories through Alfresco, without having to move the file contents.
In the following sections we'll walk you through how to add TCS to your Alfresco instance, and configure jobs to create MIP Content. The addition of MIP will have no effect on your current content.
If you prefer to follow a walk through, one can be found here
note
If you're currently using an extension that extends or adds to the existing StoreSelectorAspectContentStore there will be functionality and compatibility issues as TCS Overrides that service.
TCS: How it works
TCS works as a new content store that leverages Simflofy's REST API to push and pull content.
By extending Alfresco's StoreSelectorAspectContentStore, we give users the ability to change the location of the content managed by Alfresco.
Similar to the standard store selector store, a user can change the content store of an item by simply adding an aspect and changing a property.
This plugin goes far beyond adding new Filesystem store locations. TCS provides access to any repository listed in your Content Service Connections
Configured connections from Simflofy will be added to the list of available stores in the cm:storeName property.
TCS: FAQS
How do we get content into Alfresco?
There are several ways to add the content to Alfresco for TCS to manage. Two easy and popular methods are:
Use a Simflofy job to connect to your source repository and output content to Alfresco via a CMIS connector.
- This method is good for smaller batches of content (100s and thousands but not hundreds of thousands)
- Fast set up and ingestion.
Use a Simflofy job to connect to your source repository and output to BFS format. Leverage the BFS import tool to create content in Alfresco
- Ideal for very large data sets.
Remember, we are only importing information about the location of the content and some metadata. The import process is very fast.
How are my Alfresco users affected by this change?
They won't be. All the standard content features are available. Meaning users can still preview, download, and update content from within Alfresco.
What if I don't want users to change content in my remote system?
TCS provides an option to implement read-only mode. This is done when configuring a new store (see below). You also must set the deep delete option if you want to allow Alfresco (and Simflofy) to delete source content.
TCS Installation
Prerequisites
- Alfresco Enterprise version 6.2+
- Alfresco Record's Management/Alfresco Governance Services
- Simflofy 3.X+
- TCS Plugin Jars (available on Launch)
Files
- transparent-content-services-share.jar
- This file contains the Share customizations for Alfresco.
- transparent-content-services-platform.jar
- This file contains the core repository extensions and the content services required to communicate to Simflofy
Installation Process
Inside the Tomcat folder which contains ACS (referred to as TOMCAT_HOME), add the following folders if they do not exist
/modules/platform
/modules/share
- Stop Alfresco
- Place the transparent-content-services-platform in [TOMCAT_HOME]/modules/platform
- Place the transparent-content-services-share.jar in [TOMCAT_HOME]/modules/share
- Once you have completed the configuration (see the next section), start Alfresco.
TCS Logging
In order to see all the logging for TCS in the alfresco.log or catalina.out file the following appender must be added to Alfresco's log4 properties.
com.fikatechnologies
The options are info, debug, or trace, in order of granularity.
TCS Configuration and Properties
The properties marked as required should be added to your alfresco-global.properties file. The rest are optional configuration
- TCS Basic Properties (Required)
- Simflofy Batch API Properties (Required)
- TCS Basic Content Caching Properties (Optional)
- TCS Content Caching Encryption Properties (Optional))
- Simflofy Content Service Connection Properties (Optional)
tcs.user: The Simflofy username. Default: admin
tcs.pass: The Simflofy password. Default: admin
tcs.url: The full url to Simflofy's simflofy-admin application. Default: http://localhost:8080/simflofy-admin
tcs.read.timeout: The amount of time the connection to Simflofy should wait on a read operation before timing out.
tcs.connect.timeout: The amount of time the attempt to make a connection toSimflofy will wait before timing out.
tcs.defaultStoreName: The name that will be given to the default filestore. Should just be, DefaultStoreName except for special circumstances.
tcs.connectorIds: A comma(,) delimited list of content service connector IDs that should be included in TCS.
- batchapi.behaviours.disableall: If this is false, reattempts to import data can cause an infinite loop.
- batchapi.rules.disableall: If this is false, reattempts to import data can cause an infinite loop.
Rules and Behaviors
Both of these values must be true to avoid conflicts with Alfresco's content store rules and behaviors
- batchapi.jsonthreadcount: The amount of threads to delegate to the json batch integration queue. Default: 20
- batchapi.batchthreadcount: The amount of threads to delegate to the batch integration queue. Default: 50
- batchapi.userid: The Alfresco User to perform the Batch API actions as. Must be a valid Alfresco user. Default: admin
- batchapi.licensekey: This property is Required and does NOT have a default. The license key of the Simflofy instance you are connecting to. This does not have a default. Jobs will not write batches to Alfresco if this value is not set with a valid license key.
These are a series of properties that control various levels of caching. TCS Caches some basic information during runtime by default to improve performance. It can also store those properties as a manifest, and can also cache and store content on the local filesystem. There a button is available in the TCS Admin console found in the TCS Admin Console
tcs.cache.content.enabled: (true or false) Tells TCS whether to cache and store content. These cached files will be used to generate faster previews
tcs.cache.content.dirLocation: (String/filepath) When content caching is enabled. The root folder where content is cached. The folder paths are in the structure of (fileLocation)/yyyy/MM/dd/HH. Will use java.io.tmpdr/temp if left blank
tcs.cache.content.maxCacheFileSize: (long) When content caching is enabled, the max number of bytes before the cache is partially cleared (makes an estimate of what potential file size will be). Default is 5000000 (around 5 MB)
tcs.cache.content.maxCacheEntries: (integer) The number of entries held in the content cache. Default is INTEGER.MAX (around 2 billion)
tcs.cache.content.daysToKeep: How many days before the cache is cleared. Default is 3
tcs.cache.manifest.enabled: (true or false) - Tells TCS whether to store the cache in a manifest. The manifest will be used to persist the cache between restarts of Alfresco
tcs.cache.manifest.fileLocation: (String/filepath) - The location of the manifest file. Created on shutdown and attempted to load on startup. Allows the cache to persist through shutdown.
tcs.cache.manifest.maxCacheFileSize: (long) When manifest cache storage is enabled, the max number of bytes before the cache is partially cleared (makes an estimate of what potential file size will be). Default is 5000000 (around 5 MB)
tcs.cache.manifest.maxCacheEntries: (integer) When manifest storage is not enabled, the number of entries held in the cache. Default is INTEGER.MAX (around 2 billion)
tcs.cache.manifest.daysToKeep: -(integer) How many days before the cache is cleared. Default is 3
note
Currently, TCS Cache Encryption Only supports one alias, so if a keystore contains more than one alias, only the one designated in the property, will be used.
tcs.cache.encryption.enabled - (true or false) - Tells TCS whether to use encryption on its caches. Defaults to false. tcs.cache.encryption.providerName - (String) - The JCE Security provider to use. Optional. Leave blank for default (In most cases this should stay blank) tcs.cache.encryption.keystore.type - (String) - The Type of Keystore being used. Defaults to JCEKS. Required if Encryption Enabled. tcs.cache.encryption.keystore.path - (String) - The Absolute Path to the keystore file.Required if Encryption Enabled. tcs.cache.encryption.keystore.password - (String) - The Password for the Keystore itself.Required if Encryption Enabled. tcs.cache.encryption.key.alias - (String) - The Alias of the key to use in the keystore.Required if Encryption Enabled. tcs.cache.encryption.key.password - (String) - The Password to the key for the Alias provided.Required if Encryption Enabled. tcs.cache.encryption.key.algorithm - (String) - The Key Algorithm used. Defaults to AES.Required if Encryption Enabled.
This is the only TCS Configuration that takes place inside Simflofy Admin The content service connectors used for TCS have a few parameters that can be set in the Connection Configuration tab by clicking Add Custom Parameter.
- deepDelete- If true, deleting content in Alfresco will also delete it in the source repository. (Default=true)
- readOnly - If true, you will not be able to edit files or upload documents through Alfresco. (Default=false)
- root_folder- Dictates where content will be pushed to when a file is uploaded in Alfresco. If blank, the content service connector will use the Alfresco path. Some content service connectors create this value as a default.
Setting up MIP Jobs in Simflofy
Once all the configuration is out of the way, setting up MIP Jobs only requires a few additions to the standard job configuration
- In the Job Details tab, a content service connector for the source repository must be set.
- In the Job Details tab under Advanced Settings, Include Binaries should not be checked.
- In the Alfresco Output Specification, Create Content Url Always must be checked.
- The following mappings are required for all MIP jobs.
{"mappings":[{"sourceType":"TEXT","watch":"false","mappingType":"CALCULATED_FIELD","targetType":"TEXT","source":"'#{rd.fileName}'","position":0,"target":"cmis:name"},{"sourceType":"TEXT","watch":"false","mappingType":"ASPECT_MAPPING","targetType":"TEXT","source":"Document","position":0,"target":"cm:storeSelector"},{"sourceType":"TEXT","watch":"false","mappingType":"CALCULATED_FIELD","targetType":"TEXT","source":"'#{rd.simflofyContentServiceConnector}'","position":0,"target":"cm:storeName"}]}
They can be added all at once using the Import Mappings button. Just paste and save.
Job Mapping
If simflofy was run with initialize.bootstrap=true as of 3.1, this mapping will be included as a preloaded Job Mapping.
TCS Admin Console
The admin console for tcs can be found at
[alfrescoUrl]/alfresco/service/enterprise/admin/tcs-admin
It will display the currently configured content stores which are available for MIP operations, as well as provide a chart breaking down the number of documents stored in each Simflofy content store.
The Connect To Simflofy button will re-run the process of building the store selector and adding content stores. This can be useful if you need to alter the Content Service Configuration for a store (see the properties tabs)
If no content stores appear when loading the page or after hitting the button, there is an issue with your configuration, and you should check the logging to see if any notable errors showed up. Make sure you have your TCS Logging set.
It also has a button which will clear the current cache.
Need help integrating Alfresco? We can help.