MongoDB GridFS
info
In Simflofy, the GridFS is just a mode of the MongoDB connector.
GridFS is the MongoDB specification for storing and retrieving large files such as images, audio files, video files, etc. It is kind of a file system to store files but its data is stored within MongoDB collections. GridFS has the capability to store files even greater than its document size limit of 16MB.
GridFS divides a file into chunks and stores each chunk of data in a separate document, each of maximum size 255k.
GridFS by default uses two collections, [collection].files and [collection].chunks to store the file's metadata and the chunks. Each chunk is identified by its unique _id ObjectId field. The [collection].files serves as a parent document. The files _id field in the [collection].chunks document links the chunk to its parent.
GridFS Authentication Connection
- Name: The name of your auth connector.
- Username: The username of the MongoDB admin user you want to authenticate as.
- Password: The password of the MongoDB admin user you want to authenticate as.
- Mongo URI: The URI to your MongoDB. For example:mongodb://localhost:27017 will connect to a Mongo database hosted locally (relative to Simflofy), with the port 27017.
- Database: The name of the database that you want to authenticate against.
- Use MongoDB GridFS Services? Required for binary storage. This checkbox enables GridFS services
Mongo URI
Simflofy inserts the username and password into the connection string. In order to include them as part of the uri we use
[[USER]]:[[PASS]]
Integration Connection
The Simflofy MongoDB Connector allows organizations to read/write from/toa Mongo Database using GridFS. This means that using Simflofy and your MongoDB Instance of choice, you can connect to, retrieve data, and content from these instances.
Job Configuration
- Output Specification
- Advanced Options
- Repository Configuration
- Collection : The name of the collection to create/write to. Simflofy will handle the "files" and "chunks" collections internally
- Insert Simflofy Metadata : Required for Federation. Write Simflofy metadata onto objects. Automatically included if GridFS is enabled in auth connection.
- Use bulk write operations (MongoDB only) : Ignore for GridFS
- Number of documents to write per bulk operations : Ignore for GridFS
- Include Un-Mapped Properties : Add all metadata on the document to the metadata object in [collections].files
- Drop and Build Indexes: Indexes are created to speed up searches. This will rebuild them entirely. Should always be checked for the first run for Federation, so you can include a text index for full text search.
- Index Keys: A comma delimited list of field keys to index. For multiple collections use key1:collection1,key2: collection2.
- Test Index Keys: Text index keys for full-text searching (comma delimited). Full text search will fail if this is not defined. For multiple collections use key1:collection1,key2:collection2.
- Upsert Key: 'Upserting' is simply means "Update if exists". This key will be checked to see if the document already exists. Leave blank to use Simflofy source repository key. If not set and Insert Simflofy Metadata is not checked, then only creates will be called.
- MongoDB Write Concern: Write concern describes the level of acknowledgement requested from MongoDB for write operations
- File Store Connector ID: Connector ID of Content Service Connector to use as the File Store. Or leave blank to use native GridFS. This field is only used if MongoDB GridFS is Enabled in the Authentication Connector.
- Comma delimited list of collections to crawl : Do not append ".files" or ".chunks".
- Select what field will act as the source id for the document. : The field which will appears as the " source_repository_id" field in the output document
- Query : A Mongo Query. If left blank the query will be "{}", or "get all"
Content Service Connection
Source repository for GridFS are a compound ID take the form of:
[collection]:[mongodbId]:[version]
As an example
demo:61684139dc5eb835dbf0a0c2:1
- Insert Simflofy Metadata: Uses Simflofy specific metadata with objects. File Store Connector ID is only used if this is enabled. Check-in and check-out functionality is only used if this is enabled.
- GridFS Bucket for file uploads: Name of a single collection
- File Store Connector ID: Connector ID of Content Service Connector to use as the File Store. Or leave blank to use native GridFS. This field is only used if MongoDB GridFS is Enabled in the Authentication Connector.
- File Store Root Folder ID: If using a File Store, this value is required. This field is only used if MongoDB GridFS is Enabled in the Authentication Connector.
This connector supports the following methods:
- Check In *
- Check Out *
- Create File *
- Create Folder (id = bucket name to create)
- Delete Folder (id = bucket name to delete)
- Delete Object By ID
- Get File Content
- Get Version Content
- Get Object Properties
- Get Version Properties
- List Versions
- Update File Content
- Update File Properties
- Get Object ID By Path (folderPath will be the bucket in this case)
Content Search Connection
Default Query: (3.1.1+) This field allows you to add a default mongodb query to all incoming queries. The query in this box will be wrapped in an $and clause with all other search parameters. Adding the "metadata" prefix is required for all fields, except the following
- length
- filename
- chunkSize
- uploadDate
- md5
To add a default query which filters out all .txt
documents:
{
"metadata.simflofy_content_type": {
"$not": {
"$eq": "text/plain"
}
}
}
Get all versions and display versions on file name: As the name says, search will now retrieve all versions of documents and label them