Term Vectors - Term vectors are additional data stored as part of each document which keeps track of the frequency of words that appear more often. Term vectors increase the size of an index but are required for highlighting and More Like This searches. They can only be applied to text fields.
The content, simflofy_typename, simflofy_filename fields have term vectors enabled by default.
MLT searches won't work properly without a sufficient number of documents, likely 200-300 at least with relevant metadata. This number is hard to pin down, but the default configuration of the widget is very permissive.
If a field mapping already exists in ElasticSearch without term vectors, attempting to add term vectors to it will cause an exception. You can review mapping properties by either using Kibana or checking the following endpoint in ElasticSearch
Any Indexing Job with ElasticSearch can apply term vectors. The job specification takes a comma delimited list of fields to make into term vectors. The fields included in this list must also be mapped, as seen below:
The MLT Widget
Simflofy does not have a widget instance for the MLT Widget out of the box. You will need to go to the Federation menu and select widget instances.
Then, select Create New Widget Instance.
Select MLTWidget from the dropdown.
You will then be taken to the Widget Instance page.
Configuring the Widget
It is highly recommended you read up on how MLT searches work. Most of the configuration for the widget is pulled directly from ElasticSearch
|refDocs||The sample size of documents used. The MLT search requires a source set of documents to start using as search criteria.|
|maxqt||The maximum number of query terms that will be selected. How many common phrases will be selected from the source documents to begin the search? Increasing this value gives greater accuracy at the expense of query execution speed.|
|mintf||The minimum term frequency (how often a word or phrase shows up) below which the terms will be ignored from the input document. A setting of 1 means that if a document matches a term one time, it will be included in the results.|
|mindf||The minimum document frequency (how many matches a document gets) below which the terms will be ignored from the input document.|
|minwl||The minimum word length below which the terms will be ignored.|
|maxwl||The maximum word length above which the terms will be ignored. Defaults to unbound (0)|
|mltfl||The list of fields checked for similarities. The default values are fields that have term vectors by default.|
|btnLabel||The label for the More Like This search button.|
These default values are meant to be a starting point, as they require very few matches to be considered like another document. More refined searches will require more tuning.
The widget can be placed on the sidebar of any view using the simflofy template. Upon initial load, assuming your indexes have enough data for MLT to get results, you should see something like the following:
The information icons can be used to view the metadata of the sample documents.
After a search is completed (including the initial one), the widget performs a separate search using the results (number determined by refDocs) to provide a sample of the MLT documents.
Pressing the button on the Widget will use the sample documents as a reference and will load the results as normal. Note that the widget will still perform an mlt search on the return for this search as well.
Interactions with Other Facet Widgets
MLT results will return facet counts, but those values cannot be drilled down into by other widgets as of now.
For example, you could not perform an MLT Search and then use the Facet Select widget to select all PDFs. For situations like this:
It is recommended that you separately index the content type using the following calculated mapping
Then add content_type to your Term Vector field list in the output specification and the field list for the widget.
simflofy_content_type will have its term vector added automatically as part of the migration and can be added to the field