Overview
This article describes the process to configure Word Breaker.
Word Breaking is the breaking down of text into individual text tokens or words. Many languages, especially those with Roman alphabets, have an array of word separators (e.g., blank space) and punctuation used to distinguish words, phrases, and sentences. Word breakers must rely on accurate language heuristics to provide reliable and accurate results.
Word breaking is more complex for character-based systems of writing or script-based alphabets, where the meaning of individual characters is determined from context.
A Word Breaker is vital for the proper indexing of most of the Asian languages (e.g., Japanese, Chinese, and Arabic) and other languages.
Process
To configure the Word Breaker, you have to set up the Language Analyzer as described below:
- Open GFI Archiver.
- Navigate to the Configuration tab and click Archive Stores.
- Click Index Management.
- Configure one of the languages analyzing options:
Option | Description |
---|---|
Enable built-in word breaker | The GFI Archiver language analyzer is enabled by default. It is highly recommended to enable this option for optimal indexing performance. |
Enable Microsoft Windows word breaker |
Choose this option to disable the GFI Archiver built-in word breaker and use the word breaker of your Windows operating system. Use the Default Language drop-down list to specify the language to be used to index archived data. NOTE: If the required language is not listed in the Default Language drop-down list, add the required language from the Regional settings option within the Windows® control panel. Alternatively, check the Enable automatic language detection box to let Windows detect the language automatically. |