Commit Graph

11 Commits

Author SHA1 Message Date
zonble 8ba4b9dfdf Prevents loading data models repeatedly. 2022-01-30 20:27:33 +08:00
zonble a75c7b7086 Allows users to type Latin letters while using shift + letter keys.
Fixes issue #162.
2022-01-17 00:48:29 +08:00
zonble c4259c4c4e Updates comments and fixes a typo. 2022-01-16 15:04:20 +08:00
zonble 5c0a14deeb Refactors the function to filter and transform unigrams in McBopomofoLM. 2022-01-16 15:04:20 +08:00
zonble b627e8e3b6 Adds an option to let users to choose Chinse conversion style.
Option 0: converts the output.
Option 1: converts the models.
2022-01-16 15:04:20 +08:00
zonble b348a05735 Filters duplicated unigram values properly. 2022-01-16 15:04:18 +08:00
zonble 136ac34f22 Introduces in-place phrase replacement.
Since we have implemented the functions to add and exlcude phrases, the
commit allows users to use a table to change the output of a phrase
without changing its BPMF reading and score, when the "phrase replacement"
mode is on.

It could help users to switch a specific input scenario and the ordinary
one. For example, if a user wants to work on financial Chinese numbers
like 壹、貳、參, he or she may want the characters to have higher score
as the normal numbers like 一、二、三. The commit can let the users to
temporarily replace 一、二、三 to 壹、貳、參 by just turn on "phrase
replacement" mode and prepare a custom table.

The conversion is not done on the output phase like how we do
Traditional/Simplified Chinese conversion. What the phrase replacement
table does is to slightly modify the language model. The replacement
takes place on walking the nodes and candidates list.

A user can enable the mode and edit the table from the input menu. Since
the function is quite advanced, the menu items are hidden until the user
holds the option key.

The table is a plain text file. Each line contains a "from" and "to".
For example

```
一 壹
```

However, if the user also want all other phrase contain 一 to become 壹,
all of the phrases have to be built into the table

```
一百 壹佰
一千 壹仟
一萬 壹萬
一百萬 壹百萬
```
2022-01-15 06:23:09 +08:00
Lukhnos Liu d6cc5479f6 Use a more tolerant parser for user phrases
A generic key-value blob reader, KeyValueBlobReader, is implemented to
allow more flexibility in user-editable files. For example, this allows
comments in the file, as well as tolerating leading or trailing spaces,
tabs, or even Windows CR LF line endings.

Unit tests are supplied for KeyValueBlobReader although they are not
part of the Xcode project. A separate CMakeLists.txt is provided.

UserPhrasesLM is refactored to use KeyValueBlobReader. A small stylistic
change is appiled to reduce "using namespace" uses, but otherwise no
major style changes were applied to UserPhrasesLM.

Please note that McBopomofo's user phrase LM uses the value in a
key-value pair as the reading, and the key as the actual "value". We
don't plan to change that order so that we don't have to migrate data.

std::string_view is used to allow efficient reference to char buffers
and interop with std::string (and so no c_str() is needed). C++17 is now
enabled for the project to enable the use of std::string_view.

Copyright headers are added to McBopomofoLM and UserPhrasesLM.
2022-01-13 23:27:31 -08:00
zonble f1e56a7e01 Lets McBopomofoLM to accept NULL as the parameter in loadUserPhrases. 2022-01-12 13:17:41 +08:00
zonble 84fc2f068b Removes unused code and fixes a typo. 2022-01-12 13:16:10 +08:00
zonble abdf97f652 Adds McBopomofoLM as the facade of three language models.
- main language model
- user phrases
- user excluded phrases
2022-01-12 12:26:24 +08:00