vChewing-macOS

Commit Graph

Author	SHA1	Message	Date
Lukhnos Liu	75f321f088	Update copyright headers (fixes #213 )	2022-01-18 14:21:55 -08:00
zonble	b627e8e3b6	Adds an option to let users to choose Chinse conversion style. Option 0: converts the output. Option 1: converts the models.	2022-01-16 15:04:20 +08:00
Lukhnos Liu	d064f420e4	Use a parseless phrase db to speed up LM loading We take advantage of the fact that no one is able to modify the phrase databases shipped with the binary (guranteed by macOS's integrity check for notarized apps), and we can simply pre-sort the phrases in the database files. With this change, we can speed up McBopomofo's language model loading during the app initialization by about 500-800x on a 2018 Intel MacBook Pro. The LM loading used to take 300-400 ms, but now it's done within a sub-millisecond range (0.5-0.6 ms). Microbenchmarking shows that ParselessLM is about 16000x faster than FastLM. We amortize the latency during the query time, and even by deferring the parsing, ParselessLM is only ~1.5x slower than FastLM, and both LM classes serve queries unedr 6 microseconds (that's 0.006 ms), which means the tradeoff only contributes to neglible overall latency. This PR requires some small changes to the phrase db cooking scripts. Python 3 is now used and the (value, reading, score) tuples are rearranged to (reading, value, score) and sorted by reading ("key"). A header is added to the phrase databases to call out the fact that these are pre-sorted. clang-format is used to apply WebKit C++ style to the new code. This also applies to KeyValueBlobReader that was added recently. Microbenchmark result below: ``` --------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------- BM_ParselessLMOpenClose 17710 ns 17199 ns 33422 BM_FastLMOpenClose 376520248 ns 367526500 ns 2 BM_ParselessLMFindUnigrams 5967 ns 5899 ns 113729 BM_FastLMFindUnigrams 2268 ns 2265 ns 307038 ```	2022-01-15 16:15:02 -08:00
zonble	136ac34f22	Introduces in-place phrase replacement. Since we have implemented the functions to add and exlcude phrases, the commit allows users to use a table to change the output of a phrase without changing its BPMF reading and score, when the "phrase replacement" mode is on. It could help users to switch a specific input scenario and the ordinary one. For example, if a user wants to work on financial Chinese numbers like 壹、貳、參, he or she may want the characters to have higher score as the normal numbers like 一、二、三. The commit can let the users to temporarily replace 一、二、三 to 壹、貳、參 by just turn on "phrase replacement" mode and prepare a custom table. The conversion is not done on the output phase like how we do Traditional/Simplified Chinese conversion. What the phrase replacement table does is to slightly modify the language model. The replacement takes place on walking the nodes and candidates list. A user can enable the mode and edit the table from the input menu. Since the function is quite advanced, the menu items are hidden until the user holds the option key. The table is a plain text file. Each line contains a "from" and "to". For example ``` 一壹 ``` However, if the user also want all other phrase contain 一 to become 壹, all of the phrases have to be built into the table ``` 一百壹佰一千壹仟一萬壹萬一百萬壹百萬 ```	2022-01-15 06:23:09 +08:00
zonble	abdf97f652	Adds McBopomofoLM as the facade of three language models. - main language model - user phrases - user excluded phrases	2022-01-12 12:26:24 +08:00
zonble	9b485b799c	Implements excluding phrases.	2022-01-12 00:16:55 +08:00
zonble	144d133463	Adds Language Model Manager. The reference of the global language models were stored in the class InputMethodController, however, the global models are global but not a part of the input method controller, and the input method controller only use one of the models (McBopomofo/Plain Bopomofo). I guess it somehow violates SRP and there should be a better place for the global models.	2022-01-11 17:12:58 +08:00

7 Commits