Commit Graph

1613 Commits

Author SHA1 Message Date
Lukhnos Liu d064f420e4 Use a parseless phrase db to speed up LM loading
We take advantage of the fact that no one is able to modify the phrase
databases shipped with the binary (guranteed by macOS's integrity check
for notarized apps), and we can simply pre-sort the phrases in the
database files.

With this change, we can speed up McBopomofo's language model loading
during the app initialization by about 500-800x on a 2018 Intel MacBook
Pro. The LM loading used to take 300-400 ms, but now it's done within a
sub-millisecond range (0.5-0.6 ms). Microbenchmarking shows that
ParselessLM is about 16000x faster than FastLM. We amortize the latency
during the query time, and even by deferring the parsing, ParselessLM is
only ~1.5x slower than FastLM, and both LM classes serve queries unedr 6
microseconds (that's 0.006 ms), which means the tradeoff only
contributes to neglible overall latency.

This PR requires some small changes to the phrase db cooking scripts.
Python 3 is now used and the (value, reading, score) tuples are
rearranged to (reading, value, score) and sorted by reading ("key"). A
header is added to the phrase databases to call out the fact that these
are pre-sorted.

clang-format is used to apply WebKit C++ style to the new code. This
also applies to KeyValueBlobReader that was added recently.

Microbenchmark result below:

```
---------------------------------------------------------------------
Benchmark                           Time             CPU   Iterations
---------------------------------------------------------------------
BM_ParselessLMOpenClose         17710 ns        17199 ns        33422
BM_FastLMOpenClose          376520248 ns    367526500 ns            2
BM_ParselessLMFindUnigrams       5967 ns         5899 ns       113729
BM_FastLMFindUnigrams            2268 ns         2265 ns       307038
```
2022-01-15 16:15:02 -08:00
zonble 5a1779b436 Updates the issue templates. 2022-01-15 22:25:33 +08:00
zonble 1140228a4b Updates issue templates. 2022-01-15 22:20:55 +08:00
Weizhong Yang a.k.a zonble 8467b82860 Update issue templates 2022-01-15 21:52:06 +08:00
zonble 136ac34f22 Introduces in-place phrase replacement.
Since we have implemented the functions to add and exlcude phrases, the
commit allows users to use a table to change the output of a phrase
without changing its BPMF reading and score, when the "phrase replacement"
mode is on.

It could help users to switch a specific input scenario and the ordinary
one. For example, if a user wants to work on financial Chinese numbers
like 壹、貳、參, he or she may want the characters to have higher score
as the normal numbers like 一、二、三. The commit can let the users to
temporarily replace 一、二、三 to 壹、貳、參 by just turn on "phrase
replacement" mode and prepare a custom table.

The conversion is not done on the output phase like how we do
Traditional/Simplified Chinese conversion. What the phrase replacement
table does is to slightly modify the language model. The replacement
takes place on walking the nodes and candidates list.

A user can enable the mode and edit the table from the input menu. Since
the function is quite advanced, the menu items are hidden until the user
holds the option key.

The table is a plain text file. Each line contains a "from" and "to".
For example

```
一 壹
```

However, if the user also want all other phrase contain 一 to become 壹,
all of the phrases have to be built into the table

```
一百 壹佰
一千 壹仟
一萬 壹萬
一百萬 壹百萬
```
2022-01-15 06:23:09 +08:00
Weizhong Yang a.k.a zonble 825ed4f122 Merge pull request #223 from zonble/master
Extracts preferences and emacs key detection from the input controller
2022-01-15 01:40:05 +08:00
zonble 7edf011e42 Fixes a typo. 2022-01-14 22:36:58 +08:00
zonble 5ce581e0c6 Brings back VXHanConvert. 2022-01-14 22:15:17 +08:00
zonble 7a5cb635e9 Fixes the bugs in the preferences like typos. 2022-01-14 20:31:39 +08:00
zonble aa325f73aa Fixes the typo for the enum of McBopomofo keys. 2022-01-14 20:18:45 +08:00
zonble 95648caa0c Simplifies the code to build the input menu. 2022-01-14 19:55:08 +08:00
zonble d11daacbd2 Refactors the keyboard layout enum. 2022-01-14 19:47:53 +08:00
zonble 83354f7c48 Adds icons for keyboard layouts in preference. 2022-01-14 19:37:48 +08:00
zonble 9faed2153f Uses property wrappers to manage preferences. 2022-01-14 18:06:26 +08:00
Lukhnos Liu c698c61432 Merge pull request #220 from lukhnos/custom-phrase-reader
Use a more tolerant parser for user phrases
2022-01-13 23:43:35 -08:00
Lukhnos Liu d6cc5479f6 Use a more tolerant parser for user phrases
A generic key-value blob reader, KeyValueBlobReader, is implemented to
allow more flexibility in user-editable files. For example, this allows
comments in the file, as well as tolerating leading or trailing spaces,
tabs, or even Windows CR LF line endings.

Unit tests are supplied for KeyValueBlobReader although they are not
part of the Xcode project. A separate CMakeLists.txt is provided.

UserPhrasesLM is refactored to use KeyValueBlobReader. A small stylistic
change is appiled to reduce "using namespace" uses, but otherwise no
major style changes were applied to UserPhrasesLM.

Please note that McBopomofo's user phrase LM uses the value in a
key-value pair as the reading, and the key as the actual "value". We
don't plan to change that order so that we don't have to migrate data.

std::string_view is used to allow efficient reference to char buffers
and interop with std::string (and so no c_str() is needed). C++17 is now
enabled for the project to enable the use of std::string_view.

Copyright headers are added to McBopomofoLM and UserPhrasesLM.
2022-01-13 23:27:31 -08:00
Weizhong Yang a.k.a zonble 915693bc28 Merge pull request #219 from zonble/dev/half_size_punctuation
Various new functions and UI components
2022-01-14 14:40:20 +08:00
zonble 3ca0eddd23 Makes some members private. 2022-01-14 02:48:21 +08:00
zonble 34e193df21 Also updates the README file. 2022-01-14 02:34:23 +08:00
zonble 536aff1070 Removes unused files. 2022-01-14 01:38:32 +08:00
zonble fcdd59dd6b Wraps OpenCCBridge into a SPM package. 2022-01-14 00:57:41 +08:00
zonble e01eb46c9f Wraps InputSourceHelper to a SPM package. 2022-01-14 00:43:21 +08:00
zonble d4772ffa99 Adds notifier UI to notify user Chinese conversion on/off. 2022-01-13 23:38:56 +08:00
zonble a7e38b5b2d Fine-tunes tooltip UI. 2022-01-13 22:00:29 +08:00
zonble 366453820d Adds a tiny tooltop for shift-left/right selections. 2022-01-13 21:47:52 +08:00
zonble 4c1781d970 Renames half-size to half-width. 2022-01-13 19:59:49 +08:00
zonble 9cd2306313 Adds emacs-style hotkeys. 2022-01-13 19:57:08 +08:00
zonble 232a944953 Implements half-size punctuations. 2022-01-13 17:07:22 +08:00
Lukhnos Liu c31e390122 Merge pull request #218 from zonble/dev/user_phrases_lm
Adds UserPhrasesLM for user phrases.
2022-01-12 07:51:37 -08:00
zonble d590d748f8 Adds UserPhrasesLM for user phrases.
Since there is no probability information for users' custom phrases,
they should be stored in a format differs from data.txt. Using the same
format and FastLM to parse user phrases just because of laziness but it
is not the right way.

The pull request adds a new language model class to parse user phrases.
It also update the input method controller to adopt the new user phrase
format.
2022-01-12 16:53:51 +08:00
Weizhong Yang a.k.a zonble fb513f51b0 Merge pull request #216 from zonble/dev/lm_management
Introduces custom excluded phrases
2022-01-12 13:25:06 +08:00
zonble f1e56a7e01 Lets McBopomofoLM to accept NULL as the parameter in loadUserPhrases. 2022-01-12 13:17:41 +08:00
zonble 84fc2f068b Removes unused code and fixes a typo. 2022-01-12 13:16:10 +08:00
Weizhong Yang a.k.a zonble 819e1be8d6 Merge pull request #217 from lukhnos/fix-vertical-candidate-ui
Fix regression in vertical candidate UI
2022-01-12 12:52:19 +08:00
Lukhnos Liu 7c354a5b6c Fix regression in vertical candidate UI
The table view style must be set before it's added as the scroll view's
content view. See [1].

[1] https://github.com/openvanilla/McBopomofo/blob/1.1/Source/CandidateUI/VTVerticalCandidateController.m#L110
2022-01-11 20:43:31 -08:00
zonble abdf97f652 Adds McBopomofoLM as the facade of three language models.
- main language model
- user phrases
- user excluded phrases
2022-01-12 12:26:24 +08:00
zonble 56896625e3 Removes unused comments. 2022-01-12 01:17:39 +08:00
zonble cbd21cbe1d Updates localization. 2022-01-12 01:10:39 +08:00
zonble ea36061a41 Implements excluding punctuations. 2022-01-12 00:36:55 +08:00
zonble 9b485b799c Implements excluding phrases. 2022-01-12 00:16:55 +08:00
zonble 144d133463 Adds Language Model Manager.
The reference of the global language models were stored in the class
InputMethodController, however, the global models are global but not a
part of the input method controller, and the input method controller
only use one of the models (McBopomofo/Plain Bopomofo). I guess it
somehow violates SRP and there should be a better place for the global
models.
2022-01-11 17:12:58 +08:00
zonble f339948219 Fixes duplicated code and typos. 2022-01-11 13:46:29 +08:00
Weizhong Yang a.k.a zonble 11d33c0b42 Merge pull request #212 from zonble/dev/swiftify
Converts most of the Objective-C classes into Swift
2022-01-11 02:53:32 +08:00
Weizhong Yang a.k.a zonble b6ad33967a Merge pull request #211 from zonble/master
Fixes the bug that I forgot to create the user phrases folder.
2022-01-11 02:44:48 +08:00
zonble df3914eeed Fixes a minor bug in the new Swift app delegate. 2022-01-11 02:35:31 +08:00
zonble a7b2edcf26 Converts AppDelegate to Swift. 2022-01-11 02:22:13 +08:00
zonble 52bf2d67c5 Updates copyright information. 2022-01-11 01:13:28 +08:00
zonble 61e2751702 Converts candidate UI to a Swift package. 2022-01-11 01:07:17 +08:00
zonble 867a828722 Fixes minor layout issues. 2022-01-11 00:47:48 +08:00
zonble f7e927d67d Starts to use Swift candidate UI.
There are bugs still.
2022-01-11 00:30:02 +08:00