Commit Graph

323 Commits

Author SHA1 Message Date
zonble b627e8e3b6 Adds an option to let users to choose Chinse conversion style.
Option 0: converts the output.
Option 1: converts the models.
2022-01-16 15:04:20 +08:00
zonble b348a05735 Filters duplicated unigram values properly. 2022-01-16 15:04:18 +08:00
Lukhnos Liu d064f420e4 Use a parseless phrase db to speed up LM loading
We take advantage of the fact that no one is able to modify the phrase
databases shipped with the binary (guranteed by macOS's integrity check
for notarized apps), and we can simply pre-sort the phrases in the
database files.

With this change, we can speed up McBopomofo's language model loading
during the app initialization by about 500-800x on a 2018 Intel MacBook
Pro. The LM loading used to take 300-400 ms, but now it's done within a
sub-millisecond range (0.5-0.6 ms). Microbenchmarking shows that
ParselessLM is about 16000x faster than FastLM. We amortize the latency
during the query time, and even by deferring the parsing, ParselessLM is
only ~1.5x slower than FastLM, and both LM classes serve queries unedr 6
microseconds (that's 0.006 ms), which means the tradeoff only
contributes to neglible overall latency.

This PR requires some small changes to the phrase db cooking scripts.
Python 3 is now used and the (value, reading, score) tuples are
rearranged to (reading, value, score) and sorted by reading ("key"). A
header is added to the phrase databases to call out the fact that these
are pre-sorted.

clang-format is used to apply WebKit C++ style to the new code. This
also applies to KeyValueBlobReader that was added recently.

Microbenchmark result below:

```
---------------------------------------------------------------------
Benchmark                           Time             CPU   Iterations
---------------------------------------------------------------------
BM_ParselessLMOpenClose         17710 ns        17199 ns        33422
BM_FastLMOpenClose          376520248 ns    367526500 ns            2
BM_ParselessLMFindUnigrams       5967 ns         5899 ns       113729
BM_FastLMFindUnigrams            2268 ns         2265 ns       307038
```
2022-01-15 16:15:02 -08:00
zonble 136ac34f22 Introduces in-place phrase replacement.
Since we have implemented the functions to add and exlcude phrases, the
commit allows users to use a table to change the output of a phrase
without changing its BPMF reading and score, when the "phrase replacement"
mode is on.

It could help users to switch a specific input scenario and the ordinary
one. For example, if a user wants to work on financial Chinese numbers
like 壹、貳、參, he or she may want the characters to have higher score
as the normal numbers like 一、二、三. The commit can let the users to
temporarily replace 一、二、三 to 壹、貳、參 by just turn on "phrase
replacement" mode and prepare a custom table.

The conversion is not done on the output phase like how we do
Traditional/Simplified Chinese conversion. What the phrase replacement
table does is to slightly modify the language model. The replacement
takes place on walking the nodes and candidates list.

A user can enable the mode and edit the table from the input menu. Since
the function is quite advanced, the menu items are hidden until the user
holds the option key.

The table is a plain text file. Each line contains a "from" and "to".
For example

```
一 壹
```

However, if the user also want all other phrase contain 一 to become 壹,
all of the phrases have to be built into the table

```
一百 壹佰
一千 壹仟
一萬 壹萬
一百萬 壹百萬
```
2022-01-15 06:23:09 +08:00
zonble 7edf011e42 Fixes a typo. 2022-01-14 22:36:58 +08:00
zonble 5ce581e0c6 Brings back VXHanConvert. 2022-01-14 22:15:17 +08:00
zonble 7a5cb635e9 Fixes the bugs in the preferences like typos. 2022-01-14 20:31:39 +08:00
zonble aa325f73aa Fixes the typo for the enum of McBopomofo keys. 2022-01-14 20:18:45 +08:00
zonble 95648caa0c Simplifies the code to build the input menu. 2022-01-14 19:55:08 +08:00
zonble d11daacbd2 Refactors the keyboard layout enum. 2022-01-14 19:47:53 +08:00
zonble 83354f7c48 Adds icons for keyboard layouts in preference. 2022-01-14 19:37:48 +08:00
zonble 9faed2153f Uses property wrappers to manage preferences. 2022-01-14 18:06:26 +08:00
Lukhnos Liu c698c61432 Merge pull request #220 from lukhnos/custom-phrase-reader
Use a more tolerant parser for user phrases
2022-01-13 23:43:35 -08:00
Lukhnos Liu d6cc5479f6 Use a more tolerant parser for user phrases
A generic key-value blob reader, KeyValueBlobReader, is implemented to
allow more flexibility in user-editable files. For example, this allows
comments in the file, as well as tolerating leading or trailing spaces,
tabs, or even Windows CR LF line endings.

Unit tests are supplied for KeyValueBlobReader although they are not
part of the Xcode project. A separate CMakeLists.txt is provided.

UserPhrasesLM is refactored to use KeyValueBlobReader. A small stylistic
change is appiled to reduce "using namespace" uses, but otherwise no
major style changes were applied to UserPhrasesLM.

Please note that McBopomofo's user phrase LM uses the value in a
key-value pair as the reading, and the key as the actual "value". We
don't plan to change that order so that we don't have to migrate data.

std::string_view is used to allow efficient reference to char buffers
and interop with std::string (and so no c_str() is needed). C++17 is now
enabled for the project to enable the use of std::string_view.

Copyright headers are added to McBopomofoLM and UserPhrasesLM.
2022-01-13 23:27:31 -08:00
zonble fcdd59dd6b Wraps OpenCCBridge into a SPM package. 2022-01-14 00:57:41 +08:00
zonble e01eb46c9f Wraps InputSourceHelper to a SPM package. 2022-01-14 00:43:21 +08:00
zonble d4772ffa99 Adds notifier UI to notify user Chinese conversion on/off. 2022-01-13 23:38:56 +08:00
zonble a7e38b5b2d Fine-tunes tooltip UI. 2022-01-13 22:00:29 +08:00
zonble 366453820d Adds a tiny tooltop for shift-left/right selections. 2022-01-13 21:47:52 +08:00
zonble 4c1781d970 Renames half-size to half-width. 2022-01-13 19:59:49 +08:00
zonble 9cd2306313 Adds emacs-style hotkeys. 2022-01-13 19:57:08 +08:00
zonble 232a944953 Implements half-size punctuations. 2022-01-13 17:07:22 +08:00
zonble d590d748f8 Adds UserPhrasesLM for user phrases.
Since there is no probability information for users' custom phrases,
they should be stored in a format differs from data.txt. Using the same
format and FastLM to parse user phrases just because of laziness but it
is not the right way.

The pull request adds a new language model class to parse user phrases.
It also update the input method controller to adopt the new user phrase
format.
2022-01-12 16:53:51 +08:00
zonble f1e56a7e01 Lets McBopomofoLM to accept NULL as the parameter in loadUserPhrases. 2022-01-12 13:17:41 +08:00
zonble 84fc2f068b Removes unused code and fixes a typo. 2022-01-12 13:16:10 +08:00
zonble abdf97f652 Adds McBopomofoLM as the facade of three language models.
- main language model
- user phrases
- user excluded phrases
2022-01-12 12:26:24 +08:00
zonble 56896625e3 Removes unused comments. 2022-01-12 01:17:39 +08:00
zonble cbd21cbe1d Updates localization. 2022-01-12 01:10:39 +08:00
zonble ea36061a41 Implements excluding punctuations. 2022-01-12 00:36:55 +08:00
zonble 9b485b799c Implements excluding phrases. 2022-01-12 00:16:55 +08:00
zonble 144d133463 Adds Language Model Manager.
The reference of the global language models were stored in the class
InputMethodController, however, the global models are global but not a
part of the input method controller, and the input method controller
only use one of the models (McBopomofo/Plain Bopomofo). I guess it
somehow violates SRP and there should be a better place for the global
models.
2022-01-11 17:12:58 +08:00
zonble f339948219 Fixes duplicated code and typos. 2022-01-11 13:46:29 +08:00
zonble df3914eeed Fixes a minor bug in the new Swift app delegate. 2022-01-11 02:35:31 +08:00
zonble a7b2edcf26 Converts AppDelegate to Swift. 2022-01-11 02:22:13 +08:00
zonble 61e2751702 Converts candidate UI to a Swift package. 2022-01-11 01:07:17 +08:00
zonble 867a828722 Fixes minor layout issues. 2022-01-11 00:47:48 +08:00
zonble f7e927d67d Starts to use Swift candidate UI.
There are bugs still.
2022-01-11 00:30:02 +08:00
zonble a97cc5ca6c Converts VerticalCandidateController to Swift. 2022-01-11 00:03:32 +08:00
zonble 5aafe64751 Starts to convert candidate UI to Swift. 2022-01-10 22:01:40 +08:00
zonble ba6889fa63 Converts OVInputSourceHelper to Swift. 2022-01-10 22:01:40 +08:00
zonble 84849bdb3d Converts the preference and non modal view controller to Swift. 2022-01-10 22:01:40 +08:00
zonble 75a0f68a9c Fixes the bug that I forgot to create the user phrases folder.
There was a legacy user override model which creates a folder and a
plist file. If a user uses McBopomofo for years, the folder would
exist. However, when the old override model was removed, I forgot
to create the folder for the new user phrase file.

The bug would let the users with new installation of McBopomofo unable
to add user phrases.
2022-01-10 21:59:18 +08:00
zonble 6bdd2aab44 Fixes a bug on building the unigrams. 2022-01-09 13:00:19 -08:00
zonble b4276f0488 Fixes a bug on building the vector for unigrams from both global language model and user phrases. 2022-01-09 13:00:19 -08:00
zonble 1e5bad20c2 Removes unused references. 2022-01-09 08:38:32 -08:00
zonble 5b72e48a4e Minor fine-tunes on the preference window. 2022-01-09 08:38:32 -08:00
zonble 0af238ef79 Cleans-up unused logs. 2022-01-09 08:38:32 -08:00
zonble 3763688275 Fixes a typo. 2022-01-09 08:38:32 -08:00
zonble a5247d958c Makes it able to reload user phrases. 2022-01-09 08:38:32 -08:00
zonble e909dc20b5 Uses user phrases in the block builder. 2022-01-09 08:38:32 -08:00
zonble 6f761ecbcd Implements adding phrase from shift and arrow keys. 2022-01-09 08:38:32 -08:00
zonble 358462dff1 [WIP] Starts to work on the user phrases. 2022-01-09 08:38:32 -08:00
ovadmin 789d2a5687 計算選字事件時,若遇到常用標點,將標點視為句尾
如此一來標點後的單字詞,在計算時,等同於句首第一詞。
2022-01-06 18:28:37 -08:00
ovadmin aeb774a8ed 小幅重構重複的程式碼 2022-01-06 18:28:37 -08:00
ovadmin 2e8e78971c 傳統注音不要記住用戶選字 2022-01-06 18:28:37 -08:00
ovadmin eef6f8c0ce 加大用戶選字詞模型的容量跟半衰期 2022-01-06 18:28:37 -08:00
ovadmin 3e0e859feb 將用戶選字記憶機制整合入 InputMethodController 2022-01-06 18:28:37 -08:00
ovadmin d672136843 實作簡單的用戶選字記憶模型
這個模型基本上只是根據游標前的兩個 unigram 記憶當前的用戶選字。當有超過
一個以上的用戶選字時,則要給每個選字評分。評分標準是選字頻率乘上一個透過
半衰期遞減的最近選字經歷時間。如此一來我們在「少用但最近選過」及「常用但
最近少選」之間取得一個平衡。半衰期透過經驗法則決定。

目前這個簡易模型並不存入磁碟,因此下一次重開機後就會洗掉重來。目前這樣選
擇純粹是因為模型有半衰期,因此長時間存放後還是會遺忘。

這個模型的好處是對既有詞庫提供詞的影響很小,對於連續單字詞的 override 有
還不錯的幫助。如此對於人名、地名、公司名等專有名詞,應該可以減少選字的頻
率。這個模型應用起來的缺點是,如果用戶修改的字詞原來是個雙字詞,例如先前
的兩個 unigram 分別是 A, BB ,而用戶想改的是 BB 的第二個字,使選完後的三
個字分別是 A, B', C,這個 C 往往是記不起來的,但如果一開始用戶逐字選取,
亦即在 BB 只出現 B 時就選取 B' 然後再打 C ,則 A, B', C 這個組合往往能被
正確記憶。實際發生原因在此不討論,但跟底層所用的組字網架的架構有關。確實
要改進的話得要從底層重新架構來下手,但至少目前這個模型給的建議偏保守,不
至干擾原有的預設選字。衡諸得失,這個模型提供一些邊際上的改善,應該還是值
得採用的。
2022-01-06 18:28:37 -08:00
ovadmin a17438b67a 修正一些選字機制 C++ 檔案 #include 不完整的問題 2022-01-06 18:28:37 -08:00
ovadmin 3760d24350 移除早期的候選歷史記憶機制
這個機制從未正式發布,設計本身也有很多缺陷,因此決定移除。
2022-01-06 18:28:29 -08:00
zonble 23100153cc Adds an option to clear entire input buffer by ESC key.
This fixes #146.
2022-01-02 22:09:23 +08:00
Lukhnos Liu 25ea443891 Correctly locate a candidate panel's screen
Previously only the x value was used to determine the screen to which a
candidate panel should below. That was incorrect. The entire point needs
to be considered.

This fixes the same issue that affected OpenVanilla:
https://github.com/openvanilla/openvanilla/issues/49
2021-11-24 23:22:29 -08:00
Lukhnos Liu 5ff3efb385 Revert "Stop using IMK's showPreferences:"
This reverts commit 69e463958e.
2021-11-24 21:16:20 -08:00
Lukhnos Liu ad81de87a0 Bump version to 1.1 2021-11-23 22:55:21 -08:00
Lukhnos Liu 3a027ba8fb Update copyright years 2021-11-23 19:04:42 -08:00
Lukhnos Liu 75b4bfac31 Localize new strings
Also fine-tune the Chinese Conversion menu item text.
2021-11-23 19:04:42 -08:00
Lukhnos Liu 69e463958e Stop using IMK's showPreferences:
This turns out to be unreliable on macOS 12.
2021-11-23 19:04:42 -08:00
Lukhnos Liu c1bea8c382 Fix IME activation issues on macOS 12
We now let the Installer to call the TextInputSources API. Since macOS
12, users are prompted to allow enabling of third-party IMEs in
Preferences.app the momemnt TISRegisterInputSource or
TISEnableInputSource is called. By moving the activation to the
Installer, a user will clearly see that it's the Installer that wants to
enable the IME.

In addition, we had to make necessary changes so that on macOS 12 and
later, the Installer always enable the default input source. This is due
to the observation that the kTISPropertyInputSourceIsEnabled becomes
unreliable on macOS 12--it may be true even if the user has removed the
input mode from their active input mode list in Preferences.app.
2021-11-23 19:04:42 -08:00
Lukhnos Liu b85029dec1 Fix non-existent font in .xib 2021-11-22 20:51:24 -08:00
zonble 164705e6f3 Allows users to use left and right key to go to another candidate page in the vertical candidates list.
This fixes #61.
2021-11-20 22:43:23 -08:00
zonble e27f5babe1 Allows auto-commiting the first candidate when users input a punctuation in plan BPMF mode. 2021-11-20 22:43:23 -08:00
zonble 385638c3b9 Allows commit the first canidate while typing a punctuation in plain BPMF mode. 2021-11-20 22:43:23 -08:00
zonble c17d991718 Also applies Chinese conversion on popped text.
Fixes issue #172.
2021-11-20 18:14:08 -08:00
zonble 21252e6c55 Removes NSUserDefault for selection key if a user chooses to use the default setting. 2021-11-12 00:36:41 +08:00
zonble da8e6c6fa5 Adds selection key settings in the preference window. 2021-11-12 00:02:01 +08:00
zonble 723a8402ab Fixes typos. 2021-11-11 00:14:49 +08:00
zonble 1f8cd8d06f Updates SwiftOpenCC. 2021-11-10 21:38:04 +08:00
zonble c8bad0913b Removes unused code. 2021-03-01 22:48:46 +08:00
zonble f6c36fe325 Bridges SwiftyOpenCC to create a simple Chinese convertion function. 2021-03-01 22:43:02 +08:00
zonble 4e27b5ecfa Adopts modern Objective-C syntax. 2021-02-28 22:45:36 +08:00
zonble 6341270696 Coverts to Objective-C ARC. 2021-02-28 21:38:59 +08:00
zonble 0f05e245a5 Coverts to Objective-C ARC. 2021-02-28 21:30:10 +08:00
zonble 7626d21a90 Merge branch 'master' of github.com:openvanilla/McBopomofo 2020-12-21 00:31:49 +08:00
Lukhnos Liu b754acdf07 Bump to version 1.0 2020-10-28 12:42:55 -07:00
Lukhnos Liu 9cbcee5b1f Bump to 1.0-beta3 to prepare for 1.0 release 2020-10-19 20:27:33 -07:00
Lukhnos Liu c44db5b000 Provide the UI to disable auto update check (#80) 2020-10-19 20:25:39 -07:00
Lukhnos Liu 4d2cf36b61 Add a preferences key to disable update checks 2020-10-19 20:09:17 -07:00
Lukhnos Liu 60aa005e2d Upgrade preferences.xib format (zh-Hant only) 2020-10-19 20:02:01 -07:00
Lukhnos Liu 688ae64723 Bump to 1.0-beta2 to prepare for 1.0 release 2020-10-19 14:52:18 -07:00
Lukhnos Liu 81748ae7fe Enable explicit update check in Release builds 2020-10-19 14:51:05 -07:00
Lukhnos Liu 7e3ee1742d Bump to 1.0-beta1 to prepare for 1.0 release 2020-10-18 20:08:51 -07:00
Lukhnos Liu eae12c04b4 Update copyright years 2020-10-18 12:48:15 -07:00
Lukhnos Liu 56dbbbc3b7 Delete UpdateNotificationController
This unifies the version update checker's UI. It also allows us to show
detailed info for a new version.
2020-10-18 12:48:15 -07:00
Lukhnos Liu 3bc70769df Show no update available when checking explicitly
This imports OpenVanilla's OVNonModalAlertWindowController for the
alerts.
2020-10-18 12:48:14 -07:00
Lukhnos Liu 4adf3c1b42 Update keyboard and app icons 2020-10-18 12:48:08 -07:00
Lukhnos Liu 3ac018f6c0 Retire IconMaker 2020-10-17 06:23:49 -07:00
Lukhnos Liu 4c8270c42f Cancel candidate by Bksp or Del when Plain Bopomofo
Fixes #152
2020-10-17 06:19:47 -07:00
Lukhnos Liu a71b354908 Fix broken CI builds by guarding new API usage 2020-10-10 07:47:29 -07:00
Lukhnos Liu 2f2f18d9e0 Check if the translocated app is still mounted
This ensures that, after the Installer has killed the current input method
process, the Installer can tell if the translocated input method bundle is no
longer mounted. It turns out that getfsstat() may return cached results and a
call to statfs() is necessary.

This fixes the bug that the Installer did not always correctly report that a
new version of the input method has been installed over a previous version.
The bug only manifests when getfsstat() returns cached results. That seems to
be the case on newer versions of macOS.
2020-10-10 07:24:50 -07:00
Lukhnos Liu 7d13ea0b41 Use NSTableViewStyleFullWidth on supported macOS
This prevents the vertical candidate table view to use the inset style [1].

The full-width style serves the purpose. The inset style makes the first
candidate too further away from the cursor in the composing buffer.

[1] https://developer.apple.com/design/human-interface-guidelines/macos/overview/whats-new-in-macos/
2020-10-10 06:37:32 -07:00