vChewing-macOS

Commit Graph

Author	SHA1	Message	Date
zonble	8ba4b9dfdf	Prevents loading data models repeatedly.	2022-01-30 20:27:33 +08:00
zonble	5ba7365cd3	Fixes typos.	2022-01-30 08:26:32 +08:00
zonble	c3d953c618	Converts input mode into a typed enum.	2022-01-30 08:06:22 +08:00
zonble	56c393cefa	Prevents using global state as possible.	2022-01-27 23:19:27 +08:00
zonble	1ad9e23918	Refactors the input controller.	2022-01-27 22:54:53 +08:00
zonble	177cba5d56	[WIP] Starts to extract input states from the input controller.	2022-01-24 02:13:18 +08:00
Lukhnos Liu	202b1fa058	Also make PhraseReplacementMap more tolerant This also clarifies the test expectations and how parsing errors are handled.	2022-01-18 22:46:26 -08:00
Weizhong Yang a.k.a zonble	9bc3536630	Merge branch 'master' into more-tolerant-userphraseslm	2022-01-19 14:01:23 +08:00
Lukhnos Liu	c8f65580bb	Make UserPhrasesLM more tolerant This lets UserPhrasesLM consumes as much user data as possible before bailing. This makes it more tolerant to data errors and will not fail entirely just because the user has one faulty line in a data file. Also removes FastFM from the benchmarking suite. This also runs the CMake-based C++ tests as part of the GitHub CI.	2022-01-18 16:20:25 -08:00
Lukhnos Liu	75f321f088	Update copyright headers (fixes #213 )	2022-01-18 14:21:55 -08:00
zonble	a75c7b7086	Allows users to type Latin letters while using shift + letter keys. Fixes issue #162.	2022-01-17 00:48:29 +08:00
zonble	4ec4eed562	Removes unused files.	2022-01-16 15:15:41 +08:00
zonble	c4259c4c4e	Updates comments and fixes a typo.	2022-01-16 15:04:20 +08:00
zonble	5c0a14deeb	Refactors the function to filter and transform unigrams in McBopomofoLM.	2022-01-16 15:04:20 +08:00
zonble	b627e8e3b6	Adds an option to let users to choose Chinse conversion style. Option 0: converts the output. Option 1: converts the models.	2022-01-16 15:04:20 +08:00
zonble	b348a05735	Filters duplicated unigram values properly.	2022-01-16 15:04:18 +08:00
Lukhnos Liu	d064f420e4	Use a parseless phrase db to speed up LM loading We take advantage of the fact that no one is able to modify the phrase databases shipped with the binary (guranteed by macOS's integrity check for notarized apps), and we can simply pre-sort the phrases in the database files. With this change, we can speed up McBopomofo's language model loading during the app initialization by about 500-800x on a 2018 Intel MacBook Pro. The LM loading used to take 300-400 ms, but now it's done within a sub-millisecond range (0.5-0.6 ms). Microbenchmarking shows that ParselessLM is about 16000x faster than FastLM. We amortize the latency during the query time, and even by deferring the parsing, ParselessLM is only ~1.5x slower than FastLM, and both LM classes serve queries unedr 6 microseconds (that's 0.006 ms), which means the tradeoff only contributes to neglible overall latency. This PR requires some small changes to the phrase db cooking scripts. Python 3 is now used and the (value, reading, score) tuples are rearranged to (reading, value, score) and sorted by reading ("key"). A header is added to the phrase databases to call out the fact that these are pre-sorted. clang-format is used to apply WebKit C++ style to the new code. This also applies to KeyValueBlobReader that was added recently. Microbenchmark result below: ``` --------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------- BM_ParselessLMOpenClose 17710 ns 17199 ns 33422 BM_FastLMOpenClose 376520248 ns 367526500 ns 2 BM_ParselessLMFindUnigrams 5967 ns 5899 ns 113729 BM_FastLMFindUnigrams 2268 ns 2265 ns 307038 ```	2022-01-15 16:15:02 -08:00
zonble	136ac34f22	Introduces in-place phrase replacement. Since we have implemented the functions to add and exlcude phrases, the commit allows users to use a table to change the output of a phrase without changing its BPMF reading and score, when the "phrase replacement" mode is on. It could help users to switch a specific input scenario and the ordinary one. For example, if a user wants to work on financial Chinese numbers like 壹、貳、參, he or she may want the characters to have higher score as the normal numbers like 一、二、三. The commit can let the users to temporarily replace 一、二、三 to 壹、貳、參 by just turn on "phrase replacement" mode and prepare a custom table. The conversion is not done on the output phase like how we do Traditional/Simplified Chinese conversion. What the phrase replacement table does is to slightly modify the language model. The replacement takes place on walking the nodes and candidates list. A user can enable the mode and edit the table from the input menu. Since the function is quite advanced, the menu items are hidden until the user holds the option key. The table is a plain text file. Each line contains a "from" and "to". For example ``` 一壹 ``` However, if the user also want all other phrase contain 一 to become 壹, all of the phrases have to be built into the table ``` 一百壹佰一千壹仟一萬壹萬一百萬壹百萬 ```	2022-01-15 06:23:09 +08:00
Lukhnos Liu	d6cc5479f6	Use a more tolerant parser for user phrases A generic key-value blob reader, KeyValueBlobReader, is implemented to allow more flexibility in user-editable files. For example, this allows comments in the file, as well as tolerating leading or trailing spaces, tabs, or even Windows CR LF line endings. Unit tests are supplied for KeyValueBlobReader although they are not part of the Xcode project. A separate CMakeLists.txt is provided. UserPhrasesLM is refactored to use KeyValueBlobReader. A small stylistic change is appiled to reduce "using namespace" uses, but otherwise no major style changes were applied to UserPhrasesLM. Please note that McBopomofo's user phrase LM uses the value in a key-value pair as the reading, and the key as the actual "value". We don't plan to change that order so that we don't have to migrate data. std::string_view is used to allow efficient reference to char buffers and interop with std::string (and so no c_str() is needed). C++17 is now enabled for the project to enable the use of std::string_view. Copyright headers are added to McBopomofoLM and UserPhrasesLM.	2022-01-13 23:27:31 -08:00
zonble	d590d748f8	Adds UserPhrasesLM for user phrases. Since there is no probability information for users' custom phrases, they should be stored in a format differs from data.txt. Using the same format and FastLM to parse user phrases just because of laziness but it is not the right way. The pull request adds a new language model class to parse user phrases. It also update the input method controller to adopt the new user phrase format.	2022-01-12 16:53:51 +08:00
zonble	f1e56a7e01	Lets McBopomofoLM to accept NULL as the parameter in loadUserPhrases.	2022-01-12 13:17:41 +08:00
zonble	84fc2f068b	Removes unused code and fixes a typo.	2022-01-12 13:16:10 +08:00
zonble	abdf97f652	Adds McBopomofoLM as the facade of three language models. - main language model - user phrases - user excluded phrases	2022-01-12 12:26:24 +08:00
zonble	9b485b799c	Implements excluding phrases.	2022-01-12 00:16:55 +08:00
zonble	84849bdb3d	Converts the preference and non modal view controller to Swift.	2022-01-10 22:01:40 +08:00
zonble	6bdd2aab44	Fixes a bug on building the unigrams.	2022-01-09 13:00:19 -08:00
zonble	b4276f0488	Fixes a bug on building the vector for unigrams from both global language model and user phrases.	2022-01-09 13:00:19 -08:00
zonble	e909dc20b5	Uses user phrases in the block builder.	2022-01-09 08:38:32 -08:00
zonble	6f761ecbcd	Implements adding phrase from shift and arrow keys.	2022-01-09 08:38:32 -08:00
zonble	358462dff1	[WIP] Starts to work on the user phrases.	2022-01-09 08:38:32 -08:00
ovadmin	aeb774a8ed	小幅重構重複的程式碼	2022-01-06 18:28:37 -08:00
ovadmin	3e0e859feb	將用戶選字記憶機制整合入 InputMethodController	2022-01-06 18:28:37 -08:00
ovadmin	a17438b67a	修正一些選字機制 C++ 檔案 #include 不完整的問題	2022-01-06 18:28:37 -08:00
Lukhnos Liu	fa224c2657	Reset other nodes' fixed state when fixing a node This fixes a bug that, when a span covers several nodes and a long node has already been candidate-fixed, fixing a short node does not cause the walk to reflect the result. A concrete example: 1. type 高中生. 2. move the cursor to 中 and change to 鐘聲: 高鐘聲. 3. with cursor position unchanged, select the candidate to 忠. 4. the expected result should be 高忠生 but instead it is stuck with 高鐘聲 due to the node representing "鐘聲" is still fixed. Fixes #54	2020-10-09 22:16:06 -07:00
Lukhnos Liu	71b97f82b3	Simplify candidate fixing by moving code to Grid	2020-10-09 22:16:06 -07:00
Lukhnos Liu	8058f37fff	Modernize project and bump min version to 10.10 32-bit architecture support is removed as a result.	2018-11-24 21:47:15 -08:00
Lukhnos Liu	b4eea515c3	Fix Span removal bug when linked against libc++	2013-06-14 23:54:37 -07:00
Mengjuei	beee34b96c	Enable IBM Keyboard Layout, no update to xib yet	2012-11-13 00:40:26 -08:00
Lukhnos Liu	c300e9cc10	Detab source code.	2012-10-31 22:12:50 -07:00
Lukhnos Liu	e68845381c	Revise DFA for parsing language models.	2012-10-31 21:55:13 -07:00
Lukhnos Liu	362801eb6c	Remove SimpleLM.	2012-09-10 23:27:00 -07:00
Lukhnos Liu	67775e3ccf	Implement an mmap-based LM parser.	2012-09-10 22:55:40 -07:00
Lukhnos Liu	71921b848a	Use stable sort in the engine. So that unigram nodes with the same log probability are sorted according to the order in which they were added to the language model.	2012-09-10 19:02:24 -07:00
Mengjuei	7476edf12a	最多使用六個自來組成一個詞	2011-10-18 16:06:51 -07:00
Mengjuei Hsieh	8549045ef5	Accepting 5-char phrases	2011-10-01 10:20:18 -07:00
Mengjuei Hsieh	5f976e4642	first commit	2011-09-01 23:56:26 -07:00

46 Commits