| Age | Commit message (Collapse) | Author | Files | Lines |
|
* Update cdict
* scripts/subst_of_compose.py: Compute substitutions
from compose mappings. They are used when building dictionaries.
* Add substitutions compose data
* Better suggestion with diacritics
This improves the suggestions for words that contain diacritics and
uppercase letters.
This works by stripping diacritics both when building the dictionaries
(using word aliases added in cdict: https://github.com/Julow/cdict/pull/3)
and during lookup. Cdict then takes care of resolving the correct word.
The substitutions are generated using mappings from `fn`, `shift` and
all the `accent_*` modifiers into srcs/compose/substitutions.json
This can be updated easily when more mappings are added.
|
|
compile.py is changed to report when compose+Upper+Upper exists but
compse+Upper+Lower do not.
With these findings, many sequences are added.
|
|
Appending the ':' character to a sequence result forces it to be a
string final state. This will cause a KeyValue lookup that would
normally not happen for single-character results.
This is useful to make Tamil letters smaller, even when they are the
result of a Shift.
|
|
Sequences longer than two characters were not read correctly from json
files, creating conflicts and causing dropped sequences.
The detection of collision in sequences is also improved. Two colliding
sequences are removed.
|
|
|
|
Make compose sequences ending in the same character to share the ending
state.
This reduce the compiled compose key data size from 33kB to 27kB.
|
|
Useful to manager growing the collection of sequences.
|
|
This allows adding more compose sequences without modifying
en_US_UTF_8_Compose.pre.
This is done by grouping sequences files that should be merged together
into a directory. This also allows moving keysymdef.h into that
directory.
|
|
Sequences from several files are no longer merged but compiled to
separate starting states.
The plan is to use that to represent the diacritics.
|
|
Encoding errors in the compose data compiler due to:
- 'UTF-16' adds a BOM, use 'UTF-16-LE' instead
- 'str.encode' returns a byte array, use 'array' to have a 16-bit char
array.
|
|
Parse key names from keysymdef.h, which is distributed with Xorg. The
Greek, Cyrillic and Hebrew sequences referenced these keysyms.
This increases the number of sequences from 2043 to 2668.
|
|
Change the compose state machine definition to allow final states that
are wider than 16-bits.
This increases the number of sequences that can be used from
en_US_UTF_8_Compose.pre from 2013 to 2043 (of 3201).
|
|
There's no json file yet, this was part of an experiment.
Add a missing escape rule and detect colliding sequences.
|
|
compile.py implements a parser for X11's Compose.pre files. A lot of
code is necessary to interpret character names but thanksfully, the name
of most characters is contained in the file.
The state machine is compiled into two char arrays which unfortunately
requires an expensive initialisation and allocation.
|
|
The COMPOSE_PENDING modifier indicate whether a compose sequence is in
progress. The new key of kind Compose_pending sets the current state of
the sequence.
The compose sequences are compiled into a state machine by a python
script into a compact encoding.
The state of the pending compose is determined by the index of a state.
|