Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(667)

Side by Side Diff: third_party/hyphen/README.nonstandard

Issue 20860003: Remove hyphenation code from Chromium. (Closed) Base URL: svn://svn.chromium.org/chrome/trunk/src
Patch Set: rebase Created 7 years, 4 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch | Annotate | Revision Log
« no previous file with comments | « third_party/hyphen/README.hyphen ('k') | third_party/hyphen/README_hyph_en_US.txt » ('j') | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
(Empty)
1 Non-standard hyphenation
2 ------------------------
3
4 Some languages use non-standard hyphenation; `discretionary'
5 character changes at hyphenation points. For example,
6 Catalan: paral·lel -> paral-lel,
7 Dutch: omaatje -> oma-tje,
8 German (before the new orthography): Schiffahrt -> Schiff-fahrt,
9 Hungarian: asszonnyal -> asz-szony-nyal (multiple occurance!)
10 Swedish: tillata -> till-lata.
11
12 Using this extended library, you can define
13 non-standard hyphenation patterns. For example:
14
15 l·1l/l=l
16 a1atje./a=t,1,3
17 .schif1fahrt/ff=f,5,2
18 .as3szon/sz=sz,2,3
19 n1nyal./ny=ny,1,3
20 .til1lata./ll=l,3,2
21
22 or with narrow boundaries:
23
24 l·1l/l=,1,2
25 a1atje./a=,1,1
26 .schif1fahrt/ff=,5,1
27 .as3szon/sz=,2,1
28 n1nyal./ny=,1,1
29 .til1lata./ll=,3,1
30
31 Note: Libhnj uses modified patterns by preparing substrings.pl.
32 Unfortunatelly, now the conversion step can generate bad non-standard
33 patterns (non-standard -> standard pattern conversion), so using
34 narrow boundaries may be better for recent Libhnj. For example,
35 substrings.pl generates a few bad patterns for Hungarian hyphenation
36 patterns resulting bad non-standard hyphenation in a few cases. Using narrow
37 boundaries solves this problem. Java HyFo module can check this problem.
38
39 Syntax of the non-standard hyphenation patterns
40 ------------------------------------------------
41
42 pat1tern/change[,start,cut]
43
44 If this pattern matches the word, and this pattern win (see README.hyphen)
45 in the change region of the pattern, then pattern[start, start + cut - 1]
46 substring will be replaced with the "change".
47
48 For example, a German ff -> ff-f hyphenation:
49
50 f1f/ff=f
51
52 or with expansion
53
54 f1f/ff=f,1,2
55
56 will change every "ff" with "ff=f" at hyphenation.
57
58 A more real example:
59
60 % simple ff -> f-f hyphenation
61 f1f
62 % Schiffahrt -> Schiff-fahrt hyphenation
63 %
64 schif3fahrt/ff=f,5,2
65
66 Specification
67
68 - Pattern: matching patterns of the original Liang's algorithm
69 - patterns must contain only one hyphenation point at change region
70 signed with an one-digit odd number (1, 3, 5, 7 or 9).
71 These point may be at subregion boundaries: schif3fahrt/ff=,5,1
72 - only the greater value guarantees the win (don't mix non-standard and
73 non-standard patterns with the same value, for example
74 instead of f3f and schif3fahrt/ff=f,5,2 use f3f and schif5fahrt/ff=f,5,2)
75
76 - Change: new characters.
77 Arbitrary character sequence. Equal sign (=) signs hyphenation points
78 for OpenOffice.org (like in the example). (In a possible German LaTeX
79 preprocessor, ff could be replaced with "ff, for a Hungarian one, ssz
80 with `ssz, according to the German and Hungarian Babel settings.)
81
82 - Start: starting position of the change region.
83 - begins with 1 (not 0): schif3fahrt/ff=f,5,2
84 - start dot doesn't matter: .schif3fahrt/ff=f,5,2
85 - numbers don't matter: .s2c2h2i2f3f2ahrt/ff=f,5,2
86 - In UTF-8 encoding, use Unicode character positions: össze/sz=sz,2,3
87 ("össze" looks "össze" in an ISO 8859-1 8-bit editor).
88
89 - Cut: length of the removed character sequence in the original word.
90 - In UTF-8 encoding, use Unicode character length: paral·1lel/l=l,5,3
91 ("paral·lel" looks "paral·1lel" in an ISO 8859-1 8-bit editor).
92
93 Dictionary developing
94 ---------------------
95
96 There hasn't been extended PatGen pattern generator for non-standard
97 hyphenation patterns, yet.
98
99 Fortunatelly, non-standard hyphenation points are forbidden in the PatGen
100 generated hyphenation patterns, so with a little patch can be develop
101 non-standard hyphenation patterns also in this case.
102
103 Warning: If you use UTF-8 Unicode encoding in your patterns, call
104 substrings.pl with UTF-8 parameter to calculate right
105 character positions for non-standard hyphenation:
106
107 ./substrings.pl input output UTF-8
108
109 Programming
110 -----------
111
112 Use hyphenate2() or hyphenate3() to handle non-standard hyphenation.
113 See hyphen.h for the documentation of the hyphenate*() functions.
114 See example.c for processing the output of the hyphenate*() functions.
115
116 Warning: change characters are lower cased in the source, so you may need
117 case conversion of the change characters based on input word case detection.
118 For example, see OpenOffice.org source
119 (lingucomponent/source/hyphenator/altlinuxhyph/hyphen/hyphenimp.cxx).
120
121 László Németh
122 <nemeth (at) openoffice.org>
OLDNEW
« no previous file with comments | « third_party/hyphen/README.hyphen ('k') | third_party/hyphen/README_hyph_en_US.txt » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698