OLD | NEW |
| (Empty) |
1 // Copyright (c) 2012, the Dart project authors. Please see the AUTHORS file | |
2 // for details. All rights reserved. Use of this source code is governed by a | |
3 // BSD-style license that can be found in the LICENSE file. | |
4 | |
5 /** | |
6 * Bidi stands for Bi-directional text. | |
7 * According to [Wikipedia](http://en.wikipedia.org/wiki/Bi-directional_text): | |
8 * Bi-directional text is text containing text in both text directionalities, | |
9 * both right-to-left (RTL) and left-to-right (LTR). It generally involves text | |
10 * containing different types of alphabets, but may also refer to boustrophedon, | |
11 * which is changing text directionality in each row. | |
12 * | |
13 * Utility class for formatting display text in a potentially | |
14 * opposite-directionality context without garbling layout issues. | |
15 * Mostly a very "slimmed-down" and dart-ified port of the Closure Birectional | |
16 * formatting libary. If there is a utility in the Closure library (or ICU, or | |
17 * elsewhere) that you would like this formatter to make available, please | |
18 * contact the Dart team. | |
19 * | |
20 * Provides the following functionality: | |
21 * | |
22 * 1. *BiDi Wrapping* | |
23 * When text in one language is mixed into a document in another, opposite- | |
24 * directionality language, e.g. when an English business name is embedded in a | |
25 * Hebrew web page, both the inserted string and the text following it may be | |
26 * displayed incorrectly unless the inserted string is explicitly separated | |
27 * from the surrounding text in a "wrapper" that declares its directionality at | |
28 * the start and then resets it back at the end. This wrapping can be done in | |
29 * HTML mark-up (e.g. a 'span dir=rtl' tag) or - only in contexts where mark-up | |
30 * can not be used - in Unicode BiDi formatting codes (LRE|RLE and PDF). | |
31 * Providing such wrapping services is the basic purpose of the BiDi formatter. | |
32 * | |
33 * 2. *Directionality estimation* | |
34 * How does one know whether a string about to be inserted into surrounding | |
35 * text has the same directionality? Well, in many cases, one knows that this | |
36 * must be the case when writing the code doing the insertion, e.g. when a | |
37 * localized message is inserted into a localized page. In such cases there is | |
38 * no need to involve the BiDi formatter at all. In the remaining cases, e.g. | |
39 * when the string is user-entered or comes from a database, the language of | |
40 * the string (and thus its directionality) is not known a priori, and must be | |
41 * estimated at run-time. The BiDi formatter does this automatically. | |
42 * | |
43 * 3. *Escaping* | |
44 * When wrapping plain text - i.e. text that is not already HTML or HTML- | |
45 * escaped - in HTML mark-up, the text must first be HTML-escaped to prevent XSS | |
46 * attacks and other nasty business. This of course is always true, but the | |
47 * escaping cannot be done after the string has already been wrapped in | |
48 * mark-up, so the BiDi formatter also serves as a last chance and includes | |
49 * escaping services. | |
50 * | |
51 * Thus, in a single call, the formatter will escape the input string as | |
52 * specified, determine its directionality, and wrap it as necessary. It is | |
53 * then up to the caller to insert the return value in the output. | |
54 */ | |
55 | |
56 class BidiFormatter { | |
57 | |
58 /** The direction of the surrounding text (the context). */ | |
59 TextDirection contextDirection; | |
60 | |
61 /** | |
62 * Indicates if we should always wrap the formatted text in a <span<,. | |
63 */ | |
64 bool _alwaysSpan; | |
65 | |
66 /** | |
67 * Create a formatting object with a direction. If [alwaysSpan] is true we | |
68 * should always use a `span` tag, even when the input directionality is | |
69 * neutral or matches the context, so that the DOM structure of the output | |
70 * does not depend on the combination of directionalities. | |
71 */ | |
72 BidiFormatter.LTR([alwaysSpan=false]) : contextDirection = TextDirection.LTR, | |
73 _alwaysSpan = alwaysSpan; | |
74 BidiFormatter.RTL([alwaysSpan=false]) : contextDirection = TextDirection.RTL, | |
75 _alwaysSpan = alwaysSpan; | |
76 BidiFormatter.UNKNOWN([alwaysSpan=false]) : | |
77 contextDirection = TextDirection.UNKNOWN, _alwaysSpan = alwaysSpan; | |
78 | |
79 /** Is true if the known context direction for this formatter is RTL. */ | |
80 bool get isRTL() => contextDirection == TextDirection.RTL; | |
81 | |
82 /** | |
83 * Formats a string of a given (or estimated, if not provided) | |
84 * [direction] for use in HTML output of the context directionality, so | |
85 * an opposite-directionality string is neither garbled nor garbles what | |
86 * follows it. | |
87 * If the input string's directionality doesn't match the context | |
88 * directionality, we wrap it with a `span` tag and add a `dir` attribute | |
89 * (either "dir=rtl" or "dir=ltr"). | |
90 * If alwaysSpan was true when constructing the formatter, the input is always | |
91 * wrapped with `span` tag, skipping the dir attribute when it's not needed. | |
92 * | |
93 * If [resetDir] is true and the overall directionality or the exit | |
94 * directionality of [text] is opposite to the context directionality, | |
95 * a trailing unicode BiDi mark matching the context directionality is | |
96 * appended (LRM or RLM). If [isHtml] is false, we HTML-escape the [text]. | |
97 */ | |
98 String wrapWithSpan(String text, [bool isHtml=false, bool resetDir=true, | |
99 TextDirection direction]) { | |
100 if (direction == null) direction = estimateDirection(text, isHtml); | |
101 var result; | |
102 if (!isHtml) text = htmlEscape(text); | |
103 var directionChange = contextDirection.isDirectionChange(direction); | |
104 if (_alwaysSpan || directionChange) { | |
105 var spanDirection = ''; | |
106 if (directionChange) { | |
107 spanDirection = ' dir=${direction.spanText}'; | |
108 } | |
109 result= '<span$spanDirection>$text</span>'; | |
110 } else { | |
111 result = text; | |
112 } | |
113 return result.concat(resetDir? _resetDir(text, direction, isHtml) : ''); | |
114 } | |
115 | |
116 /** | |
117 * Format [text] of a known (if specified) or estimated [direction] for use | |
118 * in *plain-text* output of the context directionality, so an | |
119 * opposite-directionality text is neither garbled nor garbles what follows | |
120 * it. Unlike wrapWithSpan, this makes use of unicode BiDi formatting | |
121 * characters instead of spans for wrapping. The returned string would be | |
122 * RLE+text+PDF for RTL text, or LRE+text+PDF for LTR text. | |
123 * | |
124 * If [resetDir] is true, and if the overall directionality or the exit | |
125 * directionality of text are opposite to the context directionality, | |
126 * a trailing unicode BiDi mark matching the context directionality is | |
127 * appended (LRM or RLM). | |
128 * | |
129 * In HTML, the *only* valid use of this function is inside of elements that | |
130 * do not allow markup, e.g. an 'option' tag. | |
131 * This function does *not* do HTML-escaping regardless of the value of | |
132 * [isHtml]. [isHtml] is used to designate if the text contains HTML (escaped | |
133 * or unescaped). | |
134 */ | |
135 String wrapWithUnicode(String text, [bool isHtml=false, bool resetDir=true, | |
136 TextDirection direction]) { | |
137 if (direction == null) direction = estimateDirection(text, isHtml); | |
138 var result = text; | |
139 if (contextDirection.isDirectionChange(direction)) { | |
140 result = '''${direction == TextDirection.RTL ? RLE : LRE}$text$PDF'''; | |
141 } | |
142 return result.concat(resetDir? _resetDir(text, direction, isHtml) : ''); | |
143 } | |
144 | |
145 /** | |
146 * Estimates the directionality of [text] using the best known | |
147 * general-purpose method (using relative word counts). A | |
148 * TextDirection.UNKNOWN return value indicates completely neutral input. | |
149 * [isHtml] is true if [text] HTML or HTML-escaped. | |
150 */ | |
151 TextDirection estimateDirection(String text, [bool isHtml=false]) { | |
152 return estimateDirectionOfText(text, isHtml); //TODO~!!! | |
153 } | |
154 | |
155 /** | |
156 * Returns a unicode BiDi mark matching the surrounding context's [direction] | |
157 * (not necessarily the direction of [text]). The function returns an LRM or | |
158 * RLM if the overall directionality or the exit directionality of [text] is | |
159 * opposite the context directionality. Otherwise | |
160 * return the empty string. [isHtml] is true if [text] is HTML or | |
161 * HTML-escaped. | |
162 */ | |
163 String _resetDir(String text, TextDirection direction, bool isHtml) { | |
164 // endsWithRtl and endsWithLtr are called only if needed (short-circuit). | |
165 if ((contextDirection == TextDirection.LTR && | |
166 (direction == TextDirection.RTL || | |
167 endsWithRtl(text, isHtml))) || | |
168 (contextDirection == TextDirection.RTL && | |
169 (direction == TextDirection.LTR || | |
170 endsWithLtr(text, isHtml)))) { | |
171 if (contextDirection == TextDirection.LTR) { | |
172 return LRM; | |
173 } else { | |
174 return RLM; | |
175 } | |
176 } else { | |
177 return ''; | |
178 } | |
179 } | |
180 } | |
OLD | NEW |