third_party/harfbuzz-ng/src/hb-ot-shape-normalize.cc - Issue 10510004: Roll harfbuzz-ng 3b8fd9c48f4bde368bf2d465c148b9743a9216ee

Side by Side Diff: third_party/harfbuzz-ng/src/hb-ot-shape-normalize.cc

Issue 10510004: Roll harfbuzz-ng 3b8fd9c48f4bde368bf2d465c148b9743a9216ee (Closed) Base URL: http://git.chromium.org/chromium/src.git@master

Patch Set: Created 8 years, 6 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

« no previous file with comments | « third_party/harfbuzz-ng/src/hb-ot-shape-complex-private.hh ('k') | third_party/harfbuzz-ng/src/hb-ot-shape-normalize-private.hh » ('j') | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Hide Comments ('s')

OLD	NEW
1 /*	1 /*

2 * Copyright © 2011 Google, Inc.	2 * Copyright © 2011,2012 Google, Inc.

3 *	3 *

4 * This is part of HarfBuzz, a text shaping library.	4 * This is part of HarfBuzz, a text shaping library.

5 *	5 *

6 * Permission is hereby granted, without written agreement and without	6 * Permission is hereby granted, without written agreement and without

7 * license or royalty fees, to use, copy, modify, and distribute this	7 * license or royalty fees, to use, copy, modify, and distribute this

8 * software and its documentation for any purpose, provided that the	8 * software and its documentation for any purpose, provided that the

9 * above copyright notice and the following two paragraphs appear in	9 * above copyright notice and the following two paragraphs appear in

10 * all copies of this software.	10 * all copies of this software.

11 *	11 *

12 * IN NO EVENT SHALL THE COPYRIGHT HOLDER BE LIABLE TO ANY PARTY FOR	12 * IN NO EVENT SHALL THE COPYRIGHT HOLDER BE LIABLE TO ANY PARTY FOR

13 * DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES	13 * DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES

14 * ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN	14 * ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN

15 * IF THE COPYRIGHT HOLDER HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH	15 * IF THE COPYRIGHT HOLDER HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH

16 * DAMAGE.	16 * DAMAGE.

17 *	17 *

18 * THE COPYRIGHT HOLDER SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING,	18 * THE COPYRIGHT HOLDER SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING,

19 * BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND	19 * BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND

20 * FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS	20 * FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS

21 * ON AN "AS IS" BASIS, AND THE COPYRIGHT HOLDER HAS NO OBLIGATION TO	21 * ON AN "AS IS" BASIS, AND THE COPYRIGHT HOLDER HAS NO OBLIGATION TO

22 * PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.	22 * PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.

23 *	23 *

24 * Google Author(s): Behdad Esfahbod	24 * Google Author(s): Behdad Esfahbod

25 */	25 */

26	26

	27 #include "hb-ot-shape-normalize-private.hh"

27 #include "hb-ot-shape-private.hh"	28 #include "hb-ot-shape-private.hh"

28 #include "hb-ot-shape-complex-private.hh"

29	29

30	30

31 /*	31 /*

32 * HIGHLEVEL DESIGN:	32 * HIGHLEVEL DESIGN:

33 *	33 *

34 * This file exports one main function: _hb_ot_shape_normalize().	34 * This file exports one main function: _hb_ot_shape_normalize().

35 *	35 *

36 * This function closely reflects the Unicode Normalization Algorithm,	36 * This function closely reflects the Unicode Normalization Algorithm,

37 * yet it's different. The shaper an either prefer decomposed (NFD) or	37 * yet it's different.

38 * composed (NFC).	38 *

	39 * Each shaper specifies whether it prefers decomposed (NFD) or composed (NFC).

	40 * The logic however tries to use whatever the font can support.

39 *	41 *

40 * In general what happens is that: each grapheme is decomposed in a chain	42 * In general what happens is that: each grapheme is decomposed in a chain

41 * of 1:2 decompositions, marks reordered, and then recomposed if desired,	43 * of 1:2 decompositions, marks reordered, and then recomposed if desired,

42 * so far it's like Unicode Normalization. However, the decomposition and	44 * so far it's like Unicode Normalization. However, the decomposition and

43 * recomposition only happens if the font supports the resulting characters.	45 * recomposition only happens if the font supports the resulting characters.

44 *	46 *

45 * The goals are:	47 * The goals are:

46 *	48 *

47 * - Try to render all canonically equivalent strings similarly. To really	49 * - Try to render all canonically equivalent strings similarly. To really

48 * achieve this we have to always do the full decomposition and then	50 * achieve this we have to always do the full decomposition and then

49 * selectively recompose from there. It's kinda too expensive though, so	51 * selectively recompose from there. It's kinda too expensive though, so

50 * we skip some cases. For example, if composed is desired, we simply	52 * we skip some cases. For example, if composed is desired, we simply

51 * don't touch 1-character clusters that are supported by the font, even	53 * don't touch 1-character clusters that are supported by the font, even

52 * though their NFC may be different.	54 * though their NFC may be different.

53 *	55 *

54 * - When a font has a precomposed character for a sequence but the 'ccmp'	56 * - When a font has a precomposed character for a sequence but the 'ccmp'

55 * feature in the font is not adequate, use the precomposed character	57 * feature in the font is not adequate, use the precomposed character

56 * which typically has better mark positioning.	58 * which typically has better mark positioning.

57 *	59 *

58 * - When a font does not support a combining mark, but supports it precompose d	60 * - When a font does not support a combining mark, but supports it precompose d

59 * with previous base. This needs the itemizer to have this knowledge too.	61 * with previous base, use that. This needs the itemizer to have this

60 * We need ot provide assistance to the itemizer.	62 * knowledge too. We need to provide assistance to the itemizer.

61 *	63 *

62 * - When a font does not support a character but supports its decomposition,	64 * - When a font does not support a character but supports its decomposition,

63 * well, use the decomposition.	65 * well, use the decomposition.

64 *	66 *

65 * - The Indic shaper requests decomposed output. This will handle splitting	67 * - The Indic shaper requests decomposed output. This will handle splitting

66 * matra for the Indic shaper.	68 * matra for the Indic shaper.

67 */	69 */

68	70

69 static void	71 static void

70 output_glyph (hb_ot_shape_context_t *c,	72 output_glyph (hb_buffer_t *buffer, hb_codepoint_t glyph)

71 » hb_codepoint_t glyph)

72 {	73 {

73 hb_buffer_t *buffer = c->buffer;

74

75 buffer->output_glyph (glyph);	74 buffer->output_glyph (glyph);

76 hb_glyph_info_set_unicode_props (&buffer->out_info[buffer->out_len - 1], buffe r->unicode);	75 _hb_glyph_info_set_unicode_props (&buffer->prev(), buffer->unicode);

77 }	76 }

78	77

79 static bool	78 static bool

80 decompose (hb_ot_shape_context_t *c,	79 decompose (hb_font_t font, hb_buffer_t buffer,

81 bool shortest,	80 bool shortest,

82 hb_codepoint_t ab)	81 hb_codepoint_t ab)

83 {	82 {

84 hb_codepoint_t a, b, glyph;	83 hb_codepoint_t a, b, glyph;

85	84

86 if (!hb_unicode_decompose (c->buffer->unicode, ab, &a, &b) \|\|	85 if (!hb_unicode_decompose (buffer->unicode, ab, &a, &b) \|\|

87 (b && !hb_font_get_glyph (c->font, b, 0, &glyph)))	86 (b && !hb_font_get_glyph (font, b, 0, &glyph)))

88 return FALSE;	87 return FALSE;

89	88

90 bool has_a = hb_font_get_glyph (c->font, a, 0, &glyph);	89 bool has_a = hb_font_get_glyph (font, a, 0, &glyph);

91 if (shortest && has_a) {	90 if (shortest && has_a) {

92 /* Output a and b */	91 /* Output a and b */

93 output_glyph (c, a);	92 output_glyph (buffer, a);

94 if (b)	93 if (b)

95 output_glyph (c, b);	94 output_glyph (buffer, b);

96 return TRUE;	95 return TRUE;

97 }	96 }

98	97

99 if (decompose (c, shortest, a)) {	98 if (decompose (font, buffer, shortest, a)) {

100 if (b)	99 if (b)

101 output_glyph (c, b);	100 output_glyph (buffer, b);

102 return TRUE;	101 return TRUE;

103 }	102 }

104	103

105 if (has_a) {	104 if (has_a) {

106 output_glyph (c, a);	105 output_glyph (buffer, a);

107 if (b)	106 if (b)

108 output_glyph (c, b);	107 output_glyph (buffer, b);

109 return TRUE;	108 return TRUE;

110 }	109 }

111	110

112 return FALSE;	111 return FALSE;

113 }	112 }

114	113

115 static void	114 static void

116 decompose_current_glyph (hb_ot_shape_context_t *c,	115 decompose_current_glyph (hb_font_t font, hb_buffer_t buffer,

117 bool shortest)	116 bool shortest)

118 {	117 {

119 if (decompose (c, shortest, c->buffer->info[c->buffer->idx].codepoint))	118 if (decompose (font, buffer, shortest, buffer->cur().codepoint))

120 c->buffer->skip_glyph ();	119 buffer->skip_glyph ();

121 else	120 else

122 c->buffer->next_glyph ();	121 buffer->next_glyph ();

123 }	122 }

124	123

125 static void	124 static void

126 decompose_single_char_cluster (hb_ot_shape_context_t *c,	125 decompose_single_char_cluster (hb_font_t font, hb_buffer_t buffer,

127 bool will_recompose)	126 bool will_recompose)

128 {	127 {

129 hb_codepoint_t glyph;	128 hb_codepoint_t glyph;

130	129

131 /* If recomposing and font supports this, we're good to go */	130 /* If recomposing and font supports this, we're good to go */

132 if (will_recompose && hb_font_get_glyph (c->font, c->buffer->info[c->buffer->i dx].codepoint, 0, &glyph)) {	131 if (will_recompose && hb_font_get_glyph (font, buffer->cur().codepoint, 0, &gl yph)) {

133 c->buffer->next_glyph ();	132 buffer->next_glyph ();

134 return;	133 return;

135 }	134 }

136	135

137 decompose_current_glyph (c, will_recompose);	136 decompose_current_glyph (font, buffer, will_recompose);

138 }	137 }

139	138

140 static void	139 static void

141 decompose_multi_char_cluster (hb_ot_shape_context_t *c,	140 decompose_multi_char_cluster (hb_font_t font, hb_buffer_t buffer,

142 unsigned int end)	141 unsigned int end)

143 {	142 {

144 /* TODO Currently if there's a variation-selector we give-up, it's just too ha rd. */	143 /* TODO Currently if there's a variation-selector we give-up, it's just too ha rd. */

145 for (unsigned int i = c->buffer->idx; i < end; i++)	144 for (unsigned int i = buffer->idx; i < end; i++)

146 if (unlikely (is_variation_selector (c->buffer->info[i].codepoint))) {	145 if (unlikely (_hb_unicode_is_variation_selector (buffer->info[i].codepoint)) ) {

147 while (c->buffer->idx < end)	146 while (buffer->idx < end)

148 » c->buffer->next_glyph ();	147 » buffer->next_glyph ();

149 return;	148 return;

150 }	149 }

151	150

152 while (c->buffer->idx < end)	151 while (buffer->idx < end)

153 decompose_current_glyph (c, FALSE);	152 decompose_current_glyph (font, buffer, FALSE);

154 }	153 }

155	154

156 static int	155 static int

157 compare_combining_class (const hb_glyph_info_t pa, const hb_glyph_info_t pb)	156 compare_combining_class (const hb_glyph_info_t pa, const hb_glyph_info_t pb)

158 {	157 {

159 unsigned int a = pa->combining_class();	158 unsigned int a = _hb_glyph_info_get_modified_combining_class (pa);

160 unsigned int b = pb->combining_class();	159 unsigned int b = _hb_glyph_info_get_modified_combining_class (pb);

161	160

162 return a < b ? -1 : a == b ? 0 : +1;	161 return a < b ? -1 : a == b ? 0 : +1;

163 }	162 }

164	163

165 void	164 void

166 _hb_ot_shape_normalize (hb_ot_shape_context_t *c)	165 _hb_ot_shape_normalize (hb_font_t font, hb_buffer_t buffer,

	166 » » » hb_ot_shape_normalization_mode_t mode)

167 {	167 {

168 hb_buffer_t *buffer = c->buffer;	168 bool recompose = mode != HB_OT_SHAPE_NORMALIZATION_MODE_DECOMPOSED;

169 bool recompose = !hb_ot_shape_complex_prefer_decomposed (c->plan->shaper);

170 bool has_multichar_clusters = FALSE;	169 bool has_multichar_clusters = FALSE;

171 unsigned int count;	170 unsigned int count;

172	171

173 /* We do a fairly straightforward yet custom normalization process in three	172 /* We do a fairly straightforward yet custom normalization process in three

174 * separate rounds: decompose, reorder, recompose (if desired). Currently	173 * separate rounds: decompose, reorder, recompose (if desired). Currently

175 * this makes two buffer swaps. We can make it faster by moving the last	174 * this makes two buffer swaps. We can make it faster by moving the last

176 * two rounds into the inner loop for the first round, but it's more readable	175 * two rounds into the inner loop for the first round, but it's more readable

177 * this way. */	176 * this way. */

178	177

179	178

180 /* First round, decompose */	179 /* First round, decompose */

181	180

182 buffer->clear_output ();	181 buffer->clear_output ();

183 count = buffer->len;	182 count = buffer->len;

184 for (buffer->idx = 0; buffer->idx < count;)	183 for (buffer->idx = 0; buffer->idx < count;)

185 {	184 {

186 unsigned int end;	185 unsigned int end;

187 for (end = buffer->idx + 1; end < count; end++)	186 for (end = buffer->idx + 1; end < count; end++)

188 if (buffer->info[buffer->idx].cluster != buffer->info[end].cluster)	187 if (buffer->cur().cluster != buffer->info[end].cluster)

189 break;	188 break;

190	189

191 if (buffer->idx + 1 == end)	190 if (buffer->idx + 1 == end)

192 decompose_single_char_cluster (c, recompose);	191 decompose_single_char_cluster (font, buffer, recompose);

193 else {	192 else {

194 decompose_multi_char_cluster (c, end);	193 decompose_multi_char_cluster (font, buffer, end);

195 has_multichar_clusters = TRUE;	194 has_multichar_clusters = TRUE;

196 }	195 }

197 }	196 }

198 buffer->swap_buffers ();	197 buffer->swap_buffers ();

199	198

200	199

201 /* Technically speaking, two characters with ccc=0 may combine. But all	200 if (mode != HB_OT_SHAPE_NORMALIZATION_MODE_COMPOSED_FULL && !has_multichar_clu sters)

202 * those cases are in languages that the indic module handles (which expects

203 * decomposed), or in Hangul jamo, which again, we want decomposed anyway.

204 * So we don't bother combining across cluster boundaries. This is a huge

205 * performance saver if the compose() callback is slow.

206 *

207 * TODO: Am I right about Hangul? If I am, we should add a Hangul module

208 * that requests decomposed. If for Hangul we end up wanting composed, we

209 * can do that in the Hangul module.

210 */

211

212 if (!has_multichar_clusters)

213 return; /* Done! */	201 return; /* Done! */

214	202

215	203

216 /* Second round, reorder (inplace) */	204 /* Second round, reorder (inplace) */

217	205

218 count = buffer->len;	206 count = buffer->len;

219 for (unsigned int i = 0; i < count; i++)	207 for (unsigned int i = 0; i < count; i++)

220 {	208 {

221 if (buffer->info[i].combining_class() == 0)	209 if (_hb_glyph_info_get_modified_combining_class (&buffer->info[i]) == 0)

222 continue;	210 continue;

223	211

224 unsigned int end;	212 unsigned int end;

225 for (end = i + 1; end < count; end++)	213 for (end = i + 1; end < count; end++)

226 if (buffer->info[end].combining_class() == 0)	214 if (_hb_glyph_info_get_modified_combining_class (&buffer->info[end]) == 0)

227 break;	215 break;

228	216

229 /* We are going to do a bubble-sort. Only do this if the	217 /* We are going to do a bubble-sort. Only do this if the

230 * sequence is short. Doing it on long sequences can result	218 * sequence is short. Doing it on long sequences can result

231 * in an O(n^2) DoS. */	219 * in an O(n^2) DoS. */

232 if (end - i > 10) {	220 if (end - i > 10) {

233 i = end;	221 i = end;

234 continue;	222 continue;

235 }	223 }

236	224

(...skipping 10 matching lines...) Expand all Loading...
247	235

248 /* As noted in the comment earlier, we don't try to combine	236 /* As noted in the comment earlier, we don't try to combine

249 * ccc=0 chars with their previous Starter. */	237 * ccc=0 chars with their previous Starter. */

250	238

251 buffer->clear_output ();	239 buffer->clear_output ();

252 count = buffer->len;	240 count = buffer->len;

253 unsigned int starter = 0;	241 unsigned int starter = 0;

254 buffer->next_glyph ();	242 buffer->next_glyph ();

255 while (buffer->idx < count)	243 while (buffer->idx < count)

256 {	244 {

257 if (buffer->info[buffer->idx].combining_class() == 0) {	245 hb_codepoint_t composed, glyph;

258 starter = buffer->out_len;	246 if (/* If mode is NOT COMPOSED_FULL (ie. it's COMPOSED_DIACRITICS), we don't try to

259 buffer->next_glyph ();	247 » * compose a CCC=0 character with it's preceding starter. */

	248 » (mode == HB_OT_SHAPE_NORMALIZATION_MODE_COMPOSED_FULL \|\|

	249 » _hb_glyph_info_get_modified_combining_class (&buffer->cur()) != 0) &&

	250 » /* If there's anything between the starter and this char, they should ha ve CCC

	251 » * smaller than this character's. */

	252 » (starter == buffer->out_len - 1 \|\|

	253 » _hb_glyph_info_get_modified_combining_class (&buffer->prev()) < _hb_gly ph_info_get_modified_combining_class (&buffer->cur())) &&

	254 » /* And compose. */

	255 » hb_unicode_compose (buffer->unicode,

	256 » » » buffer->out_info[starter].codepoint,

	257 » » » buffer->cur().codepoint,

	258 » » » &composed) &&

	259 » /* And the font has glyph for the composite. */

	260 » hb_font_get_glyph (font, composed, 0, &glyph))

	261 {

	262 /* Composes. Modify starter and carry on. */

	263 buffer->out_info[starter].codepoint = composed;

	264 /* XXX update cluster */

	265 _hb_glyph_info_set_unicode_props (&buffer->out_info[starter], buffer->unic ode);

	266

	267 buffer->skip_glyph ();

260 continue;	268 continue;

261 }	269 }

262	270

263 hb_codepoint_t composed, glyph;	271 /* Blocked, or doesn't compose. */

264 if ((buffer->out_info[buffer->out_len - 1].combining_class() >=	272 buffer->next_glyph ();

265 » buffer->info[buffer->idx].combining_class()) \|\|

266 » !hb_unicode_compose (c->buffer->unicode,

267 » » » buffer->out_info[starter].codepoint,

268 » » » buffer->info[buffer->idx].codepoint,

269 » » » &composed) \|\|

270 » !hb_font_get_glyph (c->font, composed, 0, &glyph))

271 {

272 /* Blocked, or doesn't compose. */

273 buffer->next_glyph ();

274 continue;

275 }

276	273

277 /* Composes. Modify starter and carry on. */	274 if (_hb_glyph_info_get_modified_combining_class (&buffer->prev()) == 0)

278 buffer->out_info[starter].codepoint = composed;	275 starter = buffer->out_len - 1;

279 hb_glyph_info_set_unicode_props (&buffer->out_info[starter], buffer->unicode );

280

281 buffer->skip_glyph ();

282 }	276 }

283 buffer->swap_buffers ();	277 buffer->swap_buffers ();

284	278

285 }	279 }

286

OLD	NEW