Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(62)

Side by Side Diff: recipes/src/core/strings.md

Issue 12335109: Strings recipes for the Dart Cookbook (Closed) Base URL: https://github.com/dart-lang/cookbook.git@master
Patch Set: Made most changes requested my Kathy. Created 7 years, 9 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « recipes/pubspec.yaml ('k') | recipes/test/all_tests.dart » ('j') | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
(Empty)
1 # Strings
2
3 Dart strings are immutable: once you create a string, you cannot change it.
4 You can always build a string out of other strings, or assign the results
5 of calling a method on a string to a new string.
6
7 String literals can be written in three ways: with single quotes ('with
8 embedded "double" quotes'), with double quotes: "with embedded 'single'
9 quotes"), or with triple quotes ('''With single quotes''', """With double
10 quotes"""). Triple quoted strings can span multiple lines with associated
11 whitespace preserved.
12
13 Dart does not have a char type. Indexing operations on strings give you
14 one-character strings.
15
16 Dart strings support string concatenation and expression interpolation. The
17 String class provides methods for searching inside a string, extracting
18 substrings, handling case, trimming whitespace, replacing a part of a
19 string, and more. The StringBuffer class lets you programmatically build
20 up a string in an efficient manner. You can use regular expressions
21 (RegExp objects) to search within strings and to replace parts of strings.
22
23 Dart string characters are encoded in UTF-16. Decoding UTF-16 yields Unicode
24 code points. Borrowing terminology from Go, Dart uses the term `rune` for an
25 integer representing a Unicode code point. The runes of a String are accessible
26 throught the `runes` getter.
27
28 Dart strings support the full Unicode range, and cover every alphabetic system
29 in use in the whole world. The String library provides support for the correct
30 handling of extended UTF-16 characters.
31
32 ## Concatenating Strings
33
34 ### Problem
35
36 You want to concatenate strings in Dart. You tried using `+`, but
37 that resulted in an error.
38
39 ### Solution
40
41 Use adjacent string literals:
42
43 var fact = 'Dart' 'is' ' fun!'; // 'Dart is fun!'
44
45 ### Discussion
46
47 Adjacent literals also work over multiple lines:
48
49 var fact = 'Dart'
50 'is'
51 'fun!'; // 'Dart is fun!'
52
53 They also work when using multiline strings:
54
55 var lunch = '''Peanut
56 butter'''
57 '''and
58 jelly'''; // 'Peanut\nbutter and\njelly'
59
60 You can concatenate adjacent single line literals with multiline strings:
61
62 var funnyGuys = 'Dewey ' 'Cheatem'
63 ''' and
64 Howe'''; // 'Dewey Cheatem and\n Howe'
65
66
67 #### Alternatives to adjacent string literals
68
69 You can also use the `concat()` method on a string to concatenate it to another
70 string:
71
72 var film = filmToWatch();
73 film = film.concat('\n'); // 'The Big Lebowski\n'
74
75 Since `concat()` creates a new string every time it is invoked, a long chain of
76 `concat()`s can be expensive. Avoid those. Use a StringBuffer instead (see
77 _Incrementally building a string efficiently using a StringBuffer_, below).
78
79 Use can `join()` to combine a sequence of strings:
80
81 var film = ['The', 'Big', 'Lebowski']).join(' '); // 'The Big Lebowski'
82
83 You can also use string interpolation to concatenate strings (see
84 _Interpolating expressions inside strings_, below).
85
86
87 ## Interpolating expressions inside strings
88
89 ### Problem
90
91 You want to create strings that contain Dart expressions and identifiers.
92
93 ### Solution
94
95 You can put the value of an expression inside a string by using ${expression}.
96
97 var favFood = 'sushi';
98 var whatDoILove = 'I love ${favFood.toUpperCase()}'; // 'I love SUSHI'
99
100 You can skip the {} if the expression is an identifier:
101
102 var whatDoILove = 'I love $favFood'; // 'I love sushi'
103
104 ### Discussion
105
106 An interpolated string, `string ${expression}` is equivalent to the
107 concatenation of the strings 'string ' and `expression.toString()`.
108 Consider this code:
109
110 var four = 4;
111 var seasons = 'The $four seasons'; // 'The 4 seasons'
112
113 It is equivalent to the following:
114
115 var seasons = 'The '.concat(4.toString()).concat(' seasons'); // 'The 4 seas ons'
116
117 You should consider implementing a `toString()` method for user-defined
118 objects. Here's what happens if you don't:
119
120 class Point {
121 num x, y;
122 Point(this.x, this.y);
123 }
124
125 var point = new Point(3, 4);
126 print('Point: $point'); // "Point: Instance of 'Point'"
127
128 Probably not what you wanted. Here is the same example with an explicit
129 `toString()`:
130
131 class Point {
132 ...
133
134 String toString() => 'x: $x, y: $y';
135 }
136
137 print('Point: $point'); // 'Point: x: 3, y: 4'
138
139
140 ## Escaping special characters
141
142 ### Problem
143
144 You want to put newlines, dollar signs, or other special characters in your stri ngs.
145
146 ### Solution
147
148 Prefix special characters with a `\`.
149
150 print(Wile\nCoyote');
151 // Wile
152 // Coyote
153
154 ### Discussion
155
156 Dart designates a few characters as special, and these can be escaped:
157
158 - \n for newline, equivalent to \x0A.
159 - \r for carriage return, equivalent to \x0D.
160 - \f for form feed, equivalent to \x0C.
161 - \b for backspace, equivalent to \x08.
162 - \t for tab, equivalent to \x09.
163 - \v for vertical tab, equivalent to \x0B.
164
165 If you prefer, you can use `\x` or `\u` notation to indicate the special
166 character:
167
168 print('Wile\x0ACoyote'); // same as print('Wile\nCoyote');
169 print('Wile\u000ACoyote'); // same as print('Wile\nCoyote');
170
171 You can also use `\u{}` notation:
172
173 print('Wile\u{000A}Coyote'); // same as print('Wile\nCoyote');
174
175 You can also escape the `$` used in string interpolation:
176
177 var superGenius = 'Wile Coyote';
178 print('$superGenius and Road Runner'); // 'Wile Coyote and Road Runner'
179 print('\$superGenius and Road Runner'); // '$superGenius and Road Runner'
180
181 If you escape a non-special character, the `\` is ignored:
182
183 print('Wile \E Coyote'); // 'Wile E Coyote'
184
185
186 ## Incrementally building a string efficiently using a StringBuffer
187
188 ### Problem
189
190 You want to collect string fragments and combine them in an efficient manner.
191
192 ### Solution
193
194 Use a StringBuffer to programmatically generate a string. A StringBuffer
195 collects the string fragments, but does not generate a new string until
196 `toString()` is called:
197
198 var sb = new StringBuffer();
199 sb.write('John, ');
200 sb.write('Paul, ');
201 sb.write('George, ');
202 sb.write('and Ringo');
203 var beatles = sb.toString(); // 'John, Paul, George, and Ringo'
204
205 ### Discussion
206
207 In addition to `write()`, the StringBuffer class provides methods to write a
208 list of strings (`writeAll()`), write a numerical character code
209 (`writeCharCode()`), write with an added newline ('writeln()`), and more. Here
210 is a simple example that show the use of these methods:
211
212 var sb = new StringBuffer();
213 sb.writeln('The Beatles:');
214 sb.writeAll(['John, ', 'Paul, ', 'George, and Ringo']);
215 sb.writeCharCode(33); // charCode for '!'.
216 var beatles = sb.toString(); // 'The Beatles:\nJohn, Paul, George, and Ringo !'
217
218 Since a StringBuffer waits until the call to `toString()` to generate the
219 concatenated string, it represents a more efficient way of combining strings
220 than `concat()`. See the _Concatenating Strings_ recipe for a description of
221 `concat()`.
222
223 ## Converting between string characters and numerical codes
224
225 ### Problem
226
227 You want to convert string characters into numerical codes and back.
228
229 ### Solution
230
231 Use the `runes` getter to access a string's code points:
232
233 'Dart'.runes.toList(); // [68, 97, 114, 116]
234
235 var smileyFace = '\u263A'; // ☺
236 smileyFace.runes.toList(); // [9786]
237
238 The number 9786 represents the code unit '\u263A'.
239
240 Use `string.codeUnits` to get a string's UTF-16 code units:
241
242 'Dart'.codeUnits.toList(); // [68, 97, 114, 116]
243 smileyFace.codeUnits.toList(); // [9786]
244
245 ### Discussion
246
247 Notice that using `runes` and `codeUnits` produces identical results
248 in the examples above. That happens because each character in 'Dart' and in
249 `smileyFace` fits within 16 bits, resulting in a code unit corresponding
250 neatly with a code point.
251
252 Consider an example where a character cannot be represented within 16-bits,
253 the Unicode character for a Treble clef ('\u{1F3BC}'). This character consists
254 of a surrogate pair: '\uD83C', '\uDFBC'. Getting the numerical value of this
255 character using `codeUnits` and `runes` produces the following result:
256
257 var clef = '\u{1F3BC}'; // 🎼
258 clef.codeUnits.toList(); // [55356, 57276]
259 clef.runes.toList(); // [127932]
260
261 The numbers 55356 and 57276 represent `clef`'s surrogate pair, '\uD83C' and
262 '\uDFBC', respectively. The number 127932 represents the code point '\u1F3BC'.
263
264 #### Using codeUnitAt() to access individual code units
265
266 To access the 16-Bit UTF-16 code unit at a particular index, use
267 `codeUnitAt()`:
268
269 'Dart'.codeUnitAt(0); // 68
270 smileyFace.codeUnitAt(0); // 9786
271
272 Using `codeUnitAt()` with the multi-byte `clef` character leads to problems:
273
274 clef.codeUnitAt(0); // 55356
275 clef.codeUnitAt(1); // 57276
276
277 In either call to `clef.codeUnitAt()`, the values returned represent strings
278 that are only one half of a UTF-16 surrogate pair. These are not valid UTF-16
279 strings.
280
281
282 #### Converting numerical codes to strings
283
284 You can generate a new string from runes or code units using the factory
285 `String.fromCharCodes(charCodes)`:
286
287 new String.fromCharCodes([68, 97, 114, 116]); // 'Dart'
288
289 new String.fromCharCodes([73, 32, 9825, 32, 76, 117, 99, 121]);
290 // 'I ♡ Lucy'
291
292 new String.fromCharCodes([55356, 57276]); // 🎼
293 new String.fromCharCodes([127932]), // 🎼
294
295 You can use the `String.fromCharCode()` factory to convert a single rune or
296 code unit to a string:
297
298 new String.fromCharCode(68); // 'D'
299 new String.fromCharCode(9786); // ☺
300 new String.fromCharCode(127932); // 🎼
301
302 Creating a string with only one half of a surrogate pair is permitted, but not
303 recommended.
304
305 ## Determining if a string is empty
306
307 ### Problem
308
309 You want to know if a string is empty. You tried ` if(string) {...}`, but that
310 did not work.
311
312 ### Solution
313
314 Use `string.isEmpty`:
315
316 var emptyString = '';
317 emptyString.isEmpty; // true
318
319 A string with a space is not empty:
320
321 var space = ' ';
322 space.isEmpty; // false
323
324 ### Discussion
325
326 Don't use `if (string)` to test the emptiness of a string. In Dart, all
327 objects except the boolean true evaluate to false. `if(string)` will always
328 be false.
329
330
331 ## Removing leading and trailing whitespace
332
333 ### Problem
334
335 You want to remove leading and trailing whitespace from a string.
336
337 ### Solution
338
339 Use `string.trim()`:
340
341 var space = '\n\r\f\t\v'; // We'll use a variety of space characters.
342 var string = '$space X $space';
343 var newString = string.trim(); // 'X'
344
345 The String class has no methods to remove only leading or only trailing
346 whitespace. But you can always use regExps.
347
348 Remove only leading whitespace:
349
350 var newString = string.replaceFirst(new RegExp(r'^\s+'), ''); // 'X $space'
351
352 Remove only trailing whitespace:
353
354 var newString = string.replaceFirst(new RegExp(r'\s+$'), ''); // '$space X'
355
356
357 ## Calculating the length of a string
358
359 ### Problem
360
361 You want to get the length of a string, but are not sure how to
362 correctly calculate the length when working with Unicode.
363
364 ### Solution
365
366 Use string.length to get the number of UTF-16 code units in a string:
367
368 'I love music'.length; // 12
369 'I love music'.runes.length; // 12
370
371 ### Discussion
372
373 For characters that fit into 16 bits, the code unit length is the same as the
374 rune length:
375
376 var hearts = '\u2661'; // ♡
377 hearts.length; // 1
378 hearts.runes.length; // 1
379
380 If the string contains any characters outside the Basic Multilingual
381 Plane (BMP), the rune length will be less than the code unit length:
382
383 var clef = '\u{1F3BC}'; // 🎼
384 clef.length; // 2
385 clef.runes.length; // 1
386
387 var music = 'I $hearts $clef'; // 'I ♡ 🎼 '
388 music.length; // 6
389 music.runes.length // 5
390
391 Use `length` if you want to number of code units; use `runes.length` if you
392 want the number of runes.
393
394
395 ## Subscripting a string
396
397 ### Problem
398
399 You want to be able to access a character in a string at a particular index.
400
401 ### Solution
402
403 Subscript runes:
404
405 var teacup = '\u{1F375}'; // 🍵
406 teacup.runes.toList()[0]; // 127861
407
408 The number 127861 represents the code point for teacup, '\u{1F375}' (🍵 ).
409
410 ### Discussion
411
412 Subscripting a string directly can be problematic. This is because the default
413 `[]` implementation subscripts along code units. This means that
414 for non-BMP characters, subscripting yields invalid UTF-16 characters:
415
416 'Dart'[0]; // 'D'
417
418 var hearts = '\u2661'; // ♡
419 hearts[0]; '\u2661' // ♡
420
421 teacup[0]; // 55356, Invalid string, half of a surrogate pair.
422 teacup.codeUnits.toList()[0]; // The same.
423
424
425 ## Processing a string one character at a time
426
427 ### Problem
428
429 You want to do something with each individual character in a string.
430
431 ### Solution
432
433 To access an individual character, map the string runes:
434
435 var charList = "Dart".runes.map((rune) => '*${new String.fromCharCode(rune)} *').toList();
436 // ['*D*', '*a*', '*r*', '*t*']
437
438 var runeList = happy.runes.map((rune) => [rune, new String.fromCharCode(rune )]).toList(),
439 // [[73, 'I'], [32, ' '], [97, 'a'], [109, 'm'], [32, ' '], [9786, '☺']]
440
441 If you are sure that the string is in the Basic Multilingual Plane (BMP), you
442 can use string.split(''):
443
444 'Dart'.split(''); // ['D', 'a', 'r', 't']
445 smileyFace.split('').length; // 1
446
447 Since `split('')` splits at the UTF-16 code unit boundaries,
448 invoking it on a non-BMP character yields the string's surrogate pair:
449
450 var clef = '\u{1F3BC}'; // 🎼 , not in BMP.
451 clef.split('').length; // 2
452
453 The surrogate pair members are not valid UTF-16 strings.
454
455
456 ## Splitting a string into substrings
457
458 ### Problem
459
460 You want to split a string into substrings.
461
462 ### Solution
463
464 Use the `split()` method with a string or a regExp as an argument.
465
466 var smileyFace = '\u263A';
467 var happy = 'I am $smileyFace';
468 happy.split(' '); // ['I', 'am', '☺']
469
470 Here is an example of using `split()` with a regExp:
471
472 var nums = '2/7 3 4/5 3~/5';
473 var numsRegExp = new RegExp(r'(\s|/|~/)');
474 nums.split(numsRegExp); // ['2', '7', '3', '4', '5', '3', '5']
475
476 In the code above, the string `nums` contains various numbers, some of which
477 are expressed as fractions or as int-divisions. A regExp is used to split the
478 string to extract just the numbers.
479
480 You can perform operations on the matched and unmatched portions of a string
481 when using `split()` with a regExp:
482
483 'Eats SHOOTS leaves'.splitMapJoin((new RegExp(r'SHOOTS')),
484 onMatch: (m) => '*${m.group(0).toLowerCase()}*',
485 onNonMatch: (n) => n.toUpperCase()); // 'EATS *shoots* LEAVES'
486
487 The regExp matches the middle word ('SHOOTS'). A pair of callbacks are
488 registered to transform the matched and unmatched substrings before the
489 substrings are joined together again.
490
491
492 ## Changing string case
493
494 ### Problem
495
496 You want to change the case of strings.
497
498 ### Solution
499
500 Use `string.toUpperCase()` and `string.toLowerCase()` to convert a string to
501 lower-case or upper-case, respectively:
502
503 var theOneILove = 'I love Lucy';
504 theOneILove.toUpperCase(); // 'I LOVE LUCY!'
505 theOneILove.toLowerCase(); // 'i love lucy!'
506
507 ### Discussion
508
509 Case changes affect the characters of bi-cameral scripts like Greek and French:
510 var zeus = '\u0394\u03af\u03b1\u03c2'; // 'Δίας' (Zeus in modern Greek)
511 zeus.toUpperCase(); // 'ΔΊΑΣ'
512
513 var resume = '\u0052\u00e9\u0073\u0075\u006d\u00e9'; // 'Résumé'
514 resume.toLowerCase(); // 'résumé'
515
516 They do not affect the characters of uni-cameral scripts like Devanagari (used f or
517 writing many of the languages of India):
518
519 var chickenKebab = '\u091a\u093f\u0915\u0928 \u0915\u092c\u093e\u092c';
520 // 'चिकन कबाब' (in Devanagari)
521 chickenKebab.toLowerCase(); // 'चिकन कबाब'
522 chickenKebab.toUpperCase(); // 'चिकन कबाब'
523
524 If a character's case does not change when using `toUpperCase()` and
525 `toLowerCase()`, it is most likely because the character only has one
526 form.
527
528 ## Determining whether a string contains another string
529
530 ### Problem
531
532 You want to find out if a string is the substring of another string.
533
534 ### Solution
535
536 Use `string.contains()`:
537
538 var fact = 'Dart strings are immutable';
539 string.contains('immutable'); // True.
540
541 You can indicate a startIndex as a second argument:
542
543 string.contains('Dart', 2); // False
544
545 ### Discussion
546
547 The String library provides a couple of shortcuts for testing whether a string
548 is a substring of another:
549
550 string.startsWith('Dart'); // True.
551 string.endsWith('e'); // True.
552
553 You can also use `string.indexOf()`, which returns -1 if the substring is
554 not found within a string, and its matching index, if it is:
555
556 string.indexOf('art') != -1; // True, `art` is found in `Dart`
557
558 You can also use a regExp and `hasMatch()`:
559
560 new RegExp(r'ar[et]').hasMatch(string); // True, 'art' and 'are' match.
561
562
563 ## Finding matches of a regExp pattern in a string
564
565 ### Problem
566
567 You want to use regExp to match a pattern in a string, and
568 want to be able to access the matches.
569
570 ### Solution
571
572 Construct a regular expression using the RegExp class and find matches using
573 the `allMatches()` method:
574
575 var neverEatingThat = 'Not with a fox, not in a box';
576 var regExp = new RegExp(r'[fb]ox');
577 List matches = regExp.allMatches(neverEatingThat);
578 matches.map((match) => match.group(0)).toList(); // ['fox', 'box']
579
580 ### Discussion
581
582 You can query the object returned by `allMatches()` to find out the number of
583 matches:
584
585 matches.length; // 2
586
587 To find the first match, use `firstMatch()`:
588
589 regExp.firstMatch(neverEatingThat).group(0); // 'fox'
590
591 To directly access the matched string, use `stringMatch()`:
592
593 regExp.stringMatch(neverEatingThat); // 'fox'
594 regExp.stringMatch('I like bagels and lox'); // null
595
596
597 ## Substituting strings based on regExp matches
598
599 ### Problem
600
601 You want to match substrings within a string and make substitutions based on
602 the matches.
603
604 ### Solution
605
606 Construct a regular expression using the RegExp class and make replacements
607 using `replaceAll()` method:
608
609 'resume'.replaceAll(new RegExp(r'e'), '\u00E9'); // 'résumé'
610
611 If you want to replace just the first match, use 'replaceFirst()`:
612
613 '0.0001'.replaceFirst(new RegExp(r'0+'), ''); // '.0001'
614
615 The RegExp matches for one or more 0's and replaces them with an empty string.
616
617 You can use `replaceAllMatched()` and register a function to modify the
618 matches:
619
620 var heart = '\u2661'; // '♡'
621 var string = 'I like Ike but I $heart Lucy';
622 var regExp = new RegExp(r'[A-Z]\w+');
623 string.replaceAllMapped(regExp, (match) => match.group(0).toUpperCase());
624 // 'I like IKE but I ♡ LUCY'
625 ==============================================================================
626
627
628 The string recipes included in this chapter assume that you have some
629 familiarity with Unicode and UTF-16. Here is a brief refresher:
630
631 ### What is the Basic Multilingual Plane?
632
633 The Unicode code space is divided into seventeen planes of 65,536 points each.
634 The first plane (code points U+0000 to U+FFFF) contains the most
635 frequently used characters and is called the Basic Multilingual Plane or BMP.
636
637 ### What is a Surrogate Pair?
638
639 The term 'surrogate pair' refers to a means of encoding Unicode characters
640 outside the Basic Multilingual Plane.
641
642 In UTF-16, two-byte (16-bit) code sequences are used to store Unicode
643 characters. Since two bytes can only contain the 65,536 characters in the 0x0
644 to 0xFFFF range, a pair of code points are used to store values in the
645 0x10000 to 0x10FFFF range.
646
647 For example the Unicode character for musical Treble-clef (🎼 ), with
648 a value of '\u{1F3BC}', it too large to fit in 16 bits.
649
650 var clef = '\u{1F3BC}'; // 🎼
651
652 '\u{1F3BC}' is composed of a UTF-16 surrogate pair: [\uD83C, \uDFBC].
653
654 ### What is the difference between a code point and a code unit?
655
656 Within the Basic Multilingual Plane, the code point for a character is
657 numerically the same as the code unit for that character.
658
659 'D'.runes.first; // 68
660 'D'.codeUnits.first; // 68
661
662 For non-BMP characters, each code point is represented by two code units.
663
664 var clef = '\u{1F3BC}'; // 🎼
665 clef.runes.length; // 1
666 clef.codeUnits.length; // 2
667
668 ### What exactly is a character?
669
670 A character is a string contained in the Universal Character Set.
671 Each character maps to a single rune value (code point); BMP characters
672 map to 1 code unit; non-BMP characters map to 2 code units.
673
674 You can read more about the Universal Character Set at
675 http://en.wikipedia.org/wiki/Universal_Character_Set.
676
677
OLDNEW
« no previous file with comments | « recipes/pubspec.yaml ('k') | recipes/test/all_tests.dart » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698