recipes/src/core/strings.md - Issue 12335109: Strings recipes for the Dart Cookbook

Side by Side Diff: recipes/src/core/strings.md

Issue 12335109: Strings recipes for the Dart Cookbook (Closed) Base URL: https://github.com/dart-lang/cookbook.git@master

Patch Set: Made most changes requested my Kathy. Created 7 years, 9 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

OLD	NEW
(Empty)
	1 # Strings

	2

	3 Dart strings are immutable: once you create a string, you cannot change it.

	4 You can always build a string out of other strings, or assign the results

	5 of calling a method on a string to a new string.

	6

	7 String literals can be written in three ways: with single quotes ('with

	8 embedded "double" quotes'), with double quotes: "with embedded 'single'

	9 quotes"), or with triple quotes ('''With single quotes''', """With double

	10 quotes"""). Triple quoted strings can span multiple lines with associated

	11 whitespace preserved.

	12

	13 Dart does not have a char type. Indexing operations on strings give you

	14 one-character strings.

	15

	16 Dart strings support string concatenation and expression interpolation. The

	17 String class provides methods for searching inside a string, extracting

	18 substrings, handling case, trimming whitespace, replacing a part of a

	19 string, and more. The StringBuffer class lets you programmatically build

	20 up a string in an efficient manner. You can use regular expressions

	21 (RegExp objects) to search within strings and to replace parts of strings.

	22

	23 Dart string characters are encoded in UTF-16. Decoding UTF-16 yields Unicode

	24 code points. Borrowing terminology from Go, Dart uses the term `rune` for an

	25 integer representing a Unicode code point. The runes of a String are accessible

	26 throught the `runes` getter.

	27

	28 Dart strings support the full Unicode range, and cover every alphabetic system

	29 in use in the whole world. The String library provides support for the correct

	30 handling of extended UTF-16 characters.

	31

	32 ## Concatenating Strings

	33

	34 ### Problem

	35

	36 You want to concatenate strings in Dart. You tried using `+`, but

	37 that resulted in an error.

	38

	39 ### Solution

	40

	41 Use adjacent string literals:

	42

	43 var fact = 'Dart' 'is' ' fun!'; // 'Dart is fun!'

	44

	45 ### Discussion

	46

	47 Adjacent literals also work over multiple lines:

	48

	49 var fact = 'Dart'

	50 'is'

	51 'fun!'; // 'Dart is fun!'

	52

	53 They also work when using multiline strings:

	54

	55 var lunch = '''Peanut

	56 butter'''

	57 '''and

	58 jelly'''; // 'Peanut\nbutter and\njelly'

	59

	60 You can concatenate adjacent single line literals with multiline strings:

	61

	62 var funnyGuys = 'Dewey ' 'Cheatem'

	63 ''' and

	64 Howe'''; // 'Dewey Cheatem and\n Howe'

	65

	66

	67 #### Alternatives to adjacent string literals

	68

	69 You can also use the `concat()` method on a string to concatenate it to another

	70 string:

	71

	72 var film = filmToWatch();

	73 film = film.concat('\n'); // 'The Big Lebowski\n'

	74

	75 Since `concat()` creates a new string every time it is invoked, a long chain of

	76 `concat()`s can be expensive. Avoid those. Use a StringBuffer instead (see

	77 _Incrementally building a string efficiently using a StringBuffer_, below).

	78

	79 Use can `join()` to combine a sequence of strings:

	80

	81 var film = ['The', 'Big', 'Lebowski']).join(' '); // 'The Big Lebowski'

	82

	83 You can also use string interpolation to concatenate strings (see

	84 _Interpolating expressions inside strings_, below).

	85

	86

	87 ## Interpolating expressions inside strings

	88

	89 ### Problem

	90

	91 You want to create strings that contain Dart expressions and identifiers.

	92

	93 ### Solution

	94

	95 You can put the value of an expression inside a string by using ${expression}.

	96

	97 var favFood = 'sushi';

	98 var whatDoILove = 'I love ${favFood.toUpperCase()}'; // 'I love SUSHI'

	99

	100 You can skip the {} if the expression is an identifier:

	101

	102 var whatDoILove = 'I love $favFood'; // 'I love sushi'

	103

	104 ### Discussion

	105

	106 An interpolated string, `string ${expression}` is equivalent to the

	107 concatenation of the strings 'string ' and `expression.toString()`.

	108 Consider this code:

	109

	110 var four = 4;

	111 var seasons = 'The $four seasons'; // 'The 4 seasons'

	112

	113 It is equivalent to the following:

	114

	115 var seasons = 'The '.concat(4.toString()).concat(' seasons'); // 'The 4 seas ons'

	116

	117 You should consider implementing a `toString()` method for user-defined

	118 objects. Here's what happens if you don't:

	119

	120 class Point {

	121 num x, y;

	122 Point(this.x, this.y);

	123 }

	124

	125 var point = new Point(3, 4);

	126 print('Point: $point'); // "Point: Instance of 'Point'"

	127

	128 Probably not what you wanted. Here is the same example with an explicit

	129 `toString()`:

	130

	131 class Point {

	132 ...

	133

	134 String toString() => 'x: $x, y: $y';

	135 }

	136

	137 print('Point: $point'); // 'Point: x: 3, y: 4'

	138

	139

	140 ## Escaping special characters

	141

	142 ### Problem

	143

	144 You want to put newlines, dollar signs, or other special characters in your stri ngs.

	145

	146 ### Solution

	147

	148 Prefix special characters with a `\`.

	149

	150 print(Wile\nCoyote');

	151 // Wile

	152 // Coyote

	153

	154 ### Discussion

	155

	156 Dart designates a few characters as special, and these can be escaped:

	157

	158 - \n for newline, equivalent to \x0A.

	159 - \r for carriage return, equivalent to \x0D.

	160 - \f for form feed, equivalent to \x0C.

	161 - \b for backspace, equivalent to \x08.

	162 - \t for tab, equivalent to \x09.

	163 - \v for vertical tab, equivalent to \x0B.

	164

	165 If you prefer, you can use `\x` or `\u` notation to indicate the special

	166 character:

	167

	168 print('Wile\x0ACoyote'); // same as print('Wile\nCoyote');

	169 print('Wile\u000ACoyote'); // same as print('Wile\nCoyote');

	170

	171 You can also use `\u{}` notation:

	172

	173 print('Wile\u{000A}Coyote'); // same as print('Wile\nCoyote');

	174

	175 You can also escape the `$` used in string interpolation:

	176

	177 var superGenius = 'Wile Coyote';

	178 print('$superGenius and Road Runner'); // 'Wile Coyote and Road Runner'

	179 print('\$superGenius and Road Runner'); // '$superGenius and Road Runner'

	180

	181 If you escape a non-special character, the `\` is ignored:

	182

	183 print('Wile \E Coyote'); // 'Wile E Coyote'

	184

	185

	186 ## Incrementally building a string efficiently using a StringBuffer

	187

	188 ### Problem

	189

	190 You want to collect string fragments and combine them in an efficient manner.

	191

	192 ### Solution

	193

	194 Use a StringBuffer to programmatically generate a string. A StringBuffer

	195 collects the string fragments, but does not generate a new string until

	196 `toString()` is called:

	197

	198 var sb = new StringBuffer();

	199 sb.write('John, ');

	200 sb.write('Paul, ');

	201 sb.write('George, ');

	202 sb.write('and Ringo');

	203 var beatles = sb.toString(); // 'John, Paul, George, and Ringo'

	204

	205 ### Discussion

	206

	207 In addition to `write()`, the StringBuffer class provides methods to write a

	208 list of strings (`writeAll()`), write a numerical character code

	209 (`writeCharCode()`), write with an added newline ('writeln()`), and more. Here

	210 is a simple example that show the use of these methods:

	211

	212 var sb = new StringBuffer();

	213 sb.writeln('The Beatles:');

	214 sb.writeAll(['John, ', 'Paul, ', 'George, and Ringo']);

	215 sb.writeCharCode(33); // charCode for '!'.

	216 var beatles = sb.toString(); // 'The Beatles:\nJohn, Paul, George, and Ringo !'

	217

	218 Since a StringBuffer waits until the call to `toString()` to generate the

	219 concatenated string, it represents a more efficient way of combining strings

	220 than `concat()`. See the _Concatenating Strings_ recipe for a description of

	221 `concat()`.

	222

	223 ## Converting between string characters and numerical codes

	224

	225 ### Problem

	226

	227 You want to convert string characters into numerical codes and back.

	228

	229 ### Solution

	230

	231 Use the `runes` getter to access a string's code points:

	232

	233 'Dart'.runes.toList(); // [68, 97, 114, 116]

	234

	235 var smileyFace = '\u263A'; // ☺

	236 smileyFace.runes.toList(); // [9786]

	237

	238 The number 9786 represents the code unit '\u263A'.

	239

	240 Use `string.codeUnits` to get a string's UTF-16 code units:

	241

	242 'Dart'.codeUnits.toList(); // [68, 97, 114, 116]

	243 smileyFace.codeUnits.toList(); // [9786]

	244

	245 ### Discussion

	246

	247 Notice that using `runes` and `codeUnits` produces identical results

	248 in the examples above. That happens because each character in 'Dart' and in

	249 `smileyFace` fits within 16 bits, resulting in a code unit corresponding

	250 neatly with a code point.

	251

	252 Consider an example where a character cannot be represented within 16-bits,

	253 the Unicode character for a Treble clef ('\u{1F3BC}'). This character consists

	254 of a surrogate pair: '\uD83C', '\uDFBC'. Getting the numerical value of this

	255 character using `codeUnits` and `runes` produces the following result:

	256

	257 var clef = '\u{1F3BC}'; // 🎼

	258 clef.codeUnits.toList(); // [55356, 57276]

	259 clef.runes.toList(); // [127932]

	260

	261 The numbers 55356 and 57276 represent `clef`'s surrogate pair, '\uD83C' and

	262 '\uDFBC', respectively. The number 127932 represents the code point '\u1F3BC'.

	263

	264 #### Using codeUnitAt() to access individual code units

	265

	266 To access the 16-Bit UTF-16 code unit at a particular index, use

	267 `codeUnitAt()`:

	268

	269 'Dart'.codeUnitAt(0); // 68

	270 smileyFace.codeUnitAt(0); // 9786

	271

	272 Using `codeUnitAt()` with the multi-byte `clef` character leads to problems:

	273

	274 clef.codeUnitAt(0); // 55356

	275 clef.codeUnitAt(1); // 57276

	276

	277 In either call to `clef.codeUnitAt()`, the values returned represent strings

	278 that are only one half of a UTF-16 surrogate pair. These are not valid UTF-16

	279 strings.

	280

	281

	282 #### Converting numerical codes to strings

	283

	284 You can generate a new string from runes or code units using the factory

	285 `String.fromCharCodes(charCodes)`:

	286

	287 new String.fromCharCodes([68, 97, 114, 116]); // 'Dart'

	288

	289 new String.fromCharCodes([73, 32, 9825, 32, 76, 117, 99, 121]);

	290 // 'I ♡ Lucy'

	291

	292 new String.fromCharCodes([55356, 57276]); // 🎼

	293 new String.fromCharCodes([127932]), // 🎼

	294

	295 You can use the `String.fromCharCode()` factory to convert a single rune or

	296 code unit to a string:

	297

	298 new String.fromCharCode(68); // 'D'

	299 new String.fromCharCode(9786); // ☺

	300 new String.fromCharCode(127932); // 🎼

	301

	302 Creating a string with only one half of a surrogate pair is permitted, but not

	303 recommended.

	304

	305 ## Determining if a string is empty

	306

	307 ### Problem

	308

	309 You want to know if a string is empty. You tried ` if(string) {...}`, but that

	310 did not work.

	311

	312 ### Solution

	313

	314 Use `string.isEmpty`:

	315

	316 var emptyString = '';

	317 emptyString.isEmpty; // true

	318

	319 A string with a space is not empty:

	320

	321 var space = ' ';

	322 space.isEmpty; // false

	323

	324 ### Discussion

	325

	326 Don't use `if (string)` to test the emptiness of a string. In Dart, all

	327 objects except the boolean true evaluate to false. `if(string)` will always

	328 be false.

	329

	330

	331 ## Removing leading and trailing whitespace

	332

	333 ### Problem

	334

	335 You want to remove leading and trailing whitespace from a string.

	336

	337 ### Solution

	338

	339 Use `string.trim()`:

	340

	341 var space = '\n\r\f\t\v'; // We'll use a variety of space characters.

	342 var string = '$space X $space';

	343 var newString = string.trim(); // 'X'

	344

	345 The String class has no methods to remove only leading or only trailing

	346 whitespace. But you can always use regExps.

	347

	348 Remove only leading whitespace:

	349

	350 var newString = string.replaceFirst(new RegExp(r'^\s+'), ''); // 'X $space'

	351

	352 Remove only trailing whitespace:

	353

	354 var newString = string.replaceFirst(new RegExp(r'\s+$'), ''); // '$space X'

	355

	356

	357 ## Calculating the length of a string

	358

	359 ### Problem

	360

	361 You want to get the length of a string, but are not sure how to

	362 correctly calculate the length when working with Unicode.

	363

	364 ### Solution

	365

	366 Use string.length to get the number of UTF-16 code units in a string:

	367

	368 'I love music'.length; // 12

	369 'I love music'.runes.length; // 12

	370

	371 ### Discussion

	372

	373 For characters that fit into 16 bits, the code unit length is the same as the

	374 rune length:

	375

	376 var hearts = '\u2661'; // ♡

	377 hearts.length; // 1

	378 hearts.runes.length; // 1

	379

	380 If the string contains any characters outside the Basic Multilingual

	381 Plane (BMP), the rune length will be less than the code unit length:

	382

	383 var clef = '\u{1F3BC}'; // 🎼

	384 clef.length; // 2

	385 clef.runes.length; // 1

	386

	387 var music = 'I $hearts $clef'; // 'I ♡ 🎼 '

	388 music.length; // 6

	389 music.runes.length // 5

	390

	391 Use `length` if you want to number of code units; use `runes.length` if you

	392 want the number of runes.

	393

	394

	395 ## Subscripting a string

	396

	397 ### Problem

	398

	399 You want to be able to access a character in a string at a particular index.

	400

	401 ### Solution

	402

	403 Subscript runes:

	404

	405 var teacup = '\u{1F375}'; // 🍵

	406 teacup.runes.toList()[0]; // 127861

	407

	408 The number 127861 represents the code point for teacup, '\u{1F375}' (🍵 ).

	409

	410 ### Discussion

	411

	412 Subscripting a string directly can be problematic. This is because the default

	413 `[]` implementation subscripts along code units. This means that

	414 for non-BMP characters, subscripting yields invalid UTF-16 characters:

	415

	416 'Dart'[0]; // 'D'

	417

	418 var hearts = '\u2661'; // ♡

	419 hearts[0]; '\u2661' // ♡

	420

	421 teacup[0]; // 55356, Invalid string, half of a surrogate pair.

	422 teacup.codeUnits.toList()[0]; // The same.

	423

	424

	425 ## Processing a string one character at a time

	426

	427 ### Problem

	428

	429 You want to do something with each individual character in a string.

	430

	431 ### Solution

	432

	433 To access an individual character, map the string runes:

	434

	435 var charList = "Dart".runes.map((rune) => '${new String.fromCharCode(rune)} ').toList();

	436 // ['D', 'a', 'r', 't']

	437

	438 var runeList = happy.runes.map((rune) => [rune, new String.fromCharCode(rune )]).toList(),

	439 // [[73, 'I'], [32, ' '], [97, 'a'], [109, 'm'], [32, ' '], [9786, '☺']]

	440

	441 If you are sure that the string is in the Basic Multilingual Plane (BMP), you

	442 can use string.split(''):

	443

	444 'Dart'.split(''); // ['D', 'a', 'r', 't']

	445 smileyFace.split('').length; // 1

	446

	447 Since `split('')` splits at the UTF-16 code unit boundaries,

	448 invoking it on a non-BMP character yields the string's surrogate pair:

	449

	450 var clef = '\u{1F3BC}'; // 🎼 , not in BMP.

	451 clef.split('').length; // 2

	452

	453 The surrogate pair members are not valid UTF-16 strings.

	454

	455

	456 ## Splitting a string into substrings

	457

	458 ### Problem

	459

	460 You want to split a string into substrings.

	461

	462 ### Solution

	463

	464 Use the `split()` method with a string or a regExp as an argument.

	465

	466 var smileyFace = '\u263A';

	467 var happy = 'I am $smileyFace';

	468 happy.split(' '); // ['I', 'am', '☺']

	469

	470 Here is an example of using `split()` with a regExp:

	471

	472 var nums = '2/7 3 4/5 3~/5';

	473 var numsRegExp = new RegExp(r'(\s\|/\|~/)');

	474 nums.split(numsRegExp); // ['2', '7', '3', '4', '5', '3', '5']

	475

	476 In the code above, the string `nums` contains various numbers, some of which

	477 are expressed as fractions or as int-divisions. A regExp is used to split the

	478 string to extract just the numbers.

	479

	480 You can perform operations on the matched and unmatched portions of a string

	481 when using `split()` with a regExp:

	482

	483 'Eats SHOOTS leaves'.splitMapJoin((new RegExp(r'SHOOTS')),

	484 onMatch: (m) => '${m.group(0).toLowerCase()}',

	485 onNonMatch: (n) => n.toUpperCase()); // 'EATS shoots LEAVES'

	486

	487 The regExp matches the middle word ('SHOOTS'). A pair of callbacks are

	488 registered to transform the matched and unmatched substrings before the

	489 substrings are joined together again.

	490

	491

	492 ## Changing string case

	493

	494 ### Problem

	495

	496 You want to change the case of strings.

	497

	498 ### Solution

	499

	500 Use `string.toUpperCase()` and `string.toLowerCase()` to convert a string to

	501 lower-case or upper-case, respectively:

	502

	503 var theOneILove = 'I love Lucy';

	504 theOneILove.toUpperCase(); // 'I LOVE LUCY!'

	505 theOneILove.toLowerCase(); // 'i love lucy!'

	506

	507 ### Discussion

	508

	509 Case changes affect the characters of bi-cameral scripts like Greek and French:

	510 var zeus = '\u0394\u03af\u03b1\u03c2'; // 'Δίας' (Zeus in modern Greek)

	511 zeus.toUpperCase(); // 'ΔΊΑΣ'

	512

	513 var resume = '\u0052\u00e9\u0073\u0075\u006d\u00e9'; // 'Résumé'

	514 resume.toLowerCase(); // 'résumé'

	515

	516 They do not affect the characters of uni-cameral scripts like Devanagari (used f or

	517 writing many of the languages of India):

	518

	519 var chickenKebab = '\u091a\u093f\u0915\u0928 \u0915\u092c\u093e\u092c';

	520 // 'चिकन कबाब' (in Devanagari)

	521 chickenKebab.toLowerCase(); // 'चिकन कबाब'

	522 chickenKebab.toUpperCase(); // 'चिकन कबाब'

	523

	524 If a character's case does not change when using `toUpperCase()` and

	525 `toLowerCase()`, it is most likely because the character only has one

	526 form.

	527

	528 ## Determining whether a string contains another string

	529

	530 ### Problem

	531

	532 You want to find out if a string is the substring of another string.

	533

	534 ### Solution

	535

	536 Use `string.contains()`:

	537

	538 var fact = 'Dart strings are immutable';

	539 string.contains('immutable'); // True.

	540

	541 You can indicate a startIndex as a second argument:

	542

	543 string.contains('Dart', 2); // False

	544

	545 ### Discussion

	546

	547 The String library provides a couple of shortcuts for testing whether a string

	548 is a substring of another:

	549

	550 string.startsWith('Dart'); // True.

	551 string.endsWith('e'); // True.

	552

	553 You can also use `string.indexOf()`, which returns -1 if the substring is

	554 not found within a string, and its matching index, if it is:

	555

	556 string.indexOf('art') != -1; // True, `art` is found in `Dart`

	557

	558 You can also use a regExp and `hasMatch()`:

	559

	560 new RegExp(r'ar[et]').hasMatch(string); // True, 'art' and 'are' match.

	561

	562

	563 ## Finding matches of a regExp pattern in a string

	564

	565 ### Problem

	566

	567 You want to use regExp to match a pattern in a string, and

	568 want to be able to access the matches.

	569

	570 ### Solution

	571

	572 Construct a regular expression using the RegExp class and find matches using

	573 the `allMatches()` method:

	574

	575 var neverEatingThat = 'Not with a fox, not in a box';

	576 var regExp = new RegExp(r'[fb]ox');

	577 List matches = regExp.allMatches(neverEatingThat);

	578 matches.map((match) => match.group(0)).toList(); // ['fox', 'box']

	579

	580 ### Discussion

	581

	582 You can query the object returned by `allMatches()` to find out the number of

	583 matches:

	584

	585 matches.length; // 2

	586

	587 To find the first match, use `firstMatch()`:

	588

	589 regExp.firstMatch(neverEatingThat).group(0); // 'fox'

	590

	591 To directly access the matched string, use `stringMatch()`:

	592

	593 regExp.stringMatch(neverEatingThat); // 'fox'

	594 regExp.stringMatch('I like bagels and lox'); // null

	595

	596

	597 ## Substituting strings based on regExp matches

	598

	599 ### Problem

	600

	601 You want to match substrings within a string and make substitutions based on

	602 the matches.

	603

	604 ### Solution

	605

	606 Construct a regular expression using the RegExp class and make replacements

	607 using `replaceAll()` method:

	608

	609 'resume'.replaceAll(new RegExp(r'e'), '\u00E9'); // 'résumé'

	610

	611 If you want to replace just the first match, use 'replaceFirst()`:

	612

	613 '0.0001'.replaceFirst(new RegExp(r'0+'), ''); // '.0001'

	614

	615 The RegExp matches for one or more 0's and replaces them with an empty string.

	616

	617 You can use `replaceAllMatched()` and register a function to modify the

	618 matches:

	619

	620 var heart = '\u2661'; // '♡'

	621 var string = 'I like Ike but I $heart Lucy';

	622 var regExp = new RegExp(r'[A-Z]\w+');

	623 string.replaceAllMapped(regExp, (match) => match.group(0).toUpperCase());

	624 // 'I like IKE but I ♡ LUCY'

	625 ==============================================================================

	626

	627

	628 The string recipes included in this chapter assume that you have some

	629 familiarity with Unicode and UTF-16. Here is a brief refresher:

	630

	631 ### What is the Basic Multilingual Plane?

	632

	633 The Unicode code space is divided into seventeen planes of 65,536 points each.

	634 The first plane (code points U+0000 to U+FFFF) contains the most

	635 frequently used characters and is called the Basic Multilingual Plane or BMP.

	636

	637 ### What is a Surrogate Pair?

	638

	639 The term 'surrogate pair' refers to a means of encoding Unicode characters

	640 outside the Basic Multilingual Plane.

	641

	642 In UTF-16, two-byte (16-bit) code sequences are used to store Unicode

	643 characters. Since two bytes can only contain the 65,536 characters in the 0x0

	644 to 0xFFFF range, a pair of code points are used to store values in the

	645 0x10000 to 0x10FFFF range.

	646

	647 For example the Unicode character for musical Treble-clef (🎼 ), with

	648 a value of '\u{1F3BC}', it too large to fit in 16 bits.

	649

	650 var clef = '\u{1F3BC}'; // 🎼

	651

	652 '\u{1F3BC}' is composed of a UTF-16 surrogate pair: [\uD83C, \uDFBC].

	653

	654 ### What is the difference between a code point and a code unit?

	655

	656 Within the Basic Multilingual Plane, the code point for a character is

	657 numerically the same as the code unit for that character.

	658

	659 'D'.runes.first; // 68

	660 'D'.codeUnits.first; // 68

	661

	662 For non-BMP characters, each code point is represented by two code units.

	663

	664 var clef = '\u{1F3BC}'; // 🎼

	665 clef.runes.length; // 1

	666 clef.codeUnits.length; // 2

	667

	668 ### What exactly is a character?

	669

	670 A character is a string contained in the Universal Character Set.

	671 Each character maps to a single rune value (code point); BMP characters

	672 map to 1 code unit; non-BMP characters map to 2 code units.

	673

	674 You can read more about the Universal Character Set at

	675 http://en.wikipedia.org/wiki/Universal_Character_Set.

	676

	677

OLD	NEW

« no previous file with comments | « recipes/pubspec.yaml ('k') | recipes/test/all_tests.dart » ('j') | no next file with comments »