recipes/src/core/strings.md - Issue 12335109: Strings recipes for the Dart Cookbook

Side by Side Diff: recipes/src/core/strings.md

Issue 12335109: Strings recipes for the Dart Cookbook (Closed) Base URL: https://github.com/dart-lang/cookbook.git@master

Patch Set: Total rewrite of string recipes. Created 7 years, 9 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

OLD	NEW
(Empty)
	1 # Strings

	2

	3 Dart string represents a sequence of characters encoded in UTF-16. Decoding
	Alan Knight 2013/03/07 21:15:00 A Dart string A Dart string shailentuli 2013/03/08 22:38:26 Done. Show quoted text On 2013/03/07 21:15:00, Alan Knight wrote: > A Dart string Done.
	4 UTF-16 yields Unicode code points. Borrowing terminology from Go, Dart uses

	5 the term `rune` for an integer representing a Unicode code point.

	6

	7 The string recipes included in this chapter assume that you have some

	8 familiarity with Unicode and UTF-16. Here is a brief refresher:

	9
	Alan Knight 2013/03/07 21:15:00 I agree with Erik's comments that this is not what I agree with Erik's comments that this is not what you want to start with. I'd probably start with "I want to construct strings that have dynamic content" : answer interpolation, and then go into concatenation, stringbuffers, escaping, etc. I think it's important to put the preferred answer first, as people reading the doc are likely to stop once they see one answer that does what they want. I'd say mention at the beginning that they are UTF-16 and refer to a later section on Unicode and runes. Also, right now there are multiple different sections that deal with the issues around codeUnits versus runes. I think all of that discussion should be grouped together.
	10 ### What is the Basic Multilingual Plane?

	11

	12 The Unicode code space is divided into seventeen planes of 65,536 points each.

	13 The first plane (code points U+0000 to U+FFFF) contains the most
	floitsch 2013/03/07 17:22:07 should we stick to "rune" (here and in the rest)? should we stick to "rune" (here and in the rest)? It is basically the first thing we explain in the section.
	14 frequently used characters and is called the Basic Multilingual Plane or BMP.

	15

	16 ### What is a Surrogate Pair?

	17

	18 The term 'surrogate pair' refers to a means of encoding Unicode characters

	19 outside the Basic Multilingual Plane.

	20

	21 In UTF-16, two-byte (16-bit) code sequences are used to store Unicode

	22 characters. Since two bytes can only contain the 65,536 characters in the 0x0

	23 to 0xFFFF range, a pair of code points are used to store values in the

	24 0x10000 to 0x10FFFF range.

	25

	26 For example the Unicode character for musical Treble-clef (🎼 ), with

	27 a value of '\u{1F3BC}', it too large to fit in 16 bits.

	28

	29 var clef = '\u{1F3BC}'; // 🎼

	30

	31 '\u{1F3BC}' is composed of a UTF-16 surrogate pair: [u\D83C, \uDFBC].

	32

	33 ### What is the difference between a code point and a code unit?

	34

	35 Within the Basic Multilingual Plane, the code point for a character is

	36 numerically the same as code unit for that charcter.
	floitsch 2013/03/07 17:22:07 as the code unit for that character. as the code unit for that character. shailentuli 2013/03/08 22:38:26 Done. Show quoted text On 2013/03/07 17:22:07, floitsch wrote: > as the code unit for that character. Done.
	37

	38 'D'.runes.first; // 68

	39 'D'.codeUnits.first; // 68

	40

	41 For non-BMP characters, each code point is represented by two code units.

	42

	43 var clef = '\u{1F3BC}'; // 🎼

	44 clef.runes.length; // 1

	45 clef.codeUnits.length; // 2

	46

	47 ### What exactly is a character?

	48

	49 A character is a string contained in the Universal Character Set. Each character

	50 maps to a single rune value (code point); BMP characters map to 1 code

	51 unit; non-BMP characters map 2 code units.
	floitsch 2013/03/07 17:22:07 map to map to shailentuli 2013/03/08 22:38:26 Done. Show quoted text On 2013/03/07 17:22:07, floitsch wrote: > map to Done.
	52

	53 You can read more about the Universal Character Set at

	54 http://en.wikipedia.org/wiki/Universal_Character_Set.

	55

	56 ### Do I have to really deal with Unicode?

	57

	58 Yes, if you want to build robust international applications, you do.

	59 Besides, the String library makes working with Unicode relatively painless,

	60 so there's no great overhead in doing things right.
	Alan Knight 2013/03/07 21:15:00 This seems confusing. Dealing with Unicode is not This seems confusing. Dealing with Unicode is not the same as dealing with non-UTF-16 characters. I'm not entirely convinced about the second sentence, but I do think that most people can ignore the UTF-16 issues most of the time and get by fine, even when they deal with non-BMP data. It's really only if you split the data out and need to deal with it as semantic characters that it's a problem, and even then you run would still into normalization issues that Florian refers to with e-accent-aigu. shailentuli 2013/03/08 22:38:26 You are quite right. I am removing this section. W Show quoted text On 2013/03/07 21:15:00, Alan Knight wrote: > This seems confusing. Dealing with Unicode is not the same as dealing with > non-UTF-16 characters. I'm not entirely convinced about the second sentence, but > I do think that most people can ignore the UTF-16 issues most of the time and > get by fine, even when they deal with non-BMP data. It's really only if you > split the data out and need to deal with it as semantic characters that it's a > problem, and even then you run would still into normalization issues that > Florian refers to with e-accent-aigu. You are quite right. I am removing this section. When I write a full introduction to begin this chapter, I will include some of these issues there.
	61

	62 ## Concatenating Strings

	63

	64 ### Problem

	65

	66 You want to concatenate strings in Dart. You tried using `+`, but

	67 that resulted in an error.

	68

	69 ### Solution

	70

	71 Use adjacent string literals:

	72

	73 var fact = 'Dart' 'is' ' fun!'; // 'Dart is fun!'

	74

	75 ### Discussion

	76

	77 Adjacent literals also work over multiple lines:

	78

	79 var fact = 'Dart'

	80 'is'

	81 'fun!'; // 'Dart is fun!'

	82

	83 They also work when using multiline strings:

	84

	85 var lunch = '''Peanut

	86 butter'''

	87 '''and

	88 jelly'''; // 'Peanut\nbutter and\njelly'

	89

	90 You can concatenate adjacent single line literals with multiline strings:

	91

	92 var funnyGuys = 'Dewey ' 'Cheatem'

	93 ''' and

	94 Howe'''; // 'Dewey Cheatem and\n Howe'

	95

	96

	97 #### Alternatives to adjacent string literals

	98

	99 You can also use the `concat()` method on a string to concatenate it to another

	100 string:

	101

	102 var film = filmToWatch();

	103 film = film.concat('\n'); // 'The Big Lebowski\n'

	104

	105 Since `concat()` creates a new string every time it is invoked, a long chain of

	106 `concat()`s can be expensive. Avoid those. Use a StringBuffer instead (see

	107 _Incrementally building a string efficiently using a StringBuffer_, below).

	108

	109 Use can `join()` to combine a sequence of strings:

	110

	111 var film = ['The', 'Big', 'Lebowski']).join(' '); // 'The Big Lebowski'

	112

	113 You can also use string interpolation to concatenate strings (see

	114 _Interpolating expressions inside strings_, below).

	115

	116

	117 ## Interpolating expressions inside strings

	118

	119 ### Problem

	120

	121 You want to create strings that contain Dart expressions and identifiers.
	sethladd 2013/03/07 05:15:16 This sounds like I already know what the solution This sounds like I already know what the solution is. Can you reword this to sound like someone asking who doesn't know what dart is?
	122

	123 ### Solution

	124

	125 You can put the value of an expression inside a string by using ${expression}.

	126

	127 var favFood = 'sushi';

	128 var whatDoILove = 'I love ${favFood.toUpperCase()}'; // 'I love SUSHI'

	129

	130 You can skip the {} if the expression is an identifier:

	131

	132 var whatDoILove = 'I love $favFood'; // 'I love sushi'

	133

	134 ### Discussion

	135

	136 An interpolated string, `string ${expression}` is equivalent to the

	137 concatenation of the strings 'string ' and `expression.toString()`.

	138 Consider this code:

	139

	140 var four = 4;

	141 var seasons = 'The $four seasons'; // 'The 4 seasons'

	142

	143 It is equivalent to the following:
	floitsch 2013/03/07 17:22:07 I'm not sure we should call it "equivalent". It sh I'm not sure we should call it "equivalent". It should be more efficient.
	144

	145 var seasons = 'The '.concat(4.toString()).concat(' seasons'); // 'The 4 seas ons'

	146

	147 You should consider implementing a `toString()` method for user-defined

	148 objects. Here's what happens if you don't:

	149

	150 class Point {

	151 num x, y;

	152 Point(this.x, this.y);

	153 }

	154

	155 var point = new Point(3, 4);

	156 print('Point: $point'); // "Point: Instance of 'Point'"

	157

	158 Probably not what you wanted. Here is the same example with an explicit

	159 `toString()`:

	160

	161 class Point {

	162 ...

	163

	164 String toString() => 'x: $x, y: $y';

	165 }

	166

	167 print('Point: $point'); // 'Point: x: 3, y: 4'

	168

	169

	170 ## Escaping special characters
	sethladd 2013/03/07 05:15:16 Is this the right usage of the terminology? This i Is this the right usage of the terminology? This is not how I think of escaping. The problem here is vague and the solution doesn't look like escaping... I would think the problem would be "how do I insert a newline character?" with the solution you use here. shailentuli 2013/03/08 22:38:26 I've used Alan's recommendation below. Show quoted text On 2013/03/07 05:15:16, sethladd wrote: > Is this the right usage of the terminology? This is not how I think of escaping. > The problem here is vague and the solution doesn't look like escaping... I would > think the problem would be "how do I insert a newline character?" with the > solution you use here. I've used Alan's recommendation below.
	171

	172 ### Problem

	173

	174 You want to know how to escape special characters.

	175
	Alan Knight 2013/03/07 21:15:00 Better phrased in terms of the problem. e.g. You w Better phrased in terms of the problem. e.g. You want to put newlines, dollar signs, or other special characters in your strings. shailentuli 2013/03/08 22:38:26 Done. Show quoted text On 2013/03/07 21:15:00, Alan Knight wrote: > Better phrased in terms of the problem. e.g. You want to put newlines, dollar > signs, or other special characters in your strings. Done.
	176 ### Solution

	177

	178 Prefix special characters with a `\`.

	179

	180 print(Wile\nCoyote');

	181 // Wile

	182 // Coyote

	183

	184 ### Discussion

	185

	186 Dart designates a few characters as special, and these can be escaped:

	187

	188 - \n for newline, equivalent to \x0A.

	189 - \r for carriage return, equivalent to \x0D.

	190 - \f for form feed, equivalent to \x0C.

	191 - \b for backspace, equivalent to \x08.

	192 - \t for tab, equivalent to \x09.

	193 - \v for vertical tab, equivalent to \x0B.

	194
	Alan Knight 2013/03/07 21:15:00 Also \$. And I don't see any mention of raw strin Also \$. And I don't see any mention of raw strings here. shailentuli 2013/03/08 22:38:26 I'm saving them for a separate recipe which will f Show quoted text On 2013/03/07 21:15:00, Alan Knight wrote: > Also \$. And I don't see any mention of raw strings here. I'm saving them for a separate recipe which will follow this one.
	195 If you prefer, you can use `\x` or `\u` notation to indicate the special
	floitsch 2013/03/07 17:22:07 We also have \u{A} We also have \u{A} shailentuli 2013/03/08 22:38:26 I've added an example of this. Show quoted text On 2013/03/07 17:22:07, floitsch wrote: > We also have \u{A} I've added an example of this.
	196 character:

	197

	198 print('Wile\x0ACoyote'); // same as print('Wile\nCoyote');

	199 print('Wile\u000ACoyote'); // same as print('Wile\nCoyote');

	200

	201 If you escape a non-special character, the `\` is ignored:

	202

	203 print('Wile \E Coyote'); // 'Wile E Coyote'

	204

	205

	206 ## Incrementally building a string efficiently using a StringBuffer

	207

	208 ### Problem

	209

	210 You want to collect string fragments and combine them in an efficient manner.

	211

	212 ### Solution

	213

	214 Use a StringBuffer to programmatically generate a string. A StringBuffer

	215 collects the string fragments, but does not generate a new string until

	216 `toString()` is called:

	217

	218 var sb = new StringBuffer();
	sethladd 2013/03/07 05:15:16 This example doesn't show off the right way to use This example doesn't show off the right way to use a string buffer, because what you have here could easily be written with adjacent string literals. How about a for loop that builds a string?
	219 sb.write('John, ');

	220 sb.write('Paul, ');

	221 sb.write('George, ');

	222 sb.write('and Ringo');

	223 var beatles = sb.toString(); // 'John, Paul, George, and Ringo'

	224

	225 ### Discussion

	226

	227 In addition to `write()`, the StringBuffer class provides methods to write a

	228 list of strings (`writeAll()`), write a numerical character code

	229 (`writeCharCode()`), write with an added newline ('writeln()`), and more. Here

	230 is a simple example that show the use of these methods:

	231

	232 var sb = new StringBuffer();

	233 sb.writeln('The Beatles:');

	234 sb.writeAll(['John, ', 'Paul, ', 'George, and Ringo']);

	235 sb.writeCharCode(33); // charCode for '!'.

	236 var beatles = sb.toString(); // 'The Beatles:\nJohn, Paul, George, and Ringo !'

	237

	238 Since a StringBuffer waits until the call to `toString()` to generate the

	239 concatenated string, it represents a more efficient way of combining strings

	240 than `concat()`. See the _Concatenating Strings_ recipe for a description of

	241 `concat()`.

	242

	243 ## Converting between string characters and numerical codes

	244

	245 ### Problem

	246

	247 You want to convert string characters into numerical codes and back.
	sethladd 2013/03/07 05:15:16 Is there a more real-life problem here? Why would Is there a more real-life problem here? Why would I want to do this ever? Alan Knight 2013/03/07 21:15:00 I need to compare character in a string to numeric Show quoted text On 2013/03/07 05:15:16, sethladd wrote: > Is there a more real-life problem here? Why would I want to do this ever? I need to compare character in a string to numerical values coming from another source. Or I need to split a string up into individual characters, or operate on the individual characters. If the semantics of the characters are important, and the set isn't known to be restricted, then I need to do it in terms of runes.
	248

	249 ### Solution

	250

	251 Use the `runes` getter to access a string's code points:

	252

	253 'Dart'.runes.toList(); // [68, 97, 114, 116]

	254

	255 var smileyFace = '\u263A'; // ☺

	256 smileyFace.runes.toList(); // [9786]

	257

	258 The number 9786 represents the code unit '\u263A'.

	259

	260 Use `string.codeUnits()` to get a string's UTF-16 code units:

	261

	262 'Dart'.codeUnits.toList(); // [68, 97, 114, 116]

	263 smileyFace.codeUnits.toList(); // [9786]

	264

	265 ### Discussion

	266

	267 Notice that using `runes` and `codeUnits()` produces identical results
	floitsch 2013/03/07 17:22:07 no () for codeUnits no () for codeUnits shailentuli 2013/03/08 22:38:26 Done. Show quoted text On 2013/03/07 17:22:07, floitsch wrote: > no () for codeUnits Done.
	268 in the examples above. That happens because each character in 'Dart' and in

	269 `smileyFace` fits within 16 bits, resulting in a code unit corresponding

	270 neatly with a code point.

	271

	272 Consider an example where a character cannot be represented within 16-bits,

	273 the Unicode character for a Treble clef ('\u{1F3BC}'). This character consists

	274 of a surrogate pair: '\uD83C', '\uDFBC'. Getting the numerical value of this

	275 character using `codeUnits()` and `runes` produces the following result:
	floitsch 2013/03/07 17:22:07 no "()". no "()". shailentuli 2013/03/08 22:38:26 Done. Show quoted text On 2013/03/07 17:22:07, floitsch wrote: > no "()". Done.
	276

	277 var clef = '\u{1F3BC}'; // 🎼

	278 clef.codeUnits.toList(); // [55356, 57276]

	279 clef.runes.toList(); // [127932]

	280

	281 The numbers 55356 and 57276 represent `clef`'s surrogate pair, '\uD83C' and

	282 '\uDFBC', respectively. The number 127932 represents the code point '\u1F3BC'.

	283

	284 #### Using codeUnitAt() to access individual code units

	285

	286 To access the 16-Bit UTF-16 code unit at a particular index, use

	287 `codeUnitAt()`:

	288

	289 'Dart'.codeUnitAt(0); // 68

	290 smileyFace.codeUnitAt(0); // 9786

	291

	292 Using `codeUnitAt()` with the multi-byte `clef` character leads to problems:

	293

	294 clef.codeUnitAt(0); // 55356

	295 clef.codeUnitAt(1); // 57276

	296

	297 In either call to `clef.codeUnitAt()`, the values returned represent strings

	298 that are only one half of a UTF-16 surrogate pair. These are not valid UTF-16

	299 strings.
	Alan Knight 2013/03/07 21:15:00 It's worth pointing out that this is always the ca It's worth pointing out that this is always the case. Half of a surrogate pair can never be mistaken for a single valid character.
	300

	301

	302 #### Converting numerical codes to strings

	303

	304 You can generate a new string from runes or code units using the factory

	305 `String.fromCharCodes(charCodes)`:
	Alan Knight 2013/03/07 21:15:00 I think it's important to point out that you can g I think it's important to point out that you can give it either runes or code units and it can tell the difference and do the right thing automatically.
	306

	307 new String.fromCharCodes([68, 97, 114, 116]); // 'Dart'

	308

	309 new String.fromCharCodes([73, 32, 9825, 32, 76, 117, 99, 121]);

	310 // 'I ♡ Lucy'

	311

	312 new String.fromCharCodes([55356, 57276]); // 🎼

	313 new String.fromCharCodes([127932]), // 🎼

	314

	315 You can use the `String.fromCharCode()` factory to convert a single rune or

	316 code unit to a string:

	317

	318 new String.fromCharCode(68); // 'D'

	319 new String.fromCharCode(9786); // ☺

	320 new String.fromCharCode(127932); // 🎼

	321

	322 Creating a string with only one half of a surrogate pair is permitted, but not

	323 recommended.

	324

	325 ## Determining if a string is empty

	326

	327 ### Problem

	328

	329 You want to know if a string is empty. You tried ` if(string) {...}`, but that
	sethladd 2013/03/07 05:15:16 This is a great problem, well worded! This is a great problem, well worded!
	330 did not work.

	331

	332 ### Solution

	333

	334 Use `string.isEmpty`:
	floitsch 2013/03/07 17:22:07 or just string == "". Both are fine. or just string == "". Both are fine.
	335

	336 var emptyString = '';

	337 emptyString.isEmpty; // true

	338

	339 A string with a space is not empty:

	340

	341 var space = ' ';

	342 space.isEmpty; // false

	343

	344 ### Discussion

	345

	346 Don't use `if (string)` to test the emptiness of a string. In Dart, all

	347 objects except the boolean true evaluate to false. `if(string)` will always

	348 be false.
	Alan Knight 2013/03/07 21:15:00 And you will see a warning in the editor if you us And you will see a warning in the editor if you use an "if" statement with a non-boolean
	349

	350 Don't try to explicitly test for the emptiness of a string:

	351

	352 if (emptyString == anotherString) {...}

	353

	354 This may work sometimes, but if `string` has an empty value that is

	355 not a literal `''`, the comparisons will fail:

	356

	357 emptyString == '\u0020'; // false

	358 emptyString == '\u2004'; // false
	Alan Knight 2013/03/07 21:15:00 Are you saying that the string '\u0020' is suppose Are you saying that the string '\u0020' is supposed to be considered empty? I wouldn't have thought so, and it doesn't appear to work that way if I test it. I don't see why emptyString == anotherString wouldn't work. But I also don't see why you'd want to do that instead of anotherString.isEmpty shailentuli 2013/03/08 22:38:26 This was erroneously added here. Removed. Show quoted text On 2013/03/07 21:15:00, Alan Knight wrote: > Are you saying that the string '\u0020' is supposed to be considered empty? I > wouldn't have thought so, and it doesn't appear to work that way if I test it. I > don't see why emptyString == anotherString wouldn't work. But I also don't see > why you'd want to do that instead of anotherString.isEmpty This was erroneously added here. Removed.
	359

	360

	361 ## Removing leading and trailing whitespace

	362

	363 ### Problem

	364

	365 You want to remove leading and trailing whitespace from a string.

	366

	367 ### Solution

	368

	369 Use `string.trim()`:

	370

	371 var space = '\n\r\f\t\v'; // We'll use a variety of space characters.

	372 var string = '$space X $space';

	373 var newString = string.trim(); // 'X'

	374

	375 The String class has no methods to remove leading and trailing whitespace. But

	376 you can always use regExps.
	Alan Knight 2013/03/07 21:15:00 This seems unclear. I think you mean "to remove j This seems unclear. I think you mean "to remove just* leading or just trailing whitespace. As written it sounds like it's contradicting the lines immediately previous. shailentuli 2013/03/08 22:38:26 Done. Show quoted text On 2013/03/07 21:15:00, Alan Knight wrote: > This seems unclear. I think you mean "to remove just leading or just > trailing whitespace. As written it sounds like it's contradicting the lines > immediately previous. Done.
	377

	378 Remove only leading whitespace:

	379

	380 var newString = string.replaceFirst(new RegExp(r'^\s+'), ''); // 'X $space'

	381

	382 Remove only trailing whitespace:

	383

	384 var newString = string.replaceFirst(new RegExp(r'\s+$'), ''); // '$space X'

	385
	Alan Knight 2013/03/07 21:15:00 Or you could do this with the runes or codePoints, Or you could do this with the runes or codePoints, but RegExp is likely a good bit more efficient.
	386

	387 ## Calculating the length of a string

	388

	389 ### Problem

	390

	391 You want to get the length of a string, but are not sure how to

	392 correctly calculate the length when working with Unicode.

	393

	394 ### Solution

	395

	396 Use string.length to get the number of UTF-16 code units in a string:

	397

	398 'I love music'.length; // 12

	399 'I love music'.runes.length; // 12

	400

	401 ### Discussion

	402

	403 For characters that fit into 16 bites, the code unit length is the same as the
	floitsch 2013/03/07 17:22:07 bits bits shailentuli 2013/03/08 22:38:26 Done. Show quoted text On 2013/03/07 17:22:07, floitsch wrote: > bits Done.
	404 rune length:

	405

	406 var hearts = '\u2661'; // ♡

	407 hearts.length; // 1

	408 hearts.runes.length; // 1

	409

	410 If the string contains any characters outside the Basic Multilingual

	411 Plane (BMP), the rune length will be less than the code unit length:

	412

	413 var clef = '\u{1F3BC}'; // 🎼

	414 clef.length; // 2

	415 clef.runes.length; // 1

	416

	417 var music = 'I $hearts $clef'; // 'I ♡ 🎼 '

	418 music.length; // 6

	419 music.runes.length // 5

	420

	421 Use `length` if you want to number of code units; use `runes.length` if you

	422 want the number of distinct characters.
	floitsch 2013/03/07 17:22:07 But what is a "character" ? For example "é" is not But what is a "character" ? For example "é" is not necessarily just one rune... Alan Knight 2013/03/07 21:15:00 Yes, we probably need yet another section on these Show quoted text On 2013/03/07 17:22:07, floitsch wrote: > But what is a "character" ? > For example "é" is not necessarily just one rune... Yes, we probably need yet another section on these, and a reference from here shailentuli 2013/03/08 22:38:26 Changing to runes. Show quoted text On 2013/03/07 17:22:07, floitsch wrote: > But what is a "character" ? > For example "é" is not necessarily just one rune... Changing to runes.
	423

	424

	425 ## Subscripting a string

	426

	427 ### Problem

	428

	429 You want to be able to access a character in a string at a particular index.

	430

	431 ### Solution

	432

	433 Subscript runes:

	434

	435 var coffee = '\u{1F375}'; // 🍵

	436 coffee.runes.toList()[0]; // 127861

	437

	438 The number 127861 represents the code point for coffee, '\u{1F375}' (🍵 ).

	439
	Alan Knight 2013/03/07 21:15:00 Technically I think that's "Teacup without handle" Technically I think that's "Teacup without handle". Which is a shame, the idea of a unicode character for coffee is very appealing. shailentuli 2013/03/08 22:38:26 You are quite correct. Renaming to teacup. Show quoted text On 2013/03/07 21:15:00, Alan Knight wrote: > Technically I think that's "Teacup without handle". Which is a shame, the idea > of a unicode character for coffee is very appealing. You are quite correct. Renaming to teacup.
	440 ### Discussion

	441

	442 Subscripting a string directly can be problematic. This is because the default

	443 `[]` implementation subscripts along code units. This means that

	444 for non-BMP characters, subscripting yields invalid UTF-16 characters:

	445

	446 'Dart'[0]; // 'D'

	447

	448 var hearts = '\u2661'; // ♡

	449 hearts[0]; '\u2661' // ♡

	450

	451 coffee[0]; // 55356, Invalid string, half of a surrogate pair.

	452 coffee.codeUnits.toList()[0]; // The same.

	453
	Alan Knight 2013/03/07 21:15:00 I think the recommended answer is that you just us I think the recommended answer is that you just use indexing, with a caveat that if you have non-BMP characters and that's what you're worried about then you need to work in terms of runes or split the string up into individual characters. For the great majority of uses simple indexing will be fine, even if there are non-BMP characters.
	454

	455 ## Processing a string one character at a time

	456

	457 ### Problem

	458

	459 You want to do something with each individual character in a string.

	460

	461 ### Solution

	462

	463 To access an individual character, map the string runes:

	464

	465 var charList = "Dart".runes.map((rune) => '${new String.fromCharCode(rune)} ').toList();
	floitsch 2013/03/07 17:22:07 I'm questioning the utility of these strings. "é" I'm questioning the utility of these strings. "é" could be decomposed into two runes. One string would be an "e" the other the combining accent.
	466 // ['D', 'a', 'r', 't']

	467

	468 var runeList = happy.runes.map((rune) => [rune, new String.fromCharCode(rune )]).toList(),

	469 // [[73, 'I'], [32, ' '], [97, 'a'], [109, 'm'], [32, ' '], [9786, '☺' ]]

	470

	471 If you are sure that the string is in the Basic Multilingual Plane (BMP), you
	floitsch 2013/03/07 17:22:07 But then you can also just index into the string. But then you can also just index into the string.
	472 can use string.split(''):

	473

	474 'Dart'.split(''); // ['D', 'a', 'r', 't']

	475 smileyFace.split('').length; // 1

	476

	477 Since `split('')` splits at the UTF-16 code unit boundaries,

	478 invoking it on a non-BMP character yields the string's surrogate pair:

	479

	480 var clef = '\u{1F3BC}'; // 🎼 , not in BMP.

	481 clef.split('').length; // 2

	482

	483 The surrogate pair members are not valid UTF-16 strings.

	484

	485

	486 ## Splitting a string into substrings

	487

	488 ### Problem

	489

	490 You want to split a string into substrings.
	sethladd 2013/03/07 05:15:16 can you add "based on some pattern". An example he can you add "based on some pattern". An example helps here.
	491

	492 ### Solution

	493

	494 Use the `split()` method with a string or a regExp as an argument.

	495

	496 var smileyFace = '\u263A';

	497 var happy = 'I am $smileyFace';

	498 happy.split(' '); // ['I', 'am', '☺']

	499

	500 Here is an example of using `split()` with a regExp:

	501

	502 var nums = '2/7 3 4/5 3~/5';

	503 var numsRegExp = new RegExp(r'(\s\|/\|~/)');

	504 nums.split(numsRegExp); // ['2', '7', '3', '4', '5', '3', '5']

	505

	506 In the code above, the string `nums` contains various numbers, some of which

	507 are expressed as fractions or as int-divisions. A regExp is used to split the

	508 string to extract just the numbers.

	509

	510 You can perform operations on the matched and unmatched portions of a string

	511 when using `split()` with a regExp:

	512

	513 'Eats SHOOTS leaves'.splitMapJoin((new RegExp(r'SHOOTS')),

	514 onMatch: (m) => '${m.group(0).toLowerCase()}',

	515 onNonMatch: (n) => n.toUpperCase()); // 'EATS shoots LEAVES'

	516

	517 The regExp matches the middle word ('SHOOTS'). A pair of callbacks are

	518 registered to transform the matched and unmatched substrings before the

	519 substrings are joined together again.

	520

	521

	522 ## Changing string case

	523

	524 ### Problem

	525

	526 You want to change the case of strings.

	527

	528 ### Solution

	529

	530 Use `string.toUpperCase()` and `string.toLowerCase()` to covert a string to
	floitsch 2013/03/07 17:22:07 convert convert shailentuli 2013/03/08 22:38:26 Done. Show quoted text On 2013/03/07 17:22:07, floitsch wrote: > convert Done.
	531 lower-case or upper-case, respectively:
	floitsch 2013/03/07 17:22:07 No. this is not a good solution. This only works i No. this is not a good solution. This only works in some languages, but not in all. A good solution is much more complex. The discussion only touches parts of the problem. shailentuli 2013/03/08 22:38:26 Agreed. The goal is to show people how to correctl Show quoted text On 2013/03/07 17:22:07, floitsch wrote: > No. this is not a good solution. > This only works in some languages, but not in all. A good solution is much more > complex. > The discussion only touches parts of the problem. Agreed. The goal is to show people how to correctly use string.toUpperCase() and string.toLowerCase(). In the context of that, what specifically would you like included in the discussion? floitsch 2013/03/09 00:01:41 Well, that it doesn't work except in some language Show quoted text On 2013/03/08 22:38:26, shailentuli wrote: > On 2013/03/07 17:22:07, floitsch wrote: > > No. this is not a good solution. > > This only works in some languages, but not in all. A good solution is much > more > > complex. > > The discussion only touches parts of the problem. > > Agreed. The goal is to show people how to correctly use string.toUpperCase() and > string.toLowerCase(). In the context of that, what specifically would you like > included in the discussion? Well, that it doesn't work except in some languages. For example in turkish it fails: http://stackoverflow.com/questions/1850232/turkish-case-conversion-in-javascript Ideally we wouldn't have a toUpperCase and toLowerCase on String, but there were too many requests to add it.
	532

	533 var theOneILove = 'I love Lucy';

	534 theOneILove.toUpperCase(); // 'I LOVE LUCY!'

	535 theOneILove.toLowerCase(); // 'i love lucy!'

	536

	537 ### Discussion

	538

	539 Case changes affect the characters of bi-cameral scripts like Greek and French:

	540 var zeus = '\u0394\u03af\u03b1\u03c2'; // 'Δίας' (Zeus in modern Greek)

	541 zeus.toUpperCase(); // 'ΔΊΑΣ'

	542

	543 var resume = '\u0052\u00e9\u0073\u0075\u006d\u00e9'; // 'Résumé'

	544 resume.toLowerCase(); // 'résumé'

	545

	546 They do not affect the characters of uni-cameral scripts like Devanagari (used f or

	547 writing many of the languages of India):

	548

	549 var chickenKebab = '\u091a\u093f\u0915\u0928 \u0915\u092c\u093e\u092c';

	550 // 'चिकन कबाब' (in Devanagari)

	551 chickenKebab.toLowerCase(); // 'चिकन कबाब'

	552 chickenKebab.toUpperCase(); // 'चिकन कबाब'

	553

	554 If a character's case does not change when using `toUpperCase()` and

	555 `toLowerCase()`, it is most likely because the character only has one

	556 form.

	557

	558 ## Determining whether a string contains another string
	Alan Knight 2013/03/07 21:15:00 These should be much earlier, as they seem like ve These should be much earlier, as they seem like very common tasks.
	559

	560 ### Problem

	561

	562 You want to find out if a string is the substring of another string.

	563

	564 ### Solution

	565

	566 Use `string.contains()`:

	567

	568 var fact = 'Dart strings are immutable';

	569 string.contains('immutable'); // True.

	570

	571 You can indicate a startIndex as a second argument:

	572

	573 string.contains('Dart', 2); // False

	574

	575 ### Discussion

	576

	577 The String library provides a couple of shortcuts for testing whether a string

	578 is a substring of another:

	579

	580 string.startsWith('Dart'); // True.

	581 string.endsWith('e'); // True.

	582

	583 You can also use `string.indexOf()`, which returns -1 if the substring is

	584 not found within a string, and its matching index, if it is:

	585

	586 string.indexOf('art') != -1; // True, `art` is found in `Dart`

	587

	588 You can also use a regExp and `hasMatch()`:

	589

	590 new RegExp(r'ar[et]').hasMatch(string); // True, 'art' and 'are' match.

	591

	592

	593 ## Finding matches of a regExp pattern in a string

	594

	595 ### Problem

	596

	597 You want to use regExp to match a pattern in a string, and

	598 want to be able to access the matches.

	599

	600 ### Solution

	601

	602 Construct a regular expression using the RegExp class and find matches using

	603 the `allMatches()` method:

	604

	605 var neverEatingThat = 'Not with a fox, not in a box';

	606 var regExp = new RegExp(r'[fb]ox');

	607 List matches = regExp.allMatches(neverEatingThat);

	608 matches.map((match) => match.group(0)).toList(); // ['fox', 'box']

	609

	610 ### Discussion

	611

	612 You can query the object returned by `allMatches()` to find out the number of

	613 matches:

	614

	615 matches.length; // 2

	616

	617 To find the first match, use `firstMatch()`:

	618

	619 regExp.firstMatch(neverEatingThat).group(0); // 'fox'

	620

	621 To directly access the matched string, use `stringMatch()`:

	622

	623 regExp.stringMatch(neverEatingThat); // 'fox'

	624 regExp.stringMatch('I like bagels and lox'); // null

	625

	626

	627 ## Substituting strings based on regExp matches

	628

	629 ### Problem

	630

	631 You want to match substrings within a string and make substitutions based on

	632 the matches.

	633

	634 ### Solution

	635

	636 Construct a regular expression using the RegExp class and make replacements

	637 using `replaceAll()` method:

	638

	639 'resume'.replaceAll(new RegExp(r'e'), '\u00E9'); // 'résumé'

	640

	641 If you want to replace just the first match, use 'replaceFirst()`:

	642

	643 '0.0001'.replaceFirst(new RegExp(r'0+'), ''); // '.0001'

	644

	645 The RegExp matches for one or more 0's and replaces them with an empty string.

	646

	647 You can use `replaceAllMatched()` and register a function to modify the

	648 matches:

	649

	650 var heart = '\u2661'; // '♡'

	651 var string = 'I like Ike but I $heart Lucy';

	652 var regExp = new RegExp(r'[A-Z]\w+');

	653 string.replaceAllMapped(regExp, (match) => match.group(0).toUpperCase());

	654 // 'I like IKE but I ♡ LUCY'
	Alan Knight 2013/03/07 21:15:00 I think it would be nice to see some discussion of I think it would be nice to see some discussion of some the things people might not be expecting to see that could be quite useful, e.g. Iterable.join and splitMapJoin.
OLD	NEW

« no previous file with comments | « recipes/pubspec.yaml ('k') | recipes/test/all_tests.dart » ('j') | no next file with comments »