Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(177)

Side by Side Diff: lib/src/line_splitter.dart

Issue 1255643002: New, simpler and faster line splitter. (Closed) Base URL: https://github.com/dart-lang/dart_style.git@master
Patch Set: Optimize nesting. Reformat. Created 5 years, 5 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « lib/src/line_prefix.dart ('k') | lib/src/line_writer.dart » ('j') | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 // Copyright (c) 2014, the Dart project authors. Please see the AUTHORS file 1 // Copyright (c) 2015, the Dart project authors. Please see the AUTHORS file
2 // for details. All rights reserved. Use of this source code is governed by a 2 // for details. All rights reserved. Use of this source code is governed by a
3 // BSD-style license that can be found in the LICENSE file. 3 // BSD-style license that can be found in the LICENSE file.
4 4
5 library dart_style.src.line_splitter; 5 library dart_style.src.line_splitter;
6 6
7 import 'dart:math' as math; 7 import 'package:collection/priority_queue.dart';
8 8
9 import 'chunk.dart'; 9 import 'chunk.dart';
10 import 'debug.dart' as debug; 10 import 'debug.dart' as debug;
11 import 'line_prefix.dart';
12 import 'line_writer.dart'; 11 import 'line_writer.dart';
13 import 'rule/rule.dart'; 12 import 'rule/rule.dart';
14 13 import 'rule_set.dart';
15 /// Takes a series of [Chunk]s and determines the best way to split them into 14
16 /// lines of output that fit within the page width (if possible). 15 /// To ensure the solver doesn't go totally pathological on giant code, we cap
17 /// 16 /// it at a fixed number of attempts.
18 /// Imagine a naïve solution. You start at the first chunk in the list and work 17 ///
19 /// your way forward. For each chunk, you try every possible value of the rule 18 /// If the optimal solution isn't found after this many tries, it just uses the
20 /// that the chunk uses. Then, you ask the rule if that chunk should split or 19 /// best it found so far.
21 /// not when the rule has that value. If it does, then you try every possible 20 const _maxAttempts = 50000;
22 /// way you could modify the expression nesting context at that point. 21
23 /// 22 /// Takes a set of chunks and determines the best values for its rules in order
24 /// For each of those possible results, you start with that as a partial 23 /// to fit it inside the page boundary.
25 /// solution. You move onto the next chunk and repeat the process, building on 24 ///
26 /// to that partial solution. When you reach the last chunk, you determine the 25 /// This problem is exponential in the number of rules and a single expression
27 /// cost of that set of rule values and keep track of the best one. 26 /// in Dart can be quite large, so it isn't feasible to brute force this. For
28 /// 27 /// example:
29 /// Once you've tried every single possible permutation of rule values and 28 ///
30 /// nesting stacks, you will have exhaustively covered every possible way the 29 /// outer(
31 /// line can be split and you'll know which one had the lowest cost. That's 30 /// fn(1 + 2, 3 + 4, 5 + 6, 7 + 8),
32 /// your result. 31 /// fn(1 + 2, 3 + 4, 5 + 6, 7 + 8),
33 /// 32 /// fn(1 + 2, 3 + 4, 5 + 6, 7 + 8),
34 /// The obvious problem is that this is exponential in the number of values for 33 /// fn(1 + 2, 3 + 4, 5 + 6, 7 + 8));
35 /// each rule and the expression nesting levels, both of which can be quite 34 ///
36 /// large with things like long method chains containing function literals, 35 /// There are 509,607,936 ways this can be split.
37 /// large collection literals, etc. To tame that, this uses dynamic programming. 36 ///
38 /// 37 /// The problem is even harder because we may not be able to easily tell if a
39 /// As the naïve solver is working its way down the chunks, it's building up a 38 /// given solution is the best one. It's possible that there is *no* solution
40 /// partial solution. This solution has: 39 /// that fits in the page (due to long strings or identifiers) so the winning
41 /// 40 /// solution may still have overflow characters. This makes it hard to know
42 /// * A length—the number of chunks in it. 41 /// when we are done and can stop looking.
43 /// * The set of rule values we have selected for the rules used by those 42 ///
44 /// chunks. 43 /// There are a couple of pieces of domain knowledge we use to cope with this:
45 /// * The expression nesting stack that is currently active at the point where 44 ///
46 /// the current chunk splits. 45 /// - Changing a rule from unsplit to split will never lower its cost. A
47 /// 46 /// solution with all rules unsplit will always be the one with the lowest
48 /// We'll call this partial solution a [LinePrefix]. The naïve solution starts 47 /// cost (zero). Conversely, setting all of its rules to the maximum split
49 /// with an empty prefix (zero length, no rule values, no expression nesting). 48 /// value will always have the highest cost.
50 /// It grabs the next chunk after that prefix and says, OK, given this prefix 49 ///
51 /// what are all of the possible ways we can split that chunk (rule values and 50 /// (You might think there is a converse rule about overflow characters. The
52 /// nesting stacks). Create a new, one-chunk-longer prefix for each of those 51 /// solution with the fewest splits will have the most overflow, and the
53 /// and then recursively solve each of their suffixes. 52 /// solution with the most splits will have the least overflow. Alas, because
54 /// 53 /// of indentation, that isn't always the case. Adding a split may *increase*
55 /// Normally, this would basically be a depth-first traversal of the entire 54 /// overflow in some cases.)
56 /// possible solution space. However, there is a simple optimization we can 55 ///
57 /// make. When brute forcing the solution tree, we will often solve many 56 /// - If all of the chunks for a rule are inside lines that already fit in the
58 /// prefixes with the same length. 57 /// page, then splitting that rule will never improve the solution.
59 /// 58 ///
60 /// For example, if the first chunk's rule can take three different values, we 59 /// We start off with a [SolveState] where all rules are unbound (which
61 /// will solve three line prefixes of length one, one for each rule value. 60 /// implicitly treats them as unsplit). For a given solve state, we can produce
62 /// Those in turn may have lots of permutations leading to even more solutions 61 /// a set of expanded states that takes some of the rules in the first long
63 /// all with line prefixes of length two, and so on. 62 /// line and bind them to split values. This always produces new solve states
64 /// 63 /// with higher cost (but often fewer overflow characters) than the parent
65 /// In many cases, we have to treat these line prefixes separately. If two 64 /// state.
66 /// prefixes have the same length, but fix a different value for some rule, they 65 ///
67 /// may lead to different ways of splitting the suffix since that rule may 66 /// We take these expanded states and add them to a work list sorted by cost.
68 /// appear in a later chunk too. 67 /// Since unsplit rules always have lower cost solutions, we know that no state
69 /// 68 /// we enqueue later will ever have a lower cost than the ones we already have
70 /// But in many cases, the line prefixes are equivalent. For example, consider: 69 /// enqueued.
71 /// 70 ///
72 /// function( 71 /// Then we keep pulling states off the work list and expanding them and adding
73 /// first(arg, arg, arg, arg, arg, arg), 72 /// the results back into the list. We do this until we hit a solution where
74 /// second, 73 /// all characters fit in the page. The first one we find will have the lowest
75 /// third(arg, arg, arg, arg, arg, arg)); 74 /// cost and we're done.
76 /// 75 ///
77 /// The line splitter will try a number of different ways to split the argument 76 /// We also keep running track of the best solution we've found so far that
78 /// list to `first`. For each of those, it then has to try all of the different 77 /// has the fewest overflow characters and the lowest cost. If no solution fits,
79 /// ways it could split `third`. *However*, those two are unrelated. The way 78 /// we'll use this one.
80 /// you split `first` has no impact on how you split `third`. 79 ///
81 /// 80 /// As a final escape hatch for pathologically nasty code, after trying some
82 /// The precise way to state that is that as the line prefix grows *past* 81 /// fixed maximum number of solve states, we just bail and return the best
83 /// `first()` and its argument list, it contains *less* state. It discards the 82 /// solution found so far.
84 /// rule value it selected for the argument list rule since that rule doesn't 83 ///
85 /// appear anywhere in the suffix and can't affect it. Any expression nesting 84 /// Even with the above algorithmic optimizations, complex code may still
86 /// inside the argument list has been popped off by the time we get to `second`. 85 /// require a lot of exploring to find an optimal solution. To make that fast,
87 /// 86 /// this code is carefully profiled and optimized. If you modify this, make
88 /// We'll try every possible way to split `first()`'s argument list and then 87 /// sure to test against the benchmark to ensure you don't regress performance.
89 /// recursively solve the remainder of the line for each of those possible
90 /// splits. But each of those remainders will end up having identical prefixes.
91 /// For any given [LinePrefix], there is a single best solution for the suffix.
92 /// So, once we've found it, we can just reuse it the next time that same
93 /// prefix comes up.
94 ///
95 /// In other words, we add a memoization table, [_bestSplits]. It is a map from
96 /// [LinePrefix]es to [SplitSet]s. For each prefix, it stores the best set of
97 /// splits to use for the following suffix. With this, instead of exploring the
98 /// entire solution *tree*, which would be exponential, we just traverse the
99 /// solution *graph* where entire branches of the tree that are identical
100 /// collapse back together.
101 ///
102 /// The trick to making this work is storing the minimal amount of data in the
103 /// line prefix to *uniquely* identify a suffix while still collapsing as many
104 /// branches as possible. In practice, this means aggressively forgetting state
105 /// when the prefix grows. In particular, we do not need to keep track of
106 /// rule values we selected in the prefix if that rule does not affect anything
107 /// in the suffix.
108 ///
109 /// There are a couple of other smaller scale optimizations too. If we ever hit
110 /// a solution whose cost is zero, we know we won't do better, so we stop
111 /// searching. If we make it to the end of the chunk list and we haven't gone
112 /// past the end of the line, we know we don't need to split anywhere.
113 ///
114 /// But the memoization is the main thing that makes this tractable.
115 class LineSplitter { 88 class LineSplitter {
116 final LineWriter _writer; 89 final LineWriter _writer;
117 90
118 /// The list of chunks being split. 91 /// The list of chunks being split.
119 final List<Chunk> _chunks; 92 final List<Chunk> _chunks;
120 93
94 /// The set of soft rules whose values are being selected.
95 final List<Rule> _rules;
96
121 /// The number of characters of additional indentation to apply to each line. 97 /// The number of characters of additional indentation to apply to each line.
122 /// 98 ///
123 /// This is used when formatting blocks to get the output into the right 99 /// This is used when formatting blocks to get the output into the right
124 /// column based on where the block appears. 100 /// column based on where the block appears.
125 final int _blockIndentation; 101 final int _blockIndentation;
126 102
127 /// Memoization table for the best set of splits for the remainder of the 103 /// The starting column of the first line.
128 /// line following a given prefix. 104 final int _firstLineIndent;
129 final _bestSplits = <LinePrefix, SplitSet>{}; 105
130 106 /// The list of solve states to explore further.
131 /// The rules that appear in the first `n` chunks of the line where `n` is 107 ///
132 /// the index into the list. 108 /// This is sorted lowest-cost first. This ensures that as soon as we find a
133 final _prefixRules = <Set<Rule>>[]; 109 /// solution that fits in the page, we know it will be the lowest cost one
134 110 /// and can stop looking.
135 /// The rules that appear after the first `n` chunks of the line where `n` is 111 final _workList = new HeapPriorityQueue<SolveState>();
136 /// the index into the list. 112
137 final _suffixRules = <Set<Rule>>[]; 113 /// The lowest cost solution found so far.
114 SolveState _bestSolution;
138 115
139 /// Creates a new splitter for [_writer] that tries to fit [chunks] into the 116 /// Creates a new splitter for [_writer] that tries to fit [chunks] into the
140 /// page width. 117 /// page width.
141 LineSplitter(this._writer, this._chunks, this._blockIndentation); 118 LineSplitter(this._writer, List<Chunk> chunks, int blockIndentation,
142 119 int firstLineIndent,
143 /// Convert the line to a [String] representation. 120 {bool flushLeft: false})
144 /// 121 : _chunks = chunks,
145 /// It will determine how best to split it into multiple lines of output and 122 // Collect the set of soft rules that we need to select values for.
146 /// write the result to [writer]. 123 _rules = chunks
124 .map((chunk) => chunk.rule)
125 .where((rule) => rule != null && rule is! HardSplitRule)
126 .toSet()
127 .toList(growable: false),
128 _blockIndentation = blockIndentation,
129 _firstLineIndent = flushLeft ? 0 : firstLineIndent + blockIndentation {
130 // Store the rule's index in the rule so we can get from a chunk to a rule
131 // index quickly.
132 for (var i = 0; i < _rules.length; i++) {
133 _rules[i].index = i;
134 }
135 }
136
137 /// Determine the best way to split the chunks into lines that fit in the
138 /// page, if possible.
139 ///
140 /// Returns a [SplitSet] that defines where each split occurs and the
141 /// indentation of each line.
147 /// 142 ///
148 /// [firstLineIndent] is the number of characters of whitespace to prefix the 143 /// [firstLineIndent] is the number of characters of whitespace to prefix the
149 /// first line of output with. 144 /// first line of output with.
150 SplitSolution apply(int firstLineIndent, {bool flushLeft}) { 145 SplitSet apply() {
146 // Start with a completely unbound, unsplit solution.
147 _workList.add(new SolveState(this, new RuleSet(_rules.length)));
148
149 var attempts = 0;
150 while (!_workList.isEmpty) {
151 var state = _workList.removeFirst();
152
153 if (state.isBetterThan(_bestSolution)) {
154 _bestSolution = state;
155
156 // Since we sort solutions by cost the first solution we find that
157 // fits is the winner.
158 if (_bestSolution.overflowChars == 0) break;
159 }
160
161 if (debug.traceSplitter) {
162 var best = state == _bestSolution ? " (best)" : "";
163 debug.log("$state$best");
164 debug.dumpLines(_chunks, _firstLineIndent, state.splits);
165 debug.log();
166 }
167
168 if (attempts++ > _maxAttempts) break;
169
170 // Try bumping the rule values for rules whose chunks are on long lines.
171 state.expand();
172 }
173
151 if (debug.traceSplitter) { 174 if (debug.traceSplitter) {
152 debug.log(debug.green("\nSplitting:")); 175 debug.log("$_bestSolution (winner)");
153 debug.dumpChunks(0, _chunks); 176 debug.dumpLines(_chunks, _firstLineIndent, _bestSolution.splits);
154 debug.log(); 177 debug.log();
155 } 178 }
156 179
157 // Ignore the trailing rule on the last chunk since it isn't used for 180 return _bestSolution.splits;
158 // anything. 181 }
159 var ruleChunks = _chunks.take(_chunks.length - 1); 182 }
160 183
161 // Pre-calculate the set of rules appear before and after each length. We 184 /// A possibly incomplete solution in the line splitting search space.
162 // use these frequently when creating [LinePrefix]es and they only depend 185 ///
163 // on the length, so we can cache them up front. 186 /// A single [SolveState] binds some subset of the rules to values while
164 for (var i = 0; i < _chunks.length; i++) { 187 /// leaving the rest unbound. If every rule is bound, the solve state describes
165 _prefixRules.add(ruleChunks.take(i).map((chunk) => chunk.rule).toSet()); 188 /// a complete solution to the line splitting problem. Even if rules are
166 _suffixRules.add(ruleChunks.skip(i).map((chunk) => chunk.rule).toSet()); 189 /// unbound, a state can also usually be used as a solution by treating all
167 } 190 /// unbound rules as unsplit. (The usually is because a state that constrains
168 191 /// an unbound rule to split can't be used with that rule unsplit.)
169 var prefix = new LinePrefix(firstLineIndent + _blockIndentation, 192 ///
170 flushLeft: flushLeft); 193 /// From a given solve state, we can explore the search tree to more refined
171 var solution = new SplitSolution(prefix); 194 /// solve states by producing new ones that add more bound rules to the current
172 _tryChunkRuleValues(solution, prefix); 195 /// state.
173 return solution; 196 class SolveState implements Comparable<SolveState> {
174 } 197 final LineSplitter _splitter;
175 198 final RuleSet _ruleValues;
176 /// Finds the best set of splits to apply to the remainder of the chunks 199
177 /// following [prefix]. 200 /// The unbound rules in this state that can be bound to produce new more
178 /// 201 /// refined states.
179 /// This can only be called for a suffix that begins a new line. (In other 202 ///
180 /// words, the last chunk in the prefix cannot be unsplit.) 203 /// Keeping this set small is the key to make the entire line splitter
181 SplitSet _findBestSplits(LinePrefix prefix) { 204 /// perform well. If we consider too make rules at each state, our
182 // Use the memoized result if we have it. 205 /// exploration of the solution space is too branchy and we waste time on
183 if (_bestSplits.containsKey(prefix)) { 206 /// dead end solutions.
184 if (debug.traceSplitter) { 207 ///
185 debug.log("memoized splits for $prefix = ${_bestSplits[prefix]}"); 208 /// Here is the key insight. The line splitter treats any unbound rule as
186 } 209 /// being unsplit. This means refining a solution always means taking a rule
187 return _bestSplits[prefix]; 210 /// that is unsplit and making it split. That monotonically increases the
188 } 211 /// cost, but may help fit the solution inside the page.
189 212 ///
190 if (debug.traceSplitter) { 213 /// We want to keep the cost low, so the only reason to consider making a
191 debug.log("find splits for $prefix"); 214 /// rule split is if it reduces an overflowing line. It's also the case that
192 debug.indent(); 215 /// splitting an earlier rule will often reshuffle the rest of the line.
193 } 216 ///
194 217 /// Taking that into account, the only rules we consider binding to extend a
195 var solution = new SplitSolution(prefix); 218 /// solve state are *unbound rules inside the first line that is overflowing*.
196 _tryChunkRuleValues(solution, prefix); 219 /// Even if a line has dozens of rules, this generally keeps the branching
197 220 /// down to a few. It also means rules inside lines that already fit are
198 if (debug.traceSplitter) { 221 /// never touched.
199 debug.unindent(); 222 ///
200 debug.log("best splits for $prefix = ${solution.splits}"); 223 /// There is one other set of rules that go in here. Sometimes a bound rule
201 } 224 /// in the solve state constrains some other unbound rule to split. In that
202 225 /// case, we also consider that active so we know to not leave it at zero.
203 return _bestSplits[prefix] = solution.splits; 226 final _liveRules = new Set<Rule>();
204 } 227
205 228 /// The set of splits chosen for this state.
206 /// Updates [solution] with the best rule value selection for the chunk 229 SplitSet get splits => _splits;
207 /// immediately following [prefix]. 230 SplitSet _splits;
208 void _tryChunkRuleValues(SplitSolution solution, LinePrefix prefix) { 231
209 // If we made it to the end, this prefix can be solved without splitting 232 /// The number of characters that do not fit inside the page with this set of
210 // any chunks. 233 /// splits.
211 if (prefix.length == _chunks.length - 1) { 234 int get overflowChars => _overflowChars;
212 solution.update(this, new SplitSet()); 235 int _overflowChars;
213 return; 236
214 } 237 /// Whether we can treat this state as a complete solution by leaving its
215 238 /// unbound rules unsplit.
216 var chunk = _chunks[prefix.length]; 239 ///
217 240 /// This is generally true but will be false if the state contains any
218 // See if we've already selected a value for the rule. 241 /// unbound rules that are constrained to not be zero by other bound rules.
219 var value = prefix.ruleValues[chunk.rule]; 242 /// This avoids picking a solution that leaves those rules at zero when they
220 243 /// aren't allowed to be.
221 if (value == null) { 244 bool _isComplete = true;
222 // No, so try every possible value for the rule. 245
223 for (value = 0; value < chunk.rule.numValues; value++) { 246 SolveState(this._splitter, this._ruleValues) {
224 _tryRuleValue(solution, prefix, value); 247 _calculateSplits();
225 } 248 _calculateCost();
226 } else if (value == -1) { 249 }
227 // A -1 "value" means, "any non-zero value". In other words, the rule has 250
228 // to split somehow, but can split however it chooses. 251 /// Orders this state relative to [other].
229 for (value = 1; value < chunk.rule.numValues; value++) { 252 ///
230 _tryRuleValue(solution, prefix, value); 253 /// This is the best-first ordering that the [LineSplitter] uses in its
231 } 254 /// worklist. It prefers cheaper states even if they overflow because this
232 } else { 255 /// ensures it finds the best solution first as soon as it finds one that
233 // Otherwise, it's constrained to a single value, so use it. 256 /// fits in the page so it can early out.
234 _tryRuleValue(solution, prefix, value); 257 int compareTo(SolveState other) {
235 } 258 // TODO(rnystrom): It may be worth sorting by the estimated lowest number
236 } 259 // of overflow characters first. That doesn't help in cases where there is
237 260 // a solution that fits, but may help in corner cases where there is no
238 /// Updates [solution] with the best solution that can be found by setting 261 // fitting solution.
239 /// the chunk after [prefix]'s rule to [value]. 262
240 void _tryRuleValue(SplitSolution solution, LinePrefix prefix, int value) { 263 if (splits.cost != other.splits.cost) {
241 // If we already have an unbeatable solution, don't bother trying this one. 264 return splits.cost.compareTo(other.splits.cost);
242 if (solution.cost == 0) return; 265 }
243 266
244 var chunk = _chunks[prefix.length]; 267 if (overflowChars != other.overflowChars) {
245 268 return overflowChars.compareTo(other.overflowChars);
246 if (chunk.rule.isSplit(value, chunk)) { 269 }
247 // The chunk is splitting in an expression, so try all of the possible 270
248 // nesting combinations. 271 // Distinguish states based on the rule values just so that states with the
249 var ruleValues = _advancePrefix(prefix, value); 272 // same cost range but different rule values don't get considered identical
250 var longerPrefixes = prefix.split(chunk, _blockIndentation, ruleValues); 273 // by HeapPriorityQueue.
251 for (var longerPrefix in longerPrefixes) { 274 for (var rule in _splitter._rules) {
252 _tryLongerPrefix(solution, prefix, longerPrefix); 275 var value = _ruleValues.getValue(rule);
253 276 var otherValue = other._ruleValues.getValue(rule);
254 // If we find an unbeatable solution, don't try any more. 277
255 if (solution.cost == 0) break; 278 if (value != otherValue) return value.compareTo(otherValue);
256 } 279 }
257 } else { 280
258 // We didn't split here, so add this chunk and its rule value to the 281 // If we get here, this state is identical to [other].
259 // prefix and continue on to the next. 282 return 0;
260 var extended = prefix.extend(_advancePrefix(prefix, value)); 283 }
261 _tryChunkRuleValues(solution, extended); 284
262 } 285 /// Returns `true` if this state is a better solution to use as the final
263 } 286 /// result than [other].
264 287 bool isBetterThan(SolveState other) {
265 /// Updates [solution] with the solution for [prefix] assuming it uses 288 // If this state contains an unbound rule that we know can't be left
266 /// [longerPrefix] for the next chunk. 289 // unsplit, we can't pick this as a solution.
267 void _tryLongerPrefix( 290 if (!_isComplete) return false;
268 SplitSolution solution, LinePrefix prefix, LinePrefix longerPrefix) { 291
269 var remaining = _findBestSplits(longerPrefix); 292 // Anything is better than nothing.
270 293 if (other == null) return true;
271 // If it wasn't possible to split the suffix given this nesting stack, 294
272 // skip it. 295 // Prefer the solution that fits the most in the page.
273 if (remaining == null) return; 296 if (overflowChars != other.overflowChars) {
274 297 return overflowChars < other.overflowChars;
275 solution.update(this, remaining.add(prefix.length, longerPrefix.column)); 298 }
276 } 299
277 300 // Otherwise, prefer the best cost.
278 /// Determines the set of rule values for a new [LinePrefix] one chunk longer 301 return splits.cost < other.splits.cost;
279 /// than [prefix] whose rule on the new last chunk has [value]. 302 }
280 /// 303
281 /// Returns a map of [Rule]s to values for those rules for the values that 304 /// Enqueues more solve states to consider based on this one.
282 /// span the prefix and suffix of the [LinePrefix]. 305 ///
283 Map<Rule, int> _advancePrefix(LinePrefix prefix, int value) { 306 /// For each unbound rule in this state that occurred in the first long line,
284 // Get the rules that appear in both in and after the new prefix. These are 307 /// enqueue solve states that bind that rule to each value it can have and
285 // the rules that already have values that the suffix needs to honor. 308 /// bind all previous rules to zero. (In other words, try all subsolutions
286 var prefixRules = _prefixRules[prefix.length + 1]; 309 /// where that rule becomes the first new rule to split at.)
287 var suffixRules = _suffixRules[prefix.length + 1]; 310 void expand() {
288 311 var unsplitRules = _ruleValues.clone();
289 var nextRule = _chunks[prefix.length].rule; 312
290 var updatedValues = {}; 313 // Walk down the rules looking for unbound ones to try.
291 314 var triedRules = 0;
292 for (var prefixRule in prefixRules) { 315 for (var rule in _splitter._rules) {
293 var ruleValue = 316 if (_liveRules.contains(rule)) {
294 prefixRule == nextRule ? value : prefix.ruleValues[prefixRule]; 317 // We found one worth trying, so try all of its values.
295 318 for (var value = 1; value < rule.numValues; value++) {
296 if (suffixRules.contains(prefixRule)) { 319 var boundRules = unsplitRules.clone();
297 // If the same rule appears in both the prefix and suffix, then preserve 320
298 // its exact value. 321 var mustSplitRules;
299 updatedValues[prefixRule] = ruleValue; 322 var valid = boundRules.tryBind(_splitter._rules, rule, value, (rule) {
300 } 323 if (mustSplitRules == null) mustSplitRules = [];
301 324 mustSplitRules.add(rule);
302 // If we haven't specified any value for this rule in the prefix, it 325 });
303 // doesn't place any constraint on the suffix. 326
304 if (ruleValue == null) continue; 327 // Make sure we don't violate the constraints of the bound rules.
305 328 if (!valid) continue;
306 // Enforce the constraints between rules. 329
307 for (var suffixRule in suffixRules) { 330 var state = new SolveState(_splitter, boundRules);
308 if (suffixRule == prefixRule) continue; 331
309 332 // If some unbound rules are constrained to split, remember that.
310 // See if the prefix rule's value constrains any values in the suffix. 333 if (mustSplitRules != null) {
311 var value = prefixRule.constrain(ruleValue, suffixRule); 334 state._isComplete = false;
312 335 state._liveRules.addAll(mustSplitRules);
313 // Also consider the backwards case, where a later rule in the suffix 336 }
314 // constrains a rule in the prefix. 337
315 if (value == null) { 338 _splitter._workList.add(state);
316 value = suffixRule.reverseConstrain(ruleValue, prefixRule);
317 } 339 }
318 340
319 if (value != null) { 341 // Stop once we've tried all of the ones we can.
320 updatedValues[prefixRule] = ruleValue; 342 if (++triedRules == _liveRules.length) break;
321 updatedValues[suffixRule] = value; 343 }
344
345 // Fill in previous unbound rules with zero.
346 if (!_ruleValues.contains(rule)) {
347 // Pass a dummy callback because zero will never fail. (If it would
348 // have, that rule would already be bound to some other value.)
349 if (!unsplitRules.tryBind(_splitter._rules, rule, 0, (_) {})) {
350 break;
322 } 351 }
323 } 352 }
324 } 353 }
325 354 }
326 return updatedValues; 355
356 /// Calculates the [SplitSet] for this solve state, assuming any unbound
357 /// rules are set to zero.
358 void _calculateSplits() {
359 // Figure out which expression nesting levels got split and need to be
360 // assigned columns.
361 var usedNestingLevels = new Set();
362 for (var i = 0; i < _splitter._chunks.length - 1; i++) {
363 var chunk = _splitter._chunks[i];
364 if (chunk.rule.isSplit(_getValue(chunk.rule), chunk)) {
365 usedNestingLevels.add(chunk.nesting);
366 chunk.nesting.clearTotalUsedIndent();
367 }
368 }
369
370 for (var nesting in usedNestingLevels) {
371 nesting.refreshTotalUsedIndent(usedNestingLevels);
372 }
373
374 _splits = new SplitSet(_splitter._chunks.length);
375 for (var i = 0; i < _splitter._chunks.length - 1; i++) {
376 var chunk = _splitter._chunks[i];
377 if (chunk.rule.isSplit(_getValue(chunk.rule), chunk)) {
378 var indent = 0;
379 if (!chunk.flushLeftAfter) {
380 // Add in the chunk's indent.
381 indent = _splitter._blockIndentation + chunk.indent;
382
383 // And any expression nesting.
384 indent += chunk.nesting.totalUsedIndent;
385 }
386
387 _splits.add(i, indent);
388 }
389 }
390 }
391
392 /// Gets the value to use for [rule], either the bound value or `0` if it
393 /// isn't bound.
394 int _getValue(Rule rule) {
395 if (rule is HardSplitRule) return 0;
396
397 return _ruleValues.getValue(rule);
327 } 398 }
328 399
329 /// Evaluates the cost (i.e. the relative "badness") of splitting the line 400 /// Evaluates the cost (i.e. the relative "badness") of splitting the line
330 /// into [lines] physical lines based on the current set of rules. 401 /// into [lines] physical lines based on the current set of rules.
331 int _evaluateCost(LinePrefix prefix, SplitSet splits) { 402 void _calculateCost() {
332 assert(splits != null); 403 assert(_splits != null);
333 404
334 // Calculate the length of each line and apply the cost of any spans that 405 // Calculate the length of each line and apply the cost of any spans that
335 // get split. 406 // get split.
336 var cost = 0; 407 var cost = 0;
337 var length = prefix.column; 408 _overflowChars = 0;
338 409
339 var splitRules = new Set(); 410 var length = _splitter._firstLineIndent;
340 411
341 endLine() { 412 // The unbound rules in use by the current line. This will be null after
342 // Punish lines that went over the length. We don't rule these out 413 // the first long line has completed.
343 // completely because it may be that the only solution still goes over 414 var currentLineRules = [];
344 // (for example with long string literals). 415
345 if (length > _writer.pageWidth) { 416 endLine(int end) {
346 cost += (length - _writer.pageWidth) * Cost.overflowChar; 417 // Track lines that went over the length. It is only rules contained in
347 } 418 // long lines that we may want to split.
348 } 419 if (length > _splitter._writer.pageWidth) {
349 420 _overflowChars += length - _splitter._writer.pageWidth;
421
422 // Only try rules that are in the first long line, since we know at
423 // least one of them *will* be split.
424 if (currentLineRules != null && currentLineRules.isNotEmpty) {
425 _liveRules.addAll(currentLineRules);
426 currentLineRules = null;
427 }
428 } else {
429 // The line fit, so don't keep track of its rules.
430 if (currentLineRules != null) {
431 currentLineRules.clear();
432 }
433 }
434 }
435
350 // The set of spans that contain chunks that ended up splitting. We store 436 // The set of spans that contain chunks that ended up splitting. We store
351 // these in a set so a span's cost doesn't get double-counted if more than 437 // these in a set so a span's cost doesn't get double-counted if more than
352 // one split occurs in it. 438 // one split occurs in it.
353 var splitSpans = new Set(); 439 var splitSpans = new Set();
354 440
355 for (var i = prefix.length; i < _chunks.length; i++) { 441 for (var i = 0; i < _splitter._chunks.length; i++) {
356 var chunk = _chunks[i]; 442 var chunk = _splitter._chunks[i];
357 443
358 length += chunk.text.length; 444 length += chunk.text.length;
359 445
360 if (i < _chunks.length - 1) { 446 // Ignore the split after the last chunk.
361 if (splits.shouldSplitAt(i)) { 447 if (i == _splitter._chunks.length - 1) break;
362 endLine();
363 448
364 splitSpans.addAll(chunk.spans); 449 if (_splits.shouldSplitAt(i)) {
450 endLine(i);
365 451
366 if (chunk.rule != null && !splitRules.contains(chunk.rule)) { 452 splitSpans.addAll(chunk.spans);
367 // Don't double-count rules if multiple splits share the same
368 // rule.
369 splitRules.add(chunk.rule);
370 cost += chunk.rule.cost;
371 }
372 453
373 // Include the cost of the nested block. 454 // Include the cost of the nested block.
374 if (chunk.blockChunks.isNotEmpty) { 455 if (chunk.blockChunks.isNotEmpty) {
375 cost += _writer.formatBlock(chunk, splits.getColumn(i)).cost; 456 cost +=
376 } 457 _splitter._writer.formatBlock(chunk, _splits.getColumn(i)).cost;
458 }
377 459
378 // Start the new line. 460 // Start the new line.
379 length = splits.getColumn(i); 461 length = _splits.getColumn(i);
380 } else { 462 } else {
381 if (chunk.spaceWhenUnsplit) length++; 463 if (chunk.spaceWhenUnsplit) length++;
382 464
383 // Include the nested block inline, if any. 465 // Include the nested block inline, if any.
384 length += chunk.unsplitBlockLength; 466 length += chunk.unsplitBlockLength;
467
468 // If we might be in the first overly long line, keep track of any
469 // unbound rules we encounter. These are ones that we'll want to try to
470 // bind to shorten the long line.
471 if (currentLineRules != null &&
472 chunk.rule != null &&
473 !chunk.isHardSplit &&
474 !_ruleValues.contains(chunk.rule)) {
475 currentLineRules.add(chunk.rule);
385 } 476 }
386 } 477 }
387 } 478 }
388 479
480 // Add the costs for the rules that split.
481 _ruleValues.forEach(_splitter._rules, (rule, value) {
482 // A rule may be bound to zero if another rule constrains it to not split.
483 if (value != 0) cost += rule.cost;
484 });
485
389 // Add the costs for the spans containing splits. 486 // Add the costs for the spans containing splits.
390 for (var span in splitSpans) cost += span.cost; 487 for (var span in splitSpans) cost += span.cost;
391 488
392 // Finish the last line. 489 // Finish the last line.
393 endLine(); 490 endLine(_splitter._chunks.length);
394 491
395 return cost; 492 _splits.setCost(cost);
493 }
494
495 String toString() {
496 var buffer = new StringBuffer();
497
498 buffer.writeAll(
499 _splitter._rules.map((rule) {
500 var valueLength = "${rule.fullySplitValue}".length;
501
502 var value = "?";
503 if (_ruleValues.contains(rule)) {
504 value = "${_ruleValues.getValue(rule)}";
505 }
506
507 value = value.padLeft(valueLength);
508 if (_liveRules.contains(rule)) {
509 value = debug.bold(value);
510 } else {
511 value = debug.gray(value);
512 }
513
514 return value;
515 }),
516 " ");
517
518 buffer.write(" \$${splits.cost}");
519
520 if (overflowChars > 0) buffer.write(" (${overflowChars} over)");
521 if (!_isComplete) buffer.write(" (incomplete)");
522 if (splits == null) buffer.write(" invalid");
523
524 return buffer.toString();
396 } 525 }
397 } 526 }
398
399 /// Keeps track of the best set of splits found so far for a suffix of some
400 /// prefix.
401 class SplitSolution {
402 /// The prefix whose suffix we are finding a solution for.
403 final LinePrefix _prefix;
404
405 /// The best set of splits currently found.
406 SplitSet get splits => _bestSplits;
407 SplitSet _bestSplits;
408
409 /// The lowest cost currently found.
410 int get cost => _lowestCost;
411 int _lowestCost;
412
413 /// Whether a solution that fits within a page has been found yet.
414 bool get isAdequate => _lowestCost != null && _lowestCost < Cost.overflowChar;
415
416 SplitSolution(this._prefix);
417
418 /// Compares [splits] to the best solution found so far and keeps it if it's
419 /// better.
420 void update(LineSplitter splitter, SplitSet splits) {
421 var cost = splitter._evaluateCost(_prefix, splits);
422
423 if (_lowestCost == null || cost < _lowestCost) {
424 _bestSplits = splits;
425 _lowestCost = cost;
426 }
427
428 if (debug.traceSplitter) {
429 var best = _bestSplits == splits ? " (best)" : "";
430 debug.log(debug.gray("$_prefix $splits \$$cost$best"));
431 debug.dumpLines(splitter._chunks, _prefix, splits);
432 debug.log();
433 }
434 }
435 }
436
437 /// An immutable, persistent set of enabled split [Chunk]s.
438 ///
439 /// For each chunk, this tracks if it has been split and, if so, what the
440 /// chosen column is for the following line.
441 ///
442 /// Internally, this uses a sparse parallel list where each element corresponds
443 /// to the column of the chunk at that index in the chunk list, or `null` if
444 /// there is no active split there. This had about a 10% perf improvement over
445 /// using a [Set] of splits or a persistent linked list of split index/indent
446 /// pairs.
447 class SplitSet {
448 List<int> _columns;
449
450 /// Creates a new empty split set.
451 SplitSet() : this._(const []);
452
453 SplitSet._(this._columns);
454
455 /// Returns a new [SplitSet] containing the union of this one and the split
456 /// at [index] with next line starting at [column].
457 SplitSet add(int index, int column) {
458 var newIndents = new List(math.max(index + 1, _columns.length));
459 newIndents.setAll(0, _columns);
460 newIndents[index] = column;
461
462 return new SplitSet._(newIndents);
463 }
464
465 /// Returns `true` if the chunk at [splitIndex] should be split.
466 bool shouldSplitAt(int index) =>
467 index < _columns.length && _columns[index] != null;
468
469 /// Gets the zero-based starting column for the chunk at [index].
470 int getColumn(int index) => _columns[index];
471
472 String toString() {
473 var result = [];
474 for (var i = 0; i < _columns.length; i++) {
475 if (_columns[i] != null) {
476 result.add("$i:${_columns[i]}");
477 }
478 }
479
480 return result.join(" ");
481 }
482 }
OLDNEW
« no previous file with comments | « lib/src/line_prefix.dart ('k') | lib/src/line_writer.dart » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698