| OLD | NEW |
| 1 // Copyright (c) 2014, the Dart project authors. Please see the AUTHORS file | 1 // Copyright (c) 2015, the Dart project authors. Please see the AUTHORS file |
| 2 // for details. All rights reserved. Use of this source code is governed by a | 2 // for details. All rights reserved. Use of this source code is governed by a |
| 3 // BSD-style license that can be found in the LICENSE file. | 3 // BSD-style license that can be found in the LICENSE file. |
| 4 | 4 |
| 5 library dart_style.src.line_splitter; | 5 library dart_style.src.line_splitter; |
| 6 | 6 |
| 7 import 'dart:math' as math; | 7 import 'package:collection/priority_queue.dart'; |
| 8 | 8 |
| 9 import 'chunk.dart'; | 9 import 'chunk.dart'; |
| 10 import 'debug.dart' as debug; | 10 import 'debug.dart' as debug; |
| 11 import 'line_prefix.dart'; | |
| 12 import 'line_writer.dart'; | 11 import 'line_writer.dart'; |
| 13 import 'rule/rule.dart'; | 12 import 'rule/rule.dart'; |
| 14 | 13 import 'rule_set.dart'; |
| 15 /// Takes a series of [Chunk]s and determines the best way to split them into | 14 |
| 16 /// lines of output that fit within the page width (if possible). | 15 /// To ensure the solver doesn't go totally pathological on giant code, we cap |
| 17 /// | 16 /// it at a fixed number of attempts. |
| 18 /// Imagine a naïve solution. You start at the first chunk in the list and work | 17 /// |
| 19 /// your way forward. For each chunk, you try every possible value of the rule | 18 /// If the optimal solution isn't found after this many tries, it just uses the |
| 20 /// that the chunk uses. Then, you ask the rule if that chunk should split or | 19 /// best it found so far. |
| 21 /// not when the rule has that value. If it does, then you try every possible | 20 const _maxAttempts = 50000; |
| 22 /// way you could modify the expression nesting context at that point. | 21 |
| 23 /// | 22 /// Takes a set of chunks and determines the best values for its rules in order |
| 24 /// For each of those possible results, you start with that as a partial | 23 /// to fit it inside the page boundary. |
| 25 /// solution. You move onto the next chunk and repeat the process, building on | 24 /// |
| 26 /// to that partial solution. When you reach the last chunk, you determine the | 25 /// This problem is exponential in the number of rules and a single expression |
| 27 /// cost of that set of rule values and keep track of the best one. | 26 /// in Dart can be quite large, so it isn't feasible to brute force this. For |
| 28 /// | 27 /// example: |
| 29 /// Once you've tried every single possible permutation of rule values and | 28 /// |
| 30 /// nesting stacks, you will have exhaustively covered every possible way the | 29 /// outer( |
| 31 /// line can be split and you'll know which one had the lowest cost. That's | 30 /// fn(1 + 2, 3 + 4, 5 + 6, 7 + 8), |
| 32 /// your result. | 31 /// fn(1 + 2, 3 + 4, 5 + 6, 7 + 8), |
| 33 /// | 32 /// fn(1 + 2, 3 + 4, 5 + 6, 7 + 8), |
| 34 /// The obvious problem is that this is exponential in the number of values for | 33 /// fn(1 + 2, 3 + 4, 5 + 6, 7 + 8)); |
| 35 /// each rule and the expression nesting levels, both of which can be quite | 34 /// |
| 36 /// large with things like long method chains containing function literals, | 35 /// There are 509,607,936 ways this can be split. |
| 37 /// large collection literals, etc. To tame that, this uses dynamic programming. | 36 /// |
| 38 /// | 37 /// The problem is even harder because we may not be able to easily tell if a |
| 39 /// As the naïve solver is working its way down the chunks, it's building up a | 38 /// given solution is the best one. It's possible that there is *no* solution |
| 40 /// partial solution. This solution has: | 39 /// that fits in the page (due to long strings or identifiers) so the winning |
| 41 /// | 40 /// solution may still have overflow characters. This makes it hard to know |
| 42 /// * A length—the number of chunks in it. | 41 /// when we are done and can stop looking. |
| 43 /// * The set of rule values we have selected for the rules used by those | 42 /// |
| 44 /// chunks. | 43 /// There are a couple of pieces of domain knowledge we use to cope with this: |
| 45 /// * The expression nesting stack that is currently active at the point where | 44 /// |
| 46 /// the current chunk splits. | 45 /// - Changing a rule from unsplit to split will never lower its cost. A |
| 47 /// | 46 /// solution with all rules unsplit will always be the one with the lowest |
| 48 /// We'll call this partial solution a [LinePrefix]. The naïve solution starts | 47 /// cost (zero). Conversely, setting all of its rules to the maximum split |
| 49 /// with an empty prefix (zero length, no rule values, no expression nesting). | 48 /// value will always have the highest cost. |
| 50 /// It grabs the next chunk after that prefix and says, OK, given this prefix | 49 /// |
| 51 /// what are all of the possible ways we can split that chunk (rule values and | 50 /// (You might think there is a converse rule about overflow characters. The |
| 52 /// nesting stacks). Create a new, one-chunk-longer prefix for each of those | 51 /// solution with the fewest splits will have the most overflow, and the |
| 53 /// and then recursively solve each of their suffixes. | 52 /// solution with the most splits will have the least overflow. Alas, because |
| 54 /// | 53 /// of indentation, that isn't always the case. Adding a split may *increase* |
| 55 /// Normally, this would basically be a depth-first traversal of the entire | 54 /// overflow in some cases.) |
| 56 /// possible solution space. However, there is a simple optimization we can | 55 /// |
| 57 /// make. When brute forcing the solution tree, we will often solve many | 56 /// - If all of the chunks for a rule are inside lines that already fit in the |
| 58 /// prefixes with the same length. | 57 /// page, then splitting that rule will never improve the solution. |
| 59 /// | 58 /// |
| 60 /// For example, if the first chunk's rule can take three different values, we | 59 /// We start off with a [SolveState] where all rules are unbound (which |
| 61 /// will solve three line prefixes of length one, one for each rule value. | 60 /// implicitly treats them as unsplit). For a given solve state, we can produce |
| 62 /// Those in turn may have lots of permutations leading to even more solutions | 61 /// a set of expanded states that takes some of the rules in the first long |
| 63 /// all with line prefixes of length two, and so on. | 62 /// line and bind them to split values. This always produces new solve states |
| 64 /// | 63 /// with higher cost (but often fewer overflow characters) than the parent |
| 65 /// In many cases, we have to treat these line prefixes separately. If two | 64 /// state. |
| 66 /// prefixes have the same length, but fix a different value for some rule, they | 65 /// |
| 67 /// may lead to different ways of splitting the suffix since that rule may | 66 /// We take these expanded states and add them to a work list sorted by cost. |
| 68 /// appear in a later chunk too. | 67 /// Since unsplit rules always have lower cost solutions, we know that no state |
| 69 /// | 68 /// we enqueue later will ever have a lower cost than the ones we already have |
| 70 /// But in many cases, the line prefixes are equivalent. For example, consider: | 69 /// enqueued. |
| 71 /// | 70 /// |
| 72 /// function( | 71 /// Then we keep pulling states off the work list and expanding them and adding |
| 73 /// first(arg, arg, arg, arg, arg, arg), | 72 /// the results back into the list. We do this until we hit a solution where |
| 74 /// second, | 73 /// all characters fit in the page. The first one we find will have the lowest |
| 75 /// third(arg, arg, arg, arg, arg, arg)); | 74 /// cost and we're done. |
| 76 /// | 75 /// |
| 77 /// The line splitter will try a number of different ways to split the argument | 76 /// We also keep running track of the best solution we've found so far that |
| 78 /// list to `first`. For each of those, it then has to try all of the different | 77 /// has the fewest overflow characters and the lowest cost. If no solution fits, |
| 79 /// ways it could split `third`. *However*, those two are unrelated. The way | 78 /// we'll use this one. |
| 80 /// you split `first` has no impact on how you split `third`. | 79 /// |
| 81 /// | 80 /// As a final escape hatch for pathologically nasty code, after trying some |
| 82 /// The precise way to state that is that as the line prefix grows *past* | 81 /// fixed maximum number of solve states, we just bail and return the best |
| 83 /// `first()` and its argument list, it contains *less* state. It discards the | 82 /// solution found so far. |
| 84 /// rule value it selected for the argument list rule since that rule doesn't | 83 /// |
| 85 /// appear anywhere in the suffix and can't affect it. Any expression nesting | 84 /// Even with the above algorithmic optimizations, complex code may still |
| 86 /// inside the argument list has been popped off by the time we get to `second`. | 85 /// require a lot of exploring to find an optimal solution. To make that fast, |
| 87 /// | 86 /// this code is carefully profiled and optimized. If you modify this, make |
| 88 /// We'll try every possible way to split `first()`'s argument list and then | 87 /// sure to test against the benchmark to ensure you don't regress performance. |
| 89 /// recursively solve the remainder of the line for each of those possible | |
| 90 /// splits. But each of those remainders will end up having identical prefixes. | |
| 91 /// For any given [LinePrefix], there is a single best solution for the suffix. | |
| 92 /// So, once we've found it, we can just reuse it the next time that same | |
| 93 /// prefix comes up. | |
| 94 /// | |
| 95 /// In other words, we add a memoization table, [_bestSplits]. It is a map from | |
| 96 /// [LinePrefix]es to [SplitSet]s. For each prefix, it stores the best set of | |
| 97 /// splits to use for the following suffix. With this, instead of exploring the | |
| 98 /// entire solution *tree*, which would be exponential, we just traverse the | |
| 99 /// solution *graph* where entire branches of the tree that are identical | |
| 100 /// collapse back together. | |
| 101 /// | |
| 102 /// The trick to making this work is storing the minimal amount of data in the | |
| 103 /// line prefix to *uniquely* identify a suffix while still collapsing as many | |
| 104 /// branches as possible. In practice, this means aggressively forgetting state | |
| 105 /// when the prefix grows. In particular, we do not need to keep track of | |
| 106 /// rule values we selected in the prefix if that rule does not affect anything | |
| 107 /// in the suffix. | |
| 108 /// | |
| 109 /// There are a couple of other smaller scale optimizations too. If we ever hit | |
| 110 /// a solution whose cost is zero, we know we won't do better, so we stop | |
| 111 /// searching. If we make it to the end of the chunk list and we haven't gone | |
| 112 /// past the end of the line, we know we don't need to split anywhere. | |
| 113 /// | |
| 114 /// But the memoization is the main thing that makes this tractable. | |
| 115 class LineSplitter { | 88 class LineSplitter { |
| 116 final LineWriter _writer; | 89 final LineWriter _writer; |
| 117 | 90 |
| 118 /// The list of chunks being split. | 91 /// The list of chunks being split. |
| 119 final List<Chunk> _chunks; | 92 final List<Chunk> _chunks; |
| 120 | 93 |
| 94 /// The set of soft rules whose values are being selected. |
| 95 final List<Rule> _rules; |
| 96 |
| 121 /// The number of characters of additional indentation to apply to each line. | 97 /// The number of characters of additional indentation to apply to each line. |
| 122 /// | 98 /// |
| 123 /// This is used when formatting blocks to get the output into the right | 99 /// This is used when formatting blocks to get the output into the right |
| 124 /// column based on where the block appears. | 100 /// column based on where the block appears. |
| 125 final int _blockIndentation; | 101 final int _blockIndentation; |
| 126 | 102 |
| 127 /// Memoization table for the best set of splits for the remainder of the | 103 /// The starting column of the first line. |
| 128 /// line following a given prefix. | 104 final int _firstLineIndent; |
| 129 final _bestSplits = <LinePrefix, SplitSet>{}; | 105 |
| 130 | 106 /// The list of solve states to explore further. |
| 131 /// The rules that appear in the first `n` chunks of the line where `n` is | 107 /// |
| 132 /// the index into the list. | 108 /// This is sorted lowest-cost first. This ensures that as soon as we find a |
| 133 final _prefixRules = <Set<Rule>>[]; | 109 /// solution that fits in the page, we know it will be the lowest cost one |
| 134 | 110 /// and can stop looking. |
| 135 /// The rules that appear after the first `n` chunks of the line where `n` is | 111 final _workList = new HeapPriorityQueue<SolveState>(); |
| 136 /// the index into the list. | 112 |
| 137 final _suffixRules = <Set<Rule>>[]; | 113 /// The lowest cost solution found so far. |
| 114 SolveState _bestSolution; |
| 138 | 115 |
| 139 /// Creates a new splitter for [_writer] that tries to fit [chunks] into the | 116 /// Creates a new splitter for [_writer] that tries to fit [chunks] into the |
| 140 /// page width. | 117 /// page width. |
| 141 LineSplitter(this._writer, this._chunks, this._blockIndentation); | 118 LineSplitter(this._writer, List<Chunk> chunks, int blockIndentation, |
| 142 | 119 int firstLineIndent, |
| 143 /// Convert the line to a [String] representation. | 120 {bool flushLeft: false}) |
| 144 /// | 121 : _chunks = chunks, |
| 145 /// It will determine how best to split it into multiple lines of output and | 122 // Collect the set of soft rules that we need to select values for. |
| 146 /// write the result to [writer]. | 123 _rules = chunks |
| 124 .map((chunk) => chunk.rule) |
| 125 .where((rule) => rule != null && rule is! HardSplitRule) |
| 126 .toSet() |
| 127 .toList(growable: false), |
| 128 _blockIndentation = blockIndentation, |
| 129 _firstLineIndent = flushLeft ? 0 : firstLineIndent + blockIndentation { |
| 130 // Store the rule's index in the rule so we can get from a chunk to a rule |
| 131 // index quickly. |
| 132 for (var i = 0; i < _rules.length; i++) { |
| 133 _rules[i].index = i; |
| 134 } |
| 135 } |
| 136 |
| 137 /// Determine the best way to split the chunks into lines that fit in the |
| 138 /// page, if possible. |
| 139 /// |
| 140 /// Returns a [SplitSet] that defines where each split occurs and the |
| 141 /// indentation of each line. |
| 147 /// | 142 /// |
| 148 /// [firstLineIndent] is the number of characters of whitespace to prefix the | 143 /// [firstLineIndent] is the number of characters of whitespace to prefix the |
| 149 /// first line of output with. | 144 /// first line of output with. |
| 150 SplitSolution apply(int firstLineIndent, {bool flushLeft}) { | 145 SplitSet apply() { |
| 146 // Start with a completely unbound, unsplit solution. |
| 147 _workList.add(new SolveState(this, new RuleSet(_rules.length))); |
| 148 |
| 149 var attempts = 0; |
| 150 while (!_workList.isEmpty) { |
| 151 var state = _workList.removeFirst(); |
| 152 |
| 153 if (state.isBetterThan(_bestSolution)) { |
| 154 _bestSolution = state; |
| 155 |
| 156 // Since we sort solutions by cost the first solution we find that |
| 157 // fits is the winner. |
| 158 if (_bestSolution.overflowChars == 0) break; |
| 159 } |
| 160 |
| 161 if (debug.traceSplitter) { |
| 162 var best = state == _bestSolution ? " (best)" : ""; |
| 163 debug.log("$state$best"); |
| 164 debug.dumpLines(_chunks, _firstLineIndent, state.splits); |
| 165 debug.log(); |
| 166 } |
| 167 |
| 168 if (attempts++ > _maxAttempts) break; |
| 169 |
| 170 // Try bumping the rule values for rules whose chunks are on long lines. |
| 171 state.expand(); |
| 172 } |
| 173 |
| 151 if (debug.traceSplitter) { | 174 if (debug.traceSplitter) { |
| 152 debug.log(debug.green("\nSplitting:")); | 175 debug.log("$_bestSolution (winner)"); |
| 153 debug.dumpChunks(0, _chunks); | 176 debug.dumpLines(_chunks, _firstLineIndent, _bestSolution.splits); |
| 154 debug.log(); | 177 debug.log(); |
| 155 } | 178 } |
| 156 | 179 |
| 157 // Ignore the trailing rule on the last chunk since it isn't used for | 180 return _bestSolution.splits; |
| 158 // anything. | 181 } |
| 159 var ruleChunks = _chunks.take(_chunks.length - 1); | 182 } |
| 160 | 183 |
| 161 // Pre-calculate the set of rules appear before and after each length. We | 184 /// A possibly incomplete solution in the line splitting search space. |
| 162 // use these frequently when creating [LinePrefix]es and they only depend | 185 /// |
| 163 // on the length, so we can cache them up front. | 186 /// A single [SolveState] binds some subset of the rules to values while |
| 164 for (var i = 0; i < _chunks.length; i++) { | 187 /// leaving the rest unbound. If every rule is bound, the solve state describes |
| 165 _prefixRules.add(ruleChunks.take(i).map((chunk) => chunk.rule).toSet()); | 188 /// a complete solution to the line splitting problem. Even if rules are |
| 166 _suffixRules.add(ruleChunks.skip(i).map((chunk) => chunk.rule).toSet()); | 189 /// unbound, a state can also usually be used as a solution by treating all |
| 167 } | 190 /// unbound rules as unsplit. (The usually is because a state that constrains |
| 168 | 191 /// an unbound rule to split can't be used with that rule unsplit.) |
| 169 var prefix = new LinePrefix(firstLineIndent + _blockIndentation, | 192 /// |
| 170 flushLeft: flushLeft); | 193 /// From a given solve state, we can explore the search tree to more refined |
| 171 var solution = new SplitSolution(prefix); | 194 /// solve states by producing new ones that add more bound rules to the current |
| 172 _tryChunkRuleValues(solution, prefix); | 195 /// state. |
| 173 return solution; | 196 class SolveState implements Comparable<SolveState> { |
| 174 } | 197 final LineSplitter _splitter; |
| 175 | 198 final RuleSet _ruleValues; |
| 176 /// Finds the best set of splits to apply to the remainder of the chunks | 199 |
| 177 /// following [prefix]. | 200 /// The unbound rules in this state that can be bound to produce new more |
| 178 /// | 201 /// refined states. |
| 179 /// This can only be called for a suffix that begins a new line. (In other | 202 /// |
| 180 /// words, the last chunk in the prefix cannot be unsplit.) | 203 /// Keeping this set small is the key to make the entire line splitter |
| 181 SplitSet _findBestSplits(LinePrefix prefix) { | 204 /// perform well. If we consider too make rules at each state, our |
| 182 // Use the memoized result if we have it. | 205 /// exploration of the solution space is too branchy and we waste time on |
| 183 if (_bestSplits.containsKey(prefix)) { | 206 /// dead end solutions. |
| 184 if (debug.traceSplitter) { | 207 /// |
| 185 debug.log("memoized splits for $prefix = ${_bestSplits[prefix]}"); | 208 /// Here is the key insight. The line splitter treats any unbound rule as |
| 186 } | 209 /// being unsplit. This means refining a solution always means taking a rule |
| 187 return _bestSplits[prefix]; | 210 /// that is unsplit and making it split. That monotonically increases the |
| 188 } | 211 /// cost, but may help fit the solution inside the page. |
| 189 | 212 /// |
| 190 if (debug.traceSplitter) { | 213 /// We want to keep the cost low, so the only reason to consider making a |
| 191 debug.log("find splits for $prefix"); | 214 /// rule split is if it reduces an overflowing line. It's also the case that |
| 192 debug.indent(); | 215 /// splitting an earlier rule will often reshuffle the rest of the line. |
| 193 } | 216 /// |
| 194 | 217 /// Taking that into account, the only rules we consider binding to extend a |
| 195 var solution = new SplitSolution(prefix); | 218 /// solve state are *unbound rules inside the first line that is overflowing*. |
| 196 _tryChunkRuleValues(solution, prefix); | 219 /// Even if a line has dozens of rules, this generally keeps the branching |
| 197 | 220 /// down to a few. It also means rules inside lines that already fit are |
| 198 if (debug.traceSplitter) { | 221 /// never touched. |
| 199 debug.unindent(); | 222 /// |
| 200 debug.log("best splits for $prefix = ${solution.splits}"); | 223 /// There is one other set of rules that go in here. Sometimes a bound rule |
| 201 } | 224 /// in the solve state constrains some other unbound rule to split. In that |
| 202 | 225 /// case, we also consider that active so we know to not leave it at zero. |
| 203 return _bestSplits[prefix] = solution.splits; | 226 final _liveRules = new Set<Rule>(); |
| 204 } | 227 |
| 205 | 228 /// The set of splits chosen for this state. |
| 206 /// Updates [solution] with the best rule value selection for the chunk | 229 SplitSet get splits => _splits; |
| 207 /// immediately following [prefix]. | 230 SplitSet _splits; |
| 208 void _tryChunkRuleValues(SplitSolution solution, LinePrefix prefix) { | 231 |
| 209 // If we made it to the end, this prefix can be solved without splitting | 232 /// The number of characters that do not fit inside the page with this set of |
| 210 // any chunks. | 233 /// splits. |
| 211 if (prefix.length == _chunks.length - 1) { | 234 int get overflowChars => _overflowChars; |
| 212 solution.update(this, new SplitSet()); | 235 int _overflowChars; |
| 213 return; | 236 |
| 214 } | 237 /// Whether we can treat this state as a complete solution by leaving its |
| 215 | 238 /// unbound rules unsplit. |
| 216 var chunk = _chunks[prefix.length]; | 239 /// |
| 217 | 240 /// This is generally true but will be false if the state contains any |
| 218 // See if we've already selected a value for the rule. | 241 /// unbound rules that are constrained to not be zero by other bound rules. |
| 219 var value = prefix.ruleValues[chunk.rule]; | 242 /// This avoids picking a solution that leaves those rules at zero when they |
| 220 | 243 /// aren't allowed to be. |
| 221 if (value == null) { | 244 bool _isComplete = true; |
| 222 // No, so try every possible value for the rule. | 245 |
| 223 for (value = 0; value < chunk.rule.numValues; value++) { | 246 SolveState(this._splitter, this._ruleValues) { |
| 224 _tryRuleValue(solution, prefix, value); | 247 _calculateSplits(); |
| 225 } | 248 _calculateCost(); |
| 226 } else if (value == -1) { | 249 } |
| 227 // A -1 "value" means, "any non-zero value". In other words, the rule has | 250 |
| 228 // to split somehow, but can split however it chooses. | 251 /// Orders this state relative to [other]. |
| 229 for (value = 1; value < chunk.rule.numValues; value++) { | 252 /// |
| 230 _tryRuleValue(solution, prefix, value); | 253 /// This is the best-first ordering that the [LineSplitter] uses in its |
| 231 } | 254 /// worklist. It prefers cheaper states even if they overflow because this |
| 232 } else { | 255 /// ensures it finds the best solution first as soon as it finds one that |
| 233 // Otherwise, it's constrained to a single value, so use it. | 256 /// fits in the page so it can early out. |
| 234 _tryRuleValue(solution, prefix, value); | 257 int compareTo(SolveState other) { |
| 235 } | 258 // TODO(rnystrom): It may be worth sorting by the estimated lowest number |
| 236 } | 259 // of overflow characters first. That doesn't help in cases where there is |
| 237 | 260 // a solution that fits, but may help in corner cases where there is no |
| 238 /// Updates [solution] with the best solution that can be found by setting | 261 // fitting solution. |
| 239 /// the chunk after [prefix]'s rule to [value]. | 262 |
| 240 void _tryRuleValue(SplitSolution solution, LinePrefix prefix, int value) { | 263 if (splits.cost != other.splits.cost) { |
| 241 // If we already have an unbeatable solution, don't bother trying this one. | 264 return splits.cost.compareTo(other.splits.cost); |
| 242 if (solution.cost == 0) return; | 265 } |
| 243 | 266 |
| 244 var chunk = _chunks[prefix.length]; | 267 if (overflowChars != other.overflowChars) { |
| 245 | 268 return overflowChars.compareTo(other.overflowChars); |
| 246 if (chunk.rule.isSplit(value, chunk)) { | 269 } |
| 247 // The chunk is splitting in an expression, so try all of the possible | 270 |
| 248 // nesting combinations. | 271 // Distinguish states based on the rule values just so that states with the |
| 249 var ruleValues = _advancePrefix(prefix, value); | 272 // same cost range but different rule values don't get considered identical |
| 250 var longerPrefixes = prefix.split(chunk, _blockIndentation, ruleValues); | 273 // by HeapPriorityQueue. |
| 251 for (var longerPrefix in longerPrefixes) { | 274 for (var rule in _splitter._rules) { |
| 252 _tryLongerPrefix(solution, prefix, longerPrefix); | 275 var value = _ruleValues.getValue(rule); |
| 253 | 276 var otherValue = other._ruleValues.getValue(rule); |
| 254 // If we find an unbeatable solution, don't try any more. | 277 |
| 255 if (solution.cost == 0) break; | 278 if (value != otherValue) return value.compareTo(otherValue); |
| 256 } | 279 } |
| 257 } else { | 280 |
| 258 // We didn't split here, so add this chunk and its rule value to the | 281 // If we get here, this state is identical to [other]. |
| 259 // prefix and continue on to the next. | 282 return 0; |
| 260 var extended = prefix.extend(_advancePrefix(prefix, value)); | 283 } |
| 261 _tryChunkRuleValues(solution, extended); | 284 |
| 262 } | 285 /// Returns `true` if this state is a better solution to use as the final |
| 263 } | 286 /// result than [other]. |
| 264 | 287 bool isBetterThan(SolveState other) { |
| 265 /// Updates [solution] with the solution for [prefix] assuming it uses | 288 // If this state contains an unbound rule that we know can't be left |
| 266 /// [longerPrefix] for the next chunk. | 289 // unsplit, we can't pick this as a solution. |
| 267 void _tryLongerPrefix( | 290 if (!_isComplete) return false; |
| 268 SplitSolution solution, LinePrefix prefix, LinePrefix longerPrefix) { | 291 |
| 269 var remaining = _findBestSplits(longerPrefix); | 292 // Anything is better than nothing. |
| 270 | 293 if (other == null) return true; |
| 271 // If it wasn't possible to split the suffix given this nesting stack, | 294 |
| 272 // skip it. | 295 // Prefer the solution that fits the most in the page. |
| 273 if (remaining == null) return; | 296 if (overflowChars != other.overflowChars) { |
| 274 | 297 return overflowChars < other.overflowChars; |
| 275 solution.update(this, remaining.add(prefix.length, longerPrefix.column)); | 298 } |
| 276 } | 299 |
| 277 | 300 // Otherwise, prefer the best cost. |
| 278 /// Determines the set of rule values for a new [LinePrefix] one chunk longer | 301 return splits.cost < other.splits.cost; |
| 279 /// than [prefix] whose rule on the new last chunk has [value]. | 302 } |
| 280 /// | 303 |
| 281 /// Returns a map of [Rule]s to values for those rules for the values that | 304 /// Enqueues more solve states to consider based on this one. |
| 282 /// span the prefix and suffix of the [LinePrefix]. | 305 /// |
| 283 Map<Rule, int> _advancePrefix(LinePrefix prefix, int value) { | 306 /// For each unbound rule in this state that occurred in the first long line, |
| 284 // Get the rules that appear in both in and after the new prefix. These are | 307 /// enqueue solve states that bind that rule to each value it can have and |
| 285 // the rules that already have values that the suffix needs to honor. | 308 /// bind all previous rules to zero. (In other words, try all subsolutions |
| 286 var prefixRules = _prefixRules[prefix.length + 1]; | 309 /// where that rule becomes the first new rule to split at.) |
| 287 var suffixRules = _suffixRules[prefix.length + 1]; | 310 void expand() { |
| 288 | 311 var unsplitRules = _ruleValues.clone(); |
| 289 var nextRule = _chunks[prefix.length].rule; | 312 |
| 290 var updatedValues = {}; | 313 // Walk down the rules looking for unbound ones to try. |
| 291 | 314 var triedRules = 0; |
| 292 for (var prefixRule in prefixRules) { | 315 for (var rule in _splitter._rules) { |
| 293 var ruleValue = | 316 if (_liveRules.contains(rule)) { |
| 294 prefixRule == nextRule ? value : prefix.ruleValues[prefixRule]; | 317 // We found one worth trying, so try all of its values. |
| 295 | 318 for (var value = 1; value < rule.numValues; value++) { |
| 296 if (suffixRules.contains(prefixRule)) { | 319 var boundRules = unsplitRules.clone(); |
| 297 // If the same rule appears in both the prefix and suffix, then preserve | 320 |
| 298 // its exact value. | 321 var mustSplitRules; |
| 299 updatedValues[prefixRule] = ruleValue; | 322 var valid = boundRules.tryBind(_splitter._rules, rule, value, (rule) { |
| 300 } | 323 if (mustSplitRules == null) mustSplitRules = []; |
| 301 | 324 mustSplitRules.add(rule); |
| 302 // If we haven't specified any value for this rule in the prefix, it | 325 }); |
| 303 // doesn't place any constraint on the suffix. | 326 |
| 304 if (ruleValue == null) continue; | 327 // Make sure we don't violate the constraints of the bound rules. |
| 305 | 328 if (!valid) continue; |
| 306 // Enforce the constraints between rules. | 329 |
| 307 for (var suffixRule in suffixRules) { | 330 var state = new SolveState(_splitter, boundRules); |
| 308 if (suffixRule == prefixRule) continue; | 331 |
| 309 | 332 // If some unbound rules are constrained to split, remember that. |
| 310 // See if the prefix rule's value constrains any values in the suffix. | 333 if (mustSplitRules != null) { |
| 311 var value = prefixRule.constrain(ruleValue, suffixRule); | 334 state._isComplete = false; |
| 312 | 335 state._liveRules.addAll(mustSplitRules); |
| 313 // Also consider the backwards case, where a later rule in the suffix | 336 } |
| 314 // constrains a rule in the prefix. | 337 |
| 315 if (value == null) { | 338 _splitter._workList.add(state); |
| 316 value = suffixRule.reverseConstrain(ruleValue, prefixRule); | |
| 317 } | 339 } |
| 318 | 340 |
| 319 if (value != null) { | 341 // Stop once we've tried all of the ones we can. |
| 320 updatedValues[prefixRule] = ruleValue; | 342 if (++triedRules == _liveRules.length) break; |
| 321 updatedValues[suffixRule] = value; | 343 } |
| 344 |
| 345 // Fill in previous unbound rules with zero. |
| 346 if (!_ruleValues.contains(rule)) { |
| 347 // Pass a dummy callback because zero will never fail. (If it would |
| 348 // have, that rule would already be bound to some other value.) |
| 349 if (!unsplitRules.tryBind(_splitter._rules, rule, 0, (_) {})) { |
| 350 break; |
| 322 } | 351 } |
| 323 } | 352 } |
| 324 } | 353 } |
| 325 | 354 } |
| 326 return updatedValues; | 355 |
| 356 /// Calculates the [SplitSet] for this solve state, assuming any unbound |
| 357 /// rules are set to zero. |
| 358 void _calculateSplits() { |
| 359 // Figure out which expression nesting levels got split and need to be |
| 360 // assigned columns. |
| 361 var usedNestingLevels = new Set(); |
| 362 for (var i = 0; i < _splitter._chunks.length - 1; i++) { |
| 363 var chunk = _splitter._chunks[i]; |
| 364 if (chunk.rule.isSplit(_getValue(chunk.rule), chunk)) { |
| 365 usedNestingLevels.add(chunk.nesting); |
| 366 chunk.nesting.clearTotalUsedIndent(); |
| 367 } |
| 368 } |
| 369 |
| 370 for (var nesting in usedNestingLevels) { |
| 371 nesting.refreshTotalUsedIndent(usedNestingLevels); |
| 372 } |
| 373 |
| 374 _splits = new SplitSet(_splitter._chunks.length); |
| 375 for (var i = 0; i < _splitter._chunks.length - 1; i++) { |
| 376 var chunk = _splitter._chunks[i]; |
| 377 if (chunk.rule.isSplit(_getValue(chunk.rule), chunk)) { |
| 378 var indent = 0; |
| 379 if (!chunk.flushLeftAfter) { |
| 380 // Add in the chunk's indent. |
| 381 indent = _splitter._blockIndentation + chunk.indent; |
| 382 |
| 383 // And any expression nesting. |
| 384 indent += chunk.nesting.totalUsedIndent; |
| 385 } |
| 386 |
| 387 _splits.add(i, indent); |
| 388 } |
| 389 } |
| 390 } |
| 391 |
| 392 /// Gets the value to use for [rule], either the bound value or `0` if it |
| 393 /// isn't bound. |
| 394 int _getValue(Rule rule) { |
| 395 if (rule is HardSplitRule) return 0; |
| 396 |
| 397 return _ruleValues.getValue(rule); |
| 327 } | 398 } |
| 328 | 399 |
| 329 /// Evaluates the cost (i.e. the relative "badness") of splitting the line | 400 /// Evaluates the cost (i.e. the relative "badness") of splitting the line |
| 330 /// into [lines] physical lines based on the current set of rules. | 401 /// into [lines] physical lines based on the current set of rules. |
| 331 int _evaluateCost(LinePrefix prefix, SplitSet splits) { | 402 void _calculateCost() { |
| 332 assert(splits != null); | 403 assert(_splits != null); |
| 333 | 404 |
| 334 // Calculate the length of each line and apply the cost of any spans that | 405 // Calculate the length of each line and apply the cost of any spans that |
| 335 // get split. | 406 // get split. |
| 336 var cost = 0; | 407 var cost = 0; |
| 337 var length = prefix.column; | 408 _overflowChars = 0; |
| 338 | 409 |
| 339 var splitRules = new Set(); | 410 var length = _splitter._firstLineIndent; |
| 340 | 411 |
| 341 endLine() { | 412 // The unbound rules in use by the current line. This will be null after |
| 342 // Punish lines that went over the length. We don't rule these out | 413 // the first long line has completed. |
| 343 // completely because it may be that the only solution still goes over | 414 var currentLineRules = []; |
| 344 // (for example with long string literals). | 415 |
| 345 if (length > _writer.pageWidth) { | 416 endLine(int end) { |
| 346 cost += (length - _writer.pageWidth) * Cost.overflowChar; | 417 // Track lines that went over the length. It is only rules contained in |
| 347 } | 418 // long lines that we may want to split. |
| 348 } | 419 if (length > _splitter._writer.pageWidth) { |
| 349 | 420 _overflowChars += length - _splitter._writer.pageWidth; |
| 421 |
| 422 // Only try rules that are in the first long line, since we know at |
| 423 // least one of them *will* be split. |
| 424 if (currentLineRules != null && currentLineRules.isNotEmpty) { |
| 425 _liveRules.addAll(currentLineRules); |
| 426 currentLineRules = null; |
| 427 } |
| 428 } else { |
| 429 // The line fit, so don't keep track of its rules. |
| 430 if (currentLineRules != null) { |
| 431 currentLineRules.clear(); |
| 432 } |
| 433 } |
| 434 } |
| 435 |
| 350 // The set of spans that contain chunks that ended up splitting. We store | 436 // The set of spans that contain chunks that ended up splitting. We store |
| 351 // these in a set so a span's cost doesn't get double-counted if more than | 437 // these in a set so a span's cost doesn't get double-counted if more than |
| 352 // one split occurs in it. | 438 // one split occurs in it. |
| 353 var splitSpans = new Set(); | 439 var splitSpans = new Set(); |
| 354 | 440 |
| 355 for (var i = prefix.length; i < _chunks.length; i++) { | 441 for (var i = 0; i < _splitter._chunks.length; i++) { |
| 356 var chunk = _chunks[i]; | 442 var chunk = _splitter._chunks[i]; |
| 357 | 443 |
| 358 length += chunk.text.length; | 444 length += chunk.text.length; |
| 359 | 445 |
| 360 if (i < _chunks.length - 1) { | 446 // Ignore the split after the last chunk. |
| 361 if (splits.shouldSplitAt(i)) { | 447 if (i == _splitter._chunks.length - 1) break; |
| 362 endLine(); | |
| 363 | 448 |
| 364 splitSpans.addAll(chunk.spans); | 449 if (_splits.shouldSplitAt(i)) { |
| 450 endLine(i); |
| 365 | 451 |
| 366 if (chunk.rule != null && !splitRules.contains(chunk.rule)) { | 452 splitSpans.addAll(chunk.spans); |
| 367 // Don't double-count rules if multiple splits share the same | |
| 368 // rule. | |
| 369 splitRules.add(chunk.rule); | |
| 370 cost += chunk.rule.cost; | |
| 371 } | |
| 372 | 453 |
| 373 // Include the cost of the nested block. | 454 // Include the cost of the nested block. |
| 374 if (chunk.blockChunks.isNotEmpty) { | 455 if (chunk.blockChunks.isNotEmpty) { |
| 375 cost += _writer.formatBlock(chunk, splits.getColumn(i)).cost; | 456 cost += |
| 376 } | 457 _splitter._writer.formatBlock(chunk, _splits.getColumn(i)).cost; |
| 458 } |
| 377 | 459 |
| 378 // Start the new line. | 460 // Start the new line. |
| 379 length = splits.getColumn(i); | 461 length = _splits.getColumn(i); |
| 380 } else { | 462 } else { |
| 381 if (chunk.spaceWhenUnsplit) length++; | 463 if (chunk.spaceWhenUnsplit) length++; |
| 382 | 464 |
| 383 // Include the nested block inline, if any. | 465 // Include the nested block inline, if any. |
| 384 length += chunk.unsplitBlockLength; | 466 length += chunk.unsplitBlockLength; |
| 467 |
| 468 // If we might be in the first overly long line, keep track of any |
| 469 // unbound rules we encounter. These are ones that we'll want to try to |
| 470 // bind to shorten the long line. |
| 471 if (currentLineRules != null && |
| 472 chunk.rule != null && |
| 473 !chunk.isHardSplit && |
| 474 !_ruleValues.contains(chunk.rule)) { |
| 475 currentLineRules.add(chunk.rule); |
| 385 } | 476 } |
| 386 } | 477 } |
| 387 } | 478 } |
| 388 | 479 |
| 480 // Add the costs for the rules that split. |
| 481 _ruleValues.forEach(_splitter._rules, (rule, value) { |
| 482 // A rule may be bound to zero if another rule constrains it to not split. |
| 483 if (value != 0) cost += rule.cost; |
| 484 }); |
| 485 |
| 389 // Add the costs for the spans containing splits. | 486 // Add the costs for the spans containing splits. |
| 390 for (var span in splitSpans) cost += span.cost; | 487 for (var span in splitSpans) cost += span.cost; |
| 391 | 488 |
| 392 // Finish the last line. | 489 // Finish the last line. |
| 393 endLine(); | 490 endLine(_splitter._chunks.length); |
| 394 | 491 |
| 395 return cost; | 492 _splits.setCost(cost); |
| 493 } |
| 494 |
| 495 String toString() { |
| 496 var buffer = new StringBuffer(); |
| 497 |
| 498 buffer.writeAll( |
| 499 _splitter._rules.map((rule) { |
| 500 var valueLength = "${rule.fullySplitValue}".length; |
| 501 |
| 502 var value = "?"; |
| 503 if (_ruleValues.contains(rule)) { |
| 504 value = "${_ruleValues.getValue(rule)}"; |
| 505 } |
| 506 |
| 507 value = value.padLeft(valueLength); |
| 508 if (_liveRules.contains(rule)) { |
| 509 value = debug.bold(value); |
| 510 } else { |
| 511 value = debug.gray(value); |
| 512 } |
| 513 |
| 514 return value; |
| 515 }), |
| 516 " "); |
| 517 |
| 518 buffer.write(" \$${splits.cost}"); |
| 519 |
| 520 if (overflowChars > 0) buffer.write(" (${overflowChars} over)"); |
| 521 if (!_isComplete) buffer.write(" (incomplete)"); |
| 522 if (splits == null) buffer.write(" invalid"); |
| 523 |
| 524 return buffer.toString(); |
| 396 } | 525 } |
| 397 } | 526 } |
| 398 | |
| 399 /// Keeps track of the best set of splits found so far for a suffix of some | |
| 400 /// prefix. | |
| 401 class SplitSolution { | |
| 402 /// The prefix whose suffix we are finding a solution for. | |
| 403 final LinePrefix _prefix; | |
| 404 | |
| 405 /// The best set of splits currently found. | |
| 406 SplitSet get splits => _bestSplits; | |
| 407 SplitSet _bestSplits; | |
| 408 | |
| 409 /// The lowest cost currently found. | |
| 410 int get cost => _lowestCost; | |
| 411 int _lowestCost; | |
| 412 | |
| 413 /// Whether a solution that fits within a page has been found yet. | |
| 414 bool get isAdequate => _lowestCost != null && _lowestCost < Cost.overflowChar; | |
| 415 | |
| 416 SplitSolution(this._prefix); | |
| 417 | |
| 418 /// Compares [splits] to the best solution found so far and keeps it if it's | |
| 419 /// better. | |
| 420 void update(LineSplitter splitter, SplitSet splits) { | |
| 421 var cost = splitter._evaluateCost(_prefix, splits); | |
| 422 | |
| 423 if (_lowestCost == null || cost < _lowestCost) { | |
| 424 _bestSplits = splits; | |
| 425 _lowestCost = cost; | |
| 426 } | |
| 427 | |
| 428 if (debug.traceSplitter) { | |
| 429 var best = _bestSplits == splits ? " (best)" : ""; | |
| 430 debug.log(debug.gray("$_prefix $splits \$$cost$best")); | |
| 431 debug.dumpLines(splitter._chunks, _prefix, splits); | |
| 432 debug.log(); | |
| 433 } | |
| 434 } | |
| 435 } | |
| 436 | |
| 437 /// An immutable, persistent set of enabled split [Chunk]s. | |
| 438 /// | |
| 439 /// For each chunk, this tracks if it has been split and, if so, what the | |
| 440 /// chosen column is for the following line. | |
| 441 /// | |
| 442 /// Internally, this uses a sparse parallel list where each element corresponds | |
| 443 /// to the column of the chunk at that index in the chunk list, or `null` if | |
| 444 /// there is no active split there. This had about a 10% perf improvement over | |
| 445 /// using a [Set] of splits or a persistent linked list of split index/indent | |
| 446 /// pairs. | |
| 447 class SplitSet { | |
| 448 List<int> _columns; | |
| 449 | |
| 450 /// Creates a new empty split set. | |
| 451 SplitSet() : this._(const []); | |
| 452 | |
| 453 SplitSet._(this._columns); | |
| 454 | |
| 455 /// Returns a new [SplitSet] containing the union of this one and the split | |
| 456 /// at [index] with next line starting at [column]. | |
| 457 SplitSet add(int index, int column) { | |
| 458 var newIndents = new List(math.max(index + 1, _columns.length)); | |
| 459 newIndents.setAll(0, _columns); | |
| 460 newIndents[index] = column; | |
| 461 | |
| 462 return new SplitSet._(newIndents); | |
| 463 } | |
| 464 | |
| 465 /// Returns `true` if the chunk at [splitIndex] should be split. | |
| 466 bool shouldSplitAt(int index) => | |
| 467 index < _columns.length && _columns[index] != null; | |
| 468 | |
| 469 /// Gets the zero-based starting column for the chunk at [index]. | |
| 470 int getColumn(int index) => _columns[index]; | |
| 471 | |
| 472 String toString() { | |
| 473 var result = []; | |
| 474 for (var i = 0; i < _columns.length; i++) { | |
| 475 if (_columns[i] != null) { | |
| 476 result.add("$i:${_columns[i]}"); | |
| 477 } | |
| 478 } | |
| 479 | |
| 480 return result.join(" "); | |
| 481 } | |
| 482 } | |
| OLD | NEW |