|
|
DescriptionImprove phi hinting heuristics.
The basic intention is to try to remove unnecessary moves caused by
hints in otherwise empty blocks. Roughly:
Before After
-----------------------------------------------------------
B0: add x1, ... B0: add x1, ...
b.ne B2 b.eq B3
B1: mov x0, x1 B1: [empty]
b B3
B2: add x0, x1, ... B2: add x1, x1, ...
B3: phi(B1,B2) in x0 B3: phi(B0,B1) in x1
Hinting is also improved in cases where one of the inputs is already
allocated. This occurs commonly on architectures with instructions which
write into fixed registers, for example.
BUG=
Committed: https://crrev.com/2e360cd83f60ecfa9eaa6d0648450ef937a15a5f
Cr-Commit-Position: refs/heads/master@{#40498}
Patch Set 1 #
Total comments: 10
Patch Set 2 : Rebase only. #Patch Set 3 : Fix regressions on the compile-time benchmarks. #
Total comments: 11
Patch Set 4 : Review fixes. #Messages
Total messages: 24 (4 generated)
jacob.bramley@arm.com changed reviewers: + bmeurer@chromium.org, danno@chromium.org, dcarney@chromium.org, jarin@chromium.org, mtrofin@chromium.org
Looks generally good, I'm curious about the perf impact (hence started the try perf runs), also on the comment on line 2264. https://codereview.chromium.org/2125463002/diff/1/src/compiler/register-alloc... File src/compiler/register-allocator.cc (right): https://codereview.chromium.org/2125463002/diff/1/src/compiler/register-alloc... src/compiler/register-allocator.cc:2208: // jump). This helps to avoid otherwise useless move + jump sequences. Nit: the moves wouldn't be useless, I think you're trying to say that if the hint comes from there and we succeed in eliding the move ("the hint is taken"), then maybe the jump threader can elide the jump, too? If so, mind expanding the sentence for the benefit of the reader? https://codereview.chromium.org/2125463002/diff/1/src/compiler/register-alloc... src/compiler/register-allocator.cc:2228: // Phis are assigned in an END move after the last instruction in each I think you meant to say "in the END position of the last instruction". https://codereview.chromium.org/2125463002/diff/1/src/compiler/register-alloc... src/compiler/register-allocator.cc:2250: predecessor_hint_preference += kNotDeferredBlockPreference; Nit: replace with bitwise or, I think that'd make it more clear. https://codereview.chromium.org/2125463002/diff/1/src/compiler/register-alloc... src/compiler/register-allocator.cc:2264: // from an allocated (or explicit) operand. This happens if the previous instruction (the one before predecessor_instr) happens to output to a fixed register, which then participates in the phi. Is this a common pattern? Or could you comment why not walk up the instruction sequence, finding the def? On that, I think you could avoid the search for the def through the moves by fetching the live range and picking its first use position.
Performance depends on the platform. For ARM and ARM64, we get +3% on a couple of benchmarks (depending on exactly which core). Small cores seem to benefit more, but the patch seems to be beneficial regardless. https://codereview.chromium.org/2125463002/diff/1/src/compiler/register-alloc... File src/compiler/register-allocator.cc (right): https://codereview.chromium.org/2125463002/diff/1/src/compiler/register-alloc... src/compiler/register-allocator.cc:2208: // jump). This helps to avoid otherwise useless move + jump sequences. On 2016/07/04 16:32:15, Mircea Trofin wrote: > Nit: the moves wouldn't be useless, I think you're trying to say that if the > hint comes from there and we succeed in eliding the move ("the hint is taken"), > then maybe the jump threader can elide the jump, too? > > If so, mind expanding the sentence for the benefit of the reader? Yes, that's what I meant. I'll re-word it. https://codereview.chromium.org/2125463002/diff/1/src/compiler/register-alloc... src/compiler/register-allocator.cc:2228: // Phis are assigned in an END move after the last instruction in each On 2016/07/04 16:32:15, Mircea Trofin wrote: > I think you meant to say "in the END position of the last instruction". Acknowledged. https://codereview.chromium.org/2125463002/diff/1/src/compiler/register-alloc... src/compiler/register-allocator.cc:2250: predecessor_hint_preference += kNotDeferredBlockPreference; On 2016/07/04 16:32:15, Mircea Trofin wrote: > Nit: replace with bitwise or, I think that'd make it more clear. I'd half planned to support an arithmetic score, though it didn't turn out that way in the end. (It's co-incidental that the values are powers of two.) Nonetheless, I thought that addition would be more flexible for future modifications. https://codereview.chromium.org/2125463002/diff/1/src/compiler/register-alloc... src/compiler/register-allocator.cc:2264: // from an allocated (or explicit) operand. On 2016/07/04 16:32:15, Mircea Trofin wrote: > This happens if the previous instruction (the one before predecessor_instr) > happens to output to a fixed > register, which then participates in the phi. Is this a common pattern? Reasonably common. Without this, the empty-block check causes regressions in the cases you described. > Or could you comment why not walk > up the instruction sequence, finding the def? I actually considered it, but I thought it would be too expensive to be worthwhile in practice. (I examined a lot of benchmarks and didn't see it occur, yet we'd do the search for every case that would otherwise not hit.) I think that the negative case is the common case, and needs to be the fast. > On that, I think you could avoid the search for the def through the moves by > fetching the live range and picking > its first use position. If that is likely to be fast in the negative case, it would probably be wise. I didn't realise it was feasible here. Otherwise, I'll add a comment to explain why we don't look beyond the last instruction.
https://codereview.chromium.org/2125463002/diff/1/src/compiler/register-alloc... File src/compiler/register-allocator.cc (right): https://codereview.chromium.org/2125463002/diff/1/src/compiler/register-alloc... src/compiler/register-allocator.cc:2250: predecessor_hint_preference += kNotDeferredBlockPreference; On 2016/07/04 16:58:54, jbramley wrote: > On 2016/07/04 16:32:15, Mircea Trofin wrote: > > Nit: replace with bitwise or, I think that'd make it more clear. > > I'd half planned to support an arithmetic score, though it didn't turn out that > way in the end. (It's co-incidental that the values are powers of two.) > Nonetheless, I thought that addition would be more flexible for future > modifications. Oh - the fact that they are powers of 2 threw me off. Mind adding a comment saying "they just happen they are" or something like that, to avoid confusion later? Thanks! https://codereview.chromium.org/2125463002/diff/1/src/compiler/register-alloc... src/compiler/register-allocator.cc:2264: // from an allocated (or explicit) operand. On 2016/07/04 16:58:54, jbramley wrote: > On 2016/07/04 16:32:15, Mircea Trofin wrote: > > This happens if the previous instruction (the one before predecessor_instr) > > happens to output to a fixed > > register, which then participates in the phi. Is this a common pattern? > > Reasonably common. Without this, the empty-block check causes regressions in the > cases you described. > > > Or could you comment why not walk > > up the instruction sequence, finding the def? > > I actually considered it, but I thought it would be too expensive to be > worthwhile in practice. (I examined a lot of benchmarks and didn't see it occur, > yet we'd do the search for every case that would otherwise not hit.) I think > that the negative case is the common case, and needs to be the fast. > > > On that, I think you could avoid the search for the def through the moves by > > fetching the live range and picking > > its first use position. > > If that is likely to be fast in the negative case, it would probably be wise. I > didn't realise it was feasible here. Otherwise, I'll add a comment to explain > why we don't look beyond the last instruction. It may be worth investigating. The compile benchmark shows a 12-13% regression. What I'm talking about is O(1): TopLevelLiveRange* top =data()->live_ranges()[whatever the vreg] top->first_pos() (etc, etc)
On 2016/07/04 21:20:35, Mircea Trofin wrote: > It may be worth investigating. The compile benchmark shows a 12-13% regression. Oh, wow. Yes, I need to look into that. How can I get hold of the benchmark? I don't see any regression like that on the benchmarks that I have run. I tried to measure the compile-time cost using --turbo-stats, but I didn't find any regressions there either. I also can't see the try perf runs, at least not linked from this review. Have I missed something obvious?
On 2016/07/04 21:20:35, Mircea Trofin wrote: > > > On that, I think you could avoid the search for the def through the moves by > > > fetching the live range and picking > > > its first use position. > > > > If that is likely to be fast in the negative case, it would probably be wise. > I > > didn't realise it was feasible here. Otherwise, I'll add a comment to explain > > why we don't look beyond the last instruction. > > It may be worth investigating. The compile benchmark shows a 12-13% regression. > What I'm talking about is O(1): > > TopLevelLiveRange* top =data()->live_ranges()[whatever the vreg] > top->first_pos() (etc, etc) I'm not complete sure about what these data structures should look like at this stage, but it appears that the live ranges for the phi inputs are often empty at this point, such that top->first_pos() returns NULL and top->Print() prints an empty range. I'm not sure how, to be honest; I had assumed that the live ranges would all have been calculated by this point. Indeed, none of the cases where the inputs _do_ have live ranges appear to have allocated operands. Is there any way around this?
On 2016/07/19 16:46:39, jbramley wrote: > On 2016/07/04 21:20:35, Mircea Trofin wrote: > > > > On that, I think you could avoid the search for the def through the moves > by > > > > fetching the live range and picking > > > > its first use position. > > > > > > If that is likely to be fast in the negative case, it would probably be > wise. > > I > > > didn't realise it was feasible here. Otherwise, I'll add a comment to > explain > > > why we don't look beyond the last instruction. > > > > It may be worth investigating. The compile benchmark shows a 12-13% > regression. > > What I'm talking about is O(1): > > > > TopLevelLiveRange* top =data()->live_ranges()[whatever the vreg] > > top->first_pos() (etc, etc) > > I'm not complete sure about what these data structures should look like at this > stage, but it appears that the live ranges for the phi inputs are often empty at > this point, such that top->first_pos() returns NULL and top->Print() prints an > empty range. I'm not sure how, to be honest; I had assumed that the live ranges > would all have been calculated by this point. Indeed, none of the cases where > the inputs _do_ have live ranges appear to have allocated operands. Is there any > way around this? That is surprising. I'd have to take a look, would it be OK if it waited a week, until I'm back from my vacation? Just a thoght-might the case when the input is empty/null be a case where the input is a fixed range?
On 2016/07/19 16:52:40, Mircea Trofin wrote: > On 2016/07/19 16:46:39, jbramley wrote: > > On 2016/07/04 21:20:35, Mircea Trofin wrote: > > > > > On that, I think you could avoid the search for the def through the > moves > > by > > > > > fetching the live range and picking > > > > > its first use position. > > > > > > > > If that is likely to be fast in the negative case, it would probably be > > wise. > > > I > > > > didn't realise it was feasible here. Otherwise, I'll add a comment to > > explain > > > > why we don't look beyond the last instruction. > > > > > > It may be worth investigating. The compile benchmark shows a 12-13% > > regression. > > > What I'm talking about is O(1): > > > > > > TopLevelLiveRange* top =data()->live_ranges()[whatever the vreg] > > > top->first_pos() (etc, etc) > > > > I'm not complete sure about what these data structures should look like at > this > > stage, but it appears that the live ranges for the phi inputs are often empty > at > > this point, such that top->first_pos() returns NULL and top->Print() prints an > > empty range. I'm not sure how, to be honest; I had assumed that the live > ranges > > would all have been calculated by this point. Indeed, none of the cases where > > the inputs _do_ have live ranges appear to have allocated operands. Is there > any > > way around this? > > That is surprising. I'd have to take a look, would it be OK if it waited a week, > until I'm back from my vacation? Yes, of course! > Just a thoght-might the case when the input is empty/null be a case where the > input is a fixed range? Possibly, though I'm not quite sure how I'd tell. I'll try to investigate further.
On 2016/07/19 16:56:14, jbramley wrote: > On 2016/07/19 16:52:40, Mircea Trofin wrote: > > On 2016/07/19 16:46:39, jbramley wrote: > > > On 2016/07/04 21:20:35, Mircea Trofin wrote: > > > > > > On that, I think you could avoid the search for the def through the > > moves > > > by > > > > > > fetching the live range and picking > > > > > > its first use position. > > > > > > > > > > If that is likely to be fast in the negative case, it would probably be > > > wise. > > > > I > > > > > didn't realise it was feasible here. Otherwise, I'll add a comment to > > > explain > > > > > why we don't look beyond the last instruction. > > > > > > > > It may be worth investigating. The compile benchmark shows a 12-13% > > > regression. > > > > What I'm talking about is O(1): > > > > > > > > TopLevelLiveRange* top =data()->live_ranges()[whatever the vreg] > > > > top->first_pos() (etc, etc) > > > > > > I'm not complete sure about what these data structures should look like at > > this > > > stage, but it appears that the live ranges for the phi inputs are often > empty > > at > > > this point, such that top->first_pos() returns NULL and top->Print() prints > an > > > empty range. I'm not sure how, to be honest; I had assumed that the live > > ranges > > > would all have been calculated by this point. Indeed, none of the cases > where > > > the inputs _do_ have live ranges appear to have allocated operands. Is there > > any > > > way around this? > > > > That is surprising. I'd have to take a look, would it be OK if it waited a > week, > > until I'm back from my vacation? > > Yes, of course! > > > Just a thoght-might the case when the input is empty/null be a case where the > > input is a fixed range? > > Possibly, though I'm not quite sure how I'd tell. I'll try to investigate > further. Range-> IsFixed() Also, its vreg would be negative.
On 2016/07/19 16:57:20, Mircea Trofin wrote: > On 2016/07/19 16:56:14, jbramley wrote: > > On 2016/07/19 16:52:40, Mircea Trofin wrote: > > > On 2016/07/19 16:46:39, jbramley wrote: > > > > On 2016/07/04 21:20:35, Mircea Trofin wrote: > > > > > > > On that, I think you could avoid the search for the def through the > > > moves > > > > by > > > > > > > fetching the live range and picking > > > > > > > its first use position. > > > > > > > > > > > > If that is likely to be fast in the negative case, it would probably > be > > > > wise. > > > > > I > > > > > > didn't realise it was feasible here. Otherwise, I'll add a comment to > > > > explain > > > > > > why we don't look beyond the last instruction. > > > > > > > > > > It may be worth investigating. The compile benchmark shows a 12-13% > > > > regression. > > > > > What I'm talking about is O(1): > > > > > > > > > > TopLevelLiveRange* top =data()->live_ranges()[whatever the vreg] > > > > > top->first_pos() (etc, etc) > > > > > > > > I'm not complete sure about what these data structures should look like at > > > this > > > > stage, but it appears that the live ranges for the phi inputs are often > > empty > > > at > > > > this point, such that top->first_pos() returns NULL and top->Print() > prints > > an > > > > empty range. I'm not sure how, to be honest; I had assumed that the live > > > ranges > > > > would all have been calculated by this point. Indeed, none of the cases > > where > > > > the inputs _do_ have live ranges appear to have allocated operands. Is > there > > > any > > > > way around this? > > > > > > That is surprising. I'd have to take a look, would it be OK if it waited a > > week, > > > until I'm back from my vacation? > > > > Yes, of course! > > > > > Just a thoght-might the case when the input is empty/null be a case where > the > > > input is a fixed range? > > > > Possibly, though I'm not quite sure how I'd tell. I'll try to investigate > > further. > > Range-> IsFixed() > > Also, its vreg would be negative. The vregs I've been examining haven't been negative, so I suppose that rules out fixed ranges.
On 2016/07/19 17:00:10, jbramley wrote: > On 2016/07/19 16:57:20, Mircea Trofin wrote: > > On 2016/07/19 16:56:14, jbramley wrote: > > > On 2016/07/19 16:52:40, Mircea Trofin wrote: > > > > On 2016/07/19 16:46:39, jbramley wrote: > > > > > On 2016/07/04 21:20:35, Mircea Trofin wrote: > > > > > > > > On that, I think you could avoid the search for the def through > the > > > > moves > > > > > by > > > > > > > > fetching the live range and picking > > > > > > > > its first use position. > > > > > > > > > > > > > > If that is likely to be fast in the negative case, it would probably > > be > > > > > wise. > > > > > > I > > > > > > > didn't realise it was feasible here. Otherwise, I'll add a comment > to > > > > > explain > > > > > > > why we don't look beyond the last instruction. > > > > > > > > > > > > It may be worth investigating. The compile benchmark shows a 12-13% > > > > > regression. > > > > > > What I'm talking about is O(1): > > > > > > > > > > > > TopLevelLiveRange* top =data()->live_ranges()[whatever the vreg] > > > > > > top->first_pos() (etc, etc) > > > > > > > > > > I'm not complete sure about what these data structures should look like > at > > > > this > > > > > stage, but it appears that the live ranges for the phi inputs are often > > > empty > > > > at > > > > > this point, such that top->first_pos() returns NULL and top->Print() > > prints > > > an > > > > > empty range. I'm not sure how, to be honest; I had assumed that the live > > > > ranges > > > > > would all have been calculated by this point. Indeed, none of the cases > > > where > > > > > the inputs _do_ have live ranges appear to have allocated operands. Is > > there > > > > any > > > > > way around this? > > > > > > > > That is surprising. I'd have to take a look, would it be OK if it waited a > > > week, > > > > until I'm back from my vacation? > > > > > > Yes, of course! > > > > > > > Just a thoght-might the case when the input is empty/null be a case where > > the > > > > input is a fixed range? > > > > > > Possibly, though I'm not quite sure how I'd tell. I'll try to investigate > > > further. > > > > Range-> IsFixed() > > > > Also, its vreg would be negative. > > The vregs I've been examining haven't been negative, so I suppose that rules out > fixed ranges. True. Let's chat next week then!
On 2016/07/19 17:03:29, Mircea Trofin wrote: > On 2016/07/19 17:00:10, jbramley wrote: > > On 2016/07/19 16:57:20, Mircea Trofin wrote: > > > On 2016/07/19 16:56:14, jbramley wrote: > > > > On 2016/07/19 16:52:40, Mircea Trofin wrote: > > > > > On 2016/07/19 16:46:39, jbramley wrote: > > > > > > On 2016/07/04 21:20:35, Mircea Trofin wrote: > > > > > > > > > On that, I think you could avoid the search for the def through > > the > > > > > moves > > > > > > by > > > > > > > > > fetching the live range and picking > > > > > > > > > its first use position. > > > > > > > > > > > > > > > > If that is likely to be fast in the negative case, it would > probably > > > be > > > > > > wise. > > > > > > > I > > > > > > > > didn't realise it was feasible here. Otherwise, I'll add a comment > > to > > > > > > explain > > > > > > > > why we don't look beyond the last instruction. > > > > > > > > > > > > > > It may be worth investigating. The compile benchmark shows a 12-13% > > > > > > regression. > > > > > > > What I'm talking about is O(1): > > > > > > > > > > > > > > TopLevelLiveRange* top =data()->live_ranges()[whatever the vreg] > > > > > > > top->first_pos() (etc, etc) > > > > > > > > > > > > I'm not complete sure about what these data structures should look > like > > at > > > > > this > > > > > > stage, but it appears that the live ranges for the phi inputs are > often > > > > empty > > > > > at > > > > > > this point, such that top->first_pos() returns NULL and top->Print() > > > prints > > > > an > > > > > > empty range. I'm not sure how, to be honest; I had assumed that the > live > > > > > ranges > > > > > > would all have been calculated by this point. Indeed, none of the > cases > > > > where > > > > > > the inputs _do_ have live ranges appear to have allocated operands. Is > > > there > > > > > any > > > > > > way around this? > > > > > > > > > > That is surprising. I'd have to take a look, would it be OK if it waited > a > > > > week, > > > > > until I'm back from my vacation? > > > > > > > > Yes, of course! > > > > > > > > > Just a thoght-might the case when the input is empty/null be a case > where > > > the > > > > > input is a fixed range? > > > > > > > > Possibly, though I'm not quite sure how I'd tell. I'll try to investigate > > > > further. > > > > > > Range-> IsFixed() > > > > > > Also, its vreg would be negative. > > > > The vregs I've been examining haven't been negative, so I suppose that rules > out > > fixed ranges. > > True. Let's chat next week then! I took a look - the problem is that we currently call LiveRangeBuilder::ProcessPhis as we're building the live ranges, which is why you're seeing the live ranges of phi input operands as empty. I suppose you could pull out of the loop the hinting part of phi processing.
On 2016/07/25 16:22:45, Mircea Trofin wrote: > I took a look - the problem is that we currently call > LiveRangeBuilder::ProcessPhis as we're building the live ranges, > which is why you're seeing the live ranges of phi input operands as empty. I > suppose you could pull out of the loop > the hinting part of phi processing. I tried pulling it out of the loop but it proved to be more complicated than I expected. Phases such as ProcessInstructions rely on the hinting that ProcessPhis applies, so pulling it out doesn't seem to be feasible, unfortunately. That said, this loop doesn't seem to be particularly hot in practice (according to perf). The regression on the compile-time benchmark occurred because it has some very large switch statements. These result in phis with a huge number of inputs, so we iterate through the loop many times. As the number of phi inputs grows, the compile time increases, but also the benefit of this optimisation drops. A simple and effective fix is to limit the number of iterations that we look at. This eliminates the regression on the compile-time benchmark and improves some other benchmarks too. This patch implements this fix and also addresses your other comments. Thanks, Jacob
some nits, lgtm otherwise. https://codereview.chromium.org/2125463002/diff/40001/src/compiler/register-a... File src/compiler/register-allocator.cc (right): https://codereview.chromium.org/2125463002/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:2185: int predecessor_limit = 2; static const int https://codereview.chromium.org/2125463002/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:2216: // heuristics. It is co-incidental that they are currently powers of two. could you expand the comment to explain what a non-trivial heuristic is? https://codereview.chromium.org/2125463002/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:2241: // called. Could you add a TODO (v8) here, perhaps we find a way to avoid this. It should also point back to the predecessor_limit not being needed, should we find such an alternative.
Thanks for the review! https://codereview.chromium.org/2125463002/diff/40001/src/compiler/register-a... File src/compiler/register-allocator.cc (right): https://codereview.chromium.org/2125463002/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:2185: int predecessor_limit = 2; On 2016/10/19 23:00:41, Mircea Trofin wrote: > static const int It's decremented in the loop so it can't be const. (I couldn't think of a name that would make this clearer.) If you prefer I could make this a static const and then use a separate loop counter. https://codereview.chromium.org/2125463002/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:2216: // heuristics. It is co-incidental that they are currently powers of two. On 2016/10/19 23:00:41, Mircea Trofin wrote: > could you expand the comment to explain what a non-trivial heuristic is? "Non-trivial" was probably the wrong phrase; it's just additive rather than combinatorial. Perhaps it would be simpler to just say this: "The scores are combined by addition. It is co-incidental that they are currently powers of two." Alternatively, would you prefer if I just combined them with a bitwise OR, as you originally suggested? It doesn't make any functional difference for these values and if that's more intuitive then it's probably the right thing to do. https://codereview.chromium.org/2125463002/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:2241: // called. On 2016/10/19 23:00:41, Mircea Trofin wrote: > Could you add a TODO (v8) here, perhaps we find a way to avoid this. It should > also point back to the predecessor_limit not being needed, should we find such > an alternative. Done. I think predecessor_limit would still be needed, though; according to perf, it is the first (and original) loop ("// Look up the predecessor instruction") that is hot on the compile-time benchmark.
https://codereview.chromium.org/2125463002/diff/40001/src/compiler/register-a... File src/compiler/register-allocator.cc (right): https://codereview.chromium.org/2125463002/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:2185: int predecessor_limit = 2; On 2016/10/20 10:16:12, jbramley wrote: > On 2016/10/19 23:00:41, Mircea Trofin wrote: > > static const int > > It's decremented in the loop so it can't be const. (I couldn't think of a name > that would make this clearer.) If you prefer I could make this a static const > and then use a separate loop counter. I think the "2" could be a constant, but perhaps it's fine as-is, especially given the comment above. Fine to leave as-is. https://codereview.chromium.org/2125463002/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:2216: // heuristics. It is co-incidental that they are currently powers of two. On 2016/10/20 10:16:12, jbramley wrote: > On 2016/10/19 23:00:41, Mircea Trofin wrote: > > could you expand the comment to explain what a non-trivial heuristic is? > > "Non-trivial" was probably the wrong phrase; it's just additive rather than > combinatorial. Perhaps it would be simpler to just say this: "The scores are > combined by addition. It is co-incidental that they are currently powers of > two." > > Alternatively, would you prefer if I just combined them with a bitwise OR, as > you originally suggested? It doesn't make any functional difference for these > values and if that's more intuitive then it's probably the right thing to do. I think it's more intuitive to go with the bitwise or. https://codereview.chromium.org/2125463002/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:2241: // called. On 2016/10/20 10:16:12, jbramley wrote: > On 2016/10/19 23:00:41, Mircea Trofin wrote: > > Could you add a TODO (v8) here, perhaps we find a way to avoid this. It should > > also point back to the predecessor_limit not being needed, should we find such > > an alternative. > > Done. I think predecessor_limit would still be needed, though; according to > perf, it is the first (and original) loop ("// Look up the predecessor > instruction") that is hot on the compile-time benchmark. Acknowledged.
https://codereview.chromium.org/2125463002/diff/40001/src/compiler/register-a... File src/compiler/register-allocator.cc (right): https://codereview.chromium.org/2125463002/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:2185: int predecessor_limit = 2; On 2016/10/20 16:41:11, Mircea Trofin wrote: > On 2016/10/20 10:16:12, jbramley wrote: > > On 2016/10/19 23:00:41, Mircea Trofin wrote: > > > static const int > > > > It's decremented in the loop so it can't be const. (I couldn't think of a name > > that would make this clearer.) If you prefer I could make this a static const > > and then use a separate loop counter. > > I think the "2" could be a constant, but perhaps it's fine as-is, especially > given the comment above. Fine to leave as-is. Acknowledged. https://codereview.chromium.org/2125463002/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:2216: // heuristics. It is co-incidental that they are currently powers of two. On 2016/10/20 16:41:11, Mircea Trofin wrote: > On 2016/10/20 10:16:12, jbramley wrote: > > On 2016/10/19 23:00:41, Mircea Trofin wrote: > > > could you expand the comment to explain what a non-trivial heuristic is? > > > > "Non-trivial" was probably the wrong phrase; it's just additive rather than > > combinatorial. Perhaps it would be simpler to just say this: "The scores are > > combined by addition. It is co-incidental that they are currently powers of > > two." > > > > Alternatively, would you prefer if I just combined them with a bitwise OR, as > > you originally suggested? It doesn't make any functional difference for these > > values and if that's more intuitive then it's probably the right thing to do. > > I think it's more intuitive to go with the bitwise or. Done.
The CQ bit was checked by jacob.bramley@arm.com
The patchset sent to the CQ was uploaded after l-g-t-m from mtrofin@chromium.org Link to the patchset: https://codereview.chromium.org/2125463002/#ps60001 (title: "Review fixes.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Committed patchset #4 (id:60001)
Message was sent while issue was closed.
Description was changed from ========== Improve phi hinting heuristics. The basic intention is to try to remove unnecessary moves caused by hints in otherwise empty blocks. Roughly: Before After ----------------------------------------------------------- B0: add x1, ... B0: add x1, ... b.ne B2 b.eq B3 B1: mov x0, x1 B1: [empty] b B3 B2: add x0, x1, ... B2: add x1, x1, ... B3: phi(B1,B2) in x0 B3: phi(B0,B1) in x1 Hinting is also improved in cases where one of the inputs is already allocated. This occurs commonly on architectures with instructions which write into fixed registers, for example. BUG= ========== to ========== Improve phi hinting heuristics. The basic intention is to try to remove unnecessary moves caused by hints in otherwise empty blocks. Roughly: Before After ----------------------------------------------------------- B0: add x1, ... B0: add x1, ... b.ne B2 b.eq B3 B1: mov x0, x1 B1: [empty] b B3 B2: add x0, x1, ... B2: add x1, x1, ... B3: phi(B1,B2) in x0 B3: phi(B0,B1) in x1 Hinting is also improved in cases where one of the inputs is already allocated. This occurs commonly on architectures with instructions which write into fixed registers, for example. BUG= Committed: https://crrev.com/2e360cd83f60ecfa9eaa6d0648450ef937a15a5f Cr-Commit-Position: refs/heads/master@{#40498} ==========
Message was sent while issue was closed.
Patchset 4 (id:??) landed as https://crrev.com/2e360cd83f60ecfa9eaa6d0648450ef937a15a5f Cr-Commit-Position: refs/heads/master@{#40498} |