Issue 2125463002: Improve phi hinting heuristics.

jbramley

jacob.bramley@arm.com changed reviewers: + bmeurer@chromium.org, danno@chromium.org, dcarney@chromium.org, jarin@chromium.org, mtrofin@chromium.org

4 years, 5 months ago (2016-07-04 14:50:45 UTC) #1

Mircea Trofin

Looks generally good, I'm curious about the perf impact (hence started the try perf runs), ...

4 years, 5 months ago (2016-07-04 16:32:15 UTC) #3

jbramley

Performance depends on the platform. For ARM and ARM64, we get +3% on a couple ...

4 years, 5 months ago (2016-07-04 16:58:54 UTC) #4

Performance depends on the platform. For ARM and ARM64, we get +3% on a couple
of benchmarks (depending on exactly which core). Small cores seem to benefit
more, but the patch seems to be beneficial regardless.

https://codereview.chromium.org/2125463002/diff/1/src/compiler/register-alloc...
File src/compiler/register-allocator.cc (right):

https://codereview.chromium.org/2125463002/diff/1/src/compiler/register-alloc...
src/compiler/register-allocator.cc:2208: //   jump). This helps to avoid
otherwise useless move + jump sequences.
On 2016/07/04 16:32:15, Mircea Trofin wrote:
> Nit: the moves wouldn't be useless, I think you're trying to say that if the
> hint comes from there and we succeed in eliding the move ("the hint is
taken"),
> then maybe the jump threader can elide the jump, too?
> 
> If so, mind expanding the sentence for the benefit of the reader?

Yes, that's what I meant. I'll re-word it.

https://codereview.chromium.org/2125463002/diff/1/src/compiler/register-alloc...
src/compiler/register-allocator.cc:2228: // Phis are assigned in an END move
after the last instruction in each
On 2016/07/04 16:32:15, Mircea Trofin wrote:
> I think you meant to say "in the END position of the last instruction".

Acknowledged.

https://codereview.chromium.org/2125463002/diff/1/src/compiler/register-alloc...
src/compiler/register-allocator.cc:2250: predecessor_hint_preference +=
kNotDeferredBlockPreference;
On 2016/07/04 16:32:15, Mircea Trofin wrote:
> Nit: replace with bitwise or, I think that'd make it more clear. 

I'd half planned to support an arithmetic score, though it didn't turn out that
way in the end. (It's co-incidental that the values are powers of two.)
Nonetheless, I thought that addition would be more flexible for future
modifications.

https://codereview.chromium.org/2125463002/diff/1/src/compiler/register-alloc...
src/compiler/register-allocator.cc:2264: // from an allocated (or explicit)
operand.
On 2016/07/04 16:32:15, Mircea Trofin wrote:
> This happens if the previous instruction (the one before predecessor_instr)
> happens to output to a fixed 
> register, which then participates in the phi. Is this a common pattern? 

Reasonably common. Without this, the empty-block check causes regressions in the
cases you described.

> Or could you comment why not walk 
> up the instruction sequence, finding the def?

I actually considered it, but I thought it would be too expensive to be
worthwhile in practice. (I examined a lot of benchmarks and didn't see it occur,
yet we'd do the search for every case that would otherwise not hit.) I think
that the negative case is the common case, and needs to be the fast.

> On that, I think you could avoid the search for the def through the moves by
> fetching the live range and picking
> its first use position.

If that is likely to be fast in the negative case, it would probably be wise. I
didn't realise it was feasible here. Otherwise, I'll add a comment to explain
why we don't look beyond the last instruction.

Mircea Trofin

https://codereview.chromium.org/2125463002/diff/1/src/compiler/register-allocator.cc File src/compiler/register-allocator.cc (right): https://codereview.chromium.org/2125463002/diff/1/src/compiler/register-allocator.cc#newcode2250 src/compiler/register-allocator.cc:2250: predecessor_hint_preference += kNotDeferredBlockPreference; On 2016/07/04 16:58:54, jbramley wrote: > ...

4 years, 5 months ago (2016-07-04 21:20:35 UTC) #5

https://codereview.chromium.org/2125463002/diff/1/src/compiler/register-alloc...
File src/compiler/register-allocator.cc (right):

https://codereview.chromium.org/2125463002/diff/1/src/compiler/register-alloc...
src/compiler/register-allocator.cc:2250: predecessor_hint_preference +=
kNotDeferredBlockPreference;
On 2016/07/04 16:58:54, jbramley wrote:
> On 2016/07/04 16:32:15, Mircea Trofin wrote:
> > Nit: replace with bitwise or, I think that'd make it more clear. 
> 
> I'd half planned to support an arithmetic score, though it didn't turn out
that
> way in the end. (It's co-incidental that the values are powers of two.)
> Nonetheless, I thought that addition would be more flexible for future
> modifications.

Oh - the fact that they are powers of 2 threw me off. Mind adding a comment
saying "they just happen they are" or something like that, to avoid confusion
later? Thanks!

https://codereview.chromium.org/2125463002/diff/1/src/compiler/register-alloc...
src/compiler/register-allocator.cc:2264: // from an allocated (or explicit)
operand.
On 2016/07/04 16:58:54, jbramley wrote:
> On 2016/07/04 16:32:15, Mircea Trofin wrote:
> > This happens if the previous instruction (the one before predecessor_instr)
> > happens to output to a fixed 
> > register, which then participates in the phi. Is this a common pattern? 
> 
> Reasonably common. Without this, the empty-block check causes regressions in
the
> cases you described.
> 
> > Or could you comment why not walk 
> > up the instruction sequence, finding the def?
> 
> I actually considered it, but I thought it would be too expensive to be
> worthwhile in practice. (I examined a lot of benchmarks and didn't see it
occur,
> yet we'd do the search for every case that would otherwise not hit.) I think
> that the negative case is the common case, and needs to be the fast.
> 
> > On that, I think you could avoid the search for the def through the moves by
> > fetching the live range and picking
> > its first use position.
> 
> If that is likely to be fast in the negative case, it would probably be wise.
I
> didn't realise it was feasible here. Otherwise, I'll add a comment to explain
> why we don't look beyond the last instruction.

It may be worth investigating. The compile benchmark shows a 12-13% regression. 
What I'm talking about is O(1):

TopLevelLiveRange* top =data()->live_ranges()[whatever the vreg]
top->first_pos() (etc, etc)

jbramley

On 2016/07/04 21:20:35, Mircea Trofin wrote: > It may be worth investigating. The compile benchmark ...

4 years, 5 months ago (2016-07-05 12:41:54 UTC) #6

jbramley

On 2016/07/04 21:20:35, Mircea Trofin wrote: > > > On that, I think you could ...

4 years, 5 months ago (2016-07-19 16:46:39 UTC) #7

Mircea Trofin

On 2016/07/19 16:46:39, jbramley wrote: > On 2016/07/04 21:20:35, Mircea Trofin wrote: > > > ...

4 years, 5 months ago (2016-07-19 16:52:40 UTC) #8

jbramley

On 2016/07/19 16:52:40, Mircea Trofin wrote: > On 2016/07/19 16:46:39, jbramley wrote: > > On ...

4 years, 5 months ago (2016-07-19 16:56:14 UTC) #9

Mircea Trofin

On 2016/07/19 16:56:14, jbramley wrote: > On 2016/07/19 16:52:40, Mircea Trofin wrote: > > On ...

4 years, 5 months ago (2016-07-19 16:57:20 UTC) #10

jbramley

On 2016/07/19 16:57:20, Mircea Trofin wrote: > On 2016/07/19 16:56:14, jbramley wrote: > > On ...

4 years, 5 months ago (2016-07-19 17:00:10 UTC) #11

On 2016/07/19 16:57:20, Mircea Trofin wrote:
> On 2016/07/19 16:56:14, jbramley wrote:
> > On 2016/07/19 16:52:40, Mircea Trofin wrote:
> > > On 2016/07/19 16:46:39, jbramley wrote:
> > > > On 2016/07/04 21:20:35, Mircea Trofin wrote:
> > > > > > > On that, I think you could avoid the search for the def through
the
> > > moves
> > > > by
> > > > > > > fetching the live range and picking
> > > > > > > its first use position.
> > > > > > 
> > > > > > If that is likely to be fast in the negative case, it would probably
> be
> > > > wise.
> > > > > I
> > > > > > didn't realise it was feasible here. Otherwise, I'll add a comment
to
> > > > explain
> > > > > > why we don't look beyond the last instruction.
> > > > > 
> > > > > It may be worth investigating. The compile benchmark shows a 12-13%
> > > > regression. 
> > > > > What I'm talking about is O(1):
> > > > > 
> > > > > TopLevelLiveRange* top =data()->live_ranges()[whatever the vreg]
> > > > > top->first_pos() (etc, etc)
> > > > 
> > > > I'm not complete sure about what these data structures should look like
at
> > > this
> > > > stage, but it appears that the live ranges for the phi inputs are often
> > empty
> > > at
> > > > this point, such that top->first_pos() returns NULL and top->Print()
> prints
> > an
> > > > empty range. I'm not sure how, to be honest; I had assumed that the live
> > > ranges
> > > > would all have been calculated by this point. Indeed, none of the cases
> > where
> > > > the inputs _do_ have live ranges appear to have allocated operands. Is
> there
> > > any
> > > > way around this?
> > > 
> > > That is surprising. I'd have to take a look, would it be OK if it waited a
> > week,
> > > until I'm back from my vacation?
> > 
> > Yes, of course!
> > 
> > > Just a thoght-might the case when the input is empty/null be a case where
> the
> > > input is a fixed range?
> > 
> > Possibly, though I'm not quite sure how I'd tell. I'll try to investigate
> > further.
> 
> Range-> IsFixed()
> 
> Also, its vreg would be negative.

The vregs I've been examining haven't been negative, so I suppose that rules out
fixed ranges.

Mircea Trofin

On 2016/07/19 17:00:10, jbramley wrote: > On 2016/07/19 16:57:20, Mircea Trofin wrote: > > On ...

4 years, 5 months ago (2016-07-19 17:03:29 UTC) #12

On 2016/07/19 17:00:10, jbramley wrote:
> On 2016/07/19 16:57:20, Mircea Trofin wrote:
> > On 2016/07/19 16:56:14, jbramley wrote:
> > > On 2016/07/19 16:52:40, Mircea Trofin wrote:
> > > > On 2016/07/19 16:46:39, jbramley wrote:
> > > > > On 2016/07/04 21:20:35, Mircea Trofin wrote:
> > > > > > > > On that, I think you could avoid the search for the def through
> the
> > > > moves
> > > > > by
> > > > > > > > fetching the live range and picking
> > > > > > > > its first use position.
> > > > > > > 
> > > > > > > If that is likely to be fast in the negative case, it would
probably
> > be
> > > > > wise.
> > > > > > I
> > > > > > > didn't realise it was feasible here. Otherwise, I'll add a comment
> to
> > > > > explain
> > > > > > > why we don't look beyond the last instruction.
> > > > > > 
> > > > > > It may be worth investigating. The compile benchmark shows a 12-13%
> > > > > regression. 
> > > > > > What I'm talking about is O(1):
> > > > > > 
> > > > > > TopLevelLiveRange* top =data()->live_ranges()[whatever the vreg]
> > > > > > top->first_pos() (etc, etc)
> > > > > 
> > > > > I'm not complete sure about what these data structures should look
like
> at
> > > > this
> > > > > stage, but it appears that the live ranges for the phi inputs are
often
> > > empty
> > > > at
> > > > > this point, such that top->first_pos() returns NULL and top->Print()
> > prints
> > > an
> > > > > empty range. I'm not sure how, to be honest; I had assumed that the
live
> > > > ranges
> > > > > would all have been calculated by this point. Indeed, none of the
cases
> > > where
> > > > > the inputs _do_ have live ranges appear to have allocated operands. Is
> > there
> > > > any
> > > > > way around this?
> > > > 
> > > > That is surprising. I'd have to take a look, would it be OK if it waited
a
> > > week,
> > > > until I'm back from my vacation?
> > > 
> > > Yes, of course!
> > > 
> > > > Just a thoght-might the case when the input is empty/null be a case
where
> > the
> > > > input is a fixed range?
> > > 
> > > Possibly, though I'm not quite sure how I'd tell. I'll try to investigate
> > > further.
> > 
> > Range-> IsFixed()
> > 
> > Also, its vreg would be negative.
> 
> The vregs I've been examining haven't been negative, so I suppose that rules
out
> fixed ranges.

True. Let's chat next week then!

Mircea Trofin

On 2016/07/19 17:03:29, Mircea Trofin wrote: > On 2016/07/19 17:00:10, jbramley wrote: > > On ...

4 years, 5 months ago (2016-07-25 16:22:45 UTC) #13

On 2016/07/19 17:03:29, Mircea Trofin wrote:
> On 2016/07/19 17:00:10, jbramley wrote:
> > On 2016/07/19 16:57:20, Mircea Trofin wrote:
> > > On 2016/07/19 16:56:14, jbramley wrote:
> > > > On 2016/07/19 16:52:40, Mircea Trofin wrote:
> > > > > On 2016/07/19 16:46:39, jbramley wrote:
> > > > > > On 2016/07/04 21:20:35, Mircea Trofin wrote:
> > > > > > > > > On that, I think you could avoid the search for the def
through
> > the
> > > > > moves
> > > > > > by
> > > > > > > > > fetching the live range and picking
> > > > > > > > > its first use position.
> > > > > > > > 
> > > > > > > > If that is likely to be fast in the negative case, it would
> probably
> > > be
> > > > > > wise.
> > > > > > > I
> > > > > > > > didn't realise it was feasible here. Otherwise, I'll add a
comment
> > to
> > > > > > explain
> > > > > > > > why we don't look beyond the last instruction.
> > > > > > > 
> > > > > > > It may be worth investigating. The compile benchmark shows a
12-13%
> > > > > > regression. 
> > > > > > > What I'm talking about is O(1):
> > > > > > > 
> > > > > > > TopLevelLiveRange* top =data()->live_ranges()[whatever the vreg]
> > > > > > > top->first_pos() (etc, etc)
> > > > > > 
> > > > > > I'm not complete sure about what these data structures should look
> like
> > at
> > > > > this
> > > > > > stage, but it appears that the live ranges for the phi inputs are
> often
> > > > empty
> > > > > at
> > > > > > this point, such that top->first_pos() returns NULL and top->Print()
> > > prints
> > > > an
> > > > > > empty range. I'm not sure how, to be honest; I had assumed that the
> live
> > > > > ranges
> > > > > > would all have been calculated by this point. Indeed, none of the
> cases
> > > > where
> > > > > > the inputs _do_ have live ranges appear to have allocated operands.
Is
> > > there
> > > > > any
> > > > > > way around this?
> > > > > 
> > > > > That is surprising. I'd have to take a look, would it be OK if it
waited
> a
> > > > week,
> > > > > until I'm back from my vacation?
> > > > 
> > > > Yes, of course!
> > > > 
> > > > > Just a thoght-might the case when the input is empty/null be a case
> where
> > > the
> > > > > input is a fixed range?
> > > > 
> > > > Possibly, though I'm not quite sure how I'd tell. I'll try to
investigate
> > > > further.
> > > 
> > > Range-> IsFixed()
> > > 
> > > Also, its vreg would be negative.
> > 
> > The vregs I've been examining haven't been negative, so I suppose that rules
> out
> > fixed ranges.
> 
> True. Let's chat next week then!

I took a look - the problem is that we currently call
LiveRangeBuilder::ProcessPhis as we're building the live ranges, 
which is why you're seeing the live ranges of phi input operands as empty. I
suppose you could pull out of the loop 
the hinting part of phi processing.

jbramley

On 2016/07/25 16:22:45, Mircea Trofin wrote: > I took a look - the problem is ...

4 years, 2 months ago (2016-10-18 13:57:29 UTC) #14

Mircea Trofin

some nits, lgtm otherwise. https://codereview.chromium.org/2125463002/diff/40001/src/compiler/register-allocator.cc File src/compiler/register-allocator.cc (right): https://codereview.chromium.org/2125463002/diff/40001/src/compiler/register-allocator.cc#newcode2185 src/compiler/register-allocator.cc:2185: int predecessor_limit = 2; static ...

4 years, 2 months ago (2016-10-19 23:00:41 UTC) #15

jbramley

Thanks for the review! https://codereview.chromium.org/2125463002/diff/40001/src/compiler/register-allocator.cc File src/compiler/register-allocator.cc (right): https://codereview.chromium.org/2125463002/diff/40001/src/compiler/register-allocator.cc#newcode2185 src/compiler/register-allocator.cc:2185: int predecessor_limit = 2; On ...

4 years, 2 months ago (2016-10-20 10:16:12 UTC) #16

Mircea Trofin

https://codereview.chromium.org/2125463002/diff/40001/src/compiler/register-allocator.cc File src/compiler/register-allocator.cc (right): https://codereview.chromium.org/2125463002/diff/40001/src/compiler/register-allocator.cc#newcode2185 src/compiler/register-allocator.cc:2185: int predecessor_limit = 2; On 2016/10/20 10:16:12, jbramley wrote: ...

4 years, 2 months ago (2016-10-20 16:41:11 UTC) #17

https://codereview.chromium.org/2125463002/diff/40001/src/compiler/register-a...
File src/compiler/register-allocator.cc (right):

https://codereview.chromium.org/2125463002/diff/40001/src/compiler/register-a...
src/compiler/register-allocator.cc:2185: int predecessor_limit = 2;
On 2016/10/20 10:16:12, jbramley wrote:
> On 2016/10/19 23:00:41, Mircea Trofin wrote:
> > static const int
> 
> It's decremented in the loop so it can't be const. (I couldn't think of a name
> that would make this clearer.) If you prefer I could make this a static const
> and then use a separate loop counter.

I think the "2" could be a constant, but perhaps it's fine as-is, especially
given the comment above. Fine to leave as-is.

https://codereview.chromium.org/2125463002/diff/40001/src/compiler/register-a...
src/compiler/register-allocator.cc:2216: // heuristics. It is co-incidental that
they are currently powers of two.
On 2016/10/20 10:16:12, jbramley wrote:
> On 2016/10/19 23:00:41, Mircea Trofin wrote:
> > could you expand the comment to explain what a non-trivial heuristic is?
> 
> "Non-trivial" was probably the wrong phrase; it's just additive rather than
> combinatorial. Perhaps it would be simpler to just say this: "The scores are
> combined by addition. It is co-incidental that they are currently powers of
> two."
> 
> Alternatively, would you prefer if I just combined them with a bitwise OR, as
> you originally suggested? It doesn't make any functional difference for these
> values and if that's more intuitive then it's probably the right thing to do.

I think it's more intuitive to go with the bitwise or.

https://codereview.chromium.org/2125463002/diff/40001/src/compiler/register-a...
src/compiler/register-allocator.cc:2241: // called.
On 2016/10/20 10:16:12, jbramley wrote:
> On 2016/10/19 23:00:41, Mircea Trofin wrote:
> > Could you add a TODO (v8) here, perhaps we find a way to avoid this. It
should
> > also point back to the predecessor_limit not being needed, should we find
such
> > an alternative.
> 
> Done. I think predecessor_limit would still be needed, though; according to
> perf, it is the first (and original) loop ("// Look up the predecessor
> instruction") that is hot on the compile-time benchmark.

Acknowledged.

jbramley

https://codereview.chromium.org/2125463002/diff/40001/src/compiler/register-allocator.cc File src/compiler/register-allocator.cc (right): https://codereview.chromium.org/2125463002/diff/40001/src/compiler/register-allocator.cc#newcode2185 src/compiler/register-allocator.cc:2185: int predecessor_limit = 2; On 2016/10/20 16:41:11, Mircea Trofin ...

4 years, 2 months ago (2016-10-21 10:11:14 UTC) #18

jbramley

The patchset sent to the CQ was uploaded after l-g-t-m from mtrofin@chromium.org Link to the ...

4 years, 2 months ago (2016-10-21 10:15:45 UTC) #20

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2125463002/60001

4 years, 2 months ago (2016-10-21 10:15:52 UTC) #21

commit-bot: I haz the power

Description was changed from ========== Improve phi hinting heuristics. The basic intention is to try ...

4 years, 1 month ago (2016-11-17 22:09:31 UTC) #23

commit-bot: I haz the power

4 years, 1 month ago (2016-11-17 22:09:32 UTC) #24

Message was sent while issue was closed.

Patchset 4 (id:??) landed as
https://crrev.com/2e360cd83f60ecfa9eaa6d0648450ef937a15a5f
Cr-Commit-Position: refs/heads/master@{#40498}

Issue 2125463002: Improve phi hinting heuristics. (Closed)

Description

Patch Set 1 #

Patch Set 2 : Rebase only. #

Patch Set 3 : Fix regressions on the compile-time benchmarks. #

Patch Set 4 : Review fixes. #

Messages