Issue 2130153003: [turbofan] Add instruction latency modeling for ia32 and x64 platform

shiyu.zhang

Description was changed from ========== [turbofan] Enable instruction scheduling for Intel Atom platform Enable instruction ...

4 years, 5 months ago (2016-07-14 08:38:53 UTC) #1

shiyu.zhang

Description was changed from ========== [turbofan] Enable instruction scheduling for ia32 and x64 platform Enable ...

4 years, 4 months ago (2016-08-05 03:04:49 UTC) #2

Description was changed from

==========
[turbofan] Enable instruction scheduling for ia32 and x64 platform

Enable instruction scheduling in turbofan for ia32 & x64.
Basic instruction latency modeling is added for ia32 and x64 respectively.

Instruction scheduling can introduce performance improvement on Atom.
For example, after turn on FLAG_turbo_instruction_scheduling:
Octane2-zlib improves 5% on ia32 and 1.25% on x64,
bench_skinning improves 12% on ia32 and 10% on x64,
JetStream-float-mm.c improves 4% on both ia32 and x64,
JetStream-n-body.c improves 5% on both ia32 and x64.

BUG=
==========

to

==========
[turbofan] Enable instruction scheduling for ia32 and x64 platform

Add basic instruction latency modeling for ia32 and x64 respectively. Fix an
instruction selector related test case error caused by instruction scheduling.
The bigcore and smallcore (ATOM) share same instruction latency table per
Danno’s guide for better validation confidence.

Instruction scheduling introduces performance improvement on Atom since
scheduled instruction helps less-OOO ATOM processor to get better instruction
latency hidden.

For example, after turn on FLAG_turbo_instruction_scheduling on Atom:
Octane2-zlib improves 5% on ia32 and 1.3% on x64,
bench_skinning improves 12% on ia32 and 10% on x64,
JetStream-float-mm.c improves 4% on both ia32 and x64,
JetStream-n-body.c improves 5% on both ia32 and x64.

Improvement on bigcore platform (Haswell) is also observed, for example:
bench_copy improves 7.5% on ia32 and 2% on x64.

However, two regressions occur on bigcore platform (Haswell):
bench_corrections regresses 7% on ia32,
bench_skinning regresses 8% on x64.

The most likely reason for the regressions is that the benefit from instruction
latency hidden is less in such strong-OOO processor, as well as the scheduled
code may increase the register pressure. The accumulated result of those 2
exceptional cases is regression. The balance between instruction latency hidden
and register pressure when implementing instruction scheduling is worth
studying.
==========

shiyu.zhang

shiyu.zhang@intel.com changed reviewers: + danno@chromium.org

4 years, 4 months ago (2016-08-08 01:17:58 UTC) #3

danno

Thanks for the patch! This matches my expectations about the implementation we discussed, but the ...

4 years, 4 months ago (2016-08-10 12:19:47 UTC) #5

shiyu.zhang

Description was changed from ========== [turbofan] Enable instruction scheduling for ia32 and x64 platform Add ...

4 years, 3 months ago (2016-09-14 06:13:57 UTC) #6

Description was changed from

==========
[turbofan] Enable instruction scheduling for ia32 and x64 platform

Add basic instruction latency modeling for ia32 and x64 respectively. Fix an
instruction selector related test case error caused by instruction scheduling.
The bigcore and smallcore (ATOM) share same instruction latency table per
Danno’s guide for better validation confidence.

Instruction scheduling introduces performance improvement on Atom since
scheduled instruction helps less-OOO ATOM processor to get better instruction
latency hidden.

For example, after turn on FLAG_turbo_instruction_scheduling on Atom:
Octane2-zlib improves 5% on ia32 and 1.3% on x64,
bench_skinning improves 12% on ia32 and 10% on x64,
JetStream-float-mm.c improves 4% on both ia32 and x64,
JetStream-n-body.c improves 5% on both ia32 and x64.

Improvement on bigcore platform (Haswell) is also observed, for example:
bench_copy improves 7.5% on ia32 and 2% on x64.

However, two regressions occur on bigcore platform (Haswell):
bench_corrections regresses 7% on ia32,
bench_skinning regresses 8% on x64.

The most likely reason for the regressions is that the benefit from instruction
latency hidden is less in such strong-OOO processor, as well as the scheduled
code may increase the register pressure. The accumulated result of those 2
exceptional cases is regression. The balance between instruction latency hidden
and register pressure when implementing instruction scheduling is worth
studying.
==========

to

==========
Add basic instruction latency modeling for ia32 and x64 respectively.
The bigcore shares same instruction latency table as smallcore (ATOM).
The accurate latency modeling will benefit the instruction scheduler for
ia32 and x64 without introducing extra regression.

Instruction scheduling introduces performance improvement on Atom since
scheduled instruction helps less-OOO ATOM processor to get better instruction
latency hidden.

For example, after turn on FLAG_turbo_instruction_scheduling on Atom:
Octane2-zlib improves 5% on ia32,
bench_skinning improves 12% on ia32 and 10% on x64,
JetStream-float-mm.c improves 4% on both ia32 and x64,
JetStream-n-body.c improves 5% on both ia32 and x64.

Improvement on bigcore platform (Haswell) is also observed, for example:
bench_copy improves 6% on ia32 and 2% on x64.

However, one regression occurs on bigcore platform (Haswell):
bench_skinning regresses 8% on x64.

This regression exists whether this patch applies or not. The most likely reason
for the regression is that the benefit from instruction latency hidden is less
in such strong-OOO processor, as well as the scheduled code may increase the
register pressure. The accumulated result of this exceptional case is
regression.
The balance between instruction latency hidden and register pressure when
implementing
instruction scheduling is worth studying.
==========

shiyu.zhang

shiyu.zhang@intel.com changed reviewers: - danno@chromium.org

4 years, 3 months ago (2016-09-14 06:13:57 UTC) #7

shiyu.zhang

Description was changed from ========== Add basic instruction latency modeling for ia32 and x64 respectively. ...

4 years, 2 months ago (2016-10-19 06:15:08 UTC) #8

shiyu.zhang

After enable instruction scheduling, the following performance improvement is observed on Atom. With the tuned ...

4 years, 2 months ago (2016-10-19 06:28:51 UTC) #9

shiyu.zhang

shiyu.zhang@intel.com changed reviewers: + danno@chromium.org

4 years, 2 months ago (2016-10-21 06:15:30 UTC) #10

shiyu.zhang

Hi Danno, thanks for the comments. Here is the instruction latency model for ia32 and ...

4 years, 2 months ago (2016-10-21 06:23:20 UTC) #11

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2130153003/60001

4 years, 2 months ago (2016-10-21 07:44:52 UTC) #14

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 2 months ago (2016-10-21 08:27:43 UTC) #15

commit-bot: I haz the power

Try jobs failed on following builders: v8_presubmit on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_presubmit/builds/27014)

4 years, 2 months ago (2016-10-21 08:27:44 UTC) #16

shiyu.zhang

shiyu.zhang@intel.com changed reviewers: + bmeurer@chromium.org, jarin@chromium.org

4 years, 2 months ago (2016-10-21 08:50:46 UTC) #17

Michael Starzinger

mstarzinger@chromium.org changed reviewers: + mstarzinger@chromium.org

4 years, 2 months ago (2016-10-21 08:53:04 UTC) #18

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2130153003/60001

4 years, 2 months ago (2016-10-21 08:54:44 UTC) #21

commit-bot: I haz the power

Description was changed from ========== Add basic instruction latency modeling for ia32 and x64 respectively. ...

4 years, 2 months ago (2016-10-21 08:59:48 UTC) #22

commit-bot: I haz the power

Description was changed from ========== Add basic instruction latency modeling for ia32 and x64 respectively. ...

4 years, 1 month ago (2016-11-17 22:09:16 UTC) #24

commit-bot: I haz the power

4 years, 1 month ago (2016-11-17 22:09:18 UTC) #25

Message was sent while issue was closed.

Patchset 4 (id:??) landed as
https://crrev.com/1b08c7a777d613ee433886749c94c86fce9d20b2
Cr-Commit-Position: refs/heads/master@{#40493}

Issue 2130153003: [turbofan] Add instruction latency modeling for ia32 and x64 platform (Closed)

Description

Patch Set 1 : Add instruction latency modeling only for Atom platform #

Patch Set 2 : Big-core shares the same latency model as Atom #

Patch Set 3 : Rebase on Sep. 6 #

Patch Set 4 : Rebase on Oct. 8 #

Messages