Issue 2425423006: libyuv r1629 roll for AVX2 optimized HalfFloatPlane and ExtractAlpha.

fbarchard1

The CQ bit was checked by fbarchard@google.com to run a CQ dry run

4 years, 2 months ago (2016-10-20 01:53:29 UTC) #1

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2425423006/1

4 years, 2 months ago (2016-10-20 01:54:00 UTC) #2

fbarchard1

Previous roll was rolled back due to test failure in cc_unittests on Mac for HalfFloat ...

4 years, 2 months ago (2016-10-20 01:59:51 UTC) #3

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 2 months ago (2016-10-20 03:45:03 UTC) #4

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: linux_android_rel_ng on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/164491)

4 years, 2 months ago (2016-10-20 03:45:04 UTC) #5

fbarchard1

Description was changed from ========== libyuv r1628 for improved HalfFloat and ExtractAlpha AVX2 support HalfFloat ...

4 years, 2 months ago (2016-10-20 18:29:32 UTC) #6

hubbe

https://codereview.chromium.org/2425423006/diff/20001/cc/resources/video_resource_updater_unittest.cc File cc/resources/video_resource_updater_unittest.cc (right): https://codereview.chromium.org/2425423006/diff/20001/cc/resources/video_resource_updater_unittest.cc#newcode558 cc/resources/video_resource_updater_unittest.cc:558: if (i < num_values - 1) { Why is ...

4 years, 2 months ago (2016-10-20 18:55:11 UTC) #7

fbarchard1

The CQ bit was checked by fbarchard@google.com to run a CQ dry run

4 years, 2 months ago (2016-10-20 22:55:50 UTC) #8

fbarchard1

Description was changed from ========== libyuv r1628 for improved HalfFloat and ExtractAlpha AVX2 support HalfFloat ...

4 years, 2 months ago (2016-10-20 22:56:41 UTC) #9

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2425423006/60001

4 years, 2 months ago (2016-10-20 22:56:48 UTC) #10

fbarchard1

The AVX2 version had a bug in the gcc version. This is fixed in 1629. ...

4 years, 2 months ago (2016-10-20 22:57:34 UTC) #11

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 2 months ago (2016-10-21 02:32:40 UTC) #13

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

4 years, 2 months ago (2016-10-21 02:32:41 UTC) #14

fbarchard1

Description was changed from ========== libyuv r1629 for improved HalfFloat and ExtractAlpha AVX2 support HalfFloat ...

4 years, 2 months ago (2016-10-21 17:55:37 UTC) #15

fbarchard1

Description was changed from ========== libyuv r1629 for improved HalfFloat and ExtractAlpha AVX2 support HalfFloat ...

4 years, 2 months ago (2016-10-21 17:59:39 UTC) #16

Description was changed from

==========
libyuv r1629 for improved HalfFloat and ExtractAlpha AVX2 support

HalfFloat AVX2 ported from SSE2 using same magic number method, which
is 20% faster than vcvtps2ph method and produces identical results.

HalfFloat Neon version adapted from inner loop of vectorized C, but folds shift
and narrow into
one instruction and uses element multiply instead of vector to save a
register and dup instruction.  Neon version is also full performance with -Os.

This CL enables -O2 for libyuv_neon as well.

ExtractAlpha ported to AVX2.
ARGB4444ToI420 ported to MSA.
F16C cpu detection for AVX hardware that has halffloat conversion support.

Change log:
https://chromium.googlesource.com/libyuv/libyuv/+log/198bce39..550cf829
Full changes
https://chromium.googlesource.com/libyuv/libyuv/+/198bce39..550cf829


TEST=TestHalfFloatPlane_denormal
BUG=libyuv:560
R=hubbe@chromium.org
CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_precise_blink_rel
==========

to

==========
libyuv r1629 roll for AVX2 optimized HalfFloatPlane and ExtractAlpha.

Important changes included:
AVX2 and NEON halffloat conversion.
Add F16C cpu detection.
Support for I411 removed.
Side by side UV for I420 output improves memory coherency.

HalfFloat AVX2 ported from SSE2 using same magic number method, which
is 20% faster than vcvtps2ph method and produces identical results.

HalfFloat Neon version adapted from inner loop of vectorized C, but folds shift
and narrow into
one instruction and uses element multiply instead of vector to save a
register and dup instruction.  Neon version is also full performance with -Os.

This CL enables -O2 for libyuv_neon as well.

ExtractAlpha ported to AVX2.
ARGB4444ToI420 ported to MSA.
F16C cpu detection for AVX hardware that has halffloat conversion support.

Change log:
https://chromium.googlesource.com/libyuv/libyuv/+log/198bce39..550cf829
Full changes
https://chromium.googlesource.com/libyuv/libyuv/+/198bce39..550cf829

TEST=TestHalfFloatPlane_denormal
BUG=libyuv:560, libyuv:650, libyuv:572, libyuv:645, libyuv:649
R=hubbe@chromium.org
CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_precise_blink_rel
==========

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2425423006/60001

4 years, 2 months ago (2016-10-21 18:00:30 UTC) #18

commit-bot: I haz the power

Description was changed from ========== libyuv r1629 roll for AVX2 optimized HalfFloatPlane and ExtractAlpha. Important ...

4 years, 2 months ago (2016-10-21 18:27:06 UTC) #19

Message was sent while issue was closed.

Description was changed from

==========
libyuv r1629 roll for AVX2 optimized HalfFloatPlane and ExtractAlpha.

Important changes included:
AVX2 and NEON halffloat conversion.
Add F16C cpu detection.
Support for I411 removed.
Side by side UV for I420 output improves memory coherency.

HalfFloat AVX2 ported from SSE2 using same magic number method, which
is 20% faster than vcvtps2ph method and produces identical results.

HalfFloat Neon version adapted from inner loop of vectorized C, but folds shift
and narrow into
one instruction and uses element multiply instead of vector to save a
register and dup instruction.  Neon version is also full performance with -Os.

This CL enables -O2 for libyuv_neon as well.

ExtractAlpha ported to AVX2.
ARGB4444ToI420 ported to MSA.
F16C cpu detection for AVX hardware that has halffloat conversion support.

Change log:
https://chromium.googlesource.com/libyuv/libyuv/+log/198bce39..550cf829
Full changes
https://chromium.googlesource.com/libyuv/libyuv/+/198bce39..550cf829

TEST=TestHalfFloatPlane_denormal
BUG=libyuv:560, libyuv:650, libyuv:572, libyuv:645, libyuv:649
R=hubbe@chromium.org
CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_precise_blink_rel
==========

to

==========
libyuv r1629 roll for AVX2 optimized HalfFloatPlane and ExtractAlpha.

Important changes included:
AVX2 and NEON halffloat conversion.
Add F16C cpu detection.
Support for I411 removed.
Side by side UV for I420 output improves memory coherency.

HalfFloat AVX2 ported from SSE2 using same magic number method, which
is 20% faster than vcvtps2ph method and produces identical results.

HalfFloat Neon version adapted from inner loop of vectorized C, but folds shift
and narrow into
one instruction and uses element multiply instead of vector to save a
register and dup instruction.  Neon version is also full performance with -Os.

This CL enables -O2 for libyuv_neon as well.

ExtractAlpha ported to AVX2.
ARGB4444ToI420 ported to MSA.
F16C cpu detection for AVX hardware that has halffloat conversion support.

Change log:
https://chromium.googlesource.com/libyuv/libyuv/+log/198bce39..550cf829
Full changes
https://chromium.googlesource.com/libyuv/libyuv/+/198bce39..550cf829

TEST=TestHalfFloatPlane_denormal
BUG=libyuv:560, libyuv:650, libyuv:572, libyuv:645, libyuv:649
R=hubbe@chromium.org
CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_precise_blink_rel
==========

commit-bot: I haz the power

Description was changed from ========== libyuv r1629 roll for AVX2 optimized HalfFloatPlane and ExtractAlpha. Important ...

4 years, 2 months ago (2016-10-21 18:42:41 UTC) #21

Message was sent while issue was closed.

Description was changed from

==========
libyuv r1629 roll for AVX2 optimized HalfFloatPlane and ExtractAlpha.

Important changes included:
AVX2 and NEON halffloat conversion.
Add F16C cpu detection.
Support for I411 removed.
Side by side UV for I420 output improves memory coherency.

HalfFloat AVX2 ported from SSE2 using same magic number method, which
is 20% faster than vcvtps2ph method and produces identical results.

HalfFloat Neon version adapted from inner loop of vectorized C, but folds shift
and narrow into
one instruction and uses element multiply instead of vector to save a
register and dup instruction.  Neon version is also full performance with -Os.

This CL enables -O2 for libyuv_neon as well.

ExtractAlpha ported to AVX2.
ARGB4444ToI420 ported to MSA.
F16C cpu detection for AVX hardware that has halffloat conversion support.

Change log:
https://chromium.googlesource.com/libyuv/libyuv/+log/198bce39..550cf829
Full changes
https://chromium.googlesource.com/libyuv/libyuv/+/198bce39..550cf829

TEST=TestHalfFloatPlane_denormal
BUG=libyuv:560, libyuv:650, libyuv:572, libyuv:645, libyuv:649
R=hubbe@chromium.org
CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_precise_blink_rel
==========

to

==========
libyuv r1629 roll for AVX2 optimized HalfFloatPlane and ExtractAlpha.

Important changes included:
AVX2 and NEON halffloat conversion.
Add F16C cpu detection.
Support for I411 removed.
Side by side UV for I420 output improves memory coherency.

HalfFloat AVX2 ported from SSE2 using same magic number method, which
is 20% faster than vcvtps2ph method and produces identical results.

HalfFloat Neon version adapted from inner loop of vectorized C, but folds shift
and narrow into
one instruction and uses element multiply instead of vector to save a
register and dup instruction.  Neon version is also full performance with -Os.

This CL enables -O2 for libyuv_neon as well.

ExtractAlpha ported to AVX2.
ARGB4444ToI420 ported to MSA.
F16C cpu detection for AVX hardware that has halffloat conversion support.

Change log:
https://chromium.googlesource.com/libyuv/libyuv/+log/198bce39..550cf829
Full changes
https://chromium.googlesource.com/libyuv/libyuv/+/198bce39..550cf829

TEST=TestHalfFloatPlane_denormal
BUG=libyuv:560, libyuv:650, libyuv:572, libyuv:645, libyuv:649
R=hubbe@chromium.org
CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_precise_blink_rel

Committed: https://crrev.com/6f32b27bf0f9e369fe4e94c499cec6afe2abb56e
Cr-Commit-Position: refs/heads/master@{#426849}
==========

commit-bot: I haz the power

4 years, 2 months ago (2016-10-21 18:42:42 UTC) #22

Message was sent while issue was closed.

Patchset 4 (id:??) landed as
https://crrev.com/6f32b27bf0f9e369fe4e94c499cec6afe2abb56e
Cr-Commit-Position: refs/heads/master@{#426849}

Issue 2425423006: libyuv r1629 roll for AVX2 optimized HalfFloatPlane and ExtractAlpha. (Closed)

Description

Patch Set 1 #

Patch Set 2 : avoid reading off end of buffer VideoResourceUpdaterTest.MakeHalfFloatTest #

Patch Set 3 : avoid reading off end of buffer VideoResourceUpdaterTest.MakeHalfFloatTest #

Patch Set 4 : unittest change removed #

Messages