Issue 2427053002: Move video encode accelerator IPC messages to GPU IO thread

emircan

The CQ bit was checked by emircan@chromium.org to run a CQ dry run

4 years, 2 months ago (2016-10-18 20:04:43 UTC) #1

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2427053002/1

4 years, 2 months ago (2016-10-18 20:05:02 UTC) #2

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 2 months ago (2016-10-18 21:46:58 UTC) #3

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: win_clang on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_clang/builds/105188)

4 years, 2 months ago (2016-10-18 21:46:59 UTC) #4

emircan

Description was changed from ========== move to message filter BUG= ========== to ========== Move video ...

4 years, 2 months ago (2016-10-18 23:36:24 UTC) #5

emircan

The CQ bit was checked by emircan@chromium.org to run a CQ dry run

4 years, 2 months ago (2016-10-18 23:41:58 UTC) #7

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2427053002/20001

4 years, 2 months ago (2016-10-18 23:42:32 UTC) #8

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 2 months ago (2016-10-18 23:44:56 UTC) #9

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: ios-device on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device/builds/89012) ios-simulator on ...

4 years, 2 months ago (2016-10-18 23:44:56 UTC) #10

emircan

Description was changed from ========== Move video encode accelerator IPC messages to GPU IO thread ...

4 years, 2 months ago (2016-10-19 00:40:42 UTC) #11

emircan

Description was changed from ========== Move video encode accelerator IPC messages to GPU IO thread ...

4 years, 2 months ago (2016-10-19 00:49:01 UTC) #12

emircan

emircan@chromium.org changed reviewers: + wuchengli@chromium.org

4 years, 2 months ago (2016-10-19 00:49:01 UTC) #13

emircan

emircan@chromium.org changed reviewers: - wuchengli@chromium.org

4 years, 2 months ago (2016-10-19 00:49:12 UTC) #14

emircan

The CQ bit was checked by emircan@chromium.org to run a CQ dry run

4 years, 2 months ago (2016-10-19 17:10:59 UTC) #15

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2427053002/40001

4 years, 2 months ago (2016-10-19 17:11:28 UTC) #16

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 2 months ago (2016-10-19 18:38:14 UTC) #18

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: win_clang on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_clang/builds/106499)

4 years, 2 months ago (2016-10-19 18:38:15 UTC) #19

emircan

The CQ bit was checked by emircan@chromium.org to run a CQ dry run

4 years, 2 months ago (2016-10-19 22:07:13 UTC) #20

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2427053002/60001

4 years, 2 months ago (2016-10-19 22:08:12 UTC) #21

emircan

Description was changed from ========== Move video encode accelerator IPC messages to GPU IO thread ...

4 years, 2 months ago (2016-10-19 22:13:12 UTC) #23

emircan

emircan@chromium.org changed reviewers: + sandersd@chromium.org, wuchengli@chromium.org

4 years, 2 months ago (2016-10-19 22:13:13 UTC) #24

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 2 months ago (2016-10-19 22:22:27 UTC) #26

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: android_n5x_swarming_rel on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_n5x_swarming_rel/builds/52231)

4 years, 2 months ago (2016-10-19 22:22:28 UTC) #27

emircan

The CQ bit was checked by emircan@chromium.org to run a CQ dry run

4 years, 2 months ago (2016-10-20 00:01:19 UTC) #28

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2427053002/80001

4 years, 2 months ago (2016-10-20 00:02:18 UTC) #29

emircan

Description was changed from ========== Move video encode accelerator IPC messages to GPU IO thread ...

4 years, 2 months ago (2016-10-20 00:34:10 UTC) #31

emircan

The CQ bit was checked by emircan@chromium.org to run a CQ dry run

4 years, 2 months ago (2016-10-20 00:34:21 UTC) #32

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2427053002/100001

4 years, 2 months ago (2016-10-20 00:34:58 UTC) #34

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 2 months ago (2016-10-20 02:59:00 UTC) #35

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: linux_chromium_asan_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_asan_rel_ng/builds/247121)

4 years, 2 months ago (2016-10-20 02:59:01 UTC) #36

emircan

Description was changed from ========== Move video encode accelerator IPC messages to GPU IO thread ...

4 years, 2 months ago (2016-10-20 17:08:02 UTC) #37

Description was changed from

==========
Move video encode accelerator IPC messages to GPU IO thread 

This CL moves video encode accelerator IPC messages to GPU IO thread
instead of the main thread. This helps stabilize frame rate as well 
as reduce jitter on Windows. Currently, a lot of these calls get huge
delays and results in dropped frames.

In order to do this with respect to each platform, 
TryToSetupEncodeOnSeparateThread() is added to VideoEncodeAccelerator
interface. If this method return false, we keep all IPC messages on
main thread like before. It returns false by default. 

Note: Initially, I only moved AcceleratedVideoEncoderMsg_Encode call 
to IO thread, but then we started waiting for output buffers for a 
long time. Therefore I moved all three function to IO thread and 
reached ~30 fps, even switching between tabs. 
 
BUG=657217, 649275
TEST=RunsAudioVideoCall60SecsAndLogsInternalMetricsH264 browser test 
on Windows result in stable 30 fps.
==========

to

==========
Move video encode accelerator IPC messages to GPU IO thread 

This CL moves video encode accelerator IPC messages to GPU IO thread
instead of the main thread. This helps stabilize frame rate as well 
as reduce jitter on Windows. Currently, a lot of these calls get huge
delays and results in dropped frames.

In order to do this with respect to each platform, 
TryToSetupEncodeOnSeparateThread() is added to VideoEncodeAccelerator
interface. If this method return false, we keep all IPC messages on
main thread like before. It returns false by default. 

Note: Initially, I only moved AcceleratedVideoEncoderMsg_Encode call 
to IO thread, but then we started waiting for output buffers for a 
long time, and again stay below 30 fps. Therefore I moved all three 
functions to IO thread and reached ~30 fps, even when switching between 
tabs. 
 
BUG=657217, 649275
TEST=RunsAudioVideoCall60SecsAndLogsInternalMetricsH264 browser test 
on Windows result in stable 30 fps.
==========

sandersd (OOO until July 31)

On 2016/10/19 22:13:25, emircan wrote: > PTAL. To help with this review, I have one ...

4 years, 2 months ago (2016-10-20 23:12:49 UTC) #38

emircan

On 2016/10/20 23:12:49, sandersd wrote: > For VDAs, the threading is complicated by requirements that ...

4 years, 2 months ago (2016-10-20 23:58:16 UTC) #39

sandersd (OOO until July 31)

Looks good on first pass, but I want to review the main logic more thoroughly ...

4 years, 2 months ago (2016-10-21 01:01:00 UTC) #40

sandersd (OOO until July 31)

Just few nits and low-priority requests. https://codereview.chromium.org/2427053002/diff/100001/media/gpu/ipc/service/gpu_video_encode_accelerator.cc File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right): https://codereview.chromium.org/2427053002/diff/100001/media/gpu/ipc/service/gpu_video_encode_accelerator.cc#newcode114 media/gpu/ipc/service/gpu_video_encode_accelerator.cc:114: // This will ...

4 years, 2 months ago (2016-10-21 18:41:03 UTC) #41

emircan

https://codereview.chromium.org/2427053002/diff/100001/media/gpu/ipc/service/gpu_video_encode_accelerator.cc File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right): https://codereview.chromium.org/2427053002/diff/100001/media/gpu/ipc/service/gpu_video_encode_accelerator.cc#newcode114 media/gpu/ipc/service/gpu_video_encode_accelerator.cc:114: // This will delete |owner_| and |this|. On 2016/10/21 ...

4 years, 1 month ago (2016-10-24 19:44:53 UTC) #42

emircan

The CQ bit was checked by emircan@chromium.org to run a CQ dry run

4 years, 1 month ago (2016-10-24 19:44:54 UTC) #43

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2427053002/120001

4 years, 1 month ago (2016-10-24 19:46:11 UTC) #44

emircan

Description was changed from ========== Move video encode accelerator IPC messages to GPU IO thread ...

4 years, 1 month ago (2016-10-24 21:22:05 UTC) #45

Description was changed from

==========
Move video encode accelerator IPC messages to GPU IO thread 

This CL moves video encode accelerator IPC messages to GPU IO thread
instead of the main thread. This helps stabilize frame rate as well 
as reduce jitter on Windows. Currently, a lot of these calls get huge
delays and results in dropped frames.

In order to do this with respect to each platform, 
TryToSetupEncodeOnSeparateThread() is added to VideoEncodeAccelerator
interface. If this method return false, we keep all IPC messages on
main thread like before. It returns false by default. 

Note: Initially, I only moved AcceleratedVideoEncoderMsg_Encode call 
to IO thread, but then we started waiting for output buffers for a 
long time, and again stay below 30 fps. Therefore I moved all three 
functions to IO thread and reached ~30 fps, even when switching between 
tabs. 
 
BUG=657217, 649275
TEST=RunsAudioVideoCall60SecsAndLogsInternalMetricsH264 browser test 
on Windows result in stable 30 fps.
==========

to

==========
Move video encode accelerator IPC messages to GPU IO thread 

This CL moves video encode accelerator IPC messages to GPU IO thread
instead of the main thread. This helps stabilize frame rate as well 
as reduce jitter on Windows. Currently, a lot of these calls get huge
delays and results in dropped frames.

In order to do this with respect to each platform, 
TryToSetupEncodeOnSeparateThread() is added to VideoEncodeAccelerator
interface. If this method return false, we keep all IPC messages on
main thread like before. It returns false by default. 

Note: Initially, I only moved AcceleratedVideoEncoderMsg_Encode call 
to IO thread, but then we started waiting for output buffers for a 
long time and again stayes below 30 fps. Therefore I moved all three 
functions to IO thread and reached ~30 fps, even when switching between 
tabs. 
 
BUG=657217, 649275
TEST=RunsAudioVideoCall60SecsAndLogsInternalMetricsH264 browser test 
on Windows result in stable 30 fps.
==========

emircan

The CQ bit was checked by emircan@chromium.org to run a CQ dry run

4 years, 1 month ago (2016-10-24 21:34:02 UTC) #46

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2427053002/140001

4 years, 1 month ago (2016-10-24 21:34:43 UTC) #47

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 1 month ago (2016-10-25 00:49:27 UTC) #49

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

4 years, 1 month ago (2016-10-25 00:49:28 UTC) #50

Pawel Osciak

posciak@chromium.org changed reviewers: + posciak@chromium.org

4 years, 1 month ago (2016-10-25 01:44:03 UTC) #51

Pawel Osciak

https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/gpu_video_encode_accelerator.cc File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right): https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/gpu_video_encode_accelerator.cc#newcode107 media/gpu/ipc/service/gpu_video_encode_accelerator.cc:107: void OnChannelError() override { sender_ = NULL; } sender_ ...

4 years, 1 month ago (2016-10-25 01:44:05 UTC) #52

emircan

The CQ bit was checked by emircan@chromium.org to run a CQ dry run

4 years, 1 month ago (2016-10-25 21:52:47 UTC) #53

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2427053002/160001

4 years, 1 month ago (2016-10-25 21:53:18 UTC) #54

emircan

https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/gpu_video_encode_accelerator.cc File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right): https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/gpu_video_encode_accelerator.cc#newcode107 media/gpu/ipc/service/gpu_video_encode_accelerator.cc:107: void OnChannelError() override { sender_ = NULL; } On ...

4 years, 1 month ago (2016-10-25 21:57:51 UTC) #55

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 1 month ago (2016-10-26 01:18:25 UTC) #56

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: win_chromium_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/318993)

4 years, 1 month ago (2016-10-26 01:18:26 UTC) #57

Pawel Osciak

https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/gpu_video_encode_accelerator.cc File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right): https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/gpu_video_encode_accelerator.cc#newcode352 media/gpu/ipc/service/gpu_video_encode_accelerator.cc:352: // Wrap into a SharedMemory in the beginning, so ...

4 years, 1 month ago (2016-10-27 01:45:19 UTC) #58

https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/...
File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right):

https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/...
media/gpu/ipc/service/gpu_video_encode_accelerator.cc:352: // Wrap into a
SharedMemory in the beginning, so that |params.buffer_handle|
On 2016/10/25 21:57:50, emircan wrote:
> On 2016/10/25 01:44:04, Pawel Osciak wrote:
> > Could we move the reminder of this method off of the IO thread please?
Mapping
> > may be costly.
> 
> I added a trace to measure how long it takes here. It is 0.058 ms on average
on
> my Dell E74440 Win laptop. Also, it looks like GpuJpegDecodeAccelerator also
> does this on IO thread. I don't think it is worth adding another thread hop
> here.
>
https://cs.chromium.org/chromium/src/media/gpu/ipc/service/gpu_jpeg_decode_ac...
> 

From my understanding, the rule of thumb is that nothing should be done on IO
thread if it can be helped in a reasonable way, it's timing critical for
responsiveness, etc. Also, even if we have small overheads in multiple places,
they add up.

Moreover, I'm concerned that 0.058ms in a particular situation on a particular
device/OS may be a different number on another system, for different source
memory type, under different memory pressure, virtual address space layout, etc.
This may also sleep.

GpuJpegDecodeAccelerator should also be fixed for this.

> Also, what thread would you suggest to use? Main task runner is pretty busy as
> well here. One option would be passing shmemhandle to the encoder's own thread
> directly but that would require an interface change. I can add a TODO for a
> seperate CL.

Making VideoFrame::data() and visible_data() lazy-map the memory on first access
if IsMappable() would be one solution.

The simplest solution though would probably be to have a separate thread in this
class to jump from IO onto, map there and then post back directly to IO to
encoder_->Encode() from there afterwards.

https://codereview.chromium.org/2427053002/diff/140001/media/gpu/media_founda...
File media/gpu/media_foundation_video_encode_accelerator_win.cc (right):

https://codereview.chromium.org/2427053002/diff/140001/media/gpu/media_founda...
media/gpu/media_foundation_video_encode_accelerator_win.cc:267: client_.reset();
On 2016/10/25 21:57:51, emircan wrote:
> On 2016/10/25 01:44:05, Pawel Osciak wrote:
> > Hm, do we need to InvalidateWeakPtrs() on factory instead?
> 
> It already gets invalidated from GPUVEA's OnWillDestroyStub sequence. I will
add
> a comment to explain it here. I cannot use DCHECK(!client_) though as it is on
a
> different thread. I added this reset to be explicit.

In that case I feel InvalidateWeakPtrs() is more explicitly saying what we want
to do...

https://codereview.chromium.org/2427053002/diff/140001/media/video/video_enco...
File media/video/video_encode_accelerator.h (right):

https://codereview.chromium.org/2427053002/diff/140001/media/video/video_enco...
media/video/video_encode_accelerator.h:170: virtual bool
TryToSetupEncodeOnSeparateThread(
On 2016/10/25 21:57:51, emircan wrote:
> On 2016/10/25 01:44:05, Pawel Osciak wrote:
> > Could you add this call to video_encode_accelerator_unittest as well please?
> 
> VEAUnittest is not running on Win bots right now. I will leave a TODO. I am
> going to enable it once I stabilize WebRTC perf and quality browser tests, and
> H264 problems that come up before[0].
> 
> [0] https://codereview.chromium.org/2205623002/#msg86

The test is currently run on CrOS bots on CrOS devices however. Would it be
possible to spawn a dummy thread in the test with a threadchecker, and only use
it for checking the expected calls go to it and post back to main immediately
afterwards please?

https://codereview.chromium.org/2427053002/diff/160001/media/gpu/ipc/service/...
File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right):

https://codereview.chromium.org/2427053002/diff/160001/media/gpu/ipc/service/...
media/gpu/ipc/service/gpu_video_encode_accelerator.cc:137:
DCHECK(!message->is_sync());
We just deleted message...

https://codereview.chromium.org/2427053002/diff/160001/media/gpu/media_founda...
File media/gpu/media_foundation_video_encode_accelerator_win.h (right):

https://codereview.chromium.org/2427053002/diff/160001/media/gpu/media_founda...
media/gpu/media_foundation_video_encode_accelerator_win.h:139: // Used to run
client callback BitstreamBufferReady() on the encode task
s/encode task runner/encode_client_task_runner_/

https://codereview.chromium.org/2427053002/diff/160001/media/video/video_enco...
File media/video/video_encode_accelerator.h (right):

https://codereview.chromium.org/2427053002/diff/160001/media/video/video_enco...
media/video/video_encode_accelerator.h:165: // |encode_task_runner|, and then
expect Client method to be called on
s/Client method/Client::BitstreamBufferReady()/

emircan

The CQ bit was checked by emircan@chromium.org to run a CQ dry run

4 years, 1 month ago (2016-10-27 17:53:21 UTC) #59

emircan

https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/gpu_video_encode_accelerator.cc File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right): https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/gpu_video_encode_accelerator.cc#newcode352 media/gpu/ipc/service/gpu_video_encode_accelerator.cc:352: // Wrap into a SharedMemory in the beginning, so ...

4 years, 1 month ago (2016-10-27 17:53:30 UTC) #60

https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/...
File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right):

https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/...
media/gpu/ipc/service/gpu_video_encode_accelerator.cc:352: // Wrap into a
SharedMemory in the beginning, so that |params.buffer_handle|
On 2016/10/27 01:45:19, Pawel Osciak wrote:
> On 2016/10/25 21:57:50, emircan wrote:
> > On 2016/10/25 01:44:04, Pawel Osciak wrote:
> > > Could we move the reminder of this method off of the IO thread please?
> Mapping
> > > may be costly.
> > 
> > I added a trace to measure how long it takes here. It is 0.058 ms on average
> on
> > my Dell E74440 Win laptop. Also, it looks like GpuJpegDecodeAccelerator also
> > does this on IO thread. I don't think it is worth adding another thread hop
> > here.
> >
>
https://cs.chromium.org/chromium/src/media/gpu/ipc/service/gpu_jpeg_decode_ac...
> > 
> 
> From my understanding, the rule of thumb is that nothing should be done on IO
> thread if it can be helped in a reasonable way, it's timing critical for
> responsiveness, etc. Also, even if we have small overheads in multiple places,
> they add up.
> 
> Moreover, I'm concerned that 0.058ms in a particular situation on a particular
> device/OS may be a different number on another system, for different source
> memory type, under different memory pressure, virtual address space layout,
etc.
> This may also sleep.
> 
> GpuJpegDecodeAccelerator should also be fixed for this.
> 
> > Also, what thread would you suggest to use? Main task runner is pretty busy
as
> > well here. One option would be passing shmemhandle to the encoder's own
thread
> > directly but that would require an interface change. I can add a TODO for a
> > seperate CL.
> 
> Making VideoFrame::data() and visible_data() lazy-map the memory on first
access
> if IsMappable() would be one solution.
> 
> The simplest solution though would probably be to have a separate thread in
this
> class to jump from IO onto, map there and then post back directly to IO to
> encoder_->Encode() from there afterwards.

I like the idea of letting VideoFrame lazy evaluate. That would help in many
places including GpuJpegDecodeAccelerator. I added a bug, see crbug.com/660082.

I understand that we want to avoid doing work on IO thread. Same principle
applies to the main thread as well, although it is currently used for map. What
I was trying to point out was that thread hopping would be more expensive than
the actual map operation on this platform. Two posted tasks are additional costs
and it would be a clear regression. I just wanted to avoid that.

https://codereview.chromium.org/2427053002/diff/140001/media/video/video_enco...
File media/video/video_encode_accelerator.h (right):

https://codereview.chromium.org/2427053002/diff/140001/media/video/video_enco...
media/video/video_encode_accelerator.h:170: virtual bool
TryToSetupEncodeOnSeparateThread(
On 2016/10/27 01:45:19, Pawel Osciak wrote:
> On 2016/10/25 21:57:51, emircan wrote:
> > On 2016/10/25 01:44:05, Pawel Osciak wrote:
> > > Could you add this call to video_encode_accelerator_unittest as well
please?
> > 
> > VEAUnittest is not running on Win bots right now. I will leave a TODO. I am
> > going to enable it once I stabilize WebRTC perf and quality browser tests,
and
> > H264 problems that come up before[0].
> > 
> > [0] https://codereview.chromium.org/2205623002/#msg86
> 
> The test is currently run on CrOS bots on CrOS devices however. Would it be
> possible to spawn a dummy thread in the test with a threadchecker, and only
use
> it for checking the expected calls go to it and post back to main immediately
> afterwards please?

I added a dummy IO thread where calls run to the test. However, it will first
call this method which returns false on CrOS VAAPI and V4L2. Therefore, all the
calls will still go through main thread, and new thread wouldn't be tested until
Win.

https://codereview.chromium.org/2427053002/diff/160001/media/gpu/ipc/service/...
File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right):

https://codereview.chromium.org/2427053002/diff/160001/media/gpu/ipc/service/...
media/gpu/ipc/service/gpu_video_encode_accelerator.cc:137:
DCHECK(!message->is_sync());
On 2016/10/27 01:45:19, Pawel Osciak wrote:
> We just deleted message...

Thanks for the notice. I just swapped the lines.

https://codereview.chromium.org/2427053002/diff/160001/media/gpu/media_founda...
File media/gpu/media_foundation_video_encode_accelerator_win.h (right):

https://codereview.chromium.org/2427053002/diff/160001/media/gpu/media_founda...
media/gpu/media_foundation_video_encode_accelerator_win.h:139: // Used to run
client callback BitstreamBufferReady() on the encode task
On 2016/10/27 01:45:19, Pawel Osciak wrote:
> s/encode task runner/encode_client_task_runner_/

Done.

https://codereview.chromium.org/2427053002/diff/160001/media/video/video_enco...
File media/video/video_encode_accelerator.h (right):

https://codereview.chromium.org/2427053002/diff/160001/media/video/video_enco...
media/video/video_encode_accelerator.h:165: // |encode_task_runner|, and then
expect Client method to be called on
On 2016/10/27 01:45:19, Pawel Osciak wrote:
> s/Client method/Client::BitstreamBufferReady()/

Done.

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2427053002/180001

4 years, 1 month ago (2016-10-27 17:53:47 UTC) #61

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 1 month ago (2016-10-27 19:16:57 UTC) #62

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

4 years, 1 month ago (2016-10-27 19:16:59 UTC) #63

Pawel Osciak

https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/gpu_video_encode_accelerator.cc File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right): https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/gpu_video_encode_accelerator.cc#newcode352 media/gpu/ipc/service/gpu_video_encode_accelerator.cc:352: // Wrap into a SharedMemory in the beginning, so ...

4 years, 1 month ago (2016-10-31 07:40:22 UTC) #64

https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/...
File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right):

https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/...
media/gpu/ipc/service/gpu_video_encode_accelerator.cc:352: // Wrap into a
SharedMemory in the beginning, so that |params.buffer_handle|
On 2016/10/27 17:53:30, emircan wrote:
> On 2016/10/27 01:45:19, Pawel Osciak wrote:
> > On 2016/10/25 21:57:50, emircan wrote:
> > > On 2016/10/25 01:44:04, Pawel Osciak wrote:
> > > > Could we move the reminder of this method off of the IO thread please?
> > Mapping
> > > > may be costly.
> > > 
> > > I added a trace to measure how long it takes here. It is 0.058 ms on
average
> > on
> > > my Dell E74440 Win laptop. Also, it looks like GpuJpegDecodeAccelerator
also
> > > does this on IO thread. I don't think it is worth adding another thread
hop
> > > here.
> > >
> >
>
https://cs.chromium.org/chromium/src/media/gpu/ipc/service/gpu_jpeg_decode_ac...
> > > 
> > 
> > From my understanding, the rule of thumb is that nothing should be done on
IO
> > thread if it can be helped in a reasonable way, it's timing critical for
> > responsiveness, etc. Also, even if we have small overheads in multiple
places,
> > they add up.
> > 
> > Moreover, I'm concerned that 0.058ms in a particular situation on a
particular
> > device/OS may be a different number on another system, for different source
> > memory type, under different memory pressure, virtual address space layout,
> etc.
> > This may also sleep.
> > 
> > GpuJpegDecodeAccelerator should also be fixed for this.
> > 
> > > Also, what thread would you suggest to use? Main task runner is pretty
busy
> as
> > > well here. One option would be passing shmemhandle to the encoder's own
> thread
> > > directly but that would require an interface change. I can add a TODO for
a
> > > seperate CL.
> > 
> > Making VideoFrame::data() and visible_data() lazy-map the memory on first
> access
> > if IsMappable() would be one solution.
> > 
> > The simplest solution though would probably be to have a separate thread in
> this
> > class to jump from IO onto, map there and then post back directly to IO to
> > encoder_->Encode() from there afterwards.
> 
> I like the idea of letting VideoFrame lazy evaluate. That would help in many
> places including GpuJpegDecodeAccelerator. I added a bug, see
crbug.com/660082.
> 
> I understand that we want to avoid doing work on IO thread. Same principle
> applies to the main thread as well, although it is currently used for map.
What
> I was trying to point out was that thread hopping would be more expensive than
> the actual map operation on this platform. Two posted tasks are additional
costs
> and it would be a clear regression. I just wanted to avoid that.

From our experience with this, the main performance issue here is not the cost
of thread hops, but the fact that GpuMain is heavily overloaded and thus
Encode() etc. calls have to wait a long time for other tasks on Main before they
can execute.

My suggestion is to use a separate thread here, but not GpuMain, for mapping.
That should still improve latency of these calls - no need to wait for busy
GpuMain to become free, but without doing them on IO.

Also, the mapping cost you benchmarked was in that particular situation and for
that particular platform. This may not be the case always.

https://codereview.chromium.org/2427053002/diff/180001/media/gpu/video_encode...
File media/gpu/video_encode_accelerator_unittest.cc (right):

https://codereview.chromium.org/2427053002/diff/180001/media/gpu/video_encode...
media/gpu/video_encode_accelerator_unittest.cc:1399: if
(encode_setup_on_io_thread_ &&
If you'd accept my suggestion below, we could ASSERT that
io_thread_task_runner_->BelongsToCurrentThread() here to verify this is called
on correct thread.

We could also always post to main from here as well then.

https://codereview.chromium.org/2427053002/diff/180001/media/gpu/video_encode...
media/gpu/video_encode_accelerator_unittest.cc:1585: if
(encode_setup_on_io_thread_) {
Perhaps we could remove encode_setup_on_io_thread_, initially assign
encode_task_runner_ = main_task_runner_, only replacing encode_task_runner_ if
TryToSetup...() succeded.

Then instead of having these checks, we could always just post to
encode_task_runner_, regardless of whether it was main or io.

emircan

The CQ bit was checked by emircan@chromium.org to run a CQ dry run

4 years, 1 month ago (2016-10-31 19:42:46 UTC) #65

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2427053002/200001

4 years, 1 month ago (2016-10-31 19:43:32 UTC) #66

emircan

https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/gpu_video_encode_accelerator.cc File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right): https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/gpu_video_encode_accelerator.cc#newcode352 media/gpu/ipc/service/gpu_video_encode_accelerator.cc:352: // Wrap into a SharedMemory in the beginning, so ...

4 years, 1 month ago (2016-10-31 19:45:08 UTC) #67

https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/...
File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right):

https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/...
media/gpu/ipc/service/gpu_video_encode_accelerator.cc:352: // Wrap into a
SharedMemory in the beginning, so that |params.buffer_handle|
On 2016/10/31 07:40:21, Pawel Osciak wrote:
> On 2016/10/27 17:53:30, emircan wrote:
> > On 2016/10/27 01:45:19, Pawel Osciak wrote:
> > > On 2016/10/25 21:57:50, emircan wrote:
> > > > On 2016/10/25 01:44:04, Pawel Osciak wrote:
> > > > > Could we move the reminder of this method off of the IO thread please?
> > > Mapping
> > > > > may be costly.
> > > > 
> > > > I added a trace to measure how long it takes here. It is 0.058 ms on
> average
> > > on
> > > > my Dell E74440 Win laptop. Also, it looks like GpuJpegDecodeAccelerator
> also
> > > > does this on IO thread. I don't think it is worth adding another thread
> hop
> > > > here.
> > > >
> > >
> >
>
https://cs.chromium.org/chromium/src/media/gpu/ipc/service/gpu_jpeg_decode_ac...
> > > > 
> > > 
> > > From my understanding, the rule of thumb is that nothing should be done on
> IO
> > > thread if it can be helped in a reasonable way, it's timing critical for
> > > responsiveness, etc. Also, even if we have small overheads in multiple
> places,
> > > they add up.
> > > 
> > > Moreover, I'm concerned that 0.058ms in a particular situation on a
> particular
> > > device/OS may be a different number on another system, for different
source
> > > memory type, under different memory pressure, virtual address space
layout,
> > etc.
> > > This may also sleep.
> > > 
> > > GpuJpegDecodeAccelerator should also be fixed for this.
> > > 
> > > > Also, what thread would you suggest to use? Main task runner is pretty
> busy
> > as
> > > > well here. One option would be passing shmemhandle to the encoder's own
> > thread
> > > > directly but that would require an interface change. I can add a TODO
for
> a
> > > > seperate CL.
> > > 
> > > Making VideoFrame::data() and visible_data() lazy-map the memory on first
> > access
> > > if IsMappable() would be one solution.
> > > 
> > > The simplest solution though would probably be to have a separate thread
in
> > this
> > > class to jump from IO onto, map there and then post back directly to IO to
> > > encoder_->Encode() from there afterwards.
> > 
> > I like the idea of letting VideoFrame lazy evaluate. That would help in many
> > places including GpuJpegDecodeAccelerator. I added a bug, see
> crbug.com/660082.
> > 
> > I understand that we want to avoid doing work on IO thread. Same principle
> > applies to the main thread as well, although it is currently used for map.
> What
> > I was trying to point out was that thread hopping would be more expensive
than
> > the actual map operation on this platform. Two posted tasks are additional
> costs
> > and it would be a clear regression. I just wanted to avoid that.
> 
> From our experience with this, the main performance issue here is not the cost
> of thread hops, but the fact that GpuMain is heavily overloaded and thus
> Encode() etc. calls have to wait a long time for other tasks on Main before
they
> can execute.
> 
> My suggestion is to use a separate thread here, but not GpuMain, for mapping.
> That should still improve latency of these calls - no need to wait for busy
> GpuMain to become free, but without doing them on IO.
> 
> Also, the mapping cost you benchmarked was in that particular situation and
for
> that particular platform. This may not be the case always.

Sorry if my statement earlier wasn't clear. I am not suggesting posting this
task to GPU main either and I understand the reasons behind. I wanted to point
out that with the current code, map() is happening on GPU main and that is as
bad as doing it on IO.

I think lazy map() you suggested would be the ideal solution. map() would be
delayed until the thread that actually access data-encoder thread in this case-
and would solve the problem in many places including GpuJpeg. We wouldn't need
to spin a thread in each of these classes, use the current ones and avoid thread
hops.

https://codereview.chromium.org/2427053002/diff/180001/media/gpu/video_encode...
File media/gpu/video_encode_accelerator_unittest.cc (right):

https://codereview.chromium.org/2427053002/diff/180001/media/gpu/video_encode...
media/gpu/video_encode_accelerator_unittest.cc:1399: if
(encode_setup_on_io_thread_ &&
On 2016/10/31 07:40:22, Pawel Osciak wrote:
> If you'd accept my suggestion below, we could ASSERT that
> io_thread_task_runner_->BelongsToCurrentThread() here to verify this is called
> on correct thread.
> 
> We could also always post to main from here as well then.

Done.

https://codereview.chromium.org/2427053002/diff/180001/media/gpu/video_encode...
media/gpu/video_encode_accelerator_unittest.cc:1585: if
(encode_setup_on_io_thread_) {
On 2016/10/31 07:40:21, Pawel Osciak wrote:
> Perhaps we could remove encode_setup_on_io_thread_, initially assign
> encode_task_runner_ = main_task_runner_, only replacing encode_task_runner_ if
> TryToSetup...() succeded.
> 
> Then instead of having these checks, we could always just post to
> encode_task_runner_, regardless of whether it was main or io.

Done.

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 1 month ago (2016-10-31 21:11:10 UTC) #68

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

4 years, 1 month ago (2016-10-31 21:11:11 UTC) #69

Pawel Osciak

https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/gpu_video_encode_accelerator.cc File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right): https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/gpu_video_encode_accelerator.cc#newcode352 media/gpu/ipc/service/gpu_video_encode_accelerator.cc:352: // Wrap into a SharedMemory in the beginning, so ...

4 years, 1 month ago (2016-11-01 04:59:58 UTC) #70

https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/...
File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right):

https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/...
media/gpu/ipc/service/gpu_video_encode_accelerator.cc:352: // Wrap into a
SharedMemory in the beginning, so that |params.buffer_handle|
On 2016/10/31 19:45:08, emircan wrote:
> On 2016/10/31 07:40:21, Pawel Osciak wrote:
> > On 2016/10/27 17:53:30, emircan wrote:
> > > On 2016/10/27 01:45:19, Pawel Osciak wrote:
> > > > On 2016/10/25 21:57:50, emircan wrote:
> > > > > On 2016/10/25 01:44:04, Pawel Osciak wrote:
> > > > > > Could we move the reminder of this method off of the IO thread
please?
> > > > Mapping
> > > > > > may be costly.
> > > > > 
> > > > > I added a trace to measure how long it takes here. It is 0.058 ms on
> > average
> > > > on
> > > > > my Dell E74440 Win laptop. Also, it looks like
GpuJpegDecodeAccelerator
> > also
> > > > > does this on IO thread. I don't think it is worth adding another
thread
> > hop
> > > > > here.
> > > > >
> > > >
> > >
> >
>
https://cs.chromium.org/chromium/src/media/gpu/ipc/service/gpu_jpeg_decode_ac...
> > > > > 
> > > > 
> > > > From my understanding, the rule of thumb is that nothing should be done
on
> > IO
> > > > thread if it can be helped in a reasonable way, it's timing critical for
> > > > responsiveness, etc. Also, even if we have small overheads in multiple
> > places,
> > > > they add up.
> > > > 
> > > > Moreover, I'm concerned that 0.058ms in a particular situation on a
> > particular
> > > > device/OS may be a different number on another system, for different
> source
> > > > memory type, under different memory pressure, virtual address space
> layout,
> > > etc.
> > > > This may also sleep.
> > > > 
> > > > GpuJpegDecodeAccelerator should also be fixed for this.
> > > > 
> > > > > Also, what thread would you suggest to use? Main task runner is pretty
> > busy
> > > as
> > > > > well here. One option would be passing shmemhandle to the encoder's
own
> > > thread
> > > > > directly but that would require an interface change. I can add a TODO
> for
> > a
> > > > > seperate CL.
> > > > 
> > > > Making VideoFrame::data() and visible_data() lazy-map the memory on
first
> > > access
> > > > if IsMappable() would be one solution.
> > > > 
> > > > The simplest solution though would probably be to have a separate thread
> in
> > > this
> > > > class to jump from IO onto, map there and then post back directly to IO
to
> > > > encoder_->Encode() from there afterwards.
> > > 
> > > I like the idea of letting VideoFrame lazy evaluate. That would help in
many
> > > places including GpuJpegDecodeAccelerator. I added a bug, see
> > crbug.com/660082.
> > > 
> > > I understand that we want to avoid doing work on IO thread. Same principle
> > > applies to the main thread as well, although it is currently used for map.
> > What
> > > I was trying to point out was that thread hopping would be more expensive
> than
> > > the actual map operation on this platform. Two posted tasks are additional
> > costs
> > > and it would be a clear regression. I just wanted to avoid that.
> > 
> > From our experience with this, the main performance issue here is not the
cost
> > of thread hops, but the fact that GpuMain is heavily overloaded and thus
> > Encode() etc. calls have to wait a long time for other tasks on Main before
> they
> > can execute.
> > 
> > My suggestion is to use a separate thread here, but not GpuMain, for
mapping.
> > That should still improve latency of these calls - no need to wait for busy
> > GpuMain to become free, but without doing them on IO.
> > 
> > Also, the mapping cost you benchmarked was in that particular situation and
> for
> > that particular platform. This may not be the case always.
> 
> Sorry if my statement earlier wasn't clear. I am not suggesting posting this
> task to GPU main either and I understand the reasons behind. I wanted to point
> out that with the current code, map() is happening on GPU main and that is as
> bad as doing it on IO.
> 
> I think lazy map() you suggested would be the ideal solution. map() would be
> delayed until the thread that actually access data-encoder thread in this
case-
> and would solve the problem in many places including GpuJpeg. We wouldn't need
> to spin a thread in each of these classes, use the current ones and avoid
thread
> hops.

SGTM, thanks! That would be in this CL, right?

emircan

On 2016/11/01 04:59:58, Pawel Osciak wrote: > > SGTM, thanks! That would be in this ...

4 years, 1 month ago (2016-11-01 05:05:49 UTC) #71

Pawel Osciak

On 2016/11/01 05:05:49, emircan wrote: > On 2016/11/01 04:59:58, Pawel Osciak wrote: > > > ...

4 years, 1 month ago (2016-11-01 05:13:25 UTC) #72

emircan

The CQ bit was checked by emircan@chromium.org to run a CQ dry run

4 years, 1 month ago (2016-11-02 02:13:00 UTC) #73

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2427053002/220001

4 years, 1 month ago (2016-11-02 02:13:21 UTC) #74

emircan

On 2016/11/01 05:13:25, Pawel Osciak wrote: > On 2016/11/01 05:05:49, emircan wrote: > > On ...

4 years, 1 month ago (2016-11-02 02:26:58 UTC) #75

emircan

Description was changed from ========== Move video encode accelerator IPC messages to GPU IO thread ...

4 years, 1 month ago (2016-11-02 02:27:53 UTC) #76

Description was changed from

==========
Move video encode accelerator IPC messages to GPU IO thread 

This CL moves video encode accelerator IPC messages to GPU IO thread
instead of the main thread. This helps stabilize frame rate as well 
as reduce jitter on Windows. Currently, a lot of these calls get huge
delays and results in dropped frames.

In order to do this with respect to each platform, 
TryToSetupEncodeOnSeparateThread() is added to VideoEncodeAccelerator
interface. If this method return false, we keep all IPC messages on
main thread like before. It returns false by default. 

Note: Initially, I only moved AcceleratedVideoEncoderMsg_Encode call 
to IO thread, but then we started waiting for output buffers for a 
long time and again stayes below 30 fps. Therefore I moved all three 
functions to IO thread and reached ~30 fps, even when switching between 
tabs. 
 
BUG=657217, 649275
TEST=RunsAudioVideoCall60SecsAndLogsInternalMetricsH264 browser test 
on Windows result in stable 30 fps.
==========

to

==========
Move video encode accelerator IPC messages to GPU IO thread 

This CL moves video encode accelerator IPC messages to GPU IO thread
instead of the main thread. This helps stabilize frame rate as well 
as reduce jitter on Windows. Currently, a lot of these calls get huge
delays and results in dropped frames.

In order to do this with respect to each platform, 
TryToSetupEncodeOnSeparateThread() is added to VideoEncodeAccelerator
interface. If this method return false, we keep all IPC messages on
main thread like before. It returns false by default. 

Note: Initially, I only moved AcceleratedVideoEncoderMsg_Encode call 
to IO thread, but then we started waiting for output buffers for a 
long time and again stayed below 30 fps. Therefore I moved all three 
functions to IO thread and reached ~30 fps, even when switching between 
tabs. 
 
BUG=657217, 649275
TEST=RunsAudioVideoCall60SecsAndLogsInternalMetricsH264 browser test 
on Windows result in stable 30 fps.
==========

chromium-reviews

On Mon, Oct 31, 2016 at 12:45 PM, <emircan@chromium.org> wrote: > > https://codereview.chromium.org/2427053002/diff/140001/ > media/gpu/ipc/service/gpu_video_encode_accelerator.cc ...

4 years, 1 month ago (2016-11-02 02:39:35 UTC) #77

On Mon, Oct 31, 2016 at 12:45 PM, <emircan@chromium.org> wrote:

>
> https://codereview.chromium.org/2427053002/diff/140001/
> media/gpu/ipc/service/gpu_video_encode_accelerator.cc
> File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right):
>
> https://codereview.chromium.org/2427053002/diff/140001/
> media/gpu/ipc/service/gpu_video_encode_accelerator.cc#newcode352
> media/gpu/ipc/service/gpu_video_encode_accelerator.cc:352: // Wrap into
> a SharedMemory in the beginning, so that |params.buffer_handle|
> On 2016/10/31 07:40:21, Pawel Osciak wrote:
> > On 2016/10/27 17:53:30, emircan wrote:
> > > On 2016/10/27 01:45:19, Pawel Osciak wrote:
> > > > On 2016/10/25 21:57:50, emircan wrote:
> > > > > On 2016/10/25 01:44:04, Pawel Osciak wrote:
> > > > > > Could we move the reminder of this method off of the IO thread
> please?
> > > > Mapping
> > > > > > may be costly.
> > > > >
> > > > > I added a trace to measure how long it takes here. It is 0.058
> ms on
> > average
> > > > on
> > > > > my Dell E74440 Win laptop. Also, it looks like
> GpuJpegDecodeAccelerator
> > also
> > > > > does this on IO thread. I don't think it is worth adding another
> thread
> > hop
> > > > > here.
> > > > >
> > > >
> > >
> >
> https://cs.chromium.org/chromium/src/media/gpu/ipc/
> service/gpu_jpeg_decode_accelerator.cc?rcl=0&l=194
> > > > >
> > > >
> > > > From my understanding, the rule of thumb is that nothing should be
> done on
> > IO
> > > > thread if it can be helped in a reasonable way, it's timing
> critical for
> > > > responsiveness, etc. Also, even if we have small overheads in
> multiple
> > places,
> > > > they add up.
> > > >
> > > > Moreover, I'm concerned that 0.058ms in a particular situation on
> a
> > particular
> > > > device/OS may be a different number on another system, for
> different source
> > > > memory type, under different memory pressure, virtual address
> space layout,
> > > etc.
> > > > This may also sleep.
> > > >
> > > > GpuJpegDecodeAccelerator should also be fixed for this.
> > > >
> > > > > Also, what thread would you suggest to use? Main task runner is
> pretty
> > busy
> > > as
> > > > > well here. One option would be passing shmemhandle to the
> encoder's own
> > > thread
> > > > > directly but that would require an interface change. I can add a
> TODO for
> > a
> > > > > seperate CL.
> > > >
> > > > Making VideoFrame::data() and visible_data() lazy-map the memory
> on first
> > > access
> > > > if IsMappable() would be one solution.
> > > >
> > > > The simplest solution though would probably be to have a separate
> thread in
> > > this
> > > > class to jump from IO onto, map there and then post back directly
> to IO to
> > > > encoder_->Encode() from there afterwards.
> > >
> > > I like the idea of letting VideoFrame lazy evaluate. That would help
> in many
> > > places including GpuJpegDecodeAccelerator. I added a bug, see
> > crbug.com/660082.
> > >
> > > I understand that we want to avoid doing work on IO thread. Same
> principle
> > > applies to the main thread as well, although it is currently used
> for map.
> > What
> > > I was trying to point out was that thread hopping would be more
> expensive than
> > > the actual map operation on this platform. Two posted tasks are
> additional
> > costs
> > > and it would be a clear regression. I just wanted to avoid that.
> >
> > From our experience with this, the main performance issue here is not
> the cost
> > of thread hops, but the fact that GpuMain is heavily overloaded and
> thus
> > Encode() etc. calls have to wait a long time for other tasks on Main
> before they
> > can execute.
> >
> > My suggestion is to use a separate thread here, but not GpuMain, for
> mapping.
> > That should still improve latency of these calls - no need to wait for
> busy
> > GpuMain to become free, but without doing them on IO.
> >
> > Also, the mapping cost you benchmarked was in that particular
> situation and for
> > that particular platform. This may not be the case always.
>
> Sorry if my statement earlier wasn't clear. I am not suggesting posting
> this task to GPU main either and I understand the reasons behind. I
> wanted to point out that with the current code, map() is happening on
> GPU main and that is as bad as doing it on IO.
>

drive-by: it's not. Costly code needs to stay out of the IO thread. The GPU
main thread is expected to be busy and less responsive, the IO thread is
not. There are places where we need the IO thread to be very responsive. In
particular, mouse cursor responsiveness, as well as raw compositor frame
throughput require the IO thread to be responsive (because we need round
trips to the IO thread for synchronization primitives).


>
> I think lazy map() you suggested would be the ideal solution. map()
> would be delayed until the thread that actually access data-encoder
> thread in this case- and would solve the problem in many places
> including GpuJpeg. We wouldn't need to spin a thread in each of these
> classes, use the current ones and avoid thread hops.
>
> https://codereview.chromium.org/2427053002/diff/180001/
> media/gpu/video_encode_accelerator_unittest.cc
> File media/gpu/video_encode_accelerator_unittest.cc (right):
>
> https://codereview.chromium.org/2427053002/diff/180001/
> media/gpu/video_encode_accelerator_unittest.cc#newcode1399
> media/gpu/video_encode_accelerator_unittest.cc:1399: if
> (encode_setup_on_io_thread_ &&
> On 2016/10/31 07:40:22, Pawel Osciak wrote:
> > If you'd accept my suggestion below, we could ASSERT that
> > io_thread_task_runner_->BelongsToCurrentThread() here to verify this
> is called
> > on correct thread.
> >
> > We could also always post to main from here as well then.
>
> Done.
>
> https://codereview.chromium.org/2427053002/diff/180001/
> media/gpu/video_encode_accelerator_unittest.cc#newcode1585
> media/gpu/video_encode_accelerator_unittest.cc:1585: if
> (encode_setup_on_io_thread_) {
> On 2016/10/31 07:40:21, Pawel Osciak wrote:
> > Perhaps we could remove encode_setup_on_io_thread_, initially assign
> > encode_task_runner_ = main_task_runner_, only replacing
> encode_task_runner_ if
> > TryToSetup...() succeded.
> >
> > Then instead of having these checks, we could always just post to
> > encode_task_runner_, regardless of whether it was main or io.
>
> Done.
>
> https://codereview.chromium.org/2427053002/
>

-- 
You received this message because you are subscribed to the Google Groups
"Chromium-reviews" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to chromium-reviews+unsubscribe@chromium.org.

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 1 month ago (2016-11-02 03:25:12 UTC) #78

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: linux_android_rel_ng on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/172448)

4 years, 1 month ago (2016-11-02 03:25:13 UTC) #79

emircan

The CQ bit was checked by emircan@chromium.org to run a CQ dry run

4 years, 1 month ago (2016-11-02 19:27:03 UTC) #80

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2427053002/240001

4 years, 1 month ago (2016-11-02 19:28:05 UTC) #82

emircan

The CQ bit was checked by emircan@chromium.org to run a CQ dry run

4 years, 1 month ago (2016-11-02 21:14:48 UTC) #83

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2427053002/260001

4 years, 1 month ago (2016-11-02 21:15:29 UTC) #84

emircan

Based on offline discussion, map() is within the threshold of using gpu main thread. (Gpu ...

4 years, 1 month ago (2016-11-02 21:19:21 UTC) #85

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 1 month ago (2016-11-02 21:39:31 UTC) #86

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: chromeos_amd64-generic_chromium_compile_only_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromeos_amd64-generic_chromium_compile_only_ng/builds/227926) chromeos_daisy_chromium_compile_only_ng on ...

4 years, 1 month ago (2016-11-02 21:39:32 UTC) #87

emircan

The CQ bit was checked by emircan@chromium.org to run a CQ dry run

4 years, 1 month ago (2016-11-02 23:09:09 UTC) #88

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2427053002/280001

4 years, 1 month ago (2016-11-02 23:10:01 UTC) #90

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 1 month ago (2016-11-03 00:43:12 UTC) #91

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

4 years, 1 month ago (2016-11-03 00:43:13 UTC) #92

Pawel Osciak

On 2016/11/02 21:19:21, emircan wrote: > Based on offline discussion, map() is within the threshold ...

4 years, 1 month ago (2016-11-04 04:56:48 UTC) #93

emircan

The CQ bit was checked by emircan@chromium.org to run a CQ dry run

4 years, 1 month ago (2016-11-05 02:44:53 UTC) #94

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2427053002/300001

4 years, 1 month ago (2016-11-05 02:45:10 UTC) #95

emircan

On 2016/11/04 04:56:48, Pawel Osciak wrote: > On 2016/11/02 21:19:21, emircan wrote: > > Based ...

4 years, 1 month ago (2016-11-05 02:45:11 UTC) #96

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 1 month ago (2016-11-05 04:24:59 UTC) #97

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

4 years, 1 month ago (2016-11-05 04:25:01 UTC) #98

Pawel Osciak

Thanks! https://codereview.chromium.org/2427053002/diff/300001/media/gpu/ipc/service/gpu_video_encode_accelerator.cc File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right): https://codereview.chromium.org/2427053002/diff/300001/media/gpu/ipc/service/gpu_video_encode_accelerator.cc#newcode109 media/gpu/ipc/service/gpu_video_encode_accelerator.cc:109: : owner_(owner), host_route_id_(host_route_id), sender_(nullptr) {} Nit: perhaps sender_ ...

4 years, 1 month ago (2016-11-07 02:00:36 UTC) #99

emircan

https://codereview.chromium.org/2427053002/diff/300001/media/gpu/ipc/service/gpu_video_encode_accelerator.cc File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right): https://codereview.chromium.org/2427053002/diff/300001/media/gpu/ipc/service/gpu_video_encode_accelerator.cc#newcode109 media/gpu/ipc/service/gpu_video_encode_accelerator.cc:109: : owner_(owner), host_route_id_(host_route_id), sender_(nullptr) {} On 2016/11/07 02:00:35, Pawel ...

4 years, 1 month ago (2016-11-07 19:35:29 UTC) #100

emircan

The CQ bit was checked by emircan@chromium.org to run a CQ dry run

4 years, 1 month ago (2016-11-07 19:35:45 UTC) #101

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2427053002/320001

4 years, 1 month ago (2016-11-07 19:36:14 UTC) #102

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 1 month ago (2016-11-07 19:52:22 UTC) #103

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: cast_shell_android on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/cast_shell_android/builds/159259)

4 years, 1 month ago (2016-11-07 19:52:23 UTC) #104

emircan

The CQ bit was checked by emircan@chromium.org to run a CQ dry run

4 years, 1 month ago (2016-11-07 20:09:54 UTC) #105

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2427053002/340001

4 years, 1 month ago (2016-11-07 20:10:35 UTC) #107

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 1 month ago (2016-11-07 22:37:19 UTC) #108

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

4 years, 1 month ago (2016-11-07 22:37:21 UTC) #109

Pawel Osciak

https://codereview.chromium.org/2427053002/diff/340001/media/gpu/ipc/service/gpu_video_encode_accelerator.h File media/gpu/ipc/service/gpu_video_encode_accelerator.h (right): https://codereview.chromium.org/2427053002/diff/340001/media/gpu/ipc/service/gpu_video_encode_accelerator.h#newcode115 media/gpu/ipc/service/gpu_video_encode_accelerator.h:115: // Notifies renderer that input is completed. Nit: perhaps ...

4 years, 1 month ago (2016-11-09 01:03:02 UTC) #110

emircan

The CQ bit was checked by emircan@chromium.org to run a CQ dry run

4 years, 1 month ago (2016-11-09 01:20:17 UTC) #111

emircan

https://codereview.chromium.org/2427053002/diff/340001/media/gpu/ipc/service/gpu_video_encode_accelerator.h File media/gpu/ipc/service/gpu_video_encode_accelerator.h (right): https://codereview.chromium.org/2427053002/diff/340001/media/gpu/ipc/service/gpu_video_encode_accelerator.h#newcode115 media/gpu/ipc/service/gpu_video_encode_accelerator.h:115: // Notifies renderer that input is completed. On 2016/11/09 ...

4 years, 1 month ago (2016-11-09 01:20:23 UTC) #112

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2427053002/360001

4 years, 1 month ago (2016-11-09 01:21:12 UTC) #113

emircan

The patchset sent to the CQ was uploaded after l-g-t-m from sandersd@chromium.org Link to the ...

4 years, 1 month ago (2016-11-09 01:41:09 UTC) #117

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2427053002/360001

4 years, 1 month ago (2016-11-09 01:41:38 UTC) #118

commit-bot: I haz the power

Description was changed from ========== Move video encode accelerator IPC messages to GPU IO thread ...

4 years, 1 month ago (2016-11-09 06:52:09 UTC) #119

Message was sent while issue was closed.

Description was changed from

==========
Move video encode accelerator IPC messages to GPU IO thread 

This CL moves video encode accelerator IPC messages to GPU IO thread
instead of the main thread. This helps stabilize frame rate as well 
as reduce jitter on Windows. Currently, a lot of these calls get huge
delays and results in dropped frames.

In order to do this with respect to each platform, 
TryToSetupEncodeOnSeparateThread() is added to VideoEncodeAccelerator
interface. If this method return false, we keep all IPC messages on
main thread like before. It returns false by default. 

Note: Initially, I only moved AcceleratedVideoEncoderMsg_Encode call 
to IO thread, but then we started waiting for output buffers for a 
long time and again stayed below 30 fps. Therefore I moved all three 
functions to IO thread and reached ~30 fps, even when switching between 
tabs. 
 
BUG=657217, 649275
TEST=RunsAudioVideoCall60SecsAndLogsInternalMetricsH264 browser test 
on Windows result in stable 30 fps.
==========

to

==========
Move video encode accelerator IPC messages to GPU IO thread 

This CL moves video encode accelerator IPC messages to GPU IO thread
instead of the main thread. This helps stabilize frame rate as well 
as reduce jitter on Windows. Currently, a lot of these calls get huge
delays and results in dropped frames.

In order to do this with respect to each platform, 
TryToSetupEncodeOnSeparateThread() is added to VideoEncodeAccelerator
interface. If this method return false, we keep all IPC messages on
main thread like before. It returns false by default. 

Note: Initially, I only moved AcceleratedVideoEncoderMsg_Encode call 
to IO thread, but then we started waiting for output buffers for a 
long time and again stayed below 30 fps. Therefore I moved all three 
functions to IO thread and reached ~30 fps, even when switching between 
tabs. 
 
BUG=657217, 649275
TEST=RunsAudioVideoCall60SecsAndLogsInternalMetricsH264 browser test 
on Windows result in stable 30 fps.
==========

commit-bot: I haz the power

Description was changed from ========== Move video encode accelerator IPC messages to GPU IO thread ...

4 years, 1 month ago (2016-11-09 06:55:44 UTC) #121

Message was sent while issue was closed.

Description was changed from

==========
Move video encode accelerator IPC messages to GPU IO thread 

This CL moves video encode accelerator IPC messages to GPU IO thread
instead of the main thread. This helps stabilize frame rate as well 
as reduce jitter on Windows. Currently, a lot of these calls get huge
delays and results in dropped frames.

In order to do this with respect to each platform, 
TryToSetupEncodeOnSeparateThread() is added to VideoEncodeAccelerator
interface. If this method return false, we keep all IPC messages on
main thread like before. It returns false by default. 

Note: Initially, I only moved AcceleratedVideoEncoderMsg_Encode call 
to IO thread, but then we started waiting for output buffers for a 
long time and again stayed below 30 fps. Therefore I moved all three 
functions to IO thread and reached ~30 fps, even when switching between 
tabs. 
 
BUG=657217, 649275
TEST=RunsAudioVideoCall60SecsAndLogsInternalMetricsH264 browser test 
on Windows result in stable 30 fps.
==========

to

==========
Move video encode accelerator IPC messages to GPU IO thread

This CL moves video encode accelerator IPC messages to GPU IO thread
instead of the main thread. This helps stabilize frame rate as well
as reduce jitter on Windows. Currently, a lot of these calls get huge
delays and results in dropped frames.

In order to do this with respect to each platform,
TryToSetupEncodeOnSeparateThread() is added to VideoEncodeAccelerator
interface. If this method return false, we keep all IPC messages on
main thread like before. It returns false by default.

Note: Initially, I only moved AcceleratedVideoEncoderMsg_Encode call
to IO thread, but then we started waiting for output buffers for a
long time and again stayed below 30 fps. Therefore I moved all three
functions to IO thread and reached ~30 fps, even when switching between
tabs.

BUG=657217, 649275
TEST=RunsAudioVideoCall60SecsAndLogsInternalMetricsH264 browser test
on Windows result in stable 30 fps.

Committed: https://crrev.com/66ea9831c5714e7d25e544f31bce201cdb35c4af
Cr-Commit-Position: refs/heads/master@{#430882}
==========

commit-bot: I haz the power

4 years, 1 month ago (2016-11-09 06:55:45 UTC) #122

Message was sent while issue was closed.

Patchset 11 (id:??) landed as
https://crrev.com/66ea9831c5714e7d25e544f31bce201cdb35c4af
Cr-Commit-Position: refs/heads/master@{#430882}

Issue 2427053002: Move video encode accelerator IPC messages to GPU IO thread (Closed)

Description

Patch Set 1 #

Patch Set 2 : sanders@ comments. #

Patch Set 3 : Fix log order. #

Patch Set 4 : posciak@ comments. #

Patch Set 5 : posciak@ comments. #

Patch Set 6 : posciak@ comments. #

Patch Set 7 : Add an encoder worker thread for map(). #

Patch Set 8 : Use encoder worker thread only on IO. #

Patch Set 9 : posciak@ comments. #

Patch Set 10 : posciak@ comments. #

Patch Set 11 : posciak@ comments. #

Messages