|
|
DescriptionMove video encode accelerator IPC messages to GPU IO thread
This CL moves video encode accelerator IPC messages to GPU IO thread
instead of the main thread. This helps stabilize frame rate as well
as reduce jitter on Windows. Currently, a lot of these calls get huge
delays and results in dropped frames.
In order to do this with respect to each platform,
TryToSetupEncodeOnSeparateThread() is added to VideoEncodeAccelerator
interface. If this method return false, we keep all IPC messages on
main thread like before. It returns false by default.
Note: Initially, I only moved AcceleratedVideoEncoderMsg_Encode call
to IO thread, but then we started waiting for output buffers for a
long time and again stayed below 30 fps. Therefore I moved all three
functions to IO thread and reached ~30 fps, even when switching between
tabs.
BUG=657217, 649275
TEST=RunsAudioVideoCall60SecsAndLogsInternalMetricsH264 browser test
on Windows result in stable 30 fps.
Committed: https://crrev.com/66ea9831c5714e7d25e544f31bce201cdb35c4af
Cr-Commit-Position: refs/heads/master@{#430882}
Patch Set 1 #
Total comments: 20
Patch Set 2 : sanders@ comments. #Patch Set 3 : Fix log order. #
Total comments: 36
Patch Set 4 : posciak@ comments. #
Total comments: 6
Patch Set 5 : posciak@ comments. #
Total comments: 4
Patch Set 6 : posciak@ comments. #Patch Set 7 : Add an encoder worker thread for map(). #Patch Set 8 : Use encoder worker thread only on IO. #Patch Set 9 : posciak@ comments. #
Total comments: 18
Patch Set 10 : posciak@ comments. #
Total comments: 4
Patch Set 11 : posciak@ comments. #Messages
Total messages: 122 (93 generated)
The CQ bit was checked by emircan@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_clang on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_clang/builds/...)
Description was changed from ========== move to message filter BUG= ========== to ========== Move video encode accelerator IPC messages to GPU IO thread BUG=656021 ==========
Patchset #1 (id:1) has been deleted
The CQ bit was checked by emircan@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-device on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device/builds...) ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...) mac_chromium_rel_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_...)
Description was changed from ========== Move video encode accelerator IPC messages to GPU IO thread BUG=656021 ========== to ========== Move video encode accelerator IPC messages to GPU IO thread BUG=657217, 649275 ==========
Description was changed from ========== Move video encode accelerator IPC messages to GPU IO thread BUG=657217, 649275 ========== to ========== Move video encode accelerator IPC messages to GPU IO thread This CL moves video encode accelerator IPC messages to GPU IO thread instead of the main thread. This helps stabilize frame rate as well as reduce jitter on Windows. In order to do this with respect to each platform, TryToSetupEncodeOnSeparateThread() is added to VideoEncodeAccelerator interface. BUG=657217, 649275 TEST=Browser tests on Windows result in stable 30 fps. ==========
emircan@chromium.org changed reviewers: + wuchengli@chromium.org
emircan@chromium.org changed reviewers: - wuchengli@chromium.org
The CQ bit was checked by emircan@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Patchset #1 (id:20001) has been deleted
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_clang on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_clang/builds/...)
The CQ bit was checked by emircan@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Patchset #1 (id:40001) has been deleted
Description was changed from ========== Move video encode accelerator IPC messages to GPU IO thread This CL moves video encode accelerator IPC messages to GPU IO thread instead of the main thread. This helps stabilize frame rate as well as reduce jitter on Windows. In order to do this with respect to each platform, TryToSetupEncodeOnSeparateThread() is added to VideoEncodeAccelerator interface. BUG=657217, 649275 TEST=Browser tests on Windows result in stable 30 fps. ========== to ========== Move video encode accelerator IPC messages to GPU IO thread This CL moves video encode accelerator IPC messages to GPU IO thread instead of the main thread. This helps stabilize frame rate as well as reduce jitter on Windows. Currently, a lot of these calls get huge delays and results in dropped frames. In order to do this with respect to each platform, TryToSetupEncodeOnSeparateThread() is added to VideoEncodeAccelerator interface. If this method return false, we keep all IPC messages on main thread like before. It returns false by default. BUG=657217, 649275 TEST=RunsAudioVideoCall60SecsAndLogsInternalMetricsH264 browser test on Windows result in stable 30 fps. ==========
emircan@chromium.org changed reviewers: + sandersd@chromium.org, wuchengli@chromium.org
PTAL.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: android_n5x_swarming_rel on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_n5x_...)
The CQ bit was checked by emircan@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Patchset #1 (id:60001) has been deleted
Description was changed from ========== Move video encode accelerator IPC messages to GPU IO thread This CL moves video encode accelerator IPC messages to GPU IO thread instead of the main thread. This helps stabilize frame rate as well as reduce jitter on Windows. Currently, a lot of these calls get huge delays and results in dropped frames. In order to do this with respect to each platform, TryToSetupEncodeOnSeparateThread() is added to VideoEncodeAccelerator interface. If this method return false, we keep all IPC messages on main thread like before. It returns false by default. BUG=657217, 649275 TEST=RunsAudioVideoCall60SecsAndLogsInternalMetricsH264 browser test on Windows result in stable 30 fps. ========== to ========== Move video encode accelerator IPC messages to GPU IO thread This CL moves video encode accelerator IPC messages to GPU IO thread instead of the main thread. This helps stabilize frame rate as well as reduce jitter on Windows. Currently, a lot of these calls get huge delays and results in dropped frames. In order to do this with respect to each platform, TryToSetupEncodeOnSeparateThread() is added to VideoEncodeAccelerator interface. If this method return false, we keep all IPC messages on main thread like before. It returns false by default. Note: Initially, I only moved AcceleratedVideoEncoderMsg_Encode call to IO thread, but then we started waiting for output buffers for a long time. Therefore I moved all three function to IO thread and reached ~30 fps, even switching between tabs. BUG=657217, 649275 TEST=RunsAudioVideoCall60SecsAndLogsInternalMetricsH264 browser test on Windows result in stable 30 fps. ==========
The CQ bit was checked by emircan@chromium.org to run a CQ dry run
Patchset #1 (id:80001) has been deleted
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_asan_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
Description was changed from ========== Move video encode accelerator IPC messages to GPU IO thread This CL moves video encode accelerator IPC messages to GPU IO thread instead of the main thread. This helps stabilize frame rate as well as reduce jitter on Windows. Currently, a lot of these calls get huge delays and results in dropped frames. In order to do this with respect to each platform, TryToSetupEncodeOnSeparateThread() is added to VideoEncodeAccelerator interface. If this method return false, we keep all IPC messages on main thread like before. It returns false by default. Note: Initially, I only moved AcceleratedVideoEncoderMsg_Encode call to IO thread, but then we started waiting for output buffers for a long time. Therefore I moved all three function to IO thread and reached ~30 fps, even switching between tabs. BUG=657217, 649275 TEST=RunsAudioVideoCall60SecsAndLogsInternalMetricsH264 browser test on Windows result in stable 30 fps. ========== to ========== Move video encode accelerator IPC messages to GPU IO thread This CL moves video encode accelerator IPC messages to GPU IO thread instead of the main thread. This helps stabilize frame rate as well as reduce jitter on Windows. Currently, a lot of these calls get huge delays and results in dropped frames. In order to do this with respect to each platform, TryToSetupEncodeOnSeparateThread() is added to VideoEncodeAccelerator interface. If this method return false, we keep all IPC messages on main thread like before. It returns false by default. Note: Initially, I only moved AcceleratedVideoEncoderMsg_Encode call to IO thread, but then we started waiting for output buffers for a long time, and again stay below 30 fps. Therefore I moved all three functions to IO thread and reached ~30 fps, even when switching between tabs. BUG=657217, 649275 TEST=RunsAudioVideoCall60SecsAndLogsInternalMetricsH264 browser test on Windows result in stable 30 fps. ==========
On 2016/10/19 22:13:25, emircan wrote: > PTAL. To help with this review, I have one overall question: For VDAs, the threading is complicated by requirements that certain callback, in certain conditions, must be made on the main thread. Are there any constraints on callback threading for VEAs? And more generally, how sure are we that MF VEA won't be adding latency on the IO thread? Thanks!
On 2016/10/20 23:12:49, sandersd wrote: > For VDAs, the threading is complicated by requirements that certain callback, > in certain conditions, must be made on the main thread. Are there any > constraints on callback threading for VEAs? VEA does not have that limitation as far as I read through. It is important to keep the calls synchronized though, and I am calling Encode(), UseOutputBitstreamBuffer(), BitstreamBufferReady() and RequestEncodingParametersChange() on the same thread -IO thread- for that purpose. > And more generally, how sure are we that MF VEA won't be adding latency on the > IO thread? There is definitely some overhead that it adds to IO thread. However, - It is minimal as I am immediately posting tasks to the internal encoder thread of MFVEA. See Encode(), UseOutputBitstreamBuffer(), RequestEncodingParametersChange() here, https://cs.chromium.org/chromium/src/media/gpu/media_foundation_video_encode_... - Currently we only get 20-22 fps while switching/opening tabs, scrolling, etc when using main thread commands. This makes the encoder pretty much useless for real time communications purpose, see https://chromeperf.appspot.com/report?sid=4ca2d4909073f99d431883b2321502521f8.... We need to offload those messages somewhere.
Looks good on first pass, but I want to review the main logic more thoroughly still. I'll try to get that done by noon tomorrow! https://codereview.chromium.org/2427053002/diff/100001/media/gpu/media_founda... File media/gpu/media_foundation_video_encode_accelerator_win.cc (right): https://codereview.chromium.org/2427053002/diff/100001/media/gpu/media_founda... media/gpu/media_foundation_video_encode_accelerator_win.cc:91: DVLOG(3) << __func__; Nit: Prefer __func__ logging first. https://codereview.chromium.org/2427053002/diff/100001/media/gpu/media_founda... File media/gpu/media_foundation_video_encode_accelerator_win.h (right): https://codereview.chromium.org/2427053002/diff/100001/media/gpu/media_founda... media/gpu/media_foundation_video_encode_accelerator_win.h:32: // TryToSetupEncodeOnSeparateThread() is called, it use=s the given Remove '='
Just few nits and low-priority requests. https://codereview.chromium.org/2427053002/diff/100001/media/gpu/ipc/service/... File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right): https://codereview.chromium.org/2427053002/diff/100001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:114: // This will delete |owner_| and |this|. Misleading comment, since it makes the (cross-class) assumption that RemoveFilter() was called by the destructor. I would prefer the simplicity of a model where the filter lifecycle is not so directly tied to GVEA. (No specific suggestions to achieve that though, the existing callback model certainly doesn't make it easy.) https://codereview.chromium.org/2427053002/diff/100001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:143: CHECK(rv); ? https://codereview.chromium.org/2427053002/diff/100001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:253: DLOG(ERROR) << __func__ << " failed."; Should GVEA be entering some sort of error state on these kinds of failures? https://codereview.chromium.org/2427053002/diff/100001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:315: // We're destroying; cancel all callbacks. Are there no GPU shutdown paths where the channel filter can be removed without GVEA being fully destructed first? https://codereview.chromium.org/2427053002/diff/100001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:459: DCHECK(io_task_runner_->BelongsToCurrentThread()); DCHECK() is redundant. https://codereview.chromium.org/2427053002/diff/100001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:468: (main_task_runner_->BelongsToCurrentThread() && !filter_); Nit: Putting |filter_| condition first makes more sense. https://codereview.chromium.org/2427053002/diff/100001/media/gpu/ipc/service/... File media/gpu/ipc/service/gpu_video_encode_accelerator.h (right): https://codereview.chromium.org/2427053002/diff/100001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.h:131: // destroy the VDA. VEA https://codereview.chromium.org/2427053002/diff/100001/media/gpu/media_founda... File media/gpu/media_foundation_video_encode_accelerator_win.cc (right): https://codereview.chromium.org/2427053002/diff/100001/media/gpu/media_founda... media/gpu/media_foundation_video_encode_accelerator_win.cc:266: client_ptr_factory_.reset(); Note that this no longer has the stated side effect. It's a bit ugly to have invalidations happening in multiple places, perhaps there should be two different client pointers, or all invalidation should happen on one of the two classes?
https://codereview.chromium.org/2427053002/diff/100001/media/gpu/ipc/service/... File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right): https://codereview.chromium.org/2427053002/diff/100001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:114: // This will delete |owner_| and |this|. On 2016/10/21 18:41:03, sandersd wrote: > Misleading comment, since it makes the (cross-class) assumption that > RemoveFilter() was called by the destructor. I would prefer the simplicity of a > model where the filter lifecycle is not so directly tied to GVEA. (No specific > suggestions to achieve that though, the existing callback model certainly > doesn't make it easy.) I agree that it sounds like it is coming from dtor, so I removed the comment. I actually got this model from GPUVDA directly. OnWillDestroyStub() calls OnFilterRemoved(), then destroys GPUVEA and filter as well. https://cs.chromium.org/chromium/src/media/gpu/ipc/service/gpu_video_decode_a... https://codereview.chromium.org/2427053002/diff/100001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:143: CHECK(rv); On 2016/10/21 18:41:03, sandersd wrote: > ? It was paranioa check. Removing it. https://codereview.chromium.org/2427053002/diff/100001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:253: DLOG(ERROR) << __func__ << " failed."; On 2016/10/21 18:41:03, sandersd wrote: > Should GVEA be entering some sort of error state on these kinds of failures? I added these to have logs if any error occurs but it looks like we generally do not expect these to fail at all. There was no error handling here, and VDA seems to also just log. I think the policy is to try sending the next message even if one fails. https://cs.chromium.org/chromium/src/media/gpu/ipc/service/gpu_video_decode_a... https://codereview.chromium.org/2427053002/diff/100001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:315: // We're destroying; cancel all callbacks. On 2016/10/21 18:41:03, sandersd wrote: > Are there no GPU shutdown paths where the channel filter can be removed without > GVEA being fully destructed first? Filter is actually removed before GVEA is destructed. Only shutdown path here is OnWillDestroyStub(), lifetime of all these objects are tied to the stub as it is. https://codereview.chromium.org/2427053002/diff/100001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:459: DCHECK(io_task_runner_->BelongsToCurrentThread()); On 2016/10/21 18:41:03, sandersd wrote: > DCHECK() is redundant. Done. https://codereview.chromium.org/2427053002/diff/100001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:468: (main_task_runner_->BelongsToCurrentThread() && !filter_); On 2016/10/21 18:41:03, sandersd wrote: > Nit: Putting |filter_| condition first makes more sense. Done. https://codereview.chromium.org/2427053002/diff/100001/media/gpu/ipc/service/... File media/gpu/ipc/service/gpu_video_encode_accelerator.h (right): https://codereview.chromium.org/2427053002/diff/100001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.h:131: // destroy the VDA. On 2016/10/21 18:41:03, sandersd wrote: > VEA Done. https://codereview.chromium.org/2427053002/diff/100001/media/gpu/media_founda... File media/gpu/media_foundation_video_encode_accelerator_win.cc (right): https://codereview.chromium.org/2427053002/diff/100001/media/gpu/media_founda... media/gpu/media_foundation_video_encode_accelerator_win.cc:91: DVLOG(3) << __func__; On 2016/10/21 01:01:00, sandersd wrote: > Nit: Prefer __func__ logging first. Done. https://codereview.chromium.org/2427053002/diff/100001/media/gpu/media_founda... media/gpu/media_foundation_video_encode_accelerator_win.cc:266: client_ptr_factory_.reset(); On 2016/10/21 18:41:03, sandersd wrote: > Note that this no longer has the stated side effect. It's a bit ugly to have > invalidations happening in multiple places, perhaps there should be two > different client pointers, or all invalidation should happen on one of the two > classes? Thanks for pointing this out. I added a reset to make sure |client_| is invalid. This method-Destroy()- is called by default dtor() in two scenarios: - Initialize fails. Then we don't have an external |client_| and this works. - GPUVEA::OnWillDestoyStub() sequence. Then we have an external |client_| but it is already invalidated by GPUVEA. GPUVEA attaches a callback to itself on EncodeFrameFinished(), so I think it makes sense that it owns a weakptr to itself. The problem is that this class needs to create its own WeakPtrFactory for clients as it receives a naked ptr in Initialize(). I can add a TODO to change that behavior in later CLs. The issue actually exists in V4L2VEA as well, see below. https://cs.chromium.org/chromium/src/media/gpu/v4l2_video_decode_accelerator.... https://codereview.chromium.org/2427053002/diff/100001/media/gpu/media_founda... File media/gpu/media_foundation_video_encode_accelerator_win.h (right): https://codereview.chromium.org/2427053002/diff/100001/media/gpu/media_founda... media/gpu/media_foundation_video_encode_accelerator_win.h:32: // TryToSetupEncodeOnSeparateThread() is called, it use=s the given On 2016/10/21 01:01:00, sandersd wrote: > Remove '=' Done.
The CQ bit was checked by emircan@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Description was changed from ========== Move video encode accelerator IPC messages to GPU IO thread This CL moves video encode accelerator IPC messages to GPU IO thread instead of the main thread. This helps stabilize frame rate as well as reduce jitter on Windows. Currently, a lot of these calls get huge delays and results in dropped frames. In order to do this with respect to each platform, TryToSetupEncodeOnSeparateThread() is added to VideoEncodeAccelerator interface. If this method return false, we keep all IPC messages on main thread like before. It returns false by default. Note: Initially, I only moved AcceleratedVideoEncoderMsg_Encode call to IO thread, but then we started waiting for output buffers for a long time, and again stay below 30 fps. Therefore I moved all three functions to IO thread and reached ~30 fps, even when switching between tabs. BUG=657217, 649275 TEST=RunsAudioVideoCall60SecsAndLogsInternalMetricsH264 browser test on Windows result in stable 30 fps. ========== to ========== Move video encode accelerator IPC messages to GPU IO thread This CL moves video encode accelerator IPC messages to GPU IO thread instead of the main thread. This helps stabilize frame rate as well as reduce jitter on Windows. Currently, a lot of these calls get huge delays and results in dropped frames. In order to do this with respect to each platform, TryToSetupEncodeOnSeparateThread() is added to VideoEncodeAccelerator interface. If this method return false, we keep all IPC messages on main thread like before. It returns false by default. Note: Initially, I only moved AcceleratedVideoEncoderMsg_Encode call to IO thread, but then we started waiting for output buffers for a long time and again stayes below 30 fps. Therefore I moved all three functions to IO thread and reached ~30 fps, even when switching between tabs. BUG=657217, 649275 TEST=RunsAudioVideoCall60SecsAndLogsInternalMetricsH264 browser test on Windows result in stable 30 fps. ==========
The CQ bit was checked by emircan@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
lgtm
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
posciak@chromium.org changed reviewers: + posciak@chromium.org
https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right): https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:107: void OnChannelError() override { sender_ = NULL; } sender_ = nullptr; and similarly below please. https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:134: CHECK(!message->is_sync()); Perhaps DCHECK and return false to avoid crashing the process? Do we need to delete message too? https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:149: IPC::Sender* sender_; = nullptr ? https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:150: }; Should we also have some DISALLOW_COPY_AND_ASSIGN ? https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:246: if (!Send(new AcceleratedVideoEncoderHostMsg_RequireBitstreamBuffers( DCHECK(CheckCalledOnMessageFilterThread()) if we expect to do this? https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:283: filter_removed_.Wait(); Please add a comment similar to the one in gvda.cc why we are waiting here. https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:352: // Wrap into a SharedMemory in the beginning, so that |params.buffer_handle| Could we move the reminder of this method off of the IO thread please? Mapping may be costly. https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... File media/gpu/ipc/service/gpu_video_encode_accelerator.h (right): https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.h:107: bool CheckCalledOnMessageFilterThread(); Since this checks whether we are on message filter or main thread, perhaps should be called CheckIfCalledOnCorrectThread() or something like that instead? https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.h:127: // The message filter to run VDA::Decode on IO thread if VDA supports it. "VDA::Decode", "VDA"... https://codereview.chromium.org/2427053002/diff/140001/media/gpu/media_founda... File media/gpu/media_foundation_video_encode_accelerator_win.cc (right): https://codereview.chromium.org/2427053002/diff/140001/media/gpu/media_founda... media/gpu/media_foundation_video_encode_accelerator_win.cc:226: client_->NotifyError(kInvalidArgumentError); client_ is a WeakPtr, so we should always if (client_) first, here and in other places, if dereferencing. https://codereview.chromium.org/2427053002/diff/140001/media/gpu/media_founda... media/gpu/media_foundation_video_encode_accelerator_win.cc:267: client_.reset(); Hm, do we need to InvalidateWeakPtrs() on factory instead? https://codereview.chromium.org/2427053002/diff/140001/media/video/video_enco... File media/video/video_encode_accelerator.cc (right): https://codereview.chromium.org/2427053002/diff/140001/media/video/video_enco... media/video/video_encode_accelerator.cc:22: const scoped_refptr<base::SingleThreadTaskRunner>& decode_task_runner) { encode_task_runner ? https://codereview.chromium.org/2427053002/diff/140001/media/video/video_enco... File media/video/video_encode_accelerator.h (right): https://codereview.chromium.org/2427053002/diff/140001/media/video/video_enco... media/video/video_encode_accelerator.h:157: // Encode tasks include all calls, except Initialize() and Destroy(), that are Could we instead document the complete set of calls that are called on IO instead of wording it this way please? If we say all except X() and Y(), if we add additional methods later on, we may forget to update this doc. Also, I think we shouldn't probably need RequireBitstreamBuffers and NotifyError callbacks to be on IO as well? https://codereview.chromium.org/2427053002/diff/140001/media/video/video_enco... media/video/video_encode_accelerator.h:170: virtual bool TryToSetupEncodeOnSeparateThread( Could you add this call to video_encode_accelerator_unittest as well please?
The CQ bit was checked by emircan@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right): https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:107: void OnChannelError() override { sender_ = NULL; } On 2016/10/25 01:44:04, Pawel Osciak wrote: > sender_ = nullptr; > > and similarly below please. Done. https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:134: CHECK(!message->is_sync()); On 2016/10/25 01:44:05, Pawel Osciak wrote: > Perhaps DCHECK and return false to avoid crashing the process? Do we need to > delete message too? I merged this error case with l.135 and made it a DCHECK crash. https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:149: IPC::Sender* sender_; On 2016/10/25 01:44:04, Pawel Osciak wrote: > = nullptr ? Added to the initializer list. https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:150: }; On 2016/10/25 01:44:04, Pawel Osciak wrote: > Should we also have some DISALLOW_COPY_AND_ASSIGN ? Done. https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:246: if (!Send(new AcceleratedVideoEncoderHostMsg_RequireBitstreamBuffers( On 2016/10/25 01:44:04, Pawel Osciak wrote: > DCHECK(CheckCalledOnMessageFilterThread()) if we expect to do this? I moved it to main thread after your comment and added a check. https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:283: filter_removed_.Wait(); On 2016/10/25 01:44:04, Pawel Osciak wrote: > Please add a comment similar to the one in gvda.cc why we are waiting here. Done. https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:352: // Wrap into a SharedMemory in the beginning, so that |params.buffer_handle| On 2016/10/25 01:44:04, Pawel Osciak wrote: > Could we move the reminder of this method off of the IO thread please? Mapping > may be costly. I added a trace to measure how long it takes here. It is 0.058 ms on average on my Dell E74440 Win laptop. Also, it looks like GpuJpegDecodeAccelerator also does this on IO thread. I don't think it is worth adding another thread hop here. https://cs.chromium.org/chromium/src/media/gpu/ipc/service/gpu_jpeg_decode_ac... Also, what thread would you suggest to use? Main task runner is pretty busy as well here. One option would be passing shmemhandle to the encoder's own thread directly but that would require an interface change. I can add a TODO for a seperate CL. https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... File media/gpu/ipc/service/gpu_video_encode_accelerator.h (right): https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.h:107: bool CheckCalledOnMessageFilterThread(); On 2016/10/25 01:44:05, Pawel Osciak wrote: > Since this checks whether we are on message filter or main thread, perhaps > should be called CheckIfCalledOnCorrectThread() or something like that instead? Done. https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.h:127: // The message filter to run VDA::Decode on IO thread if VDA supports it. On 2016/10/25 01:44:05, Pawel Osciak wrote: > "VDA::Decode", "VDA"... Done. https://codereview.chromium.org/2427053002/diff/140001/media/gpu/media_founda... File media/gpu/media_foundation_video_encode_accelerator_win.cc (right): https://codereview.chromium.org/2427053002/diff/140001/media/gpu/media_founda... media/gpu/media_foundation_video_encode_accelerator_win.cc:226: client_->NotifyError(kInvalidArgumentError); On 2016/10/25 01:44:05, Pawel Osciak wrote: > client_ is a WeakPtr, so we should always if (client_) first, here and in other > places, if dereferencing. Done. https://codereview.chromium.org/2427053002/diff/140001/media/gpu/media_founda... media/gpu/media_foundation_video_encode_accelerator_win.cc:267: client_.reset(); On 2016/10/25 01:44:05, Pawel Osciak wrote: > Hm, do we need to InvalidateWeakPtrs() on factory instead? It already gets invalidated from GPUVEA's OnWillDestroyStub sequence. I will add a comment to explain it here. I cannot use DCHECK(!client_) though as it is on a different thread. I added this reset to be explicit. https://codereview.chromium.org/2427053002/diff/140001/media/video/video_enco... File media/video/video_encode_accelerator.cc (right): https://codereview.chromium.org/2427053002/diff/140001/media/video/video_enco... media/video/video_encode_accelerator.cc:22: const scoped_refptr<base::SingleThreadTaskRunner>& decode_task_runner) { On 2016/10/25 01:44:05, Pawel Osciak wrote: > encode_task_runner ? Done. https://codereview.chromium.org/2427053002/diff/140001/media/video/video_enco... File media/video/video_encode_accelerator.h (right): https://codereview.chromium.org/2427053002/diff/140001/media/video/video_enco... media/video/video_encode_accelerator.h:157: // Encode tasks include all calls, except Initialize() and Destroy(), that are On 2016/10/25 01:44:05, Pawel Osciak wrote: > Could we instead document the complete set of calls that are called on IO > instead of wording it this way please? If we say all except X() and Y(), if we > add additional methods later on, we may forget to update this doc. > > Also, I think we shouldn't probably need RequireBitstreamBuffers and NotifyError > callbacks to be on IO as well? Sgtm. I moved RequireBitstreamBuffers and NotifyError back to main and updated the comment. https://codereview.chromium.org/2427053002/diff/140001/media/video/video_enco... media/video/video_encode_accelerator.h:170: virtual bool TryToSetupEncodeOnSeparateThread( On 2016/10/25 01:44:05, Pawel Osciak wrote: > Could you add this call to video_encode_accelerator_unittest as well please? VEAUnittest is not running on Win bots right now. I will leave a TODO. I am going to enable it once I stabilize WebRTC perf and quality browser tests, and H264 problems that come up before[0]. [0] https://codereview.chromium.org/2205623002/#msg86
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_...)
https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right): https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:352: // Wrap into a SharedMemory in the beginning, so that |params.buffer_handle| On 2016/10/25 21:57:50, emircan wrote: > On 2016/10/25 01:44:04, Pawel Osciak wrote: > > Could we move the reminder of this method off of the IO thread please? Mapping > > may be costly. > > I added a trace to measure how long it takes here. It is 0.058 ms on average on > my Dell E74440 Win laptop. Also, it looks like GpuJpegDecodeAccelerator also > does this on IO thread. I don't think it is worth adding another thread hop > here. > https://cs.chromium.org/chromium/src/media/gpu/ipc/service/gpu_jpeg_decode_ac... > From my understanding, the rule of thumb is that nothing should be done on IO thread if it can be helped in a reasonable way, it's timing critical for responsiveness, etc. Also, even if we have small overheads in multiple places, they add up. Moreover, I'm concerned that 0.058ms in a particular situation on a particular device/OS may be a different number on another system, for different source memory type, under different memory pressure, virtual address space layout, etc. This may also sleep. GpuJpegDecodeAccelerator should also be fixed for this. > Also, what thread would you suggest to use? Main task runner is pretty busy as > well here. One option would be passing shmemhandle to the encoder's own thread > directly but that would require an interface change. I can add a TODO for a > seperate CL. Making VideoFrame::data() and visible_data() lazy-map the memory on first access if IsMappable() would be one solution. The simplest solution though would probably be to have a separate thread in this class to jump from IO onto, map there and then post back directly to IO to encoder_->Encode() from there afterwards. https://codereview.chromium.org/2427053002/diff/140001/media/gpu/media_founda... File media/gpu/media_foundation_video_encode_accelerator_win.cc (right): https://codereview.chromium.org/2427053002/diff/140001/media/gpu/media_founda... media/gpu/media_foundation_video_encode_accelerator_win.cc:267: client_.reset(); On 2016/10/25 21:57:51, emircan wrote: > On 2016/10/25 01:44:05, Pawel Osciak wrote: > > Hm, do we need to InvalidateWeakPtrs() on factory instead? > > It already gets invalidated from GPUVEA's OnWillDestroyStub sequence. I will add > a comment to explain it here. I cannot use DCHECK(!client_) though as it is on a > different thread. I added this reset to be explicit. In that case I feel InvalidateWeakPtrs() is more explicitly saying what we want to do... https://codereview.chromium.org/2427053002/diff/140001/media/video/video_enco... File media/video/video_encode_accelerator.h (right): https://codereview.chromium.org/2427053002/diff/140001/media/video/video_enco... media/video/video_encode_accelerator.h:170: virtual bool TryToSetupEncodeOnSeparateThread( On 2016/10/25 21:57:51, emircan wrote: > On 2016/10/25 01:44:05, Pawel Osciak wrote: > > Could you add this call to video_encode_accelerator_unittest as well please? > > VEAUnittest is not running on Win bots right now. I will leave a TODO. I am > going to enable it once I stabilize WebRTC perf and quality browser tests, and > H264 problems that come up before[0]. > > [0] https://codereview.chromium.org/2205623002/#msg86 The test is currently run on CrOS bots on CrOS devices however. Would it be possible to spawn a dummy thread in the test with a threadchecker, and only use it for checking the expected calls go to it and post back to main immediately afterwards please? https://codereview.chromium.org/2427053002/diff/160001/media/gpu/ipc/service/... File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right): https://codereview.chromium.org/2427053002/diff/160001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:137: DCHECK(!message->is_sync()); We just deleted message... https://codereview.chromium.org/2427053002/diff/160001/media/gpu/media_founda... File media/gpu/media_foundation_video_encode_accelerator_win.h (right): https://codereview.chromium.org/2427053002/diff/160001/media/gpu/media_founda... media/gpu/media_foundation_video_encode_accelerator_win.h:139: // Used to run client callback BitstreamBufferReady() on the encode task s/encode task runner/encode_client_task_runner_/ https://codereview.chromium.org/2427053002/diff/160001/media/video/video_enco... File media/video/video_encode_accelerator.h (right): https://codereview.chromium.org/2427053002/diff/160001/media/video/video_enco... media/video/video_encode_accelerator.h:165: // |encode_task_runner|, and then expect Client method to be called on s/Client method/Client::BitstreamBufferReady()/
The CQ bit was checked by emircan@chromium.org to run a CQ dry run
https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right): https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:352: // Wrap into a SharedMemory in the beginning, so that |params.buffer_handle| On 2016/10/27 01:45:19, Pawel Osciak wrote: > On 2016/10/25 21:57:50, emircan wrote: > > On 2016/10/25 01:44:04, Pawel Osciak wrote: > > > Could we move the reminder of this method off of the IO thread please? > Mapping > > > may be costly. > > > > I added a trace to measure how long it takes here. It is 0.058 ms on average > on > > my Dell E74440 Win laptop. Also, it looks like GpuJpegDecodeAccelerator also > > does this on IO thread. I don't think it is worth adding another thread hop > > here. > > > https://cs.chromium.org/chromium/src/media/gpu/ipc/service/gpu_jpeg_decode_ac... > > > > From my understanding, the rule of thumb is that nothing should be done on IO > thread if it can be helped in a reasonable way, it's timing critical for > responsiveness, etc. Also, even if we have small overheads in multiple places, > they add up. > > Moreover, I'm concerned that 0.058ms in a particular situation on a particular > device/OS may be a different number on another system, for different source > memory type, under different memory pressure, virtual address space layout, etc. > This may also sleep. > > GpuJpegDecodeAccelerator should also be fixed for this. > > > Also, what thread would you suggest to use? Main task runner is pretty busy as > > well here. One option would be passing shmemhandle to the encoder's own thread > > directly but that would require an interface change. I can add a TODO for a > > seperate CL. > > Making VideoFrame::data() and visible_data() lazy-map the memory on first access > if IsMappable() would be one solution. > > The simplest solution though would probably be to have a separate thread in this > class to jump from IO onto, map there and then post back directly to IO to > encoder_->Encode() from there afterwards. I like the idea of letting VideoFrame lazy evaluate. That would help in many places including GpuJpegDecodeAccelerator. I added a bug, see crbug.com/660082. I understand that we want to avoid doing work on IO thread. Same principle applies to the main thread as well, although it is currently used for map. What I was trying to point out was that thread hopping would be more expensive than the actual map operation on this platform. Two posted tasks are additional costs and it would be a clear regression. I just wanted to avoid that. https://codereview.chromium.org/2427053002/diff/140001/media/video/video_enco... File media/video/video_encode_accelerator.h (right): https://codereview.chromium.org/2427053002/diff/140001/media/video/video_enco... media/video/video_encode_accelerator.h:170: virtual bool TryToSetupEncodeOnSeparateThread( On 2016/10/27 01:45:19, Pawel Osciak wrote: > On 2016/10/25 21:57:51, emircan wrote: > > On 2016/10/25 01:44:05, Pawel Osciak wrote: > > > Could you add this call to video_encode_accelerator_unittest as well please? > > > > VEAUnittest is not running on Win bots right now. I will leave a TODO. I am > > going to enable it once I stabilize WebRTC perf and quality browser tests, and > > H264 problems that come up before[0]. > > > > [0] https://codereview.chromium.org/2205623002/#msg86 > > The test is currently run on CrOS bots on CrOS devices however. Would it be > possible to spawn a dummy thread in the test with a threadchecker, and only use > it for checking the expected calls go to it and post back to main immediately > afterwards please? I added a dummy IO thread where calls run to the test. However, it will first call this method which returns false on CrOS VAAPI and V4L2. Therefore, all the calls will still go through main thread, and new thread wouldn't be tested until Win. https://codereview.chromium.org/2427053002/diff/160001/media/gpu/ipc/service/... File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right): https://codereview.chromium.org/2427053002/diff/160001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:137: DCHECK(!message->is_sync()); On 2016/10/27 01:45:19, Pawel Osciak wrote: > We just deleted message... Thanks for the notice. I just swapped the lines. https://codereview.chromium.org/2427053002/diff/160001/media/gpu/media_founda... File media/gpu/media_foundation_video_encode_accelerator_win.h (right): https://codereview.chromium.org/2427053002/diff/160001/media/gpu/media_founda... media/gpu/media_foundation_video_encode_accelerator_win.h:139: // Used to run client callback BitstreamBufferReady() on the encode task On 2016/10/27 01:45:19, Pawel Osciak wrote: > s/encode task runner/encode_client_task_runner_/ Done. https://codereview.chromium.org/2427053002/diff/160001/media/video/video_enco... File media/video/video_encode_accelerator.h (right): https://codereview.chromium.org/2427053002/diff/160001/media/video/video_enco... media/video/video_encode_accelerator.h:165: // |encode_task_runner|, and then expect Client method to be called on On 2016/10/27 01:45:19, Pawel Osciak wrote: > s/Client method/Client::BitstreamBufferReady()/ Done.
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right): https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:352: // Wrap into a SharedMemory in the beginning, so that |params.buffer_handle| On 2016/10/27 17:53:30, emircan wrote: > On 2016/10/27 01:45:19, Pawel Osciak wrote: > > On 2016/10/25 21:57:50, emircan wrote: > > > On 2016/10/25 01:44:04, Pawel Osciak wrote: > > > > Could we move the reminder of this method off of the IO thread please? > > Mapping > > > > may be costly. > > > > > > I added a trace to measure how long it takes here. It is 0.058 ms on average > > on > > > my Dell E74440 Win laptop. Also, it looks like GpuJpegDecodeAccelerator also > > > does this on IO thread. I don't think it is worth adding another thread hop > > > here. > > > > > > https://cs.chromium.org/chromium/src/media/gpu/ipc/service/gpu_jpeg_decode_ac... > > > > > > > From my understanding, the rule of thumb is that nothing should be done on IO > > thread if it can be helped in a reasonable way, it's timing critical for > > responsiveness, etc. Also, even if we have small overheads in multiple places, > > they add up. > > > > Moreover, I'm concerned that 0.058ms in a particular situation on a particular > > device/OS may be a different number on another system, for different source > > memory type, under different memory pressure, virtual address space layout, > etc. > > This may also sleep. > > > > GpuJpegDecodeAccelerator should also be fixed for this. > > > > > Also, what thread would you suggest to use? Main task runner is pretty busy > as > > > well here. One option would be passing shmemhandle to the encoder's own > thread > > > directly but that would require an interface change. I can add a TODO for a > > > seperate CL. > > > > Making VideoFrame::data() and visible_data() lazy-map the memory on first > access > > if IsMappable() would be one solution. > > > > The simplest solution though would probably be to have a separate thread in > this > > class to jump from IO onto, map there and then post back directly to IO to > > encoder_->Encode() from there afterwards. > > I like the idea of letting VideoFrame lazy evaluate. That would help in many > places including GpuJpegDecodeAccelerator. I added a bug, see crbug.com/660082. > > I understand that we want to avoid doing work on IO thread. Same principle > applies to the main thread as well, although it is currently used for map. What > I was trying to point out was that thread hopping would be more expensive than > the actual map operation on this platform. Two posted tasks are additional costs > and it would be a clear regression. I just wanted to avoid that. From our experience with this, the main performance issue here is not the cost of thread hops, but the fact that GpuMain is heavily overloaded and thus Encode() etc. calls have to wait a long time for other tasks on Main before they can execute. My suggestion is to use a separate thread here, but not GpuMain, for mapping. That should still improve latency of these calls - no need to wait for busy GpuMain to become free, but without doing them on IO. Also, the mapping cost you benchmarked was in that particular situation and for that particular platform. This may not be the case always. https://codereview.chromium.org/2427053002/diff/180001/media/gpu/video_encode... File media/gpu/video_encode_accelerator_unittest.cc (right): https://codereview.chromium.org/2427053002/diff/180001/media/gpu/video_encode... media/gpu/video_encode_accelerator_unittest.cc:1399: if (encode_setup_on_io_thread_ && If you'd accept my suggestion below, we could ASSERT that io_thread_task_runner_->BelongsToCurrentThread() here to verify this is called on correct thread. We could also always post to main from here as well then. https://codereview.chromium.org/2427053002/diff/180001/media/gpu/video_encode... media/gpu/video_encode_accelerator_unittest.cc:1585: if (encode_setup_on_io_thread_) { Perhaps we could remove encode_setup_on_io_thread_, initially assign encode_task_runner_ = main_task_runner_, only replacing encode_task_runner_ if TryToSetup...() succeded. Then instead of having these checks, we could always just post to encode_task_runner_, regardless of whether it was main or io.
The CQ bit was checked by emircan@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right): https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:352: // Wrap into a SharedMemory in the beginning, so that |params.buffer_handle| On 2016/10/31 07:40:21, Pawel Osciak wrote: > On 2016/10/27 17:53:30, emircan wrote: > > On 2016/10/27 01:45:19, Pawel Osciak wrote: > > > On 2016/10/25 21:57:50, emircan wrote: > > > > On 2016/10/25 01:44:04, Pawel Osciak wrote: > > > > > Could we move the reminder of this method off of the IO thread please? > > > Mapping > > > > > may be costly. > > > > > > > > I added a trace to measure how long it takes here. It is 0.058 ms on > average > > > on > > > > my Dell E74440 Win laptop. Also, it looks like GpuJpegDecodeAccelerator > also > > > > does this on IO thread. I don't think it is worth adding another thread > hop > > > > here. > > > > > > > > > > https://cs.chromium.org/chromium/src/media/gpu/ipc/service/gpu_jpeg_decode_ac... > > > > > > > > > > From my understanding, the rule of thumb is that nothing should be done on > IO > > > thread if it can be helped in a reasonable way, it's timing critical for > > > responsiveness, etc. Also, even if we have small overheads in multiple > places, > > > they add up. > > > > > > Moreover, I'm concerned that 0.058ms in a particular situation on a > particular > > > device/OS may be a different number on another system, for different source > > > memory type, under different memory pressure, virtual address space layout, > > etc. > > > This may also sleep. > > > > > > GpuJpegDecodeAccelerator should also be fixed for this. > > > > > > > Also, what thread would you suggest to use? Main task runner is pretty > busy > > as > > > > well here. One option would be passing shmemhandle to the encoder's own > > thread > > > > directly but that would require an interface change. I can add a TODO for > a > > > > seperate CL. > > > > > > Making VideoFrame::data() and visible_data() lazy-map the memory on first > > access > > > if IsMappable() would be one solution. > > > > > > The simplest solution though would probably be to have a separate thread in > > this > > > class to jump from IO onto, map there and then post back directly to IO to > > > encoder_->Encode() from there afterwards. > > > > I like the idea of letting VideoFrame lazy evaluate. That would help in many > > places including GpuJpegDecodeAccelerator. I added a bug, see > crbug.com/660082. > > > > I understand that we want to avoid doing work on IO thread. Same principle > > applies to the main thread as well, although it is currently used for map. > What > > I was trying to point out was that thread hopping would be more expensive than > > the actual map operation on this platform. Two posted tasks are additional > costs > > and it would be a clear regression. I just wanted to avoid that. > > From our experience with this, the main performance issue here is not the cost > of thread hops, but the fact that GpuMain is heavily overloaded and thus > Encode() etc. calls have to wait a long time for other tasks on Main before they > can execute. > > My suggestion is to use a separate thread here, but not GpuMain, for mapping. > That should still improve latency of these calls - no need to wait for busy > GpuMain to become free, but without doing them on IO. > > Also, the mapping cost you benchmarked was in that particular situation and for > that particular platform. This may not be the case always. Sorry if my statement earlier wasn't clear. I am not suggesting posting this task to GPU main either and I understand the reasons behind. I wanted to point out that with the current code, map() is happening on GPU main and that is as bad as doing it on IO. I think lazy map() you suggested would be the ideal solution. map() would be delayed until the thread that actually access data-encoder thread in this case- and would solve the problem in many places including GpuJpeg. We wouldn't need to spin a thread in each of these classes, use the current ones and avoid thread hops. https://codereview.chromium.org/2427053002/diff/180001/media/gpu/video_encode... File media/gpu/video_encode_accelerator_unittest.cc (right): https://codereview.chromium.org/2427053002/diff/180001/media/gpu/video_encode... media/gpu/video_encode_accelerator_unittest.cc:1399: if (encode_setup_on_io_thread_ && On 2016/10/31 07:40:22, Pawel Osciak wrote: > If you'd accept my suggestion below, we could ASSERT that > io_thread_task_runner_->BelongsToCurrentThread() here to verify this is called > on correct thread. > > We could also always post to main from here as well then. Done. https://codereview.chromium.org/2427053002/diff/180001/media/gpu/video_encode... media/gpu/video_encode_accelerator_unittest.cc:1585: if (encode_setup_on_io_thread_) { On 2016/10/31 07:40:21, Pawel Osciak wrote: > Perhaps we could remove encode_setup_on_io_thread_, initially assign > encode_task_runner_ = main_task_runner_, only replacing encode_task_runner_ if > TryToSetup...() succeded. > > Then instead of having these checks, we could always just post to > encode_task_runner_, regardless of whether it was main or io. Done.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right): https://codereview.chromium.org/2427053002/diff/140001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:352: // Wrap into a SharedMemory in the beginning, so that |params.buffer_handle| On 2016/10/31 19:45:08, emircan wrote: > On 2016/10/31 07:40:21, Pawel Osciak wrote: > > On 2016/10/27 17:53:30, emircan wrote: > > > On 2016/10/27 01:45:19, Pawel Osciak wrote: > > > > On 2016/10/25 21:57:50, emircan wrote: > > > > > On 2016/10/25 01:44:04, Pawel Osciak wrote: > > > > > > Could we move the reminder of this method off of the IO thread please? > > > > Mapping > > > > > > may be costly. > > > > > > > > > > I added a trace to measure how long it takes here. It is 0.058 ms on > > average > > > > on > > > > > my Dell E74440 Win laptop. Also, it looks like GpuJpegDecodeAccelerator > > also > > > > > does this on IO thread. I don't think it is worth adding another thread > > hop > > > > > here. > > > > > > > > > > > > > > > https://cs.chromium.org/chromium/src/media/gpu/ipc/service/gpu_jpeg_decode_ac... > > > > > > > > > > > > > From my understanding, the rule of thumb is that nothing should be done on > > IO > > > > thread if it can be helped in a reasonable way, it's timing critical for > > > > responsiveness, etc. Also, even if we have small overheads in multiple > > places, > > > > they add up. > > > > > > > > Moreover, I'm concerned that 0.058ms in a particular situation on a > > particular > > > > device/OS may be a different number on another system, for different > source > > > > memory type, under different memory pressure, virtual address space > layout, > > > etc. > > > > This may also sleep. > > > > > > > > GpuJpegDecodeAccelerator should also be fixed for this. > > > > > > > > > Also, what thread would you suggest to use? Main task runner is pretty > > busy > > > as > > > > > well here. One option would be passing shmemhandle to the encoder's own > > > thread > > > > > directly but that would require an interface change. I can add a TODO > for > > a > > > > > seperate CL. > > > > > > > > Making VideoFrame::data() and visible_data() lazy-map the memory on first > > > access > > > > if IsMappable() would be one solution. > > > > > > > > The simplest solution though would probably be to have a separate thread > in > > > this > > > > class to jump from IO onto, map there and then post back directly to IO to > > > > encoder_->Encode() from there afterwards. > > > > > > I like the idea of letting VideoFrame lazy evaluate. That would help in many > > > places including GpuJpegDecodeAccelerator. I added a bug, see > > crbug.com/660082. > > > > > > I understand that we want to avoid doing work on IO thread. Same principle > > > applies to the main thread as well, although it is currently used for map. > > What > > > I was trying to point out was that thread hopping would be more expensive > than > > > the actual map operation on this platform. Two posted tasks are additional > > costs > > > and it would be a clear regression. I just wanted to avoid that. > > > > From our experience with this, the main performance issue here is not the cost > > of thread hops, but the fact that GpuMain is heavily overloaded and thus > > Encode() etc. calls have to wait a long time for other tasks on Main before > they > > can execute. > > > > My suggestion is to use a separate thread here, but not GpuMain, for mapping. > > That should still improve latency of these calls - no need to wait for busy > > GpuMain to become free, but without doing them on IO. > > > > Also, the mapping cost you benchmarked was in that particular situation and > for > > that particular platform. This may not be the case always. > > Sorry if my statement earlier wasn't clear. I am not suggesting posting this > task to GPU main either and I understand the reasons behind. I wanted to point > out that with the current code, map() is happening on GPU main and that is as > bad as doing it on IO. > > I think lazy map() you suggested would be the ideal solution. map() would be > delayed until the thread that actually access data-encoder thread in this case- > and would solve the problem in many places including GpuJpeg. We wouldn't need > to spin a thread in each of these classes, use the current ones and avoid thread > hops. SGTM, thanks! That would be in this CL, right?
On 2016/11/01 04:59:58, Pawel Osciak wrote: > > SGTM, thanks! That would be in this CL, right? I added a bug -crbug.com/660082- and TODO in gpu_video_encode_accelerator.cc. I would do it in a different CL as it would touch different files&tests and serve a different purpose.
On 2016/11/01 05:05:49, emircan wrote: > On 2016/11/01 04:59:58, Pawel Osciak wrote: > > > > SGTM, thanks! That would be in this CL, right? > > I added a bug -crbug.com/660082- and TODO in gpu_video_encode_accelerator.cc. I > would do it in a different CL as it would touch different files&tests and serve > a different purpose. SGTM, could we do that before landing this CL please?
The CQ bit was checked by emircan@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
On 2016/11/01 05:13:25, Pawel Osciak wrote: > On 2016/11/01 05:05:49, emircan wrote: > > On 2016/11/01 04:59:58, Pawel Osciak wrote: > > > > > > SGTM, thanks! That would be in this CL, right? > > > > I added a bug -crbug.com/660082- and TODO in gpu_video_encode_accelerator.cc. > I > > would do it in a different CL as it would touch different files&tests and > serve > > a different purpose. > > SGTM, could we do that before landing this CL please? In that case, if you think this issue is a blocker for this CL, I will take your other suggestion and add a new thread :) PTAL at PS#7. We are currently running a finch experiment with MediaFoundationVEA, and I would like to land this CL asap for fixing current performance and having time for more test coverage, see CL description for more details. I don't see video frame change landing quickly as it should involve handling errors in data() call.
Description was changed from ========== Move video encode accelerator IPC messages to GPU IO thread This CL moves video encode accelerator IPC messages to GPU IO thread instead of the main thread. This helps stabilize frame rate as well as reduce jitter on Windows. Currently, a lot of these calls get huge delays and results in dropped frames. In order to do this with respect to each platform, TryToSetupEncodeOnSeparateThread() is added to VideoEncodeAccelerator interface. If this method return false, we keep all IPC messages on main thread like before. It returns false by default. Note: Initially, I only moved AcceleratedVideoEncoderMsg_Encode call to IO thread, but then we started waiting for output buffers for a long time and again stayes below 30 fps. Therefore I moved all three functions to IO thread and reached ~30 fps, even when switching between tabs. BUG=657217, 649275 TEST=RunsAudioVideoCall60SecsAndLogsInternalMetricsH264 browser test on Windows result in stable 30 fps. ========== to ========== Move video encode accelerator IPC messages to GPU IO thread This CL moves video encode accelerator IPC messages to GPU IO thread instead of the main thread. This helps stabilize frame rate as well as reduce jitter on Windows. Currently, a lot of these calls get huge delays and results in dropped frames. In order to do this with respect to each platform, TryToSetupEncodeOnSeparateThread() is added to VideoEncodeAccelerator interface. If this method return false, we keep all IPC messages on main thread like before. It returns false by default. Note: Initially, I only moved AcceleratedVideoEncoderMsg_Encode call to IO thread, but then we started waiting for output buffers for a long time and again stayed below 30 fps. Therefore I moved all three functions to IO thread and reached ~30 fps, even when switching between tabs. BUG=657217, 649275 TEST=RunsAudioVideoCall60SecsAndLogsInternalMetricsH264 browser test on Windows result in stable 30 fps. ==========
On Mon, Oct 31, 2016 at 12:45 PM, <emircan@chromium.org> wrote: > > https://codereview.chromium.org/2427053002/diff/140001/ > media/gpu/ipc/service/gpu_video_encode_accelerator.cc > File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right): > > https://codereview.chromium.org/2427053002/diff/140001/ > media/gpu/ipc/service/gpu_video_encode_accelerator.cc#newcode352 > media/gpu/ipc/service/gpu_video_encode_accelerator.cc:352: // Wrap into > a SharedMemory in the beginning, so that |params.buffer_handle| > On 2016/10/31 07:40:21, Pawel Osciak wrote: > > On 2016/10/27 17:53:30, emircan wrote: > > > On 2016/10/27 01:45:19, Pawel Osciak wrote: > > > > On 2016/10/25 21:57:50, emircan wrote: > > > > > On 2016/10/25 01:44:04, Pawel Osciak wrote: > > > > > > Could we move the reminder of this method off of the IO thread > please? > > > > Mapping > > > > > > may be costly. > > > > > > > > > > I added a trace to measure how long it takes here. It is 0.058 > ms on > > average > > > > on > > > > > my Dell E74440 Win laptop. Also, it looks like > GpuJpegDecodeAccelerator > > also > > > > > does this on IO thread. I don't think it is worth adding another > thread > > hop > > > > > here. > > > > > > > > > > > > > > > https://cs.chromium.org/chromium/src/media/gpu/ipc/ > service/gpu_jpeg_decode_accelerator.cc?rcl=0&l=194 > > > > > > > > > > > > > From my understanding, the rule of thumb is that nothing should be > done on > > IO > > > > thread if it can be helped in a reasonable way, it's timing > critical for > > > > responsiveness, etc. Also, even if we have small overheads in > multiple > > places, > > > > they add up. > > > > > > > > Moreover, I'm concerned that 0.058ms in a particular situation on > a > > particular > > > > device/OS may be a different number on another system, for > different source > > > > memory type, under different memory pressure, virtual address > space layout, > > > etc. > > > > This may also sleep. > > > > > > > > GpuJpegDecodeAccelerator should also be fixed for this. > > > > > > > > > Also, what thread would you suggest to use? Main task runner is > pretty > > busy > > > as > > > > > well here. One option would be passing shmemhandle to the > encoder's own > > > thread > > > > > directly but that would require an interface change. I can add a > TODO for > > a > > > > > seperate CL. > > > > > > > > Making VideoFrame::data() and visible_data() lazy-map the memory > on first > > > access > > > > if IsMappable() would be one solution. > > > > > > > > The simplest solution though would probably be to have a separate > thread in > > > this > > > > class to jump from IO onto, map there and then post back directly > to IO to > > > > encoder_->Encode() from there afterwards. > > > > > > I like the idea of letting VideoFrame lazy evaluate. That would help > in many > > > places including GpuJpegDecodeAccelerator. I added a bug, see > > crbug.com/660082. > > > > > > I understand that we want to avoid doing work on IO thread. Same > principle > > > applies to the main thread as well, although it is currently used > for map. > > What > > > I was trying to point out was that thread hopping would be more > expensive than > > > the actual map operation on this platform. Two posted tasks are > additional > > costs > > > and it would be a clear regression. I just wanted to avoid that. > > > > From our experience with this, the main performance issue here is not > the cost > > of thread hops, but the fact that GpuMain is heavily overloaded and > thus > > Encode() etc. calls have to wait a long time for other tasks on Main > before they > > can execute. > > > > My suggestion is to use a separate thread here, but not GpuMain, for > mapping. > > That should still improve latency of these calls - no need to wait for > busy > > GpuMain to become free, but without doing them on IO. > > > > Also, the mapping cost you benchmarked was in that particular > situation and for > > that particular platform. This may not be the case always. > > Sorry if my statement earlier wasn't clear. I am not suggesting posting > this task to GPU main either and I understand the reasons behind. I > wanted to point out that with the current code, map() is happening on > GPU main and that is as bad as doing it on IO. > drive-by: it's not. Costly code needs to stay out of the IO thread. The GPU main thread is expected to be busy and less responsive, the IO thread is not. There are places where we need the IO thread to be very responsive. In particular, mouse cursor responsiveness, as well as raw compositor frame throughput require the IO thread to be responsive (because we need round trips to the IO thread for synchronization primitives). > > I think lazy map() you suggested would be the ideal solution. map() > would be delayed until the thread that actually access data-encoder > thread in this case- and would solve the problem in many places > including GpuJpeg. We wouldn't need to spin a thread in each of these > classes, use the current ones and avoid thread hops. > > https://codereview.chromium.org/2427053002/diff/180001/ > media/gpu/video_encode_accelerator_unittest.cc > File media/gpu/video_encode_accelerator_unittest.cc (right): > > https://codereview.chromium.org/2427053002/diff/180001/ > media/gpu/video_encode_accelerator_unittest.cc#newcode1399 > media/gpu/video_encode_accelerator_unittest.cc:1399: if > (encode_setup_on_io_thread_ && > On 2016/10/31 07:40:22, Pawel Osciak wrote: > > If you'd accept my suggestion below, we could ASSERT that > > io_thread_task_runner_->BelongsToCurrentThread() here to verify this > is called > > on correct thread. > > > > We could also always post to main from here as well then. > > Done. > > https://codereview.chromium.org/2427053002/diff/180001/ > media/gpu/video_encode_accelerator_unittest.cc#newcode1585 > media/gpu/video_encode_accelerator_unittest.cc:1585: if > (encode_setup_on_io_thread_) { > On 2016/10/31 07:40:21, Pawel Osciak wrote: > > Perhaps we could remove encode_setup_on_io_thread_, initially assign > > encode_task_runner_ = main_task_runner_, only replacing > encode_task_runner_ if > > TryToSetup...() succeded. > > > > Then instead of having these checks, we could always just post to > > encode_task_runner_, regardless of whether it was main or io. > > Done. > > https://codereview.chromium.org/2427053002/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_android_rel_ng on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/linux_androi...)
The CQ bit was checked by emircan@chromium.org to run a CQ dry run
Patchset #7 (id:220001) has been deleted
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was checked by emircan@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Based on offline discussion, map() is within the threshold of using gpu main thread. (Gpu main 10 ms, IO thread 10-100 µs) Therefore, I updated the change such that thread hop only happens when we are on IO. PTAL at PS#8.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: chromeos_amd64-generic_chromium_compile_only_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromeos_amd64-...) chromeos_daisy_chromium_compile_only_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromeos_daisy_...)
The CQ bit was checked by emircan@chromium.org to run a CQ dry run
Patchset #8 (id:260001) has been deleted
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
On 2016/11/02 21:19:21, emircan wrote: > Based on offline discussion, map() is within the threshold of using gpu main > thread. (Gpu main 10 ms, IO thread 10-100 µs) Therefore, I updated the change > such that thread hop only happens when we are on IO. PTAL at PS#8. I think it should always be beneficial to map on separate thread, whichever thread we'd be calling VEA::Encode() on afterwards, and most of all that should also simplify the code? I.e., similarly to the unittest change, I believe we could have an encode_task_runner pointing to either gpu main or io, depending on the result of TryToSetupEncodeOnSeparateThread(), in Encode() always jump off to the worker thread, map there, and post VEA::Encode() to encode_task_runner when done, without having to do any checking which case we are in (IO or Main)?
The CQ bit was checked by emircan@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
On 2016/11/04 04:56:48, Pawel Osciak wrote: > On 2016/11/02 21:19:21, emircan wrote: > > Based on offline discussion, map() is within the threshold of using gpu main > > thread. (Gpu main 10 ms, IO thread 10-100 µs) Therefore, I updated the change > > such that thread hop only happens when we are on IO. PTAL at PS#8. > > I think it should always be beneficial to map on separate thread, whichever > thread we'd be calling VEA::Encode() on afterwards, and most of all that should > also simplify the code? > > I.e., similarly to the unittest change, I believe we could have an > encode_task_runner pointing to either gpu main or io, depending on the result of > TryToSetupEncodeOnSeparateThread(), in Encode() always jump off to the worker > thread, map there, and post VEA::Encode() to encode_task_runner when done, > without having to do any checking which case we are in (IO or Main)? Sure. I reverted back to PS#7 and replaced callbacks with |encode_task_runner_| as you described. PTAL at PS#9.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Thanks! https://codereview.chromium.org/2427053002/diff/300001/media/gpu/ipc/service/... File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right): https://codereview.chromium.org/2427053002/diff/300001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:109: : owner_(owner), host_route_id_(host_route_id), sender_(nullptr) {} Nit: perhaps sender_ = nullptr in the declaration below? https://codereview.chromium.org/2427053002/diff/300001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:273: DLOG(ERROR) << __func__ << " failed."; Perhaps return here? https://codereview.chromium.org/2427053002/diff/300001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:486: frame->AddDestructionObserver( Please add a comment why we are doing this. https://codereview.chromium.org/2427053002/diff/300001/media/gpu/ipc/service/... File media/gpu/ipc/service/gpu_video_encode_accelerator.h (right): https://codereview.chromium.org/2427053002/diff/300001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.h:40: : public IPC::Listener, Should we also derive from IPC::Sender now that we implement Send()? https://codereview.chromium.org/2427053002/diff/300001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.h:80: void OnFilterRemoved(); I think in c++11 this can be private. https://codereview.chromium.org/2427053002/diff/300001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.h:163: weak_this_factory_for_encoder_worker; s/weak_this_factory_for_encoder_worker/weak_this_factory_for_encoder_worker_/ https://codereview.chromium.org/2427053002/diff/300001/media/gpu/media_founda... File media/gpu/media_foundation_video_encode_accelerator_win.cc (right): https://codereview.chromium.org/2427053002/diff/300001/media/gpu/media_founda... media/gpu/media_foundation_video_encode_accelerator_win.cc:201: encode_client_task_runner_ = main_client_task_runner_; Perhaps do this only if (!encoder_client_task_runner_), in case client calls TryToSetup...() before Initialize()? https://codereview.chromium.org/2427053002/diff/300001/media/gpu/video_encode... File media/gpu/video_encode_accelerator_unittest.cc (right): https://codereview.chromium.org/2427053002/diff/300001/media/gpu/video_encode... media/gpu/video_encode_accelerator_unittest.cc:1278: client_weak_factory_.InvalidateWeakPtrs(); I think this should be run on main? https://codereview.chromium.org/2427053002/diff/300001/media/video/video_enco... File media/video/video_encode_accelerator.h (right): https://codereview.chromium.org/2427053002/diff/300001/media/video/video_enco... media/video/video_encode_accelerator.h:171: // TODO(emircan): Add this method to video_encode_accelerator_unittest. This TODO no longer needed?
https://codereview.chromium.org/2427053002/diff/300001/media/gpu/ipc/service/... File media/gpu/ipc/service/gpu_video_encode_accelerator.cc (right): https://codereview.chromium.org/2427053002/diff/300001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:109: : owner_(owner), host_route_id_(host_route_id), sender_(nullptr) {} On 2016/11/07 02:00:35, Pawel Osciak wrote: > Nit: perhaps sender_ = nullptr in the declaration below? Done. https://codereview.chromium.org/2427053002/diff/300001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:273: DLOG(ERROR) << __func__ << " failed."; On 2016/11/07 02:00:35, Pawel Osciak wrote: > Perhaps return here? Done. https://codereview.chromium.org/2427053002/diff/300001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.cc:486: frame->AddDestructionObserver( On 2016/11/07 02:00:35, Pawel Osciak wrote: > Please add a comment why we are doing this. Done. https://codereview.chromium.org/2427053002/diff/300001/media/gpu/ipc/service/... File media/gpu/ipc/service/gpu_video_encode_accelerator.h (right): https://codereview.chromium.org/2427053002/diff/300001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.h:40: : public IPC::Listener, On 2016/11/07 02:00:36, Pawel Osciak wrote: > Should we also derive from IPC::Sender now that we implement Send()? Done. https://codereview.chromium.org/2427053002/diff/300001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.h:80: void OnFilterRemoved(); On 2016/11/07 02:00:35, Pawel Osciak wrote: > I think in c++11 this can be private. Done. https://codereview.chromium.org/2427053002/diff/300001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.h:163: weak_this_factory_for_encoder_worker; On 2016/11/07 02:00:36, Pawel Osciak wrote: > s/weak_this_factory_for_encoder_worker/weak_this_factory_for_encoder_worker_/ Done. https://codereview.chromium.org/2427053002/diff/300001/media/gpu/media_founda... File media/gpu/media_foundation_video_encode_accelerator_win.cc (right): https://codereview.chromium.org/2427053002/diff/300001/media/gpu/media_founda... media/gpu/media_foundation_video_encode_accelerator_win.cc:201: encode_client_task_runner_ = main_client_task_runner_; On 2016/11/07 02:00:36, Pawel Osciak wrote: > Perhaps do this only if (!encoder_client_task_runner_), in case client calls > TryToSetup...() before Initialize()? Done. https://codereview.chromium.org/2427053002/diff/300001/media/gpu/video_encode... File media/gpu/video_encode_accelerator_unittest.cc (right): https://codereview.chromium.org/2427053002/diff/300001/media/gpu/video_encode... media/gpu/video_encode_accelerator_unittest.cc:1278: client_weak_factory_.InvalidateWeakPtrs(); On 2016/11/07 02:00:36, Pawel Osciak wrote: > I think this should be run on main? |client_weak_factory_| is used only when TryToSetupEncodeOnSeperateThread() returns true to have weak pointers to use when posting tasks on |io_thread_|. It is safe to invalidate here because |encode_task_runner_| points to |io_thread_| in this case. If not, it is safe to invalidate it on |main_thread_task_runner_| as no weak pointers are used. I renamed it to |client_weak_factory_for_io_| and added comment to explain it. https://codereview.chromium.org/2427053002/diff/300001/media/video/video_enco... File media/video/video_encode_accelerator.h (right): https://codereview.chromium.org/2427053002/diff/300001/media/video/video_enco... media/video/video_encode_accelerator.h:171: // TODO(emircan): Add this method to video_encode_accelerator_unittest. On 2016/11/07 02:00:36, Pawel Osciak wrote: > This TODO no longer needed? Done.
The CQ bit was checked by emircan@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: cast_shell_android on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/cast_shell_a...)
The CQ bit was checked by emircan@chromium.org to run a CQ dry run
Patchset #10 (id:320001) has been deleted
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
https://codereview.chromium.org/2427053002/diff/340001/media/gpu/ipc/service/... File media/gpu/ipc/service/gpu_video_encode_accelerator.h (right): https://codereview.chromium.org/2427053002/diff/340001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.h:115: // Notifies renderer that input is completed. Nit: perhaps we could rephrase "input is completed", as it feels unclear what that means... https://codereview.chromium.org/2427053002/diff/340001/media/gpu/media_founda... File media/gpu/media_foundation_video_encode_accelerator_win.cc (right): https://codereview.chromium.org/2427053002/diff/340001/media/gpu/media_founda... media/gpu/media_foundation_video_encode_accelerator_win.cc:203: encode_client_ = main_client_; This should be under the if as well...
The CQ bit was checked by emircan@chromium.org to run a CQ dry run
https://codereview.chromium.org/2427053002/diff/340001/media/gpu/ipc/service/... File media/gpu/ipc/service/gpu_video_encode_accelerator.h (right): https://codereview.chromium.org/2427053002/diff/340001/media/gpu/ipc/service/... media/gpu/ipc/service/gpu_video_encode_accelerator.h:115: // Notifies renderer that input is completed. On 2016/11/09 01:03:02, Pawel Osciak wrote: > Nit: perhaps we could rephrase "input is completed", as it feels unclear what > that means... Done. https://codereview.chromium.org/2427053002/diff/340001/media/gpu/media_founda... File media/gpu/media_foundation_video_encode_accelerator_win.cc (right): https://codereview.chromium.org/2427053002/diff/340001/media/gpu/media_founda... media/gpu/media_foundation_video_encode_accelerator_win.cc:203: encode_client_ = main_client_; On 2016/11/09 01:03:02, Pawel Osciak wrote: > This should be under the if as well... Done.
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
lgtm
The CQ bit was unchecked by emircan@chromium.org
The CQ bit was checked by emircan@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from sandersd@chromium.org Link to the patchset: https://codereview.chromium.org/2427053002/#ps360001 (title: "posciak@ comments.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Description was changed from ========== Move video encode accelerator IPC messages to GPU IO thread This CL moves video encode accelerator IPC messages to GPU IO thread instead of the main thread. This helps stabilize frame rate as well as reduce jitter on Windows. Currently, a lot of these calls get huge delays and results in dropped frames. In order to do this with respect to each platform, TryToSetupEncodeOnSeparateThread() is added to VideoEncodeAccelerator interface. If this method return false, we keep all IPC messages on main thread like before. It returns false by default. Note: Initially, I only moved AcceleratedVideoEncoderMsg_Encode call to IO thread, but then we started waiting for output buffers for a long time and again stayed below 30 fps. Therefore I moved all three functions to IO thread and reached ~30 fps, even when switching between tabs. BUG=657217, 649275 TEST=RunsAudioVideoCall60SecsAndLogsInternalMetricsH264 browser test on Windows result in stable 30 fps. ========== to ========== Move video encode accelerator IPC messages to GPU IO thread This CL moves video encode accelerator IPC messages to GPU IO thread instead of the main thread. This helps stabilize frame rate as well as reduce jitter on Windows. Currently, a lot of these calls get huge delays and results in dropped frames. In order to do this with respect to each platform, TryToSetupEncodeOnSeparateThread() is added to VideoEncodeAccelerator interface. If this method return false, we keep all IPC messages on main thread like before. It returns false by default. Note: Initially, I only moved AcceleratedVideoEncoderMsg_Encode call to IO thread, but then we started waiting for output buffers for a long time and again stayed below 30 fps. Therefore I moved all three functions to IO thread and reached ~30 fps, even when switching between tabs. BUG=657217, 649275 TEST=RunsAudioVideoCall60SecsAndLogsInternalMetricsH264 browser test on Windows result in stable 30 fps. ==========
Message was sent while issue was closed.
Committed patchset #11 (id:360001)
Message was sent while issue was closed.
Description was changed from ========== Move video encode accelerator IPC messages to GPU IO thread This CL moves video encode accelerator IPC messages to GPU IO thread instead of the main thread. This helps stabilize frame rate as well as reduce jitter on Windows. Currently, a lot of these calls get huge delays and results in dropped frames. In order to do this with respect to each platform, TryToSetupEncodeOnSeparateThread() is added to VideoEncodeAccelerator interface. If this method return false, we keep all IPC messages on main thread like before. It returns false by default. Note: Initially, I only moved AcceleratedVideoEncoderMsg_Encode call to IO thread, but then we started waiting for output buffers for a long time and again stayed below 30 fps. Therefore I moved all three functions to IO thread and reached ~30 fps, even when switching between tabs. BUG=657217, 649275 TEST=RunsAudioVideoCall60SecsAndLogsInternalMetricsH264 browser test on Windows result in stable 30 fps. ========== to ========== Move video encode accelerator IPC messages to GPU IO thread This CL moves video encode accelerator IPC messages to GPU IO thread instead of the main thread. This helps stabilize frame rate as well as reduce jitter on Windows. Currently, a lot of these calls get huge delays and results in dropped frames. In order to do this with respect to each platform, TryToSetupEncodeOnSeparateThread() is added to VideoEncodeAccelerator interface. If this method return false, we keep all IPC messages on main thread like before. It returns false by default. Note: Initially, I only moved AcceleratedVideoEncoderMsg_Encode call to IO thread, but then we started waiting for output buffers for a long time and again stayed below 30 fps. Therefore I moved all three functions to IO thread and reached ~30 fps, even when switching between tabs. BUG=657217, 649275 TEST=RunsAudioVideoCall60SecsAndLogsInternalMetricsH264 browser test on Windows result in stable 30 fps. Committed: https://crrev.com/66ea9831c5714e7d25e544f31bce201cdb35c4af Cr-Commit-Position: refs/heads/master@{#430882} ==========
Message was sent while issue was closed.
Patchset 11 (id:??) landed as https://crrev.com/66ea9831c5714e7d25e544f31bce201cdb35c4af Cr-Commit-Position: refs/heads/master@{#430882} |