|
|
Created:
8 years, 1 month ago by ananta Modified:
8 years, 1 month ago CC:
chromium-reviews, joi+watch-content_chromium.org, darin-cc_chromium.org, jam, apatrick_chromium, feature-media-reviews_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src/ Visibility:
Public. |
DescriptionDon't wait endlessly for flushing the batched queries sent to the GPU driver
during H/W decode
We currently have a loop in the DXVA decoder on Windows 7 and above where we flush
the queued D3D command bufers and wait for the operation to complete. This causes an
infinite loop on certain multicore machines due to a bug in the IDirect3DQuery9::GetData
call not returning the correct result. This seems to be a very old bug dating back to April 2008
http://us.generation-nt.com/idirect3dquery9-getdata-fails-return-ok-multi-core-cpus-help-26986872.html
Fix for now is to have an upper limit of 10 iterations while we wait for the flush to complete. ld
BUG=149139
R=apatrick
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=168093
Patch Set 1 #
Total comments: 13
Patch Set 2 : #Patch Set 3 : #
Total comments: 4
Patch Set 4 : #Patch Set 5 : #Patch Set 6 : #
Total comments: 10
Patch Set 7 : #Patch Set 8 : #
Total comments: 1
Patch Set 9 : #Patch Set 10 : #Patch Set 11 : #Patch Set 12 : #Patch Set 13 : #
Total comments: 4
Patch Set 14 : #Messages
Total messages: 16 (0 generated)
lgtm
https://chromiumcodereview.appspot.com/11369229/diff/1/content/common/gpu/med... File content/common/gpu/media/dxva_video_decode_accelerator.cc (right): https://chromiumcodereview.appspot.com/11369229/diff/1/content/common/gpu/med... content/common/gpu/media/dxva_video_decode_accelerator.cc:77: // the batched queries to the driver. Why is it safe to give up? What is the cost to correctness? Just torn frames or something else? Should we be UMA-reporting the event and/or the # of iterations it takes to get through the flush? Is there a blocking version of the flush you could use instead? Is it possible that the Sleep(1) isn't enough of a yield for the driver in question, and that's the problem? https://chromiumcodereview.appspot.com/11369229/diff/1/content/common/gpu/med... content/common/gpu/media/dxva_video_decode_accelerator.cc:78: #define MAX_ITERATIONS_FOR_D3D_FLUSH 10 prefer enum to #define enum { kMaxiterationsForD3dFlush = 10 }; https://chromiumcodereview.appspot.com/11369229/diff/1/content/common/gpu/med... content/common/gpu/media/dxva_video_decode_accelerator.cc:142: hr = buffer->SetCurrentLength(size); what's this move about? unmentioned in CL description. https://chromiumcodereview.appspot.com/11369229/diff/1/content/common/gpu/med... content/common/gpu/media/dxva_video_decode_accelerator.cc:405: // don't want to loop endlessly waiting for the driver to return success. ...because we've seen driver bugs? https://chromiumcodereview.appspot.com/11369229/diff/1/content/common/gpu/med... content/common/gpu/media/dxva_video_decode_accelerator.cc:410: if (iterations > MAX_ITERATIONS_FOR_D3D_FLUSH) putting this test in the while condition would obviate the need for the if/break https://chromiumcodereview.appspot.com/11369229/diff/1/content/common/gpu/med... content/common/gpu/media/dxva_video_decode_accelerator.cc:417: } while (hr == S_FALSE); l.408-417 could be rewritten as: int iterations = 0; while (query_->GetData(NULL, 0, D3DGETDATA_FLUSH) == S_FALSE) && ++iterations < MAX_ITERATIONS_FOR_D3D_FLUSH) { Sleep(1); // Poor-man's Yield(). } Not only would it be >50% shorter, it would also not sleep right before giving up on iterations being too large. https://chromiumcodereview.appspot.com/11369229/diff/1/content/common/gpu/med... content/common/gpu/media/dxva_video_decode_accelerator.cc:836: hr = MFSetAttributeSize(media_type, MF_MT_PIXEL_ASPECT_RATIO, 2, 1); wat?
https://chromiumcodereview.appspot.com/11369229/diff/1/content/common/gpu/med... File content/common/gpu/media/dxva_video_decode_accelerator.cc (right): https://chromiumcodereview.appspot.com/11369229/diff/1/content/common/gpu/med... content/common/gpu/media/dxva_video_decode_accelerator.cc:77: // the batched queries to the driver. On 2012/11/13 23:07:13, Ami Fischman wrote: > Why is it safe to give up? What is the cost to correctness? > Just torn frames or something else? > Should we be UMA-reporting the event and/or the # of iterations it takes to get > through the flush? > > Is there a blocking version of the flush you could use instead? > > Is it possible that the Sleep(1) isn't enough of a yield for the driver in > question, and that's the problem? > Cost to correctness would be torn frames in the worst case. On my machine it seems to work correctly even without the flush code. We are doing this to prevent a race condition. There is no blocking version of the flush call. As regards the Sleep(1), needs more investigation. will do that in a separate change. This bug seems very bad. would like this to be fixed ASAP. Will also do the UMA reporting in a subsequent change. https://chromiumcodereview.appspot.com/11369229/diff/1/content/common/gpu/med... content/common/gpu/media/dxva_video_decode_accelerator.cc:142: hr = buffer->SetCurrentLength(size); On 2012/11/13 23:07:13, Ami Fischman wrote: > what's this move about? unmentioned in CL description. Sorry. needless change crept in. was debugging something else. https://chromiumcodereview.appspot.com/11369229/diff/1/content/common/gpu/med... content/common/gpu/media/dxva_video_decode_accelerator.cc:410: if (iterations > MAX_ITERATIONS_FOR_D3D_FLUSH) On 2012/11/13 23:07:13, Ami Fischman wrote: > putting this test in the while condition would obviate the need for the if/break Done. https://chromiumcodereview.appspot.com/11369229/diff/1/content/common/gpu/med... content/common/gpu/media/dxva_video_decode_accelerator.cc:417: } while (hr == S_FALSE); On 2012/11/13 23:07:13, Ami Fischman wrote: > l.408-417 could be rewritten as: > > int iterations = 0; > while (query_->GetData(NULL, 0, D3DGETDATA_FLUSH) == S_FALSE) && > ++iterations < MAX_ITERATIONS_FOR_D3D_FLUSH) { > Sleep(1); // Poor-man's Yield(). > } > > Not only would it be >50% shorter, it would also not sleep right before giving > up on iterations being too large. Done. https://chromiumcodereview.appspot.com/11369229/diff/1/content/common/gpu/med... content/common/gpu/media/dxva_video_decode_accelerator.cc:836: hr = MFSetAttributeSize(media_type, MF_MT_PIXEL_ASPECT_RATIO, 2, 1); On 2012/11/13 23:07:13, Ami Fischman wrote: > wat? Unrelated change. Was debugging the small video painting bug. removed.
I don't understand the rush, considering this was reported by exactly one person on stable channel, and we're over a month away from the next branch point. I'd much rather we be in a position to understand the ramifications of this CL than replace a visible hang with silent corruption. If the flush doesn't go through, can a decoded picture from one video be rendered into another video's texture? Possibly from another site/tab? https://chromiumcodereview.appspot.com/11369229/diff/5003/content/common/gpu/... File content/common/gpu/media/dxva_video_decode_accelerator.cc (right): https://chromiumcodereview.appspot.com/11369229/diff/5003/content/common/gpu/... content/common/gpu/media/dxva_video_decode_accelerator.cc:78: enum { kMaxiterationsForD3dFlush = 10 }; s/xi/xI/ https://chromiumcodereview.appspot.com/11369229/diff/5003/content/common/gpu/... content/common/gpu/media/dxva_video_decode_accelerator.cc:407: // iterations. This comment is useless; it restates in English what the code is clearly doing in C++. A useful comment would explain: a) why a FLUSH is needed/desired b) why sometimes we need to give up without actually getting a FLUSH c) why (b) is ok in the presence of (a)
On 2012/11/13 23:31:25, Ami Fischman wrote: > I don't understand the rush, considering this was reported by exactly one person > on stable channel, and we're over a month away from the next branch point. > I'd much rather we be in a position to understand the ramifications of this CL > than replace a visible hang with silent corruption. > > If the flush doesn't go through, can a decoded picture from one video be > rendered into another video's texture? Possibly from another site/tab? Addressed these concerns by handling the case where we reach the upper limit on iterations. We do this by forcing a readback from the angle surface which ensures that the operations are completed. > > https://chromiumcodereview.appspot.com/11369229/diff/5003/content/common/gpu/... > File content/common/gpu/media/dxva_video_decode_accelerator.cc (right): > > https://chromiumcodereview.appspot.com/11369229/diff/5003/content/common/gpu/... > content/common/gpu/media/dxva_video_decode_accelerator.cc:78: enum { > kMaxiterationsForD3dFlush = 10 }; > s/xi/xI/ > > https://chromiumcodereview.appspot.com/11369229/diff/5003/content/common/gpu/... > content/common/gpu/media/dxva_video_decode_accelerator.cc:407: // iterations. > This comment is useless; it restates in English what the code is clearly doing > in C++. > > A useful comment would explain: > a) why a FLUSH is needed/desired > b) why sometimes we need to give up without actually getting a FLUSH > c) why (b) is ok in the presence of (a)
https://chromiumcodereview.appspot.com/11369229/diff/5003/content/common/gpu/... File content/common/gpu/media/dxva_video_decode_accelerator.cc (right): https://chromiumcodereview.appspot.com/11369229/diff/5003/content/common/gpu/... content/common/gpu/media/dxva_video_decode_accelerator.cc:78: enum { kMaxiterationsForD3dFlush = 10 }; On 2012/11/13 23:31:25, Ami Fischman wrote: > s/xi/xI/ Done. https://chromiumcodereview.appspot.com/11369229/diff/5003/content/common/gpu/... content/common/gpu/media/dxva_video_decode_accelerator.cc:407: // iterations. On 2012/11/13 23:31:25, Ami Fischman wrote: > This comment is useless; it restates in English what the code is clearly doing > in C++. > > A useful comment would explain: > a) why a FLUSH is needed/desired > b) why sometimes we need to give up without actually getting a FLUSH > c) why (b) is ok in the presence of (a) Rephrased. There is also some new code which addresses the case where if we reach the upper limit on iterations, we force a readback from the ANGLE surface to a surface created in system memory which ensures that the pending operations are completed.
https://chromiumcodereview.appspot.com/11369229/diff/9002/content/common/gpu/... File content/common/gpu/media/dxva_video_decode_accelerator.cc (right): https://chromiumcodereview.appspot.com/11369229/diff/9002/content/common/gpu/... content/common/gpu/media/dxva_video_decode_accelerator.cc:1: // Copyright (c) 2012 The Chromium Authors. All rights reserved. CL description needs update. https://chromiumcodereview.appspot.com/11369229/diff/9002/content/common/gpu/... content/common/gpu/media/dxva_video_decode_accelerator.cc:418: // ensures that the pending operations on the ANGLE surface complete. s/complete/completes/ https://chromiumcodereview.appspot.com/11369229/diff/9002/content/common/gpu/... content/common/gpu/media/dxva_video_decode_accelerator.cc:425: if (iterations > kMaxIterationsForD3DFlush) { This can never be true. https://chromiumcodereview.appspot.com/11369229/diff/9002/content/common/gpu/... content/common/gpu/media/dxva_video_decode_accelerator.cc:427: hr = device_->CreateOffscreenPlainSurface(surface_desc.Width, IIUC systems that have this problem exhibit it for every frame. So this patch will cause each frame to allocate a new texture, copy to it synchronously, and then discard it? That sounds like it might be less efficient than simply doing the decode in SW in the first place. If that's the case, then maybe instead of this CL we should simply blacklist the offending driver/card?
https://chromiumcodereview.appspot.com/11369229/diff/1/content/common/gpu/med... File content/common/gpu/media/dxva_video_decode_accelerator.cc (right): https://chromiumcodereview.appspot.com/11369229/diff/1/content/common/gpu/med... content/common/gpu/media/dxva_video_decode_accelerator.cc:77: // the batched queries to the driver. I expect it will be torn frames or similar. It won't crash. We could test that by replacing the loop with a sleep. https://chromiumcodereview.appspot.com/11369229/diff/9002/content/common/gpu/... File content/common/gpu/media/dxva_video_decode_accelerator.cc (right): https://chromiumcodereview.appspot.com/11369229/diff/9002/content/common/gpu/... content/common/gpu/media/dxva_video_decode_accelerator.cc:435: hr = device_->GetRenderTargetData(d3d_surface, temp_surface); This call could be asynchronous as well (don't know if it is in practice). I think you might have to also lock the temporary surface with LockRect. It could be a read-only lock.
https://chromiumcodereview.appspot.com/11369229/diff/9002/content/common/gpu/... File content/common/gpu/media/dxva_video_decode_accelerator.cc (right): https://chromiumcodereview.appspot.com/11369229/diff/9002/content/common/gpu/... content/common/gpu/media/dxva_video_decode_accelerator.cc:1: // Copyright (c) 2012 The Chromium Authors. All rights reserved. On 2012/11/14 01:26:07, Ami Fischman wrote: > CL description needs update. Done. https://chromiumcodereview.appspot.com/11369229/diff/9002/content/common/gpu/... content/common/gpu/media/dxva_video_decode_accelerator.cc:418: // ensures that the pending operations on the ANGLE surface complete. On 2012/11/14 01:26:07, Ami Fischman wrote: > s/complete/completes/ Rephrased https://chromiumcodereview.appspot.com/11369229/diff/9002/content/common/gpu/... content/common/gpu/media/dxva_video_decode_accelerator.cc:425: if (iterations > kMaxIterationsForD3DFlush) { On 2012/11/14 01:26:07, Ami Fischman wrote: > This can never be true. This code has been removed in favor of the BeginFrame/EndFrame calls encapsulating StretchRect. This is safe because as per docs StretchRect will only fail in a BeginFrame/EndFrame sequence for depth/stencil surfaces. https://chromiumcodereview.appspot.com/11369229/diff/9002/content/common/gpu/... content/common/gpu/media/dxva_video_decode_accelerator.cc:427: hr = device_->CreateOffscreenPlainSurface(surface_desc.Width, On 2012/11/14 01:26:07, Ami Fischman wrote: > IIUC systems that have this problem exhibit it for every frame. > > So this patch will cause each frame to allocate a new texture, copy to it > synchronously, and then discard it? > That sounds like it might be less efficient than simply doing the decode in SW > in the first place. If that's the case, then maybe instead of this CL we should > simply blacklist the offending driver/card? Removed. https://chromiumcodereview.appspot.com/11369229/diff/9002/content/common/gpu/... content/common/gpu/media/dxva_video_decode_accelerator.cc:435: hr = device_->GetRenderTargetData(d3d_surface, temp_surface); On 2012/11/14 01:27:49, apatrick_chromium wrote: > This call could be asynchronous as well (don't know if it is in practice). I > think you might have to also lock the temporary surface with LockRect. It could > be a read-only lock. Code has been removed.
https://chromiumcodereview.appspot.com/11369229/diff/10002/content/common/gpu... File content/common/gpu/media/dxva_video_decode_accelerator.cc (right): https://chromiumcodereview.appspot.com/11369229/diff/10002/content/common/gpu... content/common/gpu/media/dxva_video_decode_accelerator.cc:390: hr = query_->Issue(D3DISSUE_END); Issuing the query upfront causes the loop below to execute in most cases once.
This change LGTM but I wonder whether it'll actually fix Derek's problem from the linked bug. Can we get confirmation of the fix on that HW/driver combo before landing this as a fix for that? Also please update the CL subject in rietveld.
On 2012/11/14 17:13:10, Ami Fischman wrote: > This change LGTM but I wonder whether it'll actually fix Derek's problem from > the linked bug. Can we get confirmation of the fix on that HW/driver combo > before landing this as a fix for that? > > Also please update the CL subject in rietveld. I reverted back to the initial approach of having an upper bound on the number of iterations we spin in the IDirect3DQuery9::GetData call. This appears to be a long standing bug in Windows which occurs at times on multi core machines. http://us.generation-nt.com/idirect3dquery9-getdata-fails-return-ok-multi-cor... I was able to reproduce this on a test laptop with a dual core machine with an ATI card. dschuff mentioned that his home machine where he sees this problem is an AMD 4 core machine. I was able to reproduce this at my home desktop which is a 4 core AMD machine with a NVIDIA GE Force card. Given that after the StretchRect call we issue a Flush via the query object in a loop upto 10 iterations, and the additional latency of the picture ready IPC message getting delivered and processed by the caller, this should be a relatively safe fix. The likelihood of a garbage frame showing up in the video is very remote. PTAL.
LGTM % nits https://chromiumcodereview.appspot.com/11369229/diff/31002/content/common/gpu... File content/common/gpu/media/dxva_video_decode_accelerator.cc (right): https://chromiumcodereview.appspot.com/11369229/diff/31002/content/common/gpu... content/common/gpu/media/dxva_video_decode_accelerator.cc:76: // the batched queries to the driver. s/driver/driver (and allow torn/corrupt frames to be rendered)/ https://chromiumcodereview.appspot.com/11369229/diff/31002/content/common/gpu... content/common/gpu/media/dxva_video_decode_accelerator.cc:344: // Workaround is to have a upper limit of 10 on the number of iterations we s/a upper/an upper/
lgtm
https://chromiumcodereview.appspot.com/11369229/diff/31002/content/common/gpu... File content/common/gpu/media/dxva_video_decode_accelerator.cc (right): https://chromiumcodereview.appspot.com/11369229/diff/31002/content/common/gpu... content/common/gpu/media/dxva_video_decode_accelerator.cc:76: // the batched queries to the driver. On 2012/11/15 21:57:49, Ami Fischman wrote: > s/driver/driver (and allow torn/corrupt frames to be rendered)/ Done. https://chromiumcodereview.appspot.com/11369229/diff/31002/content/common/gpu... content/common/gpu/media/dxva_video_decode_accelerator.cc:344: // Workaround is to have a upper limit of 10 on the number of iterations we On 2012/11/15 21:57:49, Ami Fischman wrote: > s/a upper/an upper/ Done. |