|
|
Created:
7 years ago by jadahl Modified:
6 years, 8 months ago CC:
chromium-reviews, piman+watch_chromium.org, vangelis, Ken Russell (switch to Gerrit), Vangelis Kokkevis, jamesr, vmiura Base URL:
http://git.chromium.org/chromium/src.git@master Visibility:
Public. |
Descriptiongpu: Reuse transfer buffers more aggresively
By keeping track of transfer buffer usage (both sync and async), it is
possible to reuse an existing transfer buffer earlier than it is today.
For synchronous uploads, this is done by inserting tokens marking a
buffer's last usage. If such a token has passed, the buffer can be
reused.
For asynchronous uploads, this is done by adding an internal async
upload token to the GLES2Implementation and GLES2CmdDecoderImpl that
enumerates all asynchronous uploads. When an upload is completed, the
token is synchronized with the client which then can free the memory
associated with the async token.
The fenced allocator and mapped memory manager gets a callback that is
used to allow fenced allocator to make the user, in this case
GLES2Implementation, poll the current async upload token and free any
associated memory.
BUG=328808
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=260177
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=260507
Patch Set 1 #
Total comments: 1
Patch Set 2 : [RFC] gpu: Reuse transfer buffers more aggressively #
Total comments: 19
Patch Set 3 : [WIP] gpu: Reuse transfer buffers more aggresively #
Total comments: 26
Patch Set 4 : [WIP] Review comments follow-up #
Total comments: 7
Patch Set 5 : [WIP] Introduced internal SetAsyncToken command buffer command #
Total comments: 20
Patch Set 6 : Async upload token part of existing Async command; use separate shared memory to sync async upload … #
Total comments: 35
Patch Set 7 : Added glWaitAllAsyncTexImage2DCHROMIUM; other review issues addressed #
Total comments: 10
Patch Set 8 : Added AsyncUploadSync test, FencedAllocator test, addressed review issues #
Total comments: 3
Patch Set 9 : Rebased #Patch Set 10 : Rebased; removed unnecessary barrier; use CheckedNumeric #
Total comments: 2
Patch Set 11 : Fixed comment typos #Patch Set 12 : Lint fixes; updated AsyncPixelTransferManagerSync to the new approach #Patch Set 13 : Rebased #
Total comments: 1
Patch Set 14 : Addressed review issues #Patch Set 15 : Fixed bug in unittest #
Total comments: 4
Patch Set 16 : Unset texture and texture_ref after deleting #Messages
Total messages: 138 (0 generated)
Hi, This patch introduces the ability to aggressively reuse transfer buffers. This is a work-in-progress, but I'm creating this review request in order to get some input and assistance on the approach. The main problem with this patch is that it does not separate reusable not-yet-free buffers from non-reusable ones. If this is the way forward, we need a way to distinguish these two kinds of buffers and I'm not sure what the best way to do this would be. One idea I had was to use another kind of target for buffers like this, but I do not know if that would be problematic in some way, or if there is some much better way of doing this (thus this WIP review). What is already done is to separate the mapped memory managers depending on what they are used for, for example queries are not using the same mapped memory managers as transfer buffers. I suppose that the same would have to be done for the two kinds of transfer buffers as well, in order to be able to pick any block in a chunk managed by some mapped memory manager. I would also see if it is possible to get some input on the approach over all, if this is the right level to change etc. I have made some measurements already, (attached graphs to the associated bug), and there are some measurable improvements. Thanks Jonas
If I understand this correctly, I like (at least in concept) where this is going. I like the idea of a different target, which could identify buffers that can be reclaimed immediately, vs buffers that must go through the FREE_PENDING stage first. Note that if we go this route, the compositor also has some work to do, as there are still a few straggling cases where we don't respect the full upload life-time (primarily 'upload waits' and deletion). Perhaps we could implement this first and then have a separate bug for the compositor side?
https://codereview.chromium.org/116863003/diff/1/gpu/command_buffer/client/fe... File gpu/command_buffer/client/fenced_allocator.h (right): https://codereview.chromium.org/116863003/diff/1/gpu/command_buffer/client/fe... gpu/command_buffer/client/fenced_allocator.h:38: bool aggressive_reuse, Rather than a separate allocator, do you think it would be possible to flag deletions for immediate reuse (ie. skip free-pending state)? This is partially a question for reveman/sievers (as to how best this could be communicated from the client).
On 2013/12/18 19:21:14, epennerAtGoogle wrote: > https://codereview.chromium.org/116863003/diff/1/gpu/command_buffer/client/fe... > File gpu/command_buffer/client/fenced_allocator.h (right): > > https://codereview.chromium.org/116863003/diff/1/gpu/command_buffer/client/fe... > gpu/command_buffer/client/fenced_allocator.h:38: bool aggressive_reuse, > Rather than a separate allocator, do you think it would be possible to flag > deletions for immediate reuse (ie. skip free-pending state)? This is partially > a question for reveman/sievers (as to how best this could be communicated from > the client). I think the problem with that is that we'd get a chunk with reusable and non-reusable blocks mixed up together, which would reduce the chance of being able to collapse several FREE and FREE_PENDING_TOKEN blocks into one as is done in the patch.
> I think the problem with that is that we'd get a chunk with reusable and > non-reusable blocks mixed up together, which would reduce the chance of being > able to collapse several FREE and FREE_PENDING_TOKEN blocks into one as is done > in the patch. Isn't there a trade-off that we need two sets of memory blocks? We will always have some non-aggressive allocations (eg. we create some vertex buffers that need to be 'free-pending'). If we need two sets of blocks for those different allocation types, it seems less efficient than having a bit of fragmentation. Personally I think having an 'aggressive free' would be best, but I think the other reviewers here should also chime in. Aggressive free could be per-transfer buffer (via different xfer buffer type), or (slightly more ugly) via a flag: glEnable(GL_AGGRESSIVE_FREE_BUFFERS_HINT) ... // Free buffer that we know is no longer being used glDisable(GL_AGGRESSIVE_FREE_BUFFERS_HINT)
On 2013/12/19 19:27:43, epennerAtGoogle wrote: > > I think the problem with that is that we'd get a chunk with reusable and > > non-reusable blocks mixed up together, which would reduce the chance of being > > able to collapse several FREE and FREE_PENDING_TOKEN blocks into one as is > done > > in the patch. > > Isn't there a trade-off that we need two sets of memory blocks? We will always > have some non-aggressive allocations (eg. we create some vertex buffers that > need to be 'free-pending'). If we need two sets of blocks for those different > allocation types, it seems less efficient than having a bit of fragmentation. Hmm. At least for WebGL usage. For queries and vertex buffers I think the memory manager can be much more restrictive in allocation (no need to allocate a 2MB chunk), but that wont work as well for WebGL I suppose. > > Personally I think having an 'aggressive free' would be best, but I think the > other reviewers here should also chime in. With this, if I understand correctly, you mean to not FreePendingToken() in GLES2Implementation, but to Free(), for the appropriate buffers. This is something I agree on will be simpler. And it will eliminate the changes to MappedMemoryManager and the FencedAllocators. > > Aggressive free could be per-transfer buffer (via different xfer buffer type), > or (slightly more ugly) via a flag: > glEnable(GL_AGGRESSIVE_FREE_BUFFERS_HINT) > ... // Free buffer that we know is no longer being used > glDisable(GL_AGGRESSIVE_FREE_BUFFERS_HINT) Wouldn't we achieve the same, by just having a GL_PIXEL_SYNC_UNPACK_TRANSFER_BUFFER_CHROMIUM (or something other similar to the current GL_PIXEL_UNPACK_TRANSFER_BUFFER_CHROMIUM) that would just work the same but mark the buffer with a flag saying it can be free:ed aggressively? Keeping that kind of state would be easier than the a bit more implicit state that we'd have with glEnable/glDisable.
> With this, if I understand correctly, you mean to not FreePendingToken() in > GLES2Implementation, but to Free(), for the appropriate buffers. This is > something I agree on will be simpler. And it will eliminate the changes to > MappedMemoryManager and the FencedAllocators. > > > > > Aggressive free could be per-transfer buffer (via different xfer buffer type), > > or (slightly more ugly) via a flag: > > glEnable(GL_AGGRESSIVE_FREE_BUFFERS_HINT) > > ... // Free buffer that we know is no longer being used > > glDisable(GL_AGGRESSIVE_FREE_BUFFERS_HINT) > > Wouldn't we achieve the same, by just having a > GL_PIXEL_SYNC_UNPACK_TRANSFER_BUFFER_CHROMIUM (or something other similar to the > current GL_PIXEL_UNPACK_TRANSFER_BUFFER_CHROMIUM) that would just work the same > but mark the buffer with a flag saying it can be free:ed aggressively? Keeping > that kind of state would be easier than the a bit more implicit state that we'd > have with glEnable/glDisable. Yes that would work too. The only issue there is it's a bit less flexible. That's fine perhaps, and maybe more clean. Currently there are a few cases that would prevent us from using that in the compositor though, and we'd need to fix those before enabling it. Lastly Sievers@ and I were chatting, and there is one last option that might be the best of all. If we tracked the 'last-usage-token' for these buffers, we could optimize aggressive deletion under the hood for all users. Rather than creating a token on unmap, we could use the 'last-usage-token', which might have already passed! (in this case we can re-use immediately) Unfortunately, async-uploads add a wrinkle to this option, as we would also need to store the 'last-async-query'. The async-query is the equivalent of tokens when using async uploads, and we would need to confirm that also passed before recycling the buffer. I'm guessing sievers/reveman will like the last option since it's completely 'automatic', with no new API required.
On 2013/12/19 22:41:03, epennerAtGoogle wrote: > > With this, if I understand correctly, you mean to not FreePendingToken() in > > GLES2Implementation, but to Free(), for the appropriate buffers. This is > > something I agree on will be simpler. And it will eliminate the changes to > > MappedMemoryManager and the FencedAllocators. > > > > > > > > Aggressive free could be per-transfer buffer (via different xfer buffer > type), > > > or (slightly more ugly) via a flag: > > > glEnable(GL_AGGRESSIVE_FREE_BUFFERS_HINT) > > > ... // Free buffer that we know is no longer being used > > > glDisable(GL_AGGRESSIVE_FREE_BUFFERS_HINT) > > > > Wouldn't we achieve the same, by just having a > > GL_PIXEL_SYNC_UNPACK_TRANSFER_BUFFER_CHROMIUM (or something other similar to > the > > current GL_PIXEL_UNPACK_TRANSFER_BUFFER_CHROMIUM) that would just work the > same > > but mark the buffer with a flag saying it can be free:ed aggressively? Keeping > > that kind of state would be easier than the a bit more implicit state that > we'd > > have with glEnable/glDisable. > > Yes that would work too. The only issue there is it's a bit less flexible. > That's fine perhaps, and maybe more clean. Currently there are a few cases that > would prevent us from using that in the compositor though, and we'd need to fix > those before enabling it. > > Lastly Sievers@ and I were chatting, and there is one last option that might be > the best of all. If we tracked the 'last-usage-token' for these buffers, we > could optimize aggressive deletion under the hood for all users. Rather than > creating a token on unmap, we could use the 'last-usage-token', which might have > already passed! (in this case we can re-use immediately) Unfortunately, > async-uploads add a wrinkle to this option, as we would also need to store the > 'last-async-query'. The async-query is the equivalent of tokens when using async > uploads, and we would need to confirm that also passed before recycling the > buffer. > > I'm guessing sievers/reveman will like the last option since it's completely > 'automatic', with no new API required. This last option sgtm. 'last-async-query' could be a query we create and manage internally whenever a buffer is used for async uploads.
On 2013/12/20 00:00:17, David Reveman wrote: > On 2013/12/19 22:41:03, epennerAtGoogle wrote: > > > With this, if I understand correctly, you mean to not FreePendingToken() in > > > GLES2Implementation, but to Free(), for the appropriate buffers. This is > > > something I agree on will be simpler. And it will eliminate the changes to > > > MappedMemoryManager and the FencedAllocators. > > > > > > > > > > > Aggressive free could be per-transfer buffer (via different xfer buffer > > type), > > > > or (slightly more ugly) via a flag: > > > > glEnable(GL_AGGRESSIVE_FREE_BUFFERS_HINT) > > > > ... // Free buffer that we know is no longer being used > > > > glDisable(GL_AGGRESSIVE_FREE_BUFFERS_HINT) > > > > > > Wouldn't we achieve the same, by just having a > > > GL_PIXEL_SYNC_UNPACK_TRANSFER_BUFFER_CHROMIUM (or something other similar to > > the > > > current GL_PIXEL_UNPACK_TRANSFER_BUFFER_CHROMIUM) that would just work the > > same > > > but mark the buffer with a flag saying it can be free:ed aggressively? > Keeping > > > that kind of state would be easier than the a bit more implicit state that > > we'd > > > have with glEnable/glDisable. > > > > Yes that would work too. The only issue there is it's a bit less flexible. > > That's fine perhaps, and maybe more clean. Currently there are a few cases > that > > would prevent us from using that in the compositor though, and we'd need to > fix > > those before enabling it. > > > > Lastly Sievers@ and I were chatting, and there is one last option that might > be > > the best of all. If we tracked the 'last-usage-token' for these buffers, we > > could optimize aggressive deletion under the hood for all users. Rather than > > creating a token on unmap, we could use the 'last-usage-token', which might > have > > already passed! (in this case we can re-use immediately) Unfortunately, > > async-uploads add a wrinkle to this option, as we would also need to store the > > 'last-async-query'. The async-query is the equivalent of tokens when using > async > > uploads, and we would need to confirm that also passed before recycling the > > buffer. Interesting idea. So, rephrasing (for clarity and to confirm I understand correctly), for transfer buffers specifically (not mapped memory used by the QueryTracker), when the buffer is unbound (either via glBindBuffer replacing the previous binding or glDeleteBuffer) we mark the buffer as "unused_pending_token". Async-upload buffers are marked with a separate "async-usage" flag which should be tracked separately. Where today we call BufferTracker::FreePendingToken(), we should call BufferTracker::Free() for non-async-upload buffers, which should just MappedMemoryManager::Free() except if the "last_usage_token" has not yet passed; then it should MappedMemoryManager::FreePendingToken() with that token instead. > > > > I'm guessing sievers/reveman will like the last option since it's completely > > 'automatic', with no new API required. > > This last option sgtm. 'last-async-query' could be a query we create and manage > internally whenever a buffer is used for async uploads. So what you mean here is more or less do what the compositor is already doing? I.e. using the query associated with |gl_upload_query_id| in ResourceProvider, but internally in GLES2Implementation. Sounds like the compositor should be able to piggy-back on that query some how, in order to not waste memory on having its own query doing the exact same thing. There is an artificial limit in GLES2Implementation only allowing one query at a time. Assuming that one would circumvent that limit (just query_tracker_->CreateQuery() etc), are there any assumptions in other places that there will only be one query alive at a time?
On 2013/12/20 15:53:06, jadahl wrote: > On 2013/12/20 00:00:17, David Reveman wrote: > > On 2013/12/19 22:41:03, epennerAtGoogle wrote: > > > > With this, if I understand correctly, you mean to not FreePendingToken() > in > > > > GLES2Implementation, but to Free(), for the appropriate buffers. This is > > > > something I agree on will be simpler. And it will eliminate the changes to > > > > MappedMemoryManager and the FencedAllocators. > > > > > > > > > > > > > > Aggressive free could be per-transfer buffer (via different xfer buffer > > > type), > > > > > or (slightly more ugly) via a flag: > > > > > glEnable(GL_AGGRESSIVE_FREE_BUFFERS_HINT) > > > > > ... // Free buffer that we know is no longer being used > > > > > glDisable(GL_AGGRESSIVE_FREE_BUFFERS_HINT) > > > > > > > > Wouldn't we achieve the same, by just having a > > > > GL_PIXEL_SYNC_UNPACK_TRANSFER_BUFFER_CHROMIUM (or something other similar > to > > > the > > > > current GL_PIXEL_UNPACK_TRANSFER_BUFFER_CHROMIUM) that would just work the > > > same > > > > but mark the buffer with a flag saying it can be free:ed aggressively? > > Keeping > > > > that kind of state would be easier than the a bit more implicit state that > > > we'd > > > > have with glEnable/glDisable. > > > > > > Yes that would work too. The only issue there is it's a bit less flexible. > > > That's fine perhaps, and maybe more clean. Currently there are a few cases > > that > > > would prevent us from using that in the compositor though, and we'd need to > > fix > > > those before enabling it. > > > > > > Lastly Sievers@ and I were chatting, and there is one last option that might > > be > > > the best of all. If we tracked the 'last-usage-token' for these buffers, we > > > could optimize aggressive deletion under the hood for all users. Rather than > > > creating a token on unmap, we could use the 'last-usage-token', which might > > have > > > already passed! (in this case we can re-use immediately) Unfortunately, > > > async-uploads add a wrinkle to this option, as we would also need to store > the > > > 'last-async-query'. The async-query is the equivalent of tokens when using > > async > > > uploads, and we would need to confirm that also passed before recycling the > > > buffer. > > Interesting idea. So, rephrasing (for clarity and to confirm I understand > correctly), for transfer buffers specifically (not mapped memory used by the > QueryTracker), when the buffer is unbound (either via glBindBuffer replacing the > previous binding or glDeleteBuffer) we mark the buffer as > "unused_pending_token". Async-upload buffers are marked with a separate > "async-usage" flag which should be tracked separately. > > Where today we call BufferTracker::FreePendingToken(), we should call > BufferTracker::Free() for non-async-upload buffers, which should just > MappedMemoryManager::Free() except if the "last_usage_token" has not yet passed; > then it should MappedMemoryManager::FreePendingToken() with that token instead. > > > > > > > I'm guessing sievers/reveman will like the last option since it's completely > > > 'automatic', with no new API required. > > > > This last option sgtm. 'last-async-query' could be a query we create and > manage > > internally whenever a buffer is used for async uploads. > > So what you mean here is more or less do what the compositor is already doing? > I.e. using the query associated with |gl_upload_query_id| in ResourceProvider, > but internally in GLES2Implementation. Sounds like the compositor should be able > to piggy-back on that query some how, in order to not waste memory on having its > own query doing the exact same thing. > > There is an artificial limit in GLES2Implementation only allowing one query at a > time. Assuming that one would circumvent that limit (just > query_tracker_->CreateQuery() etc), are there any assumptions in other places > that there will only be one query alive at a time? That assumption is also made on the service side but I don't think it should be too hard to fix so different queries of different types can be active at the same time. Maybe we can use an internal query type for all async uploads, which we can reliably use to determine when a buffer can be freed, and the client facing query type can just be built on top of this.
Hi, This is a WIP, and since I'm not over familiar with these areas I'd like to get some sanity checks before going further. What is done so far is a proof of concept implementation of the ideas previously discussed in this review. Buffers are now marked as unused, a token is inserted, which is later used when free:ing, potentially free:ing buffers directly as where they would be free:ed with an added round trip before. Async uploads are tracked with a private/internal query that is simply identical to the already existing upload-complete query. The service side is updated to be able to deal with multiple queries at the same time, given that they have different targets. This is simply done by putting them in a target->query map, instead of just having a single current query.
How about we start by landing the changes to support multiple queries on the service side first? That seems useful on its own. https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/build... File gpu/command_buffer/build_gles2_cmd_buffer.py (right): https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/build... gpu/command_buffer/build_gles2_cmd_buffer.py:831: 'GL_ASYNC_PIXEL_UNPACK_COMPLETED_PRIVATE_CHROMIUM', Are clients prevented from using this somehow? Did you consider adding a new pair of Begin/EndQuery commands instead? ie. BeginInternalQueryEXT and EndInternalQueryEXT. Or a "bool private" parameter to the existing Begin/EndQuery commands. https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... File gpu/command_buffer/client/cmd_buffer_helper.h (right): https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... gpu/command_buffer/client/cmd_buffer_helper.h:35: class GPU_EXPORT CommandBufferHelper { What if we enumerate all internal queries and add API to this interface that allows the mapped memory manager to handle FreePendingQuery in a way that's consistent with how FreePendingToken currently works? I guess we'd need: void WaitForQuery(int32 query); int32 last_query_completed() const; Wdyt? https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... gpu/command_buffer/client/cmd_buffer_helper.h:89: bool HasTokenPassed(int32 token); why is this needed in addition to last_token_read()?
> How about we start by landing the changes to support > multiple queries on the > service side first? That seems useful on its own. Sure. I'll submit a separate code review request for that part. https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/build... File gpu/command_buffer/build_gles2_cmd_buffer.py (right): https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/build... gpu/command_buffer/build_gles2_cmd_buffer.py:831: 'GL_ASYNC_PIXEL_UNPACK_COMPLETED_PRIVATE_CHROMIUM', On 2014/01/02 01:31:29, David Reveman wrote: > Are clients prevented from using this somehow? > > Did you consider adding a new pair of Begin/EndQuery commands instead? ie. > BeginInternalQueryEXT and EndInternalQueryEXT. Or a "bool private" parameter to > the existing Begin/EndQuery commands. The idea is that this target should be private, and not allowed to be used by some client. Not sure how to enforce that rule except with error reporting. Any ideas here? What we'd gain from having a "bool private"? Would it be applicable to all types of queries? Also the assumption that one type of target only has one running query would longer hold, and any such assumptions throughout the code would no longer be true. https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... File gpu/command_buffer/client/cmd_buffer_helper.h (right): https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... gpu/command_buffer/client/cmd_buffer_helper.h:35: class GPU_EXPORT CommandBufferHelper { On 2014/01/02 01:31:29, David Reveman wrote: > What if we enumerate all internal queries and add API to this interface that > allows the mapped memory manager to handle FreePendingQuery in a way that's > consistent with how FreePendingToken currently works? > > I guess we'd need: > void WaitForQuery(int32 query); > int32 last_query_completed() const; > > Wdyt? When would we WaitForQuery? In the current implementation buffers with a query are stored separately and not free:ed, and checked and free:ed before creating a new buffer (in order to flush out finished buffers before getting a new one). A query can take a very long time (especially if it was for an upload added very late, with a long line in front of it) so it feels like that function could stall more than what would be acceptable. The idea you propose also makes assumption of ordering. Will such assumptions always hold, for example if there are two uploader threads (i.e. ordering of uploads completed won't necessarily be the same as uploads queued). https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... gpu/command_buffer/client/cmd_buffer_helper.h:89: bool HasTokenPassed(int32 token); On 2014/01/02 01:31:29, David Reveman wrote: > why is this needed in addition to last_token_read()? No need. Will remove. No idea why I added it herein the first place :P
https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/build... File gpu/command_buffer/build_gles2_cmd_buffer.py (right): https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/build... gpu/command_buffer/build_gles2_cmd_buffer.py:831: 'GL_ASYNC_PIXEL_UNPACK_COMPLETED_PRIVATE_CHROMIUM', On 2014/01/02 10:59:54, jadahl wrote: > On 2014/01/02 01:31:29, David Reveman wrote: > > Are clients prevented from using this somehow? > > > > Did you consider adding a new pair of Begin/EndQuery commands instead? ie. > > BeginInternalQueryEXT and EndInternalQueryEXT. Or a "bool private" parameter > to > > the existing Begin/EndQuery commands. > > The idea is that this target should be private, and not allowed to be used by > some client. Not sure how to enforce that rule except with error reporting. Any > ideas here? > > What we'd gain from having a "bool private"? Would it be applicable to all types > of queries? Also the assumption that one type of target only has one running > query would longer hold, and any such assumptions throughout the code would no > longer be true. The idea is that this would make it easy to keep it internal as we'd simply not expose it in the client API. It could be limited to ASYNC_PIXEL_UNPACK_COMPLETED for now but it would be trivial to add support for other targets if necessary. The key type for the QueryMap you introduced in this patch would instead be a std::pair<bool, GLuint>. Where "bool" is whether the query is private. Are we making more assumptions than this related to running queries? https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... File gpu/command_buffer/client/cmd_buffer_helper.h (right): https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... gpu/command_buffer/client/cmd_buffer_helper.h:35: class GPU_EXPORT CommandBufferHelper { On 2014/01/02 10:59:54, jadahl wrote: > On 2014/01/02 01:31:29, David Reveman wrote: > > What if we enumerate all internal queries and add API to this interface that > > allows the mapped memory manager to handle FreePendingQuery in a way that's > > consistent with how FreePendingToken currently works? > > > > I guess we'd need: > > void WaitForQuery(int32 query); > > int32 last_query_completed() const; > > > > Wdyt? > > When would we WaitForQuery? In the current implementation buffers with a query > are stored separately and not free:ed, and checked and free:ed before creating a > new buffer (in order to flush out finished buffers before getting a new one). > > A query can take a very long time (especially if it was for an upload added very > late, with a long line in front of it) so it feels like that function could > stall more than what would be acceptable. Maybe we wouldn't block on queries when trying to allocate (that might be useful to limit memory usage though) but I think we'd need it during tear down of the allocator. > > The idea you propose also makes assumption of ordering. Will such assumptions > always hold, for example if there are two uploader threads (i.e. ordering of > uploads completed won't necessarily be the same as uploads queued). Async upload commands are guaranteed to complete in sequential order. The compositor is already making assumptions based on this so I think it'd OK to make the same assumption internally as well.
On 2014/01/02 11:56:43, David Reveman wrote: > https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/build... > File gpu/command_buffer/build_gles2_cmd_buffer.py (right): > > https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/build... > gpu/command_buffer/build_gles2_cmd_buffer.py:831: > 'GL_ASYNC_PIXEL_UNPACK_COMPLETED_PRIVATE_CHROMIUM', > On 2014/01/02 10:59:54, jadahl wrote: > > On 2014/01/02 01:31:29, David Reveman wrote: > > > Are clients prevented from using this somehow? > > > > > > Did you consider adding a new pair of Begin/EndQuery commands instead? ie. > > > BeginInternalQueryEXT and EndInternalQueryEXT. Or a "bool private" parameter > > to > > > the existing Begin/EndQuery commands. > > > > The idea is that this target should be private, and not allowed to be used by > > some client. Not sure how to enforce that rule except with error reporting. > Any > > ideas here? > > > > What we'd gain from having a "bool private"? Would it be applicable to all > types > > of queries? Also the assumption that one type of target only has one running > > query would longer hold, and any such assumptions throughout the code would no > > longer be true. > > The idea is that this would make it easy to keep it internal as we'd simply not > expose it in the client API. It could be limited to ASYNC_PIXEL_UNPACK_COMPLETED > for now but it would be trivial to add support for other targets if necessary. > > The key type for the QueryMap you introduced in this patch would instead be a > std::pair<bool, GLuint>. Where "bool" is whether the query is private. Are we > making more assumptions than this related to running queries? The map might not be the problem, I was more suspecting that there might be assumptions elsewhere. It did work when I tested with UNPACK_COMPLETED (well, not the same target, but more or less so). Anyhow, I can give it a try and see if I run into any blocker. > > https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... > File gpu/command_buffer/client/cmd_buffer_helper.h (right): > > https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... > gpu/command_buffer/client/cmd_buffer_helper.h:35: class GPU_EXPORT > CommandBufferHelper { > On 2014/01/02 10:59:54, jadahl wrote: > > On 2014/01/02 01:31:29, David Reveman wrote: > > > What if we enumerate all internal queries and add API to this interface that > > > allows the mapped memory manager to handle FreePendingQuery in a way that's > > > consistent with how FreePendingToken currently works? > > > > > > I guess we'd need: > > > void WaitForQuery(int32 query); > > > int32 last_query_completed() const; > > > > > > Wdyt? > > > > When would we WaitForQuery? In the current implementation buffers with a query > > are stored separately and not free:ed, and checked and free:ed before creating > a > > new buffer (in order to flush out finished buffers before getting a new one). > > > > A query can take a very long time (especially if it was for an upload added > very > > late, with a long line in front of it) so it feels like that function could > > stall more than what would be acceptable. > > Maybe we wouldn't block on queries when trying to allocate (that might be useful > to limit memory usage though) but I think we'd need it during tear down of the > allocator. Right, for that case it would make sense, true. > > > > > The idea you propose also makes assumption of ordering. Will such assumptions > > always hold, for example if there are two uploader threads (i.e. ordering of > > uploads completed won't necessarily be the same as uploads queued). > > Async upload commands are guaranteed to complete in sequential order. The > compositor is already making assumptions based on this so I think it'd OK to > make the same assumption internally as well. I see, then I guess that won't be an issue now then. It would complicate things more than it already would if this would ever change though.
https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... File gpu/command_buffer/client/cmd_buffer_helper.h (right): https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... gpu/command_buffer/client/cmd_buffer_helper.h:35: class GPU_EXPORT CommandBufferHelper { On 2014/01/02 11:56:44, David Reveman wrote: > On 2014/01/02 10:59:54, jadahl wrote: > > On 2014/01/02 01:31:29, David Reveman wrote: > > > What if we enumerate all internal queries and add API to this interface that > > > allows the mapped memory manager to handle FreePendingQuery in a way that's > > > consistent with how FreePendingToken currently works? > > > > > > I guess we'd need: > > > void WaitForQuery(int32 query); > > > int32 last_query_completed() const; > > > > > > Wdyt? > > > > When would we WaitForQuery? In the current implementation buffers with a query > > are stored separately and not free:ed, and checked and free:ed before creating > a > > new buffer (in order to flush out finished buffers before getting a new one). > > > > A query can take a very long time (especially if it was for an upload added > very > > late, with a long line in front of it) so it feels like that function could > > stall more than what would be acceptable. > > Maybe we wouldn't block on queries when trying to allocate (that might be useful > to limit memory usage though) but I think we'd need it during tear down of the > allocator. Thinking about tear down, is it really needed to wait for queries at that stage? Since the command buffer is going down, there won't be more drawing going through it, so even if we unmap while the uploader is still using it, it wont cause reading free:ed memory as the service will yet to have unmapped. In other words, with the assumptions that buffers that are still alive at the time of destruction won't be used by anything anywhere, can't we just free and unmap without waiting for queries? > > > > > The idea you propose also makes assumption of ordering. Will such assumptions > > always hold, for example if there are two uploader threads (i.e. ordering of > > uploads completed won't necessarily be the same as uploads queued). > > Async upload commands are guaranteed to complete in sequential order. The > compositor is already making assumptions based on this so I think it'd OK to > make the same assumption internally as well. Thinking some more about this, the idea I had was to not spread out async upload logic outside of gles2_implementation.cc. Wouldn't it be better to limit the number of units that need to care about queries? For example maybe limit the exposure to gles2_implementation.cc and buffer_tracker.cc instead of adding more state to lower levels (mapped memory, fenced allocator). Maybe we could add enumeration to internal queries via this helper as you proposed but have the buffer tracker manage removed-but-pending-query buffers instead of the mapped memory manager (i.e. fenced allocator (wrapper))?
On 2014/01/03 14:13:27, jadahl wrote: > https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... > File gpu/command_buffer/client/cmd_buffer_helper.h (right): > > https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... > gpu/command_buffer/client/cmd_buffer_helper.h:35: class GPU_EXPORT > CommandBufferHelper { > On 2014/01/02 11:56:44, David Reveman wrote: > > On 2014/01/02 10:59:54, jadahl wrote: > > > On 2014/01/02 01:31:29, David Reveman wrote: > > > > What if we enumerate all internal queries and add API to this interface > that > > > > allows the mapped memory manager to handle FreePendingQuery in a way > that's > > > > consistent with how FreePendingToken currently works? > > > > > > > > I guess we'd need: > > > > void WaitForQuery(int32 query); > > > > int32 last_query_completed() const; > > > > > > > > Wdyt? > > > > > > When would we WaitForQuery? In the current implementation buffers with a > query > > > are stored separately and not free:ed, and checked and free:ed before > creating > > a > > > new buffer (in order to flush out finished buffers before getting a new > one). > > > > > > A query can take a very long time (especially if it was for an upload added > > very > > > late, with a long line in front of it) so it feels like that function could > > > stall more than what would be acceptable. > > > > Maybe we wouldn't block on queries when trying to allocate (that might be > useful > > to limit memory usage though) but I think we'd need it during tear down of the > > allocator. > > Thinking about tear down, is it really needed to wait for queries at that stage? > Since the command buffer is going down, there won't be more drawing going > through it, so even if we unmap while the uploader is still using it, it wont > cause reading free:ed memory as the service will yet to have unmapped. > > In other words, with the assumptions that buffers that are still alive at the > time of destruction won't be used by anything anywhere, can't we just free and > unmap without waiting for queries? Maybe but I think we'd have to wait for a token instead then to guarantee that we're not producing any errors. I suggest that we do whatever is easiest for now. Tear down is not really performance critical. > > > > > > > > > The idea you propose also makes assumption of ordering. Will such > assumptions > > > always hold, for example if there are two uploader threads (i.e. ordering of > > > uploads completed won't necessarily be the same as uploads queued). > > > > Async upload commands are guaranteed to complete in sequential order. The > > compositor is already making assumptions based on this so I think it'd OK to > > make the same assumption internally as well. > > Thinking some more about this, the idea I had was to not spread out async upload > logic outside of gles2_implementation.cc. Wouldn't it be better to limit the > number of units that need to care about queries? For example maybe limit the > exposure to gles2_implementation.cc and buffer_tracker.cc instead of adding more > state to lower levels (mapped memory, fenced allocator). > > Maybe we could add enumeration to internal queries via this helper as you > proposed but have the buffer tracker manage removed-but-pending-query buffers > instead of the mapped memory manager (i.e. fenced allocator (wrapper))? I think the query API and the details of how to use it should be limited to gles2_implementation.cc. As far as buffer_tracker.cc and mapped memory manager cares it's just a sequential number, like the tokens we already have. I think it's best to handle all this in a way that's consistent with how tokens are used with buffer objects. It will be harder to understand and control the memory usage if buffers can be temporarily held on to by gles2_implementation.cc.
Thanks for working on this :) Couple comments inline. https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... File gpu/command_buffer/client/buffer_tracker.h (right): https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... gpu/command_buffer/client/buffer_tracker.h:115: int32 unused_token_; Does this mean: "The buffer is unused after this token passes"? I mildly prefer "last_usage_token" or something like that. Similarly, I mildly prefer "last_async_query_id". https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... File gpu/command_buffer/client/cmd_buffer_helper.h (right): https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... gpu/command_buffer/client/cmd_buffer_helper.h:89: bool HasTokenPassed(int32 token); On 2014/01/02 10:59:54, jadahl wrote: > On 2014/01/02 01:31:29, David Reveman wrote: > > why is this needed in addition to last_token_read()? > > No need. Will remove. No idea why I added it herein the first place :P Just FYI, I actually like this (as an inline function though). Even minor reductions in logic makes code more clear and less likely to have bugs. https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... File gpu/command_buffer/client/gles2_implementation.cc (right): https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... gpu/command_buffer/client/gles2_implementation.cc:1545: void GLES2Implementation::CheckBuffersPendingAsyncComplete() { Hmm, it's unfortunate that we have to check all of these every time we create a buffer. Uploads are guaranteed to complete in order, but I guess we can't use that assumption to break out of the loop, since there is no guarantee they are deleted in order... I'm not sure how to solve this yet but I'm hesitant of adding a loop over all pending buffers.
https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... File gpu/command_buffer/client/buffer_tracker.h (right): https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... gpu/command_buffer/client/buffer_tracker.h:115: int32 unused_token_; On 2014/01/07 00:47:59, epennerAtGoogle wrote: > Does this mean: "The buffer is unused after this token passes"? I mildly prefer > "last_usage_token" or something like that. Similarly, I mildly prefer > "last_async_query_id". +1 https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... File gpu/command_buffer/client/gles2_implementation.cc (right): https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... gpu/command_buffer/client/gles2_implementation.cc:1545: void GLES2Implementation::CheckBuffersPendingAsyncComplete() { On 2014/01/07 00:47:59, epennerAtGoogle wrote: > Hmm, it's unfortunate that we have to check all of these every time we create a > buffer. > > Uploads are guaranteed to complete in order, but I guess we can't use that > assumption to break out of the loop, since there is no guarantee they are > deleted in order... I'm not sure how to solve this yet but I'm hesitant of > adding a loop over all pending buffers. you could just keep the list sorted based on query age, which would be the case if we follow the suggestion I made earlier about enumerating queries and handle the release of buffers in the mapped memory manager consistent with how we free based on passed tokens.
https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... File gpu/command_buffer/client/buffer_tracker.h (right): https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... gpu/command_buffer/client/buffer_tracker.h:115: int32 unused_token_; On 2014/01/07 06:32:06, David Reveman wrote: > On 2014/01/07 00:47:59, epennerAtGoogle wrote: > > Does this mean: "The buffer is unused after this token passes"? I mildly > prefer > > "last_usage_token" or something like that. Similarly, I mildly prefer > > "last_async_query_id". > > +1 last_usage_token np. last_async_query_id might be confusing since its the *active* async query. https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... File gpu/command_buffer/client/gles2_implementation.cc (right): https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... gpu/command_buffer/client/gles2_implementation.cc:1545: void GLES2Implementation::CheckBuffersPendingAsyncComplete() { On 2014/01/07 06:32:06, David Reveman wrote: > On 2014/01/07 00:47:59, epennerAtGoogle wrote: > > Hmm, it's unfortunate that we have to check all of these every time we create > a > > buffer. > > > > Uploads are guaranteed to complete in order, but I guess we can't use that > > assumption to break out of the loop, since there is no guarantee they are > > deleted in order... I'm not sure how to solve this yet but I'm hesitant of > > adding a loop over all pending buffers. > > you could just keep the list sorted based on query age, which would be the case > if we follow the suggestion I made earlier about enumerating queries and handle > the release of buffers in the mapped memory manager consistent with how we free > based on passed tokens. The enumeration will improve this situation indeed.
We will want unit tests for the new behavior. https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... File gpu/command_buffer/client/cmd_buffer_helper.h (right): https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... gpu/command_buffer/client/cmd_buffer_helper.h:35: class GPU_EXPORT CommandBufferHelper { On 2014/01/02 01:31:29, David Reveman wrote: > What if we enumerate all internal queries and add API to this interface that > allows the mapped memory manager to handle FreePendingQuery in a way that's > consistent with how FreePendingToken currently works? > > I guess we'd need: > void WaitForQuery(int32 query); > int32 last_query_completed() const; > > Wdyt? I would prefer if GL concepts didn't leak into CommandBufferHelper which is GL-agnostic. https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... File gpu/command_buffer/client/gles2_implementation.cc (right): https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... gpu/command_buffer/client/gles2_implementation.cc:1522: if (helper_->HasTokenPassed(token)) Maybe this logic (if the token is passed, free immediately) makes sense as low as possible, maybe FencedAllocator. That avoids some duplication of logic, and might also hit some other cases.
https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... File gpu/command_buffer/client/cmd_buffer_helper.h (right): https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien... gpu/command_buffer/client/cmd_buffer_helper.h:35: class GPU_EXPORT CommandBufferHelper { On 2014/01/08 05:08:04, piman (OOO back 2014-1-7) wrote: > On 2014/01/02 01:31:29, David Reveman wrote: > > What if we enumerate all internal queries and add API to this interface that > > allows the mapped memory manager to handle FreePendingQuery in a way that's > > consistent with how FreePendingToken currently works? > > > > I guess we'd need: > > void WaitForQuery(int32 query); > > int32 last_query_completed() const; > > > > Wdyt? > > I would prefer if GL concepts didn't leak into CommandBufferHelper which is > GL-agnostic. Makes sense. What if we just call this sequence number a "async command token" or just "async token"? As that's really what it is, a sequence number representing a position in our async command processing pipeline. The fact that GL queries are used by GLImplementation to determine this, does not need to be exposed here.
On Tue, Jan 7, 2014 at 9:56 PM, <reveman@chromium.org> wrote: > > https://codereview.chromium.org/116863003/diff/70001/gpu/ > command_buffer/client/cmd_buffer_helper.h > File gpu/command_buffer/client/cmd_buffer_helper.h (right): > > https://codereview.chromium.org/116863003/diff/70001/gpu/ > command_buffer/client/cmd_buffer_helper.h#newcode35 > gpu/command_buffer/client/cmd_buffer_helper.h:35: class GPU_EXPORT > CommandBufferHelper { > On 2014/01/08 05:08:04, piman (OOO back 2014-1-7) wrote: > >> On 2014/01/02 01:31:29, David Reveman wrote: >> > What if we enumerate all internal queries and add API to this >> > interface that > >> > allows the mapped memory manager to handle FreePendingQuery in a way >> > that's > >> > consistent with how FreePendingToken currently works? >> > >> > I guess we'd need: >> > void WaitForQuery(int32 query); >> > int32 last_query_completed() const; >> > >> > Wdyt? >> > > I would prefer if GL concepts didn't leak into CommandBufferHelper >> > which is > >> GL-agnostic. >> > > Makes sense. What if we just call this sequence number a "async command > token" or just "async token"? As that's really what it is, a sequence > number representing a position in our async command processing pipeline. > The fact that GL queries are used by GLImplementation to determine this, > does not need to be exposed here. > That's fine in principle, but keep in mind that CommandBuffer doesn't have a notion of what this async token is, and can't update it by itself, so I'm not sure how you would implement WaitForAsyncToken at this level. > > https://codereview.chromium.org/116863003/ > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2014/01/08 06:03:35, piman (OOO back 2014-1-7) wrote: > On Tue, Jan 7, 2014 at 9:56 PM, <mailto:reveman@chromium.org> wrote: > > > > > https://codereview.chromium.org/116863003/diff/70001/gpu/ > > command_buffer/client/cmd_buffer_helper.h > > File gpu/command_buffer/client/cmd_buffer_helper.h (right): > > > > https://codereview.chromium.org/116863003/diff/70001/gpu/ > > command_buffer/client/cmd_buffer_helper.h#newcode35 > > gpu/command_buffer/client/cmd_buffer_helper.h:35: class GPU_EXPORT > > CommandBufferHelper { > > On 2014/01/08 05:08:04, piman (OOO back 2014-1-7) wrote: > > > >> On 2014/01/02 01:31:29, David Reveman wrote: > >> > What if we enumerate all internal queries and add API to this > >> > > interface that > > > >> > allows the mapped memory manager to handle FreePendingQuery in a way > >> > > that's > > > >> > consistent with how FreePendingToken currently works? > >> > > >> > I guess we'd need: > >> > void WaitForQuery(int32 query); > >> > int32 last_query_completed() const; > >> > > >> > Wdyt? > >> > > > > I would prefer if GL concepts didn't leak into CommandBufferHelper > >> > > which is > > > >> GL-agnostic. > >> > > > > Makes sense. What if we just call this sequence number a "async command > > token" or just "async token"? As that's really what it is, a sequence > > number representing a position in our async command processing pipeline. > > The fact that GL queries are used by GLImplementation to determine this, > > does not need to be exposed here. > > > > That's fine in principle, but keep in mind that CommandBuffer doesn't have > a notion of what this async token is, and can't update it by itself, so I'm > not sure how you would implement WaitForAsyncToken at this level. > I don't think we should use the term token here, as it doesn't work as the other tokens do. I'd prefer something making the difference obvious, such as naming it serial, or sequence number. Regarding WaitFor(AsyncToken|Serial), do we really need to? Just because we track async uploads internally now doesn't change current usage, where the upload is tracked externally. What is the reason for starting to wait for these uploads to be completed that we didn't wait for before? FencedAllocator would then have FREE_PENDING_TOKEN and FREE_PENDING_SERIAL where a serial is defined as a sequential number that which has nothing to do with tokens and roundtrips that on tear-down should be ignored. When serial is synced, FencedAllocator would either change the state to FREE_PENDING_TOKEN or FREE depending on if there is a token set that passed or not.
Hi, I uploaded a WIP version of what we have been discussing here. This version introduces a new state to the command buffer, "serial", that is similar to a token except that its implementation is implementation specific. In the GLES2 command buffer client/service the serial is used to enumerate internal queries and is synchronized by the service when the query is completed. Only async uploads can use serials now. Note that this is a WIP and I'm uploading to get input. There are no unit tests yet, but whenever we have a design that we can agree upon I will add tests. There are some implications that should be discussed (that I can think of now): * Serials implemented are partly in the command buffer, partly in by the user. An idea I had here was to introduce something similar to QuerySync i.e. just shared memory that theh client can just read without going through the command buffer. The serial would then be "set" by GLES2Implementation to the command buffer helper so that the fenced allocator could read the last read serial. * Serials are updated by posting a closure to correct thread: extra latency. Could this be an issue? Could possibly be fixed by QuerySync type shared memory. * The fenced allocator doesn't wait for serials to free them. I can't see that we do that now, so is it a problem? * Waiting for both a serial and a token isn't supported yet. Should it? Jonas
https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie... File gpu/command_buffer/client/buffer_tracker.h (right): https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie... gpu/command_buffer/client/buffer_tracker.h:79: void set_used(bool used) { What does used mean exactly? https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie... File gpu/command_buffer/client/fenced_allocator.cc (right): https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie... gpu/command_buffer/client/fenced_allocator.cc:52: CollapseFreeBlock(i); This is probably ok in the current use, if we can assume that the FencedAllocator is destroyed as we delete the SHM it manages allocations on. In that case it would actually be ok to not WaitForToken in the FREE_PENDING_TOKEN case, either. We should document the behavior though. It does mean that the user can't reuse the SHM (e.g. if we keep a pool of open SHMs) without a Finish(). Alternatively, I suppose we could extract the set of serials/tokens to wait for, and let the client deal with waiting - though we don't have a way to wait-for-a-serial, I'm not sure the client could really do anything. Mmh. https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie... File gpu/command_buffer/client/gles2_implementation.cc (right): https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie... gpu/command_buffer/client/gles2_implementation.cc:1384: } Where does the last_usage_token get set if it's not bound to GL_PIXEL_UNPACK_TRANSFER_BUFFER_CHROMIUM ? https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie... gpu/command_buffer/client/gles2_implementation.cc:2429: BufferTracker::Buffer* buffer; nit: = NULL https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie... gpu/command_buffer/client/gles2_implementation.cc:3713: buffer->set_async_query_id(id); What if we do several AsyncTexImage2DCHROMIUM on the same buffer? https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie... File gpu/command_buffer/client/query_tracker.cc (right): https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie... gpu/command_buffer/client/query_tracker.cc:279: return NextSerial(); In case of wrap around we need to Finish(). The rest of the code assumes last_read_serial > |serial| means serial has passed. It would not be true with this code, after wraparound. We do Finish when tokens wrap (see CommandBufferHelper::InsertToken). If we're worried that we might actually wrap in practice and don't want to Finish() in that case, we can do something like double-buffer the range, and only call Finish() if the last_read_serial has the same high bit as the new serial, when that high bit changes. That should not happen in practice (unless we can have 2^31 queries actually in flight). https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/serv... File gpu/command_buffer/service/query_manager.cc (right): https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/serv... gpu/command_buffer/service/query_manager.cc:74: manager_->decoder()->engine()->set_serial(serial_); So, the rest of the logic assumes serials get set on the CommandBuffer in the same order they are created. Is it currently true? (assumes queries always complete in exactly the same order as they are created). How can we defend for that in the future, if someone wants to extend the serial to other queries?
https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie... File gpu/command_buffer/client/buffer_tracker.h (right): https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie... gpu/command_buffer/client/buffer_tracker.h:79: void set_used(bool used) { On 2014/01/11 02:02:32, piman wrote: > What does used mean exactly? It's mean to be used for keeping track if a buffer is used, so user can properly insert a last usage token. set_bound/bound might be better names, or maybe it can be removed, not sure. https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie... File gpu/command_buffer/client/gles2_implementation.cc (right): https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie... gpu/command_buffer/client/gles2_implementation.cc:1384: } On 2014/01/11 02:02:32, piman wrote: > Where does the last_usage_token get set if it's not bound to > GL_PIXEL_UNPACK_TRANSFER_BUFFER_CHROMIUM ? Right, should do the same for other buffer targets in BindBufferHelper. https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie... gpu/command_buffer/client/gles2_implementation.cc:3713: buffer->set_async_query_id(id); On 2014/01/11 02:02:32, piman wrote: > What if we do several AsyncTexImage2DCHROMIUM on the same buffer? That would not work, true. I guess it could work by associating a buffer with a query serial instead, and if a later AsyncTexImage2DCHROMIUM would occur, it'd be updated with the new serial. https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/serv... File gpu/command_buffer/service/query_manager.cc (right): https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/serv... gpu/command_buffer/service/query_manager.cc:74: manager_->decoder()->engine()->set_serial(serial_); On 2014/01/11 02:02:32, piman wrote: > So, the rest of the logic assumes serials get set on the CommandBuffer in the > same order they are created. Is it currently true? (assumes queries always > complete in exactly the same order as they are created). How can we defend for > that in the future, if someone wants to extend the serial to other queries? AFAIK uploads complete in order, so queries should as well, just as this function should be invoked in order as such. Usage parallel to async uploads is problematic and somewhat of a limitation of this approach, as they would have to follow the order of uploads.
If we're plumbing this all the way to the service side, would it make more sense to just add a real "async pixel transfer token" rather than using queries? So basically add a "SetAsyncPixelTransferToken" command that is the same as "SetToken" except that it needs to be processed by the async pixel transfer manager in sequence with async pixel transfers. On the client side, we'd simply call InsertAsyncPixelTransferToken() in a way that's consistent with how InsertToken() is used. https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie... File gpu/command_buffer/client/fenced_allocator.cc (right): https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie... gpu/command_buffer/client/fenced_allocator.cc:228: block.serial <= last_serial_read) { maybe use a switch statement here and above https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie... File gpu/command_buffer/client/gles2_implementation.cc (right): https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie... gpu/command_buffer/client/gles2_implementation.cc:1384: } On 2014/01/11 11:35:29, jadahl wrote: > On 2014/01/11 02:02:32, piman wrote: > > Where does the last_usage_token get set if it's not bound to > > GL_PIXEL_UNPACK_TRANSFER_BUFFER_CHROMIUM ? > > Right, should do the same for other buffer targets in BindBufferHelper. What if we set last usage token when issuing the command that will use the buffer (ie. DrawArrays) instead of bind/unbind? Then you can get rid of set_used() and the token would be more tightly packed with the usage. Also more consistent with how we handle AsyncTexImage2D. https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/serv... File gpu/command_buffer/service/query_manager.cc (right): https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/serv... gpu/command_buffer/service/query_manager.cc:62: this)); Hm, I think this will be a performance problem. We used to post tasks back to the main thread but we switched to writing directly to the shared memory from the transfer thread instead. Two reason for doing this: 1. The main gpu thread might be busy so it can take awhile for it to process the task. 2. And I think even more importantly, posting a task after each transfer results in a lot of context switching overhead. This is because the transfer thread has lower priority so posting a task to the main thread will consistently cause the OS to switch to that thread and process the task. I think we need to write directly to some shared memory from here instead of posting a task.
On 2014/01/11 23:39:03, David Reveman wrote: > If we're plumbing this all the way to the service side, would it make more sense > to just add a real "async pixel transfer token" rather than using queries? > > So basically add a "SetAsyncPixelTransferToken" command that is the same as > "SetToken" except that it needs to be processed by the async pixel transfer > manager in sequence with async pixel transfers. On the client side, we'd simply > call InsertAsyncPixelTransferToken() in a way that's consistent with how > InsertToken() is used. Correct me if I'm wrong, but the concern with this was that it'd add GL and query logic to the command buffer. At the same time, this patch already introduces such logic on the service side, except that it doesn't call it anything related to pixels or uploads. Maybe what would be best is to introduce a generic async token API to the command buffer that the user may use in any way, and if multiple enumerated tokens would be used in the future, it'd only be another enum (or array index or something) separating them as far as the command buffer is concerned. > > https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie... > File gpu/command_buffer/client/fenced_allocator.cc (right): > > https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie... > gpu/command_buffer/client/fenced_allocator.cc:228: block.serial <= > last_serial_read) { > maybe use a switch statement here and above > > https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie... > File gpu/command_buffer/client/gles2_implementation.cc (right): > > https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie... > gpu/command_buffer/client/gles2_implementation.cc:1384: } > On 2014/01/11 11:35:29, jadahl wrote: > > On 2014/01/11 02:02:32, piman wrote: > > > Where does the last_usage_token get set if it's not bound to > > > GL_PIXEL_UNPACK_TRANSFER_BUFFER_CHROMIUM ? > > > > Right, should do the same for other buffer targets in BindBufferHelper. > > What if we set last usage token when issuing the command that will use the > buffer (ie. DrawArrays) instead of bind/unbind? Then you can get rid of > set_used() and the token would be more tightly packed with the usage. Also more > consistent with how we handle AsyncTexImage2D. > > https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/serv... > File gpu/command_buffer/service/query_manager.cc (right): > > https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/serv... > gpu/command_buffer/service/query_manager.cc:62: this)); > Hm, I think this will be a performance problem. We used to post tasks back to > the main thread but we switched to writing directly to the shared memory from > the transfer thread instead. Two reason for doing this: > > 1. The main gpu thread might be busy so it can take awhile for it to process the > task. > > 2. And I think even more importantly, posting a task after each transfer results > in a lot of context switching overhead. This is because the transfer thread has > lower priority so posting a task to the main thread will consistently cause the > OS to switch to that thread and process the task. > > I think we need to write directly to some shared memory from here instead of > posting a task.
https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie... File gpu/command_buffer/client/gles2_implementation.cc (right): https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie... gpu/command_buffer/client/gles2_implementation.cc:1384: } On 2014/01/11 23:39:04, David Reveman wrote: > On 2014/01/11 11:35:29, jadahl wrote: > > On 2014/01/11 02:02:32, piman wrote: > > > Where does the last_usage_token get set if it's not bound to > > > GL_PIXEL_UNPACK_TRANSFER_BUFFER_CHROMIUM ? > > > > Right, should do the same for other buffer targets in BindBufferHelper. > > What if we set last usage token when issuing the command that will use the > buffer (ie. DrawArrays) instead of bind/unbind? Then you can get rid of > set_used() and the token would be more tightly packed with the usage. Also more > consistent with how we handle AsyncTexImage2D. I think this sounds like a good idea. https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/serv... File gpu/command_buffer/service/query_manager.cc (right): https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/serv... gpu/command_buffer/service/query_manager.cc:62: this)); On 2014/01/11 23:39:04, David Reveman wrote: > Hm, I think this will be a performance problem. We used to post tasks back to > the main thread but we switched to writing directly to the shared memory from > the transfer thread instead. Two reason for doing this: > > 1. The main gpu thread might be busy so it can take awhile for it to process the > task. > > 2. And I think even more importantly, posting a task after each transfer results > in a lot of context switching overhead. This is because the transfer thread has > lower priority so posting a task to the main thread will consistently cause the > OS to switch to that thread and process the task. > > I think we need to write directly to some shared memory from here instead of > posting a task. This is what I did initially, but changed to posting as I hadn't done it in a thread safe manner. Is there a way to write to the command buffer state safely directly from the upload thread?
On 2014/01/12 09:51:56, jadahl wrote: > On 2014/01/11 23:39:03, David Reveman wrote: > > If we're plumbing this all the way to the service side, would it make more > sense > > to just add a real "async pixel transfer token" rather than using queries? > > > > So basically add a "SetAsyncPixelTransferToken" command that is the same as > > "SetToken" except that it needs to be processed by the async pixel transfer > > manager in sequence with async pixel transfers. On the client side, we'd > simply > > call InsertAsyncPixelTransferToken() in a way that's consistent with how > > InsertToken() is used. > > Correct me if I'm wrong, but the concern with this was that it'd add GL and > query logic to the command buffer. At the same time, this patch already > introduces such logic on the service side, except that it doesn't call it > anything related to pixels or uploads. It would be completely separate from queries and other GL concepts and simply a mechanism for efficiently determining what the latest processed async command is. The only thing we'd introduce at the command buffer level is the notion of commands that are processed asynchronously on a separate thread. I think that'd be OK. piman@, wdyt? > > Maybe what would be best is to introduce a generic async token API to the > command buffer that the user may use in any way, and if multiple enumerated > tokens would be used in the future, it'd only be another enum (or array index or > something) separating them as far as the command buffer is concerned. We'd only need different types of async tokens if all async commands are not guaranteed to be processed in sequence. I don't think we'd want to change that.
https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/serv... File gpu/command_buffer/service/query_manager.cc (right): https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/serv... gpu/command_buffer/service/query_manager.cc:62: this)); On 2014/01/12 09:52:24, jadahl wrote: > On 2014/01/11 23:39:04, David Reveman wrote: > > Hm, I think this will be a performance problem. We used to post tasks back to > > the main thread but we switched to writing directly to the shared memory from > > the transfer thread instead. Two reason for doing this: > > > > 1. The main gpu thread might be busy so it can take awhile for it to process > the > > task. > > > > 2. And I think even more importantly, posting a task after each transfer > results > > in a lot of context switching overhead. This is because the transfer thread > has > > lower priority so posting a task to the main thread will consistently cause > the > > OS to switch to that thread and process the task. > > > > I think we need to write directly to some shared memory from here instead of > > posting a task. > > This is what I did initially, but changed to posting as I hadn't done it in a > thread safe manner. Is there a way to write to the command buffer state safely > directly from the upload thread? You'll probably need to add a mutex to do that safely. FYI, if we go the async token route then we'd want to handle this at the decoder level instead.
https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff... File gpu/command_buffer/client/gles2_implementation.cc (right): https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff... gpu/command_buffer/client/gles2_implementation.cc:1384: } On 2014/01/12 09:52:24, jadahl wrote: > On 2014/01/11 23:39:04, David Reveman wrote: > > On 2014/01/11 11:35:29, jadahl wrote: > > > On 2014/01/11 02:02:32, piman wrote: > > > > Where does the last_usage_token get set if it's not bound to > > > > GL_PIXEL_UNPACK_TRANSFER_BUFFER_CHROMIUM ? > > > > > > Right, should do the same for other buffer targets in BindBufferHelper. > > > > What if we set last usage token when issuing the command that will use the > > buffer (ie. DrawArrays) instead of bind/unbind? Then you can get rid of > > set_used() and the token would be more tightly packed with the usage. Also > more > > consistent with how we handle AsyncTexImage2D. > > I think this sounds like a good idea. Hmm, is what you mean to do this for the functions that deal with transfer buffers? i.e. UnmapBufferCHROMIUM, (Compressed)(Sub)TexImage2D? and not DrawArrays, which does not deal with pixel transfer buffers at all.
https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff... File gpu/command_buffer/client/gles2_implementation.cc (right): https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff... gpu/command_buffer/client/gles2_implementation.cc:1384: } On 2014/01/14 14:15:47, jadahl wrote: > On 2014/01/12 09:52:24, jadahl wrote: > > On 2014/01/11 23:39:04, David Reveman wrote: > > > On 2014/01/11 11:35:29, jadahl wrote: > > > > On 2014/01/11 02:02:32, piman wrote: > > > > > Where does the last_usage_token get set if it's not bound to > > > > > GL_PIXEL_UNPACK_TRANSFER_BUFFER_CHROMIUM ? > > > > > > > > Right, should do the same for other buffer targets in BindBufferHelper. > > > > > > What if we set last usage token when issuing the command that will use the > > > buffer (ie. DrawArrays) instead of bind/unbind? Then you can get rid of > > > set_used() and the token would be more tightly packed with the usage. Also > > more > > > consistent with how we handle AsyncTexImage2D. > > > > I think this sounds like a good idea. > > Hmm, is what you mean to do this for the functions that deal with transfer > buffers? i.e. UnmapBufferCHROMIUM, (Compressed)(Sub)TexImage2D? and not > DrawArrays, which does not deal with pixel transfer buffers at all. Not only transfer buffers but all buffer objects so that you can get rid of set_used() completely. So both (Compressed)(Sub)TexImage2D and DrawArrays, as well as any other command that can use a buffer object.
https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff... File gpu/command_buffer/client/gles2_implementation.cc (right): https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff... gpu/command_buffer/client/gles2_implementation.cc:1384: } On 2014/01/14 15:27:20, David Reveman wrote: > On 2014/01/14 14:15:47, jadahl wrote: > > On 2014/01/12 09:52:24, jadahl wrote: > > > On 2014/01/11 23:39:04, David Reveman wrote: > > > > On 2014/01/11 11:35:29, jadahl wrote: > > > > > On 2014/01/11 02:02:32, piman wrote: > > > > > > Where does the last_usage_token get set if it's not bound to > > > > > > GL_PIXEL_UNPACK_TRANSFER_BUFFER_CHROMIUM ? > > > > > > > > > > Right, should do the same for other buffer targets in BindBufferHelper. > > > > > > > > What if we set last usage token when issuing the command that will use the > > > > buffer (ie. DrawArrays) instead of bind/unbind? Then you can get rid of > > > > set_used() and the token would be more tightly packed with the usage. Also > > > more > > > > consistent with how we handle AsyncTexImage2D. > > > > > > I think this sounds like a good idea. > > > > Hmm, is what you mean to do this for the functions that deal with transfer > > buffers? i.e. UnmapBufferCHROMIUM, (Compressed)(Sub)TexImage2D? and not > > DrawArrays, which does not deal with pixel transfer buffers at all. > > Not only transfer buffers but all buffer objects so that you can get rid of > set_used() completely. So both (Compressed)(Sub)TexImage2D and DrawArrays, as > well as any other command that can use a buffer object. Sorry, DrawArrays doesn't use a buffer object as we don't have shared memory VBOs. The general idea is that any command that will use the contents of a buffer object on the service side needs a set_last_usage_token call. ie. BufferSubData in case of VBOs and (Compressed)(Sub)TexImage2D in case of pixel transfer buffers. Think this will also allow us to remove set_transfer_ready_token().
https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff... File gpu/command_buffer/client/gles2_implementation.cc (right): https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff... gpu/command_buffer/client/gles2_implementation.cc:1384: } On 2014/01/14 15:54:26, David Reveman wrote: > On 2014/01/14 15:27:20, David Reveman wrote: > > On 2014/01/14 14:15:47, jadahl wrote: > > > On 2014/01/12 09:52:24, jadahl wrote: > > > > On 2014/01/11 23:39:04, David Reveman wrote: > > > > > On 2014/01/11 11:35:29, jadahl wrote: > > > > > > On 2014/01/11 02:02:32, piman wrote: > > > > > > > Where does the last_usage_token get set if it's not bound to > > > > > > > GL_PIXEL_UNPACK_TRANSFER_BUFFER_CHROMIUM ? > > > > > > > > > > > > Right, should do the same for other buffer targets in > BindBufferHelper. > > > > > > > > > > What if we set last usage token when issuing the command that will use > the > > > > > buffer (ie. DrawArrays) instead of bind/unbind? Then you can get rid of > > > > > set_used() and the token would be more tightly packed with the usage. > Also > > > > more > > > > > consistent with how we handle AsyncTexImage2D. > > > > > > > > I think this sounds like a good idea. > > > > > > Hmm, is what you mean to do this for the functions that deal with transfer > > > buffers? i.e. UnmapBufferCHROMIUM, (Compressed)(Sub)TexImage2D? and not > > > DrawArrays, which does not deal with pixel transfer buffers at all. > > > > Not only transfer buffers but all buffer objects so that you can get rid of > > set_used() completely. So both (Compressed)(Sub)TexImage2D and DrawArrays, as > > well as any other command that can use a buffer object. > > Sorry, DrawArrays doesn't use a buffer object as we don't have shared memory > VBOs. The general idea is that any command that will use the contents of a > buffer object on the service side needs a set_last_usage_token call. ie. > BufferSubData in case of VBOs and (Compressed)(Sub)TexImage2D in case of pixel > transfer buffers. I don't see how BufferSubData would be used service side for a BufferTracker::Buffer, as in those cases it'll be a simple client side memcpy. > > Think this will also allow us to remove set_transfer_ready_token().
https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff... File gpu/command_buffer/client/gles2_implementation.cc (right): https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff... gpu/command_buffer/client/gles2_implementation.cc:1384: } On 2014/01/14 16:32:09, jadahl wrote: > On 2014/01/14 15:54:26, David Reveman wrote: > > On 2014/01/14 15:27:20, David Reveman wrote: > > > On 2014/01/14 14:15:47, jadahl wrote: > > > > On 2014/01/12 09:52:24, jadahl wrote: > > > > > On 2014/01/11 23:39:04, David Reveman wrote: > > > > > > On 2014/01/11 11:35:29, jadahl wrote: > > > > > > > On 2014/01/11 02:02:32, piman wrote: > > > > > > > > Where does the last_usage_token get set if it's not bound to > > > > > > > > GL_PIXEL_UNPACK_TRANSFER_BUFFER_CHROMIUM ? > > > > > > > > > > > > > > Right, should do the same for other buffer targets in > > BindBufferHelper. > > > > > > > > > > > > What if we set last usage token when issuing the command that will use > > the > > > > > > buffer (ie. DrawArrays) instead of bind/unbind? Then you can get rid > of > > > > > > set_used() and the token would be more tightly packed with the usage. > > Also > > > > > more > > > > > > consistent with how we handle AsyncTexImage2D. > > > > > > > > > > I think this sounds like a good idea. > > > > > > > > Hmm, is what you mean to do this for the functions that deal with transfer > > > > buffers? i.e. UnmapBufferCHROMIUM, (Compressed)(Sub)TexImage2D? and not > > > > DrawArrays, which does not deal with pixel transfer buffers at all. > > > > > > Not only transfer buffers but all buffer objects so that you can get rid of > > > set_used() completely. So both (Compressed)(Sub)TexImage2D and DrawArrays, > as > > > well as any other command that can use a buffer object. > > > > Sorry, DrawArrays doesn't use a buffer object as we don't have shared memory > > VBOs. The general idea is that any command that will use the contents of a > > buffer object on the service side needs a set_last_usage_token call. ie. > > BufferSubData in case of VBOs and (Compressed)(Sub)TexImage2D in case of pixel > > transfer buffers. > > I don't see how BufferSubData would be used service side for a > BufferTracker::Buffer, as in those cases it'll be a simple client side memcpy. Right. I guess it's really just (Compressed)(Sub)TexImage2D that need a set_last_usage_token.
Hi, I uploaded another WIP that addresses some of the issues raised in the review (thanks for the review!). What is left is to make a decision if these internal queries should be using their own command ala InsertToken and not go via query_manager.cc at all, and as such not involve OpenGL queries at all, or if we should (somewhat ab)use them for our needs. I have also waited with the mutex based serial updating as its implementation would depend on the decision mentioned above. All that, and tests. https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff... File gpu/command_buffer/client/fenced_allocator.cc (right): https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff... gpu/command_buffer/client/fenced_allocator.cc:52: CollapseFreeBlock(i); On 2014/01/11 02:02:32, piman wrote: > This is probably ok in the current use, if we can assume that the > FencedAllocator is destroyed as we delete the SHM it manages allocations on. In > that case it would actually be ok to not WaitForToken in the FREE_PENDING_TOKEN > case, either. > > We should document the behavior though. It does mean that the user can't reuse > the SHM (e.g. if we keep a pool of open SHMs) without a Finish(). Alternatively, > I suppose we could extract the set of serials/tokens to wait for, and let the > client deal with waiting - though we don't have a way to wait-for-a-serial, I'm > not sure the client could really do anything. Mmh. Where do you suggest we document this? Here? https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff... File gpu/command_buffer/client/query_tracker.cc (right): https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff... gpu/command_buffer/client/query_tracker.cc:279: return NextSerial(); On 2014/01/11 02:02:32, piman wrote: > In case of wrap around we need to Finish(). > The rest of the code assumes last_read_serial > |serial| means serial has > passed. It would not be true with this code, after wraparound. > > We do Finish when tokens wrap (see CommandBufferHelper::InsertToken). > > If we're worried that we might actually wrap in practice and don't want to > Finish() in that case, we can do something like double-buffer the range, and > only call Finish() if the last_read_serial has the same high bit as the new > serial, when that high bit changes. That should not happen in practice (unless > we can have 2^31 queries actually in flight). Will Finish() really help anything? In the version I just uploaded I simply provide an API via CommandBufferHelper (HasSerialPassed(uint32)) that just checks the bits. This will work if we have less than 2^31 concurrent queries, and if we'd ever have that many I think we have bigger problems already :)
IMO, I think it would be significantly cleaner if we avoided the use of queries for this and instead introduced an internal async token. https://chromiumcodereview.appspot.com/116863003/diff/550001/gpu/command_buff... File gpu/command_buffer/client/fenced_allocator.cc (right): https://chromiumcodereview.appspot.com/116863003/diff/550001/gpu/command_buff... gpu/command_buffer/client/fenced_allocator.cc:53: CollapseFreeBlock(i); I don't think it makes sense free this immediately but wait for a token in the above case. If this doesn't need to wait for a serial then it at least has to wait for a token. I assume that the WaitForToken code above protects against freeing a block before all commands that might reference it have been processed on the service side. Why wouldn't we need the same protection for blocks used with async transfers?
https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff... File gpu/command_buffer/client/gles2_implementation.cc (right): https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff... gpu/command_buffer/client/gles2_implementation.cc:3713: buffer->set_async_query_id(id); On 2014/01/11 11:35:29, jadahl wrote: > On 2014/01/11 02:02:32, piman wrote: > > What if we do several AsyncTexImage2DCHROMIUM on the same buffer? > > That would not work, true. I guess it could work by associating a buffer with a > query serial instead, and if a later AsyncTexImage2DCHROMIUM would occur, it'd > be updated with the new serial. We need to fix this, or change the API to prevent multiple AsyncTexImage2DCHROMIUM on the same buffer (and document that), but I think the latter is very limiting. https://chromiumcodereview.appspot.com/116863003/diff/550001/gpu/command_buff... File gpu/command_buffer/client/buffer_tracker.h (right): https://chromiumcodereview.appspot.com/116863003/diff/550001/gpu/command_buff... gpu/command_buffer/client/buffer_tracker.h:97: bool used_ : 1; nit: remove unused field. https://chromiumcodereview.appspot.com/116863003/diff/550001/gpu/command_buff... File gpu/command_buffer/client/fenced_allocator.cc (right): https://chromiumcodereview.appspot.com/116863003/diff/550001/gpu/command_buff... gpu/command_buffer/client/fenced_allocator.cc:53: CollapseFreeBlock(i); On 2014/01/16 17:24:49, David Reveman wrote: > I don't think it makes sense free this immediately but wait for a token in the > above case. If this doesn't need to wait for a serial then it at least has to > wait for a token. > > I assume that the WaitForToken code above protects against freeing a block > before all commands that might reference it have been processed on the service > side. That is actually not needed to ensure things on the service side, since the commands that refer to the SHM would execute before the IPC to destroy the SHM gets handled (assuming proper order of operations). As I mentioned earlier, If the assumption is that the SHM will be destroyed after this, then there is no synchronization issue. If the SHM will get reused with a fresh new FencedAllocator, then we need some sort of synchronization, yes. > Why wouldn't we need the same protection for blocks used with async > transfers? I agree that being consistent is better. However to be able to "finish", we need to dependency-inject a way to call the right WaitAsyncTexImage2DCHROMIUM that corresponds to the serial. An alternative is to document that it's the responsibility of the client to either destroy the SHM, or ensure synchronization before reuse. https://chromiumcodereview.appspot.com/116863003/diff/550001/gpu/command_buff... File gpu/command_buffer/client/gles2_implementation.cc (right): https://chromiumcodereview.appspot.com/116863003/diff/550001/gpu/command_buff... gpu/command_buffer/client/gles2_implementation.cc:1381: FreeTransferBuffer(buffer); How does synchronization work wrt asynchronous ReadPixels?
https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff... File gpu/command_buffer/client/gles2_implementation.cc (right): https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff... gpu/command_buffer/client/gles2_implementation.cc:3713: buffer->set_async_query_id(id); On 2014/01/16 21:22:50, piman wrote: > On 2014/01/11 11:35:29, jadahl wrote: > > On 2014/01/11 02:02:32, piman wrote: > > > What if we do several AsyncTexImage2DCHROMIUM on the same buffer? > > > > That would not work, true. I guess it could work by associating a buffer with > a > > query serial instead, and if a later AsyncTexImage2DCHROMIUM would occur, it'd > > be updated with the new serial. > > We need to fix this, or change the API to prevent multiple > AsyncTexImage2DCHROMIUM on the same buffer (and document that), but I think the > latter is very limiting. Then should we just introduce a new internal command as discussed, without having to hack gl queries to deal with this? https://chromiumcodereview.appspot.com/116863003/diff/550001/gpu/command_buff... File gpu/command_buffer/client/fenced_allocator.cc (right): https://chromiumcodereview.appspot.com/116863003/diff/550001/gpu/command_buff... gpu/command_buffer/client/fenced_allocator.cc:53: CollapseFreeBlock(i); On 2014/01/16 21:22:50, piman wrote: > On 2014/01/16 17:24:49, David Reveman wrote: > > I don't think it makes sense free this immediately but wait for a token in the > > above case. If this doesn't need to wait for a serial then it at least has to > > wait for a token. > > > > I assume that the WaitForToken code above protects against freeing a block > > before all commands that might reference it have been processed on the service > > side. > > That is actually not needed to ensure things on the service side, since the > commands that refer to the SHM would execute before the IPC to destroy the SHM > gets handled (assuming proper order of operations). > > As I mentioned earlier, If the assumption is that the SHM will be destroyed > after this, then there is no synchronization issue. If the SHM will get reused > with a fresh new FencedAllocator, then we need some sort of synchronization, > yes. > > > Why wouldn't we need the same protection for blocks used with async > > transfers? > > I agree that being consistent is better. However to be able to "finish", we need > to dependency-inject a way to call the right WaitAsyncTexImage2DCHROMIUM that > corresponds to the serial. > An alternative is to document that it's the responsibility of the client to > either destroy the SHM, or ensure synchronization before reuse. We could just helper_->Finish() before the loop to make sure all potential tokens have passed; but that wouldn't synchronize the async commands. To do that now we'd need to busy-wait, hoping the uploader didn't stop uploading. If we really don't want to put the responsibility to some layer above, could we put some kind of waitable here and add another flush-serial command that either signals the waitable when done, or when aborted? https://chromiumcodereview.appspot.com/116863003/diff/550001/gpu/command_buff... File gpu/command_buffer/client/gles2_implementation.cc (right): https://chromiumcodereview.appspot.com/116863003/diff/550001/gpu/command_buff... gpu/command_buffer/client/gles2_implementation.cc:1381: FreeTransferBuffer(buffer); On 2014/01/16 21:22:50, piman wrote: > How does synchronization work wrt asynchronous ReadPixels? Hmm. Good point. Reading how readpixels is implemented, it looks like it should be enough to mark last usage after helper_->ReadPixels() when the pack_transfer_buffer is bound, assuming service side doesn't reads async. Do you think that'd be enough?
https://chromiumcodereview.appspot.com/116863003/diff/550001/gpu/command_buff... File gpu/command_buffer/client/fenced_allocator.cc (right): https://chromiumcodereview.appspot.com/116863003/diff/550001/gpu/command_buff... gpu/command_buffer/client/fenced_allocator.cc:53: CollapseFreeBlock(i); On 2014/01/17 08:50:25, jadahl wrote: > On 2014/01/16 21:22:50, piman wrote: > > On 2014/01/16 17:24:49, David Reveman wrote: > > > I don't think it makes sense free this immediately but wait for a token in > the > > > above case. If this doesn't need to wait for a serial then it at least has > to > > > wait for a token. > > > > > > I assume that the WaitForToken code above protects against freeing a block > > > before all commands that might reference it have been processed on the > service > > > side. > > > > That is actually not needed to ensure things on the service side, since the > > commands that refer to the SHM would execute before the IPC to destroy the SHM > > gets handled (assuming proper order of operations). > > > > As I mentioned earlier, If the assumption is that the SHM will be destroyed > > after this, then there is no synchronization issue. If the SHM will get reused > > with a fresh new FencedAllocator, then we need some sort of synchronization, > > yes. > > > > > Why wouldn't we need the same protection for blocks used with async > > > transfers? > > > > I agree that being consistent is better. However to be able to "finish", we > need > > to dependency-inject a way to call the right WaitAsyncTexImage2DCHROMIUM that > > corresponds to the serial. > > An alternative is to document that it's the responsibility of the client to > > either destroy the SHM, or ensure synchronization before reuse. > > We could just helper_->Finish() before the loop to make sure all potential > tokens have passed; but that wouldn't synchronize the async commands. To do that > now we'd need to busy-wait, hoping the uploader didn't stop uploading. > > If we really don't want to put the responsibility to some layer above, could we > put some kind of waitable here and add another flush-serial command that either > signals the waitable when done, or when aborted? If we're adding an async-token, then some command that will wait for the token on the service side probably makes sense too. We already have a command that will wait for an async upload to finish so this shouldn't be hard to implement.
Hi, Another WIP uploaded. In this version I rolled back the usage of "internal queries", leaving them be all together. Instead, I introduced a common comand "SetAsyncToken" that works similarly to "SetToken". It is ment to be command buffer implementation specific, i.e. GLES2 command buffer will use it for async uploads. I considered adding a SetAsyncUploadToken command internal to gles2 command buffer, but there was no way currently to add internal functions that don't have a gl* function. We could change the generated functions to start from something other than 256 and have 256+somenumber to be allocated for internal gles2 command buffer commands, or change the python script to allow generating code without having a associated gl* function. The other problems are thread safety on the service side. I see no guarantees that async upload tasks (i.e. callbacks on that particular message loop) are flushed before GLES2DecoderImpl (and its engine() and CommandBufferService) are destructed. If this observation is true, then we need to flush before destruction to avoid the callback to try to set the async token on a non-existing service. Or do any of you have a better idea? The other thread issue is where to put the mutex. Now I put it in CommandBufferService so that UpdateState is sychronous, and every other GetState. This would protect the generation-int and the state-copying, but is it enough? Lastly, I have not yet added a WaitForAsyncToken or tests, but I'll get to that.
https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... File gpu/command_buffer/client/buffer_tracker.h (right): https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... gpu/command_buffer/client/buffer_tracker.h:97: GLuint async_token_; GLuint? "int32 last_async_usage_token_;" instead? https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... File gpu/command_buffer/client/fenced_allocator.cc (right): https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... gpu/command_buffer/client/fenced_allocator.cc:56: break; nit: include all enum values here instead of "default:" so failing to update this code when adding a new enum value is caught by the compiler. Also nice to have a NOTREACHED() statement in case "blocks_[i].state" is somehow set to an invalid value. https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... gpu/command_buffer/client/fenced_allocator.cc:162: break; nit: same suggestion here as above. https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... gpu/command_buffer/client/fenced_allocator.cc:250: } nit: and here. https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... File gpu/command_buffer/client/gles2_implementation.cc (right): https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... gpu/command_buffer/client/gles2_implementation.cc:3672: } IMO, the code is easier to understand without these helper functions. Especially if s/set_async_token/set_last_async_usage_token/ https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... File gpu/command_buffer/common/command_buffer.h (right): https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... gpu/command_buffer/common/command_buffer.h:52: uint32 async_token; is there a good reason this is unsigned but |token| is signed? https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... File gpu/command_buffer/service/async_pixel_transfer_manager.h (right): https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... gpu/command_buffer/service/async_pixel_transfer_manager.h:66: virtual void AsyncRunWhenCompleted(base::Callback<void()> callback) = 0; Can you drop "When" from this somehow? "Async" prefix already indicate that this will run asynchronously sometime in the future. Maybe AsyncRunCompletionCallback()? Also s/base::Callback<void()>/base::Closure/ https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... File gpu/command_buffer/service/async_pixel_transfer_manager_egl.cc (right): https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... gpu/command_buffer/service/async_pixel_transfer_manager_egl.cc:751: make_scoped_refptr(observer))); Maybe this should now call ::AsyncRun instead of PostTask directly. https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... File gpu/command_buffer/service/async_pixel_transfer_manager_idle.cc (right): https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... gpu/command_buffer/service/async_pixel_transfer_manager_idle.cc:309: make_scoped_refptr(observer)))); Would be nice to use AsyncRun here instead of push_back directly. https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... File gpu/command_buffer/service/async_pixel_transfer_manager_share_group.h (right): https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... gpu/command_buffer/service/async_pixel_transfer_manager_share_group.h:27: virtual void AsyncRunWhenCompleted(base::Callback<void()> callback) OVERRIDE; where's the implementation of this? https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... File gpu/command_buffer/service/command_buffer_service.cc (right): https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... gpu/command_buffer/service/command_buffer_service.cc:47: CommandBufferService::State CommandBufferService::GetStateImpl() { Please add a lock_.AssertAcquired() call here to protect against this being called without first acquiring |lock_|. https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... gpu/command_buffer/service/command_buffer_service.cc:187: async_token_ = token; Do you need to acquire the lock before writing to |async_token_|? https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... File gpu/command_buffer/service/command_buffer_service.h (right): https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... gpu/command_buffer/service/command_buffer_service.h:72: State GetStateImpl(); Can you adjust the name of this so it's clear |lock_| must be acquired before calling this function? Maybe GetStateImplWithLockAcquired(). https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... gpu/command_buffer/service/command_buffer_service.h:90: base::Lock lock_; Would be nice with a comment here describing what this lock is protecting. https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... File gpu/command_buffer/service/gles2_cmd_decoder.cc (right): https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... gpu/command_buffer/service/gles2_cmd_decoder.cc:680: void AsyncUploadCompleted(uint32 async_token); This is not really a completed upload, it's the passing of a token. s/AsyncUploadCompleted/OnAsyncTokenPassed/ ?
I'm not a big fan of this approach: - adding a GL-specific command in the generic code (though hiding it behind a generic name)... I don't see how it can scale to multiple asynchronous processes we might have (e.g. async readbacks), since there's only a single token. - adding a lock on the command buffer state. Some implementations don't even have threads, and we're punishing all the code that doesn't even use async uploads. - it seems redundant with the existing query mechanism. Maybe what we could do to implement this entirely in the GL layer is: - give an extra buffer shm/offset pair to AsyncTexImage2DCHROMIUM/AsyncTexSubImage2DCHROMIUM, which would just be a single atomic word (similar to queries), that would be initialized to 0 on the client side, and atomically set to 1 on completion on the service side (maybe use QuerySync for that and reuse the QuerySyncManager?). Similar the previous version with queries, but skipping the query namespace and stuff. The thread can directly write to the QuerySync to signal completion. - GLES2Implementation keeps track of which buffer is associated with which such completion token, and not free the buffer until it's passed. WDYT?
https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... File gpu/command_buffer/service/async_pixel_transfer_manager.h (right): https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... gpu/command_buffer/service/async_pixel_transfer_manager.h:66: virtual void AsyncRunWhenCompleted(base::Callback<void()> callback) = 0; This is very similar to AsyncNotifyCompletion both in how it works and what it does in this instance. Could merge this somehow? Some thoughts: - Could we use some kind of token API rather than queries in the client and ditch the queries? - Could the queries just record the token in the client and ditch the service side stuff? Nit: use 'base::Closure'
https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... File gpu/command_buffer/client/cmd_buffer_helper.cc (right): https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... gpu/command_buffer/client/cmd_buffer_helper.cc:195: uint32 CommandBufferHelper::InsertAsyncToken() { Perhaps async token is meant to be more generic, but if it was 'async_upload_token', could we just use the async upload commands rather than have a new command? If it is meant to be generic for other 'async things', it feels like we need an indicator of what 'async thing' we are talking about, to be clear about who owns the async_token and is allowed to change it. Perhaps there could be an enum of async-token-types which currently only has ASYNC_TRANSFER? https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... File gpu/command_buffer/client/cmd_buffer_helper.h (right): https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... gpu/command_buffer/client/cmd_buffer_helper.h:108: // buffer does mean it has passed. Nit: 'doesn't' https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... File gpu/command_buffer/service/command_buffer_service.h (right): https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... gpu/command_buffer/service/command_buffer_service.h:90: base::Lock lock_; I'm agreeing with Piman that a lock_ doesn't seem very nice here. It's primarily just an ownership and/or lifetime thing right? Could clear ownership/lifetime of the async-token relative to the transfer thread solve this? That might not be trivial but I think it can be done without locks.
On 2014/01/22 21:53:58, piman wrote: > I'm not a big fan of this approach: > - adding a GL-specific command in the generic code (though hiding it behind a > generic name)... I don't see how it can scale to multiple asynchronous processes > we might have (e.g. async readbacks), since there's only a single token. > - adding a lock on the command buffer state. Some implementations don't even > have threads, and we're punishing all the code that doesn't even use async > uploads. > - it seems redundant with the existing query mechanism. > > Maybe what we could do to implement this entirely in the GL layer is: > - give an extra buffer shm/offset pair to > AsyncTexImage2DCHROMIUM/AsyncTexSubImage2DCHROMIUM, which would just be a single > atomic word (similar to queries), that would be initialized to 0 on the client > side, and atomically set to 1 on completion on the service side (maybe use > QuerySync for that and reuse the QuerySyncManager?). Similar the previous > version with queries, but skipping the query namespace and stuff. The thread can > directly write to the QuerySync to signal completion. > - GLES2Implementation keeps track of which buffer is associated with which such > completion token, and not free the buffer until it's passed. > > WDYT? The thing is that we need a generic way for the FencedAllocator to check if some async token has passed in order to allow us to keep having it to be in control of its allocations. I.e. we need some "HasAsyncTokenPassed" and probably "WaitForAsyncToken" in CommandBufferHelper. Without making FencedAllocator (and its wrapper, MappedMemoryManager and BufferTracker) GLES2 aware, we still need a generic API for it to use. That being said, we don't need the other generic half of the API (SetAsyncToken command, OnSetAsyncToken, etc) and we could replace that with something else (similarly how it was before this last upload)
On 2014/01/23 02:52:42, epenner wrote: > https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... > File gpu/command_buffer/client/cmd_buffer_helper.cc (right): > > https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... > gpu/command_buffer/client/cmd_buffer_helper.cc:195: uint32 > CommandBufferHelper::InsertAsyncToken() { > Perhaps async token is meant to be more generic, but if it was > 'async_upload_token', could we just use the async upload commands rather than > have a new command? My first plan of this last version was to make the SetAsyncToken command actually be a SetAsyncUploadToken command, but there were currently no way of adding internal commands without having a corresponding gl* function, so I took the easy path in order to get input on the general approach (using new commands). > > If it is meant to be generic for other 'async things', it feels like we need an > indicator of what 'async thing' we are talking about, to be clear about who owns > the async_token and is allowed to change it. Perhaps there could be an enum of > async-token-types which currently only has ASYNC_TRANSFER? I suggested this (enum based async token) before as well. If we'd go for the generic SetAsyncToken approach, I think this would be better. > > https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... > File gpu/command_buffer/client/cmd_buffer_helper.h (right): > > https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... > gpu/command_buffer/client/cmd_buffer_helper.h:108: // buffer does mean it has > passed. > Nit: 'doesn't' > > https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... > File gpu/command_buffer/service/command_buffer_service.h (right): > > https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... > gpu/command_buffer/service/command_buffer_service.h:90: base::Lock lock_; > I'm agreeing with Piman that a lock_ doesn't seem very nice here. It's primarily > just an ownership and/or lifetime thing right? Could clear ownership/lifetime of > the async-token relative to the transfer thread solve this? That might not be > trivial but I think it can be done without locks. GetState() doesn't only get a state, it also increases an int, and thus needed to be protected. It was also put there to protect writing to the shared state, which could now happen from different threads. If *only* GLES2DecoderImpl interacts with the CommandBufferService, we could probably add locks there instead, but it doesn't feel right either.
https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... File gpu/command_buffer/client/buffer_tracker.h (right): https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff... gpu/command_buffer/client/buffer_tracker.h:97: GLuint async_token_; On 2014/01/22 17:30:04, David Reveman wrote: > GLuint? "int32 last_async_usage_token_;" instead? According to what I read in the code, negative tokens are used for reporting errors. Async tokens does not deal with errors as it is now, and thus doesn't need to be signed. It should be uint32 instead of GLuint though. GLuint is a left-over from when |async_token_| was |async_query_id_|.
So there seems to be several options of going forward regarding this change. I will list them, so that its easier to compare them. client -> service ================= 1. Generic commands (with or without identifying enum's), as done in most recent upload 2. Specific commands (e.g. SetAsyncUploadToken), similar to most recent upload, except the command is not generic 3. Extra parameter(s) to AsyncTex(Sub)Image2DCHROMIUM service -> client ================= 1. CommandBuffer::State, similar to most recent upload. Tricky because it will need to deal with multiple threads updating it. 2. Separate shared memory state ala QuerySync. Easier access model (only upload thread). Client needs to keep track of more state, and update "last async token" on the CommandBufferHelper. 3. Reuse QuerySync and QueryManager, but don't use Begin/EndQueryEXT commands. client -> fenced allocator ========================== 1. I think the only solution that has been in discussion recently is the HasAsyncTokenPassed() way. I think regarding client -> service, it's not a hard problem, and just need to agree on what is the better way. service -> client, I'm leaning towards being in favor of 2, as it has a simpler access model, and tracking the state client side is probably easier than dealing with the various threads on the service side. 3 I think there is a risk of having to deal with query details for something that is not a query. As detecting "completion" is more or less an invoked closure posted to a message loop, there is not much to win with reusing. What do people think about this?
This is just regarding CommandBuffer::State. I'll defer to the other reviewers for the other bits as I agree those all sound reasonable. The lock was used for CommandBuffer::State which is shared memory right? Could part of the state (the async_token) be accessed only by the transfer thread and not touched by the GPU thread? It looked like a method that we wanted to use on the transfer thread changed part of the state that is shared with the GPU thread. Could that bit be duplicated so GPU/transfer threads have their own state that is exclusive to the thread that touches it?
On Thu, Jan 23, 2014 at 2:49 AM, <jadahl@opera.com> wrote: > So there seems to be several options of going forward regarding this > change. I > will list them, so that its easier to compare them. > > client -> service > ================= > > 1. Generic commands (with or without identifying enum's), as done in most > recent > upload > The problem with the current way is that it doesn't scale with other types of asynchronous processes. > 2. Specific commands (e.g. SetAsyncUploadToken), similar to most recent > upload, > except the command is not generic > That may be workable, but needs to be at the GLES2 level. > 3. Extra parameter(s) to AsyncTex(Sub)Image2DCHROMIUM > That's currently my preference. > > service -> client > ================= > > 1. CommandBuffer::State, similar to most recent upload. Tricky because it > will > need to deal with multiple threads updating it. > There's also the (tricky) protocol to access it between the service and the client. For the other reasons mentioned above, it doesn't scale either and/or adds GL concepts at that level which is a layering violation. > > 2. Separate shared memory state ala QuerySync. Easier access model (only > upload > thread). Client needs to keep track of more state, and update "last async > token" > on the CommandBufferHelper. > > 3. Reuse QuerySync and QueryManager, but don't use Begin/EndQueryEXT > commands. > 2 or 3 are fine with me > > client -> fenced allocator > ========================== > > 1. I think the only solution that has been in discussion recently is the > HasAsyncTokenPassed() way. > Really, the problem is essentially that you want to be able to poll the state when allocating, to more aggressively reuse. We can add a generic concept where the FenceAllocator client can register a Poll function, that would be called by the FenceAllocator when it wants to try to get more memory. The responsibility of the Poll function would be to free any buffer that it can. Then all the state can stay in the GLES2Implementation (that would implement the Poll function, keeping track of the buffers and associated queries/tokens). > > > I think regarding client -> service, it's not a hard problem, and just > need to > agree on what is the better way. > > service -> client, I'm leaning towards being in favor of 2, as it has a > simpler > access model, and tracking the state client side is probably easier than > dealing > with the various threads on the service side. 3 I think there is a risk of > having to deal with query details for something that is not a query. As > detecting "completion" is more or less an invoked closure posted to a > message > loop, there is not much to win with reusing. > > What do people think about this? > > https://chromiumcodereview.appspot.com/116863003/ > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On Fri, Jan 24, 2014 at 4:50 PM, <epenner@chromium.org> wrote: > This is just regarding CommandBuffer::State. I'll defer to the other > reviewers > for the other bits as I agree those all sound reasonable. > > The lock was used for CommandBuffer::State which is shared memory right? > Could > part of the state (the async_token) be accessed only by the transfer > thread and > not touched by the GPU thread? It looked like a method that we wanted to > use on > the transfer thread changed part of the state that is shared with the GPU > thread. Could that bit be duplicated so GPU/transfer threads have their own > state that is exclusive to the thread that touches it? > The way the State is communicated with the client is tricky - see gpu/command_buffer/common/command_buffer_shared.h I'm super uncomfortable trying to reason about the atomicity of updates if multiple threads are accessing the structure. > https://codereview.chromium.org/116863003/ > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
> The way the State is communicated with the client is tricky - see > gpu/command_buffer/common/command_buffer_shared.h > I'm super uncomfortable trying to reason about the atomicity of updates if > multiple threads are accessing the structure. Thanks I knew there must be some gnarly memory barrier code somewhere. We shouldn't change individual bits of that structure, but could we duplicate the entire thing perhaps? That seems like effectively what 2/3 are doing but using queries instead. The nice thing about a token IMO is that there is just one instead of one query per operation. I don't have a strong preference either way though, just throwing out possibilities.
On 2014/01/25 01:03:51, piman wrote: > On Thu, Jan 23, 2014 at 2:49 AM, <mailto:jadahl@opera.com> wrote: > > > So there seems to be several options of going forward regarding this > > change. I > > will list them, so that its easier to compare them. > > > > client -> service > > ================= > > > > 1. Generic commands (with or without identifying enum's), as done in most > > recent > > upload > > > > The problem with the current way is that it doesn't scale with other types > of asynchronous processes. > > > > 2. Specific commands (e.g. SetAsyncUploadToken), similar to most recent > > upload, > > except the command is not generic > > > > That may be workable, but needs to be at the GLES2 level. > > > > 3. Extra parameter(s) to AsyncTex(Sub)Image2DCHROMIUM > > > > That's currently my preference. > > > > > > service -> client > > ================= > > > > 1. CommandBuffer::State, similar to most recent upload. Tricky because it > > will > > need to deal with multiple threads updating it. > > > > There's also the (tricky) protocol to access it between the service and the > client. > For the other reasons mentioned above, it doesn't scale either and/or adds > GL concepts at that level which is a layering violation. > > > > > > 2. Separate shared memory state ala QuerySync. Easier access model (only > > upload > > thread). Client needs to keep track of more state, and update "last async > > token" > > on the CommandBufferHelper. > > > > 3. Reuse QuerySync and QueryManager, but don't use Begin/EndQueryEXT > > commands. > > > > 2 or 3 are fine with me > > > > > > client -> fenced allocator > > ========================== > > > > 1. I think the only solution that has been in discussion recently is the > > HasAsyncTokenPassed() way. > > > > Really, the problem is essentially that you want to be able to poll the > state when allocating, to more aggressively reuse. > We can add a generic concept where the FenceAllocator client can register a > Poll function, that would be called by the FenceAllocator when it wants to > try to get more memory. The responsibility of the Poll function would be to > free any buffer that it can. > Then all the state can stay in the GLES2Implementation (that would > implement the Poll function, keeping track of the buffers and associated > queries/tokens). This sounds do-able, and would get rid of exposing a GLES2 implementation detail through the command buffer layer. However, what should we do with non-freed memory on destruction. We'd need some extra Wait next to the Poll that would do something equivalent with WaitForAsyncToken, i.e. poll until all its buffers have their associated async upload token passed. Or should the destruction of fenced allocator just do a Poll in a busy-loop until all memory is freed? > > > > > > > > I think regarding client -> service, it's not a hard problem, and just > > need to > > agree on what is the better way. > > > > service -> client, I'm leaning towards being in favor of 2, as it has a > > simpler > > access model, and tracking the state client side is probably easier than > > dealing > > with the various threads on the service side. 3 I think there is a risk of > > having to deal with query details for something that is not a query. As > > detecting "completion" is more or less an invoked closure posted to a > > message > > loop, there is not much to win with reusing. > > > > What do people think about this? > > > > https://chromiumcodereview.appspot.com/116863003/ > > > > To unsubscribe from this group and stop receiving emails from it, send an email > to mailto:chromium-reviews+unsubscribe@chromium.org. As conclusion current implementation plan is: * GLES2Implementation mmaps a AsyncUpladSync struct that has an uint32 async_upload_token (not one per request, one per GLES2Implementation instance) * AsyncTex(Sub)Image2D commands are extended with 3 new arguments: uin32 async_upload_token, void *sync_data (i.e. shm id and offset) - may be 0, 0, 0 for requests with no data. * GLES2CmdDecoderImpl queues closures that writes the associated async_upload_token to the mmap:ed AsyncUploadSync (writing *only* on the thread used for uploading) * GLES2Implementation keeps track of released transfer buffer memory (detached from BufferTracker, still managed by MappedMemoryManager) pending completed async upload - Slightly similar to a previous version, where an issue was raised regarding the unmanaging of buffers * Add generic Poll hook to mapped_memory and (wrapped_)fenced_allocator * Hook up GLES2Implementation to Poll hook, each Poll, read last completed async_upload_token from AsyncUploadSync, Free mapped memory accordingly This means: FREE_PENDING_ASYNC_TOKEN is dropped completely. No more write-from-multiple-threads to generic CommandBuffer state. No more single-usage generic-sounding "tokens" in generic CommandBuffer layer. Please raise any issues you can think of regarding this most recent approach.
On 2014/02/03 13:31:10, jadahl wrote: > On 2014/01/25 01:03:51, piman wrote: > > On Thu, Jan 23, 2014 at 2:49 AM, <mailto:jadahl@opera.com> wrote: > > > > > So there seems to be several options of going forward regarding this > > > change. I > > > will list them, so that its easier to compare them. > > > > > > client -> service > > > ================= > > > > > > 1. Generic commands (with or without identifying enum's), as done in most > > > recent > > > upload > > > > > > > The problem with the current way is that it doesn't scale with other types > > of asynchronous processes. > > > > > > > 2. Specific commands (e.g. SetAsyncUploadToken), similar to most recent > > > upload, > > > except the command is not generic > > > > > > > That may be workable, but needs to be at the GLES2 level. > > > > > > > 3. Extra parameter(s) to AsyncTex(Sub)Image2DCHROMIUM > > > > > > > That's currently my preference. > > > > > > > > > > service -> client > > > ================= > > > > > > 1. CommandBuffer::State, similar to most recent upload. Tricky because it > > > will > > > need to deal with multiple threads updating it. > > > > > > > There's also the (tricky) protocol to access it between the service and the > > client. > > For the other reasons mentioned above, it doesn't scale either and/or adds > > GL concepts at that level which is a layering violation. > > > > > > > > > > 2. Separate shared memory state ala QuerySync. Easier access model (only > > > upload > > > thread). Client needs to keep track of more state, and update "last async > > > token" > > > on the CommandBufferHelper. > > > > > > 3. Reuse QuerySync and QueryManager, but don't use Begin/EndQueryEXT > > > commands. > > > > > > > 2 or 3 are fine with me > > > > > > > > > > client -> fenced allocator > > > ========================== > > > > > > 1. I think the only solution that has been in discussion recently is the > > > HasAsyncTokenPassed() way. > > > > > > > Really, the problem is essentially that you want to be able to poll the > > state when allocating, to more aggressively reuse. > > We can add a generic concept where the FenceAllocator client can register a > > Poll function, that would be called by the FenceAllocator when it wants to > > try to get more memory. The responsibility of the Poll function would be to > > free any buffer that it can. > > Then all the state can stay in the GLES2Implementation (that would > > implement the Poll function, keeping track of the buffers and associated > > queries/tokens). > > This sounds do-able, and would get rid of exposing a GLES2 implementation detail > through the command buffer layer. > > However, what should we do with non-freed memory on destruction. We'd need some > extra Wait next to the Poll that would do something equivalent with > WaitForAsyncToken, i.e. poll until all its buffers have their associated async > upload token passed. Or should the destruction of fenced allocator just do a > Poll in a busy-loop until all memory is freed? > > > > > > > > > > > > > > I think regarding client -> service, it's not a hard problem, and just > > > need to > > > agree on what is the better way. > > > > > > service -> client, I'm leaning towards being in favor of 2, as it has a > > > simpler > > > access model, and tracking the state client side is probably easier than > > > dealing > > > with the various threads on the service side. 3 I think there is a risk of > > > having to deal with query details for something that is not a query. As > > > detecting "completion" is more or less an invoked closure posted to a > > > message > > > loop, there is not much to win with reusing. > > > > > > What do people think about this? > > > > > > https://chromiumcodereview.appspot.com/116863003/ > > > > > > > To unsubscribe from this group and stop receiving emails from it, send an > email > > to mailto:chromium-reviews+unsubscribe@chromium.org. > > As conclusion current implementation plan is: > > * GLES2Implementation mmaps a AsyncUpladSync struct that has an uint32 > async_upload_token (not one per request, one per GLES2Implementation instance) > > * AsyncTex(Sub)Image2D commands are extended with 3 new arguments: uin32 > async_upload_token, void *sync_data (i.e. shm id and offset) > - may be 0, 0, 0 for requests with no data. > > * GLES2CmdDecoderImpl queues closures that writes the associated > async_upload_token to the mmap:ed AsyncUploadSync (writing *only* on the thread > used for uploading) > > * GLES2Implementation keeps track of released transfer buffer memory (detached > from BufferTracker, still managed by MappedMemoryManager) pending completed > async upload > - Slightly similar to a previous version, where an issue was raised regarding > the unmanaging of buffers > > * Add generic Poll hook to mapped_memory and (wrapped_)fenced_allocator Rethinking this, there is no need to have a Poll hook in the FencedAllocator simply because we can just do that check before allocating a transfer buffer. This would still mean that we'd need to "unmanage" buffer data by removing it from the BufferTracker without free:ing or free:ing pending token it from the MappedMemoryManager. As I mentioned, this was a change that was reverted before with the reason that its easier to manage memory if the BufferTracker manages all memory that it ever owned. Any opinions regarding this? > > * Hook up GLES2Implementation to Poll hook, each Poll, read last completed > async_upload_token from AsyncUploadSync, Free mapped memory accordingly > > > This means: > FREE_PENDING_ASYNC_TOKEN is dropped completely. > No more write-from-multiple-threads to generic CommandBuffer state. > No more single-usage generic-sounding "tokens" in generic CommandBuffer layer. > > Please raise any issues you can think of regarding this most recent approach.
On 2014/02/03 14:38:47, jadahl wrote: > On 2014/02/03 13:31:10, jadahl wrote: > > On 2014/01/25 01:03:51, piman wrote: > > > On Thu, Jan 23, 2014 at 2:49 AM, <mailto:jadahl@opera.com> wrote: > > > > > > > So there seems to be several options of going forward regarding this > > > > change. I > > > > will list them, so that its easier to compare them. > > > > > > > > client -> service > > > > ================= > > > > > > > > 1. Generic commands (with or without identifying enum's), as done in most > > > > recent > > > > upload > > > > > > > > > > The problem with the current way is that it doesn't scale with other types > > > of asynchronous processes. > > > > > > > > > > 2. Specific commands (e.g. SetAsyncUploadToken), similar to most recent > > > > upload, > > > > except the command is not generic > > > > > > > > > > That may be workable, but needs to be at the GLES2 level. > > > > > > > > > > 3. Extra parameter(s) to AsyncTex(Sub)Image2DCHROMIUM > > > > > > > > > > That's currently my preference. > > > > > > > > > > > > > > service -> client > > > > ================= > > > > > > > > 1. CommandBuffer::State, similar to most recent upload. Tricky because it > > > > will > > > > need to deal with multiple threads updating it. > > > > > > > > > > There's also the (tricky) protocol to access it between the service and the > > > client. > > > For the other reasons mentioned above, it doesn't scale either and/or adds > > > GL concepts at that level which is a layering violation. > > > > > > > > > > > > > > 2. Separate shared memory state ala QuerySync. Easier access model (only > > > > upload > > > > thread). Client needs to keep track of more state, and update "last async > > > > token" > > > > on the CommandBufferHelper. > > > > > > > > 3. Reuse QuerySync and QueryManager, but don't use Begin/EndQueryEXT > > > > commands. > > > > > > > > > > 2 or 3 are fine with me > > > > > > > > > > > > > > client -> fenced allocator > > > > ========================== > > > > > > > > 1. I think the only solution that has been in discussion recently is the > > > > HasAsyncTokenPassed() way. > > > > > > > > > > Really, the problem is essentially that you want to be able to poll the > > > state when allocating, to more aggressively reuse. > > > We can add a generic concept where the FenceAllocator client can register a > > > Poll function, that would be called by the FenceAllocator when it wants to > > > try to get more memory. The responsibility of the Poll function would be to > > > free any buffer that it can. > > > Then all the state can stay in the GLES2Implementation (that would > > > implement the Poll function, keeping track of the buffers and associated > > > queries/tokens). > > > > This sounds do-able, and would get rid of exposing a GLES2 implementation > detail > > through the command buffer layer. > > > > However, what should we do with non-freed memory on destruction. We'd need > some > > extra Wait next to the Poll that would do something equivalent with > > WaitForAsyncToken, i.e. poll until all its buffers have their associated async > > upload token passed. Or should the destruction of fenced allocator just do a > > Poll in a busy-loop until all memory is freed? > > > > > > > > > > > > > > > > > > > > I think regarding client -> service, it's not a hard problem, and just > > > > need to > > > > agree on what is the better way. > > > > > > > > service -> client, I'm leaning towards being in favor of 2, as it has a > > > > simpler > > > > access model, and tracking the state client side is probably easier than > > > > dealing > > > > with the various threads on the service side. 3 I think there is a risk of > > > > having to deal with query details for something that is not a query. As > > > > detecting "completion" is more or less an invoked closure posted to a > > > > message > > > > loop, there is not much to win with reusing. > > > > > > > > What do people think about this? > > > > > > > > https://chromiumcodereview.appspot.com/116863003/ > > > > > > > > > > To unsubscribe from this group and stop receiving emails from it, send an > > email > > > to mailto:chromium-reviews+unsubscribe@chromium.org. > > > > As conclusion current implementation plan is: > > > > * GLES2Implementation mmaps a AsyncUpladSync struct that has an uint32 > > async_upload_token (not one per request, one per GLES2Implementation instance) > > > > * AsyncTex(Sub)Image2D commands are extended with 3 new arguments: uin32 > > async_upload_token, void *sync_data (i.e. shm id and offset) > > - may be 0, 0, 0 for requests with no data. > > > > * GLES2CmdDecoderImpl queues closures that writes the associated > > async_upload_token to the mmap:ed AsyncUploadSync (writing *only* on the > thread > > used for uploading) > > > > * GLES2Implementation keeps track of released transfer buffer memory (detached > > from BufferTracker, still managed by MappedMemoryManager) pending completed > > async upload > > - Slightly similar to a previous version, where an issue was raised > regarding > > the unmanaging of buffers > > > > * Add generic Poll hook to mapped_memory and (wrapped_)fenced_allocator > > Rethinking this, there is no need to have a Poll hook in the FencedAllocator > simply because we can just do that check before allocating a transfer buffer. > This would still mean that we'd need to "unmanage" buffer data by removing it > from the BufferTracker without free:ing or free:ing pending token it from the > MappedMemoryManager. As I mentioned, this was a change that was reverted before > with the reason that its easier to manage memory if the BufferTracker manages > all memory that it ever owned. Any opinions regarding this? Well, before allocating a buffer would not be enough, we'd need to poll before allocating anything mapped memory manager, so maybe its better anyway to have a hook instead of trying to cover all the cases. Sorry for the noise :P > > > > > * Hook up GLES2Implementation to Poll hook, each Poll, read last completed > > async_upload_token from AsyncUploadSync, Free mapped memory accordingly > > > > > > This means: > > FREE_PENDING_ASYNC_TOKEN is dropped completely. > > No more write-from-multiple-threads to generic CommandBuffer state. > > No more single-usage generic-sounding "tokens" in generic CommandBuffer layer. > > > > Please raise any issues you can think of regarding this most recent approach.
On Mon, Feb 3, 2014 at 5:31 AM, <jadahl@opera.com> wrote: > On 2014/01/25 01:03:51, piman wrote: > >> On Thu, Jan 23, 2014 at 2:49 AM, <mailto:jadahl@opera.com> wrote: >> > > > So there seems to be several options of going forward regarding this >> > change. I >> > will list them, so that its easier to compare them. >> > >> > client -> service >> > ================= >> > >> > 1. Generic commands (with or without identifying enum's), as done in >> most >> > recent >> > upload >> > >> > > The problem with the current way is that it doesn't scale with other types >> of asynchronous processes. >> > > > > 2. Specific commands (e.g. SetAsyncUploadToken), similar to most recent >> > upload, >> > except the command is not generic >> > >> > > That may be workable, but needs to be at the GLES2 level. >> > > > > 3. Extra parameter(s) to AsyncTex(Sub)Image2DCHROMIUM >> > >> > > That's currently my preference. >> > > > > >> > service -> client >> > ================= >> > >> > 1. CommandBuffer::State, similar to most recent upload. Tricky because >> it >> > will >> > need to deal with multiple threads updating it. >> > >> > > There's also the (tricky) protocol to access it between the service and >> the >> client. >> For the other reasons mentioned above, it doesn't scale either and/or adds >> GL concepts at that level which is a layering violation. >> > > > > >> > 2. Separate shared memory state ala QuerySync. Easier access model (only >> > upload >> > thread). Client needs to keep track of more state, and update "last >> async >> > token" >> > on the CommandBufferHelper. >> > >> > 3. Reuse QuerySync and QueryManager, but don't use Begin/EndQueryEXT >> > commands. >> > >> > > 2 or 3 are fine with me >> > > > > >> > client -> fenced allocator >> > ========================== >> > >> > 1. I think the only solution that has been in discussion recently is the >> > HasAsyncTokenPassed() way. >> > >> > > Really, the problem is essentially that you want to be able to poll the >> state when allocating, to more aggressively reuse. >> We can add a generic concept where the FenceAllocator client can register >> a >> Poll function, that would be called by the FenceAllocator when it wants to >> try to get more memory. The responsibility of the Poll function would be >> to >> free any buffer that it can. >> Then all the state can stay in the GLES2Implementation (that would >> implement the Poll function, keeping track of the buffers and associated >> queries/tokens). >> > > This sounds do-able, and would get rid of exposing a GLES2 implementation > detail > through the command buffer layer. > > However, what should we do with non-freed memory on destruction. We'd need > some > extra Wait next to the Poll that would do something equivalent with > WaitForAsyncToken, i.e. poll until all its buffers have their associated > async > upload token passed. Or should the destruction of fenced allocator just do > a > Poll in a busy-loop until all memory is freed? > I suppose there's 2 approaches we can take: - either make the client responsible for ensuring that, e.g. by making sure it calls glFinish before destroying the FenceAllocator, and DCHECK-ing in the FenceAllocator destructor that the Poll indicate everything is passed - add a Wait function, or a parameter to Poll to force a wait by the FencedAllocator. > > > > > >> > >> > I think regarding client -> service, it's not a hard problem, and just >> > need to >> > agree on what is the better way. >> > >> > service -> client, I'm leaning towards being in favor of 2, as it has a >> > simpler >> > access model, and tracking the state client side is probably easier than >> > dealing >> > with the various threads on the service side. 3 I think there is a risk >> of >> > having to deal with query details for something that is not a query. As >> > detecting "completion" is more or less an invoked closure posted to a >> > message >> > loop, there is not much to win with reusing. >> > >> > What do people think about this? >> > >> > https://chromiumcodereview.appspot.com/116863003/ >> > >> > > To unsubscribe from this group and stop receiving emails from it, send an >> > email > >> to mailto:chromium-reviews+unsubscribe@chromium.org. >> > > As conclusion current implementation plan is: > > * GLES2Implementation mmaps a AsyncUpladSync struct that has an uint32 > async_upload_token (not one per request, one per GLES2Implementation > instance) > > * AsyncTex(Sub)Image2D commands are extended with 3 new arguments: uin32 > async_upload_token, void *sync_data (i.e. shm id and offset) > - may be 0, 0, 0 for requests with no data. > > * GLES2CmdDecoderImpl queues closures that writes the associated > async_upload_token to the mmap:ed AsyncUploadSync (writing *only* on the > thread > used for uploading) > > * GLES2Implementation keeps track of released transfer buffer memory > (detached > from BufferTracker, still managed by MappedMemoryManager) pending completed > async upload > - Slightly similar to a previous version, where an issue was raised > regarding > the unmanaging of buffers > > * Add generic Poll hook to mapped_memory and (wrapped_)fenced_allocator > > * Hook up GLES2Implementation to Poll hook, each Poll, read last completed > async_upload_token from AsyncUploadSync, Free mapped memory accordingly > > > This means: > FREE_PENDING_ASYNC_TOKEN is dropped completely. > No more write-from-multiple-threads to generic CommandBuffer state. > No more single-usage generic-sounding "tokens" in generic CommandBuffer > layer. > > Please raise any issues you can think of regarding this most recent > approach. > Just make sure you use base::subtle::Atomic32 and the right operations to modify the shared memory (AsyncUploadSync), but otherwise this sounds good. > > https://chromiumcodereview.appspot.com/116863003/ > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
Hi, New version uploaded. This one uses the existing Async(Sub)TexImage2D to communicate the next async upload token, and a AsyncUploadSync that is not per command but per GLES2Implementation instance. FencedAllocator Poll doesn't have a wait flag, but I made it so that the destructor of GLES2Implementation makes sure all async uploads are finished. I didn't use base::subtle::Atomic32 in AsyncUploadSync. What would be the point of this? Only the upload thread will ever change that value, so there is no need for thread safe int on the service side. I added a couple of unit tests to mapped_memory_unittest.cc and buffer_tracker_unittest.cc. There seems to be no tests on the client side testing Async(Sub)TexImage2D AFAICS, so didn't write any tests for that this time. Regarding the service side, tests there would require valid shared memory, so waited with that until I get your feedback. Jonas
On Fri, Feb 7, 2014 at 4:59 AM, <jadahl@opera.com> wrote: > Hi, > > New version uploaded. This one uses the existing Async(Sub)TexImage2D to > communicate the next async upload token, and a AsyncUploadSync that is not > per > command but per GLES2Implementation instance. > > FencedAllocator Poll doesn't have a wait flag, but I made it so that the > destructor of GLES2Implementation makes sure all async uploads are > finished. > > I didn't use base::subtle::Atomic32 in AsyncUploadSync. What would be the > point > of this? Only the upload thread will ever change that value, so there is > no need > for thread safe int on the service side. > It's written on the upload thread side, but read on the client thread, without synchronization otherwise. > > I added a couple of unit tests to mapped_memory_unittest.cc and > buffer_tracker_unittest.cc. There seems to be no tests on the client side > testing Async(Sub)TexImage2D AFAICS, so didn't write any tests for that > this > time. Regarding the service side, tests there would require valid shared > memory, > so waited with that until I get your feedback. > > Jonas > > https://codereview.chromium.org/116863003/ > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
This is starting to look pretty great IMO. Thanks! https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... File gpu/command_buffer/client/buffer_tracker.h (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... gpu/command_buffer/client/buffer_tracker.h:39: async_token_(0) { Naming nit: 'last_async_token'? I recall there was reason to not change the name when using queries, but tokens are thee same right? So the name should be the same? https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... File gpu/command_buffer/client/gles2_implementation.cc (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... gpu/command_buffer/client/gles2_implementation.cc:162: base::Unretained(this), Nit: Maybe overkill in this case, but it never hurts to comment why Unretained is being used. Eg: // Unretained since 'this' outlives mapped_memory_, and it's called on this thread. https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... gpu/command_buffer/client/gles2_implementation.cc:1529: base::subtle::MemoryBarrier(); See nit on other MemoryBarrier. This one also seems extraneous if RemoveTransferBuffer is called several times in a row. Maybe I'm wrong, but see my other comment on the other MemoryBarrier for my reasoning. https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... gpu/command_buffer/client/gles2_implementation.cc:3696: base::subtle::MemoryBarrier(); It wouldn't hurt to document each of these base::subtle calls, and why exactly it is needed. It seems mostly clear now, but for posterity it's always nice to know. Two things on this one: First, do we really need a MemoryBarrier here? Don't we just need the latest token, and could that be done without a full MemoryBarrier? Second, do we need to have a memory barrier in a loop here? The chances of the token changing while we are in this loop seems low and not worth caring about. If we did this only once would the token just have a slight chance of being out of date? Alternatively could we get the token once and use it for testing all uploads? https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser... File gpu/command_buffer/service/async_pixel_transfer_manager.h (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser... gpu/command_buffer/service/async_pixel_transfer_manager.h:67: virtual void AsyncRun(const base::Closure& callback) = 0; This leads to duplicate work happening here where we notify the query and also update the token. It's fine for this patch, but do you think we could use the token in the client eventually to replace the queries being used? https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser... File gpu/command_buffer/service/gles2_cmd_decoder.cc (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser... gpu/command_buffer/service/gles2_cmd_decoder.cc:10296: base::subtle::MemoryBarrier(); See my other comment on MemoryBarriers. Wouldn't hurt to have a comment. And, do we need a full barrier here since we are only writing to the value? I have to admit I'm not completely familiar with what is required on either side.
https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... File gpu/command_buffer/client/cmd_buffer_helper.h (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... gpu/command_buffer/client/cmd_buffer_helper.h:10: #include <list> nit: you don't need this. https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... gpu/command_buffer/client/cmd_buffer_helper.h:14: #include "base/bind.h" nit: or this https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... File gpu/command_buffer/client/fenced_allocator.cc (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... gpu/command_buffer/client/fenced_allocator.cc:55: // DCHECK_EQ(blocks_[0].state, FREE); I would like to reinstate those checks, for the async case - to ensure we guaranteed we finished all async uploads before destroying the MemoryManager. For the lost context case, once we know the context is lost, we know that either the service already used the buffer that's in flight, or it won't use it in the future. At that point it is safe to free/reuse (we could check that condition in FreeUnused). https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... gpu/command_buffer/client/fenced_allocator.cc:214: if (block.state == FREE_PENDING_TOKEN && block.token <= last_token_read) { While you're here... it looks like the 'block.token <= last_token_read' check doesn't take wraparound into account. Could you make it use your new HasTokenPassed ? https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... File gpu/command_buffer/client/gles2_implementation.cc (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... gpu/command_buffer/client/gles2_implementation.cc:111: async_upload_sync_(NULL), Also need initializers for async_upload_token_, async_upload_sync_shm_id_, async_upload_sync_shm_offset_ https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... gpu/command_buffer/client/gles2_implementation.cc:291: mapped_memory_->Free(static_cast<void*>(async_upload_sync_)); nit: static_cast isn't needed. https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... gpu/command_buffer/client/gles2_implementation.cc:3696: base::subtle::MemoryBarrier(); On 2014/02/07 20:43:58, epennerAtGoogle wrote: > It wouldn't hurt to document each of these base::subtle calls, and why exactly > it is needed. It seems mostly clear now, but for posterity it's always nice to > know. > > Two things on this one: > > First, do we really need a MemoryBarrier here? Don't we just need the latest > token, and could that be done without a full MemoryBarrier? > > Second, do we need to have a memory barrier in a loop here? The chances of the > token changing while we are in this loop seems low and not worth caring about. > If we did this only once would the token just have a slight chance of being out > of date? Alternatively could we get the token once and use it for testing all > uploads? TBH, I'd rather make the access to the AsyncUploadSync be explicit about the memory semantics (i.e. use Acquire_Load to read and Release_Store to write). Basically encapsulating the atomicity as low as possible (in AsyncUploadSync), so that 'subtle' doesn't leak everywhere. Like we did for e.g. gpu::SharedState https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... gpu/command_buffer/client/gles2_implementation.cc:3700: if (HasAsyncUploadTokenPassed(it->second)) { Similar to what I mentioned in FencedAllocator, we can free this if the context is lost too (it either has consumed the buffer already, or it will never do so). https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... gpu/command_buffer/client/gles2_implementation.cc:3708: WaitForCmd(); We need more than this, otherwise we'll spin loop (possibly preventing the GPU process from making progress). WaitForCmd only ensures all the commands have been consumed by the service side, but doesn't ensure completion of the async uploads. As is, nothing ensures that on the next loop the GPU process will have made progress (in truth, WaitForCmd becomes a noop after the first one). I think we want the semantics of WaitAsyncTexImage2DCHROMIUM, but applied to all async uploads. Maybe a new WaitAllAsyncTexImage2DCHROMIUM? Also, we need this to exit the loop if the context is lost and the service side doesn't make progress any more (WaitForCmd is a noop). https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... File gpu/command_buffer/client/gles2_implementation.h (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... gpu/command_buffer/client/gles2_implementation.h:609: !(async_upload_sync_->async_token & 0x80000000)) { Reading async_token twice seems race-prone, it could have changed between line 606 and 609. Instead read it once and use that value twice. https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser... File gpu/command_buffer/service/async_pixel_transfer_manager.h (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser... gpu/command_buffer/service/async_pixel_transfer_manager.h:66: // This includes asynchronous upload tasks. I'm not the biggest fan of this pattern, because it's very hard to reason with. It's not inconceivable that an async upload needs to do several hops between threads, in which case "AsyncRun" becomes hard to define. I'd rather pass the callback to AsyncPixelTransferDelegate::AsyncTexImage2D with the explicit semantic of "this callback is run as soon as the input buffer has been consumed, possibly run on another thread", and have the implementation of that semantic being the responsibility of each delegate (or manager). https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser... File gpu/command_buffer/service/command_buffer_service.h (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser... gpu/command_buffer/service/command_buffer_service.h:10: #include "base/synchronization/lock.h" You don't need this, or the other changes in this file any more. https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser... File gpu/command_buffer/service/gles2_cmd_decoder.cc (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser... gpu/command_buffer/service/gles2_cmd_decoder.cc:10324: sync_data_shm_id, sync_data_shm_offset, sizeof(*sync)); The shared memory backing the AsyncUploadSync could be destroyed / unmapped before the completion. For the transfer data, we dup the shm (for better or worse), so we should do something similar or add some refcounting of the transfer buffers, or something.
> https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser... > gpu/command_buffer/service/gles2_cmd_decoder.cc:10324: sync_data_shm_id, > sync_data_shm_offset, sizeof(*sync)); > The shared memory backing the AsyncUploadSync could be destroyed / unmapped > before the completion. For the transfer data, we dup the shm (for better or > worse), so we should do something similar or add some refcounting of the > transfer buffers, or something. SafeSharedMemoryPool handles this for the other ones. It's not very nice way to solve this :(. The whole thing should be replaced with a ref-counting scheme and a more-opaque buffer object (no direct access to shared-memory handle etc.). I've got a bug against me for that.
https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... File gpu/command_buffer/client/buffer_tracker.h (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... gpu/command_buffer/client/buffer_tracker.h:39: async_token_(0) { On 2014/02/07 20:43:58, epennerAtGoogle wrote: > Naming nit: 'last_async_token'? I recall there was reason to not change the > name when using queries, but tokens are thee same right? So the name should be > the same? > Before I called it "serial" and changed it to async_token when putting the implementation "in parallel" to regular tokens. Should we stay with token naming or go back to something sounding unrelated? async_upload_serial, last_async_upload_serial, last_async_upload_token, ... https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... File gpu/command_buffer/client/fenced_allocator.cc (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... gpu/command_buffer/client/fenced_allocator.cc:55: // DCHECK_EQ(blocks_[0].state, FREE); On 2014/02/07 22:58:20, piman wrote: > I would like to reinstate those checks, for the async case - to ensure we > guaranteed we finished all async uploads before destroying the MemoryManager. > > For the lost context case, once we know the context is lost, we know that either > the service already used the buffer that's in flight, or it won't use it in the > future. At that point it is safe to free/reuse (we could check that condition in > FreeUnused). Should maybe the ~GLES2Implementation() take care of free:ing all of the memory it manages without waiting in the lost context case? Regarding the blocks pending token, sounds a bit unrelated to this patch. Wouldn't it be better doing that separately, for bisectability? I could upload another patch once this has landed that reinstates this check. https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... gpu/command_buffer/client/fenced_allocator.cc:214: if (block.state == FREE_PENDING_TOKEN && block.token <= last_token_read) { On 2014/02/07 22:58:20, piman wrote: > While you're here... it looks like the 'block.token <= last_token_read' check > doesn't take wraparound into account. Could you make it use your new > HasTokenPassed ? Sure. https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... File gpu/command_buffer/client/gles2_implementation.cc (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... gpu/command_buffer/client/gles2_implementation.cc:3696: base::subtle::MemoryBarrier(); On 2014/02/07 20:43:58, epennerAtGoogle wrote: > It wouldn't hurt to document each of these base::subtle calls, and why exactly > it is needed. It seems mostly clear now, but for posterity it's always nice to > know. > > Two things on this one: > > First, do we really need a MemoryBarrier here? Don't we just need the latest > token, and could that be done without a full MemoryBarrier? > > Second, do we need to have a memory barrier in a loop here? The chances of the > token changing while we are in this loop seems low and not worth caring about. > If we did this only once would the token just have a slight chance of being out > of date? Alternatively could we get the token once and use it for testing all > uploads? To be honest, I added the barrier here mostly because it was used in a similar way by the query manager. But I don't think it is really necessary here, as I don't see how we could ever write out-of-order tokens as whoever writes next will do so from the same message loop at some later point in time. I guess we could just go with very explicit inc-store/load here and in the client. But when would this really matter? Atomic increase is used to protect writing from multiple threads, but when would it matter if we read without synchronization? The only risk is we read one token before the most recent, and that would be extremely rare and harmless. https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... File gpu/command_buffer/client/gles2_implementation.h (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... gpu/command_buffer/client/gles2_implementation.h:609: !(async_upload_sync_->async_token & 0x80000000)) { On 2014/02/07 22:58:20, piman wrote: > Reading async_token twice seems race-prone, it could have changed between line > 606 and 609. > Instead read it once and use that value twice. True. Good catch. https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser... File gpu/command_buffer/service/async_pixel_transfer_manager.h (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser... gpu/command_buffer/service/async_pixel_transfer_manager.h:66: // This includes asynchronous upload tasks. On 2014/02/07 22:58:20, piman wrote: > I'm not the biggest fan of this pattern, because it's very hard to reason with. > It's not inconceivable that an async upload needs to do several hops between > threads, in which case "AsyncRun" becomes hard to define. > I'd rather pass the callback to AsyncPixelTransferDelegate::AsyncTexImage2D with > the explicit semantic of "this callback is run as soon as the input buffer has > been consumed, possibly run on another thread", and have the implementation of > that semantic being the responsibility of each delegate (or manager). Sounds reasonable to pass the callback to AsyncTexImage2D, I'll update the patch to do that. https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser... gpu/command_buffer/service/async_pixel_transfer_manager.h:67: virtual void AsyncRun(const base::Closure& callback) = 0; On 2014/02/07 20:43:58, epennerAtGoogle wrote: > This leads to duplicate work happening here where we notify the query and also > update the token. It's fine for this patch, but do you think we could use the > token in the client eventually to replace the queries being used? I'm thinking, can't we just handle those queries completely on the client side after this? https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser... File gpu/command_buffer/service/command_buffer_service.h (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser... gpu/command_buffer/service/command_buffer_service.h:10: #include "base/synchronization/lock.h" On 2014/02/07 22:58:20, piman wrote: > You don't need this, or the other changes in this file any more. Oops. Left them here unintentionally. https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser... File gpu/command_buffer/service/gles2_cmd_decoder.cc (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser... gpu/command_buffer/service/gles2_cmd_decoder.cc:10324: sync_data_shm_id, sync_data_shm_offset, sizeof(*sync)); On 2014/02/07 22:58:20, piman wrote: > The shared memory backing the AsyncUploadSync could be destroyed / unmapped > before the completion. For the transfer data, we dup the shm (for better or > worse), so we should do something similar or add some refcounting of the > transfer buffers, or something. As epenner replied elsewhere, SafeSharedMemoryPool handles this other places, so I guess I should look into using that. And then it should be replaced with ref counting and some layering. epenner, what bug number is that?
https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser... File gpu/command_buffer/service/async_pixel_transfer_manager.h (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser... gpu/command_buffer/service/async_pixel_transfer_manager.h:66: // This includes asynchronous upload tasks. On 2014/02/08 09:18:25, jadahl wrote: > On 2014/02/07 22:58:20, piman wrote: > > I'm not the biggest fan of this pattern, because it's very hard to reason > with. > > It's not inconceivable that an async upload needs to do several hops between > > threads, in which case "AsyncRun" becomes hard to define. > > I'd rather pass the callback to AsyncPixelTransferDelegate::AsyncTexImage2D > with > > the explicit semantic of "this callback is run as soon as the input buffer has > > been consumed, possibly run on another thread", and have the implementation of > > that semantic being the responsibility of each delegate (or manager). > > Sounds reasonable to pass the callback to AsyncTexImage2D, I'll update the patch > to do that. > Reconsidering this, I think we still need something like this, because not every AsyncTexImage2D command results in a call to Manager::AsyncTexImage2D (in case of error), and we cannot write that upload token directly, but we don't want to drop it because it'll block free:ing the memory until some successful upload.
https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... File gpu/command_buffer/client/buffer_tracker.h (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... gpu/command_buffer/client/buffer_tracker.h:39: async_token_(0) { Ahh I see. +1 to last_async_upload_token. No strong preference otherwise. https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser... File gpu/command_buffer/service/gles2_cmd_decoder.cc (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser... gpu/command_buffer/service/gles2_cmd_decoder.cc:10324: sync_data_shm_id, sync_data_shm_offset, sizeof(*sync)); On 2014/02/08 09:18:25, jadahl wrote: > On 2014/02/07 22:58:20, piman wrote: > > The shared memory backing the AsyncUploadSync could be destroyed / unmapped > > before the completion. For the transfer data, we dup the shm (for better or > > worse), so we should do something similar or add some refcounting of the > > transfer buffers, or something. > > As epenner replied elsewhere, SafeSharedMemoryPool handles this other places, so > I guess I should look into using that. And then it should be replaced with ref > counting and some layering. epenner, what bug number is that? Bug for reference counting SharedMemory: https://code.google.com/p/chromium/issues/detail?id=177063
https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... File gpu/command_buffer/client/gles2_implementation.cc (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... gpu/command_buffer/client/gles2_implementation.cc:3696: base::subtle::MemoryBarrier(); On 2014/02/08 09:18:25, jadahl wrote: > On 2014/02/07 20:43:58, epennerAtGoogle wrote: > > It wouldn't hurt to document each of these base::subtle calls, and why exactly > > it is needed. It seems mostly clear now, but for posterity it's always nice to > > know. > > > > Two things on this one: > > > > First, do we really need a MemoryBarrier here? Don't we just need the latest > > token, and could that be done without a full MemoryBarrier? > > > > Second, do we need to have a memory barrier in a loop here? The chances of the > > token changing while we are in this loop seems low and not worth caring about. > > If we did this only once would the token just have a slight chance of being > out > > of date? Alternatively could we get the token once and use it for testing all > > uploads? > > To be honest, I added the barrier here mostly because it was used in a similar > way by the query manager. But I don't think it is really necessary here, as I > don't see how we could ever write out-of-order tokens as whoever writes next > will do so from the same message loop at some later point in time. > I guess we could just go with very explicit inc-store/load here and in the > client. But when would this really matter? Atomic increase is used to protect > writing from multiple threads, but when would it matter if we read without > synchronization? The only risk is we read one token before the most recent, and > that would be extremely rare and harmless. Some useful background: http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could... The compiler/cpu only guarantees serialization as seen from one thread. But what can appear as serial in one thread may be seen out-of-order (or worse, "impossible" states) from another thread. Just throwing MemoryBarriers around isn't so useful because: - it's hard to convince oneself of whether it's necessary and/or sufficient. - they don't convey the semantics. In this case, I'm pretty sure that this one MemoryBarrier is neither necessary nor sufficient. Aquire_Load and Release_Store convey the proper semantics. The AsyncUploadSync is your semaphore that guards the shared state which is the transfer buffer. On the GPU side, you need to make sure that all memory accesses to the transfer buffer are not reordered to after the write to the AsyncUploadSync is visible to other threads. That's the semantic of Release_Store. On the client side, you need to make sure all memory accesses to the transfer buffer are not reordered to before the read of the AsyncUploadSync. That's the semantic of Acquire_Load. The good news is that on x86, you can elide the memory barrier in those 2 cases - see the implementation of Acquire_Load/Release_Store (they just need a compiler barrier to make sure the compiler itself doesn't reorder). That's properly abstracted by the per-platform implementations of Acquire_Load/Release_Store Since the behavior is very sublte (hence the namespace), the use of atomic operations should be invisible/irrelevant to the higher-level code, and instead should be abstracted as low as possible. Best, I think, would be to have it in AsyncUploadSync, encapsulating the semantic that we care about, something like: void SignalCompletion(uint32_t token) { Release_Store(&async_toke, token); } bool IsCompleted(uint32_t token) { uint32_t current_token = Acquire_Load(&async_token); return (current_token - token < 0x80000000); } After that, all the memory logic is irrelevant in the rest of the code. We can test the hell out of AsyncUploadSync for "strange" races, and make sure the rest of the code will be correct.
On 2014/02/11 00:09:08, piman wrote: > https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... > File gpu/command_buffer/client/gles2_implementation.cc (right): > > https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... > gpu/command_buffer/client/gles2_implementation.cc:3696: > base::subtle::MemoryBarrier(); > On 2014/02/08 09:18:25, jadahl wrote: > > On 2014/02/07 20:43:58, epennerAtGoogle wrote: > > > It wouldn't hurt to document each of these base::subtle calls, and why > exactly > > > it is needed. It seems mostly clear now, but for posterity it's always nice > to > > > know. > > > > > > Two things on this one: > > > > > > First, do we really need a MemoryBarrier here? Don't we just need the latest > > > token, and could that be done without a full MemoryBarrier? > > > > > > Second, do we need to have a memory barrier in a loop here? The chances of > the > > > token changing while we are in this loop seems low and not worth caring > about. > > > If we did this only once would the token just have a slight chance of being > > out > > > of date? Alternatively could we get the token once and use it for testing > all > > > uploads? > > > > To be honest, I added the barrier here mostly because it was used in a similar > > way by the query manager. But I don't think it is really necessary here, as I > > don't see how we could ever write out-of-order tokens as whoever writes next > > will do so from the same message loop at some later point in time. > > > I guess we could just go with very explicit inc-store/load here and in the > > client. But when would this really matter? Atomic increase is used to protect > > writing from multiple threads, but when would it matter if we read without > > synchronization? The only risk is we read one token before the most recent, > and > > that would be extremely rare and harmless. > > Some useful background: > http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could... > The compiler/cpu only guarantees serialization as seen from one thread. But what > can appear as serial in one thread may be seen out-of-order (or worse, > "impossible" states) from another thread. > > Just throwing MemoryBarriers around isn't so useful because: > - it's hard to convince oneself of whether it's necessary and/or sufficient. > - they don't convey the semantics. > In this case, I'm pretty sure that this one MemoryBarrier is neither necessary > nor sufficient. > > > Aquire_Load and Release_Store convey the proper semantics. The AsyncUploadSync > is your semaphore that guards the shared state which is the transfer buffer. > On the GPU side, you need to make sure that all memory accesses to the transfer > buffer are not reordered to after the write to the AsyncUploadSync is visible to > other threads. That's the semantic of Release_Store. > On the client side, you need to make sure all memory accesses to the transfer > buffer are not reordered to before the read of the AsyncUploadSync. That's the > semantic of Acquire_Load. > > The good news is that on x86, you can elide the memory barrier in those 2 cases > - see the implementation of Acquire_Load/Release_Store (they just need a > compiler barrier to make sure the compiler itself doesn't reorder). That's > properly abstracted by the per-platform implementations of > Acquire_Load/Release_Store > > Since the behavior is very sublte (hence the namespace), the use of atomic > operations should be invisible/irrelevant to the higher-level code, and instead > should be abstracted as low as possible. > Best, I think, would be to have it in AsyncUploadSync, encapsulating the > semantic that we care about, something like: > void SignalCompletion(uint32_t token) { > Release_Store(&async_toke, token); > } > > bool IsCompleted(uint32_t token) { > uint32_t current_token = Acquire_Load(&async_token); > return (current_token - token < 0x80000000); > } > > > After that, all the memory logic is irrelevant in the rest of the code. We can > test the hell out of AsyncUploadSync for "strange" races, and make sure the rest > of the code will be correct. Thanks for the explanation and link, I think I understand more what's going on now, more or less. It seems that on the client side I implemented semantics similar to Release_Read instead of Acquire_Read which would not ensure the order of the data read, as it was not put behind a fence (in the ARM case at least) (if I understand correctly). I agree its a good idea put all this kind of memory logic in one place, and AsyncUploadSync seems like a good place.
> Thanks for the explanation and link, I think I understand more what's going on > now, more or less. It seems that on the client side I implemented semantics > similar to Release_Read instead of Acquire_Read which would not ensure the order > of the data read, as it was not put behind a fence (in the ARM case at least) > (if I understand correctly). > > I agree its a good idea put all this kind of memory logic in one place, and > AsyncUploadSync seems like a good place. I also liked that link! It sounds like Acquire_Load is relatively inexpensive. If the cost is still significant, it still sounds like we could separate GetLatestToken() from IsCompleted(), since that would be more conservative as opposed to not conservative enough? But if it's in the noise, forget this then as it's just over-optimizing.
On Thu, Feb 13, 2014 at 12:35 PM, <epenner@chromium.org> wrote: > Thanks for the explanation and link, I think I understand more what's >> going on >> now, more or less. It seems that on the client side I implemented >> semantics >> similar to Release_Read instead of Acquire_Read which would not ensure the >> > order > >> of the data read, as it was not put behind a fence (in the ARM case at >> least) >> (if I understand correctly). >> > > I agree its a good idea put all this kind of memory logic in one place, >> and >> AsyncUploadSync seems like a good place. >> > > I also liked that link! > > It sounds like Acquire_Load is relatively inexpensive. It's probably on the order of 100ns. Similar order of magnitude to a thread safe refcount operation. > If the cost is still > significant, it still sounds like we could separate GetLatestToken() from > IsCompleted(), since that would be more conservative as opposed to not > conservative enough? But if it's in the noise, forget this then as it's > just > over-optimizing. > We could. I think we should start with something simple though. > > > > > https://chromiumcodereview.appspot.com/116863003/ > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2014/02/13 20:35:12, epenner wrote: > > Thanks for the explanation and link, I think I understand more what's going on > > now, more or less. It seems that on the client side I implemented semantics > > similar to Release_Read instead of Acquire_Read which would not ensure the > order > > of the data read, as it was not put behind a fence (in the ARM case at least) > > (if I understand correctly). > > > > I agree its a good idea put all this kind of memory logic in one place, and > > AsyncUploadSync seems like a good place. > > I also liked that link! > > It sounds like Acquire_Load is relatively inexpensive. If the cost is still > significant, it still sounds like we could separate GetLatestToken() from > IsCompleted(), since that would be more conservative as opposed to not > conservative enough? But if it's in the noise, forget this then as it's just > over-optimizing. I also considered the performance impact, so I took a look at how it is implemented in the kernel[0] when running on ARM. It seems to more or less only involve invoking a special memory fence instructions (dmb[1] on >= armv7), so I think the performance impact is negligible. [0] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/arm... [1] http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/CIHGHHIE...
https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... File gpu/command_buffer/client/fenced_allocator.cc (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... gpu/command_buffer/client/fenced_allocator.cc:55: // DCHECK_EQ(blocks_[0].state, FREE); On 2014/02/08 09:18:25, jadahl wrote: > On 2014/02/07 22:58:20, piman wrote: > > I would like to reinstate those checks, for the async case - to ensure we > > guaranteed we finished all async uploads before destroying the MemoryManager. > > > > For the lost context case, once we know the context is lost, we know that > either > > the service already used the buffer that's in flight, or it won't use it in > the > > future. At that point it is safe to free/reuse (we could check that condition > in > > FreeUnused). > > Should maybe the ~GLES2Implementation() take care of free:ing all of the memory > it manages without waiting in the lost context case? > > Regarding the blocks pending token, sounds a bit unrelated to this patch. > Wouldn't it be better doing that separately, for bisectability? I could upload > another patch once this has landed that reinstates this check. I added lost-context handling (free's all non-free blocks (both in use and pending token)) to the FreeUnused() and free:ing all unmanaged memory in the Poll callback in gles2_implementation.cc. Commenting out these two lines leads to tons of asserts when running the unit tests, since they assume its ok to leak. I suggest leaving these two commented out, fixing context-lost handling as well as making the tests pass in a separate patch.
New version uploaded. In this one I added a glWaitAllAsyncTexImage2D that is more or less identical to glWaitAsyncTexImage2D for whatever upload was queued last. It is called in ~GLES2Implementation in order to properly clean out the async uploads before destructing the mapped memory manager. I did not remove the two DCHECK's as I got too many unrelated failing test cases. Another change worth noting is the revert of the AsyncPixelTransferManager::AsyncRun() function. Instead I just use the already existing AsyncNotifyCompletion which has the shared memory guards that was needed. I did not make async upload token updating part of AsyncPixelTransferDelegate::AsyncTex(Sub)Image2D as it block free:ing a buffer if the AsyncTex(Sub)Image2D command would fail. Other than that, it's mostly addressed review issues. I'd like to know if there are any more unit tests you are expecting from this patch. Also, thanks for all the reviewing and valuable input! Jonas https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... File gpu/command_buffer/client/gles2_implementation.cc (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... gpu/command_buffer/client/gles2_implementation.cc:111: async_upload_sync_(NULL), On 2014/02/07 22:58:20, piman wrote: > Also need initializers for async_upload_token_, async_upload_sync_shm_id_, > async_upload_sync_shm_offset_ async_upload_token_ is needed indeed, but shm_id and shm_offset would be redundant since they are coupled with |async_upload_sync_|. |async_upload_sync_| is used as the guard for checking the validity/no-oom of async upload sync.
https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... File gpu/command_buffer/client/gles2_implementation.cc (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli... gpu/command_buffer/client/gles2_implementation.cc:111: async_upload_sync_(NULL), On 2014/02/19 14:06:20, jadahl wrote: > On 2014/02/07 22:58:20, piman wrote: > > Also need initializers for async_upload_token_, async_upload_sync_shm_id_, > > async_upload_sync_shm_offset_ > > async_upload_token_ is needed indeed, but shm_id and shm_offset would be > redundant since they are coupled with |async_upload_sync_|. |async_upload_sync_| > is used as the guard for checking the validity/no-oom of async upload sync. Per style guide ( http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml?showone=Initia... ), please initialize all fields that are not default-initialized (i.e. POD). https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/cli... File gpu/command_buffer/client/fenced_allocator_test.cc (right): https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/cli... gpu/command_buffer/client/fenced_allocator_test.cc:107: } Could we add a test that exercises the poll feature at the low level? Create a FencedAllocator passing a mock Poll function. Allocate several entries. Call FreeUnused, with/whithout the mock Poll function set up to free an entry. Check consistency. https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/cli... File gpu/command_buffer/client/gles2_implementation.cc (right): https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/cli... gpu/command_buffer/client/gles2_implementation.cc:299: } It would be nice to do this in FreeEverything too. FreeEverything is called when a tab goes offscreen, so: 1- it's ok to take long (i.e. waiting for things to happen). 2- we want to free as much as possible at that point https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/cli... gpu/command_buffer/client/gles2_implementation.cc:1540: base::subtle::MemoryBarrier(); Because the necessary memory barrier is handled in HasAsyncUploadTokenPassed, you can remove this. https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/com... File gpu/command_buffer/common/gles2_cmd_format.h (right): https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/com... gpu/command_buffer/common/gles2_cmd_format.h:169: }; Can we add a unit test for this, that mimics our usage pattern? Something along the lines of: void ReadCompareAndSignal(uint32* value, uint32 expected, uint32 token, AsyncUploadSync* sync) { // Simulate a "read" of the shared data, similar to what texture upload would do EXPECT_EQ(expected, *value); sync->SetAsyncUploadToken(token); } TEST() { const size_t kSize = 10; const size_t kCount = 10000; // some value that keeps running time to below 1s or so but making sure its >> kSize so we reuse memory many times. AsyncUploadSync sync; sync.Reset(); uint32 token = 0; // or some value to exercise wrapping. scoped_ptr<uint32[]> buffer(new uint32[kSize]); scoped_ptr<uint32[]> tokens(new uint32[kSize]); memset(tokens.get(), 0, sizeod(uint32[kSize]); base::Thread thread("Upload Thread"); thread.Start(); for (uint32 i = 0; i < kCount; ++i) { size_t offset = i % kSize; while (!sync.HasAsyncUploadTokenPassed(tokens[i])) PlatformThread::YieldCurrentThread(); // Or something to back off without explicit synchronization. uint32* data = buffer.get()+offset; *data = i; ++token; tokens[i] = token; thread.message_loop()->PostTask( FROM_HERE, base::Bind(&ReadCompareAndSignal, data, i, token, &sync)); } } https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/ser... File gpu/command_buffer/service/gles2_cmd_decoder.cc (right): https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/ser... gpu/command_buffer/service/gles2_cmd_decoder.cc:486: AsyncUploadTokenCompletionObserver(uint32 async_upload_token) nit: explicit https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/ser... gpu/command_buffer/service/gles2_cmd_decoder.cc:498: uint32 async_upload_token_; nit: DISALLOW_COPY_AND_ASSIGN https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/ser... gpu/command_buffer/service/gles2_cmd_decoder.cc:10330: mem_params.shm_data_size = sizeof(AsyncUploadSync); There's a DCHECK in AsyncNotifyCompletion that says it expects shm_data_offset+shm_data_size <= shm_size For security we need an overflow-safe check on the size.
https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/ser... File gpu/command_buffer/service/gles2_cmd_decoder.cc (right): https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/ser... gpu/command_buffer/service/gles2_cmd_decoder.cc:10330: mem_params.shm_data_size = sizeof(AsyncUploadSync); On 2014/02/20 01:52:37, piman (OOO back 2014-3-4) wrote: > There's a DCHECK in AsyncNotifyCompletion that says it expects > shm_data_offset+shm_data_size <= shm_size > > For security we need an overflow-safe check on the size. This seems to be missing for the mem_params passed to delegate->AsyncTex(Sub)Image2D as well, or am I missing something? Considering how we should handle these errors. As it is now, if GetSharedMemoryBuffer() returns a buffer with shared_memory == NULL, or if we'd add a this check, we'd not free the async upload buffer on the client side until (and only if) we either loose context or successfully queue another upload. I guess that the cases we'd handle here are misbehaving clients, and we'd more or less make them misbehave some more this way. Any preferences in how to deal with this?
https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/ser... File gpu/command_buffer/service/gles2_cmd_decoder.cc (right): https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/ser... gpu/command_buffer/service/gles2_cmd_decoder.cc:10330: mem_params.shm_data_size = sizeof(AsyncUploadSync); It's checked in in ValidateTexSubImage2D. The 'pixels' should be null if the memory is invalid in any way for the operation. I'll find where that null is coming from though. For any of these we should return the same error as ValidateTexSubImage2D
https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/ser... File gpu/command_buffer/service/gles2_cmd_decoder.cc (right): https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/ser... gpu/command_buffer/service/gles2_cmd_decoder.cc:10330: mem_params.shm_data_size = sizeof(AsyncUploadSync); It's in GetSharedMemoryAs: https://code.google.com/p/chromium/codesearch#chromium/src/gpu/command_buffer...
Hi, Another one uploaded. This new version adds a test that checks the usage of the poll function of FencedAllocator, checks the usage pattern of AsyncUploadSync, adds a sanity check for shared memory parameters in GLES2DecoderImpl, as well as more minor review fixes. One of the review issues that this new upload addressed was regarding error handling in the async upload path in GLES2DecoderImpl. The existing potential problem is that if an async upload fails either because the AsyncUploadSync shared memory parameters were invalid, it would effectively stall any free:ing of transfer buffers on the client side until any later async upload would go past those two checks. In those cases the corresponding function (HandleAsyncTex(Sub)Image2DCHROMIUM()) would return an error. Should we in those cases instead fail "harder" so that the client looses its context? Any ideas? Jonas
> In those cases the corresponding function (HandleAsyncTex(Sub)Image2DCHROMIUM()) > would return an error. Should we in those cases instead fail "harder" so that > the client looses its context? Any ideas? > > Jonas I need to confirm this, but I believe the client process gets killed on this type of out-of-bounds error. It effectively means that malicious code is running in the client. So, we should make the client crash immediately.
On 2014/02/21 22:47:36, epennerAtGoogle wrote: > > In those cases the corresponding function > (HandleAsyncTex(Sub)Image2DCHROMIUM()) > > would return an error. Should we in those cases instead fail "harder" so that > > the client looses its context? Any ideas? > > > > Jonas > > I need to confirm this, but I believe the client process gets killed on this > type of out-of-bounds error. It effectively means that malicious code is running > in the client. So, we should make the client crash immediately. I did some digging and it looks like we'd end up in GpuCommandBufferStub::OnParseError() sending a GpuCommandBufferMsg_Destroyed message to the client and a GpuHostMsg_DidLoseContext message to gpu host. I tried invoking the error path during rendering to see how the client is dealt with but it mostly seemed to die because of DCHECKs here and there (the renderer host as well for that matter).
> I did some digging and it looks like we'd end up in > GpuCommandBufferStub::OnParseError() sending a GpuCommandBufferMsg_Destroyed > message to the client and a GpuHostMsg_DidLoseContext message to gpu host. I > tried invoking the error path during rendering to see how the client is dealt > with but it mostly seemed to die because of DCHECKs here and there (the renderer > host as well for that matter). Looks like piman will be out for a bit. That's concerning we are hitting DCHECKs in the service side. We might want to file bugs for those. Also it might be worth trying with release build to see what happens. This patch is starting to look really good from my perspective. Are you seeing the memory savings as a result? If we ever go over the limit now, that's a bug, likely in the compositor and I can start looking into it in parallel perhaps.
On 2014/02/27 05:37:42, epenner wrote: > > I did some digging and it looks like we'd end up in > > GpuCommandBufferStub::OnParseError() sending a GpuCommandBufferMsg_Destroyed > > message to the client and a GpuHostMsg_DidLoseContext message to gpu host. I > > tried invoking the error path during rendering to see how the client is dealt > > with but it mostly seemed to die because of DCHECKs here and there (the > renderer > > host as well for that matter). > > Looks like piman will be out for a bit. That's concerning we are hitting DCHECKs > in the service side. We might want to file bugs for those. Also it might be > worth trying with release build to see what happens. This patch is starting to > look really good from my perspective. Are you seeing the memory savings as a > result? If we ever go over the limit now, that's a bug, likely in the > compositor and I can start looking into it in parallel perhaps. I did measurements on an early version which gave promising results and I plan to rebase my utility (it more or less adds a parallel path to trace events pushing arbitrary JSON objects to a file that I then process in Javascript drawing graphs on a canvas) and measure again. FYI, I will be away most of next week, so piman being unresponsive for a bit is no problem for me. I'll see if I can track down the DCHECK's as well (I stopped commenting them out after 2-3 of them).
On 2014/02/27 08:20:43, jadahl wrote: > On 2014/02/27 05:37:42, epenner wrote: > > > I did some digging and it looks like we'd end up in > > > GpuCommandBufferStub::OnParseError() sending a GpuCommandBufferMsg_Destroyed > > > message to the client and a GpuHostMsg_DidLoseContext message to gpu host. I > > > tried invoking the error path during rendering to see how the client is > dealt > > > with but it mostly seemed to die because of DCHECKs here and there (the > > renderer > > > host as well for that matter). > > > > Looks like piman will be out for a bit. That's concerning we are hitting > DCHECKs > > in the service side. We might want to file bugs for those. Also it might be > > worth trying with release build to see what happens. This patch is starting to > > look really good from my perspective. Are you seeing the memory savings as a > > result? If we ever go over the limit now, that's a bug, likely in the > > compositor and I can start looking into it in parallel perhaps. > > I did measurements on an early version which gave promising results and I plan > to rebase my utility (it more or less adds a parallel path to trace events > pushing arbitrary JSON objects to a file that I then process in Javascript > drawing graphs on a canvas) and measure again. FYI, I will be away most of next > week, so piman being unresponsive for a bit is no problem for me. I'll see if I > can track down the DCHECK's as well (I stopped commenting them out after 2-3 of > them). Any progress regarding review? Also, I attached some new measurements to the issue that shows some quite promising results.
I realize I had old comments that I somehow never submitted. https://codereview.chromium.org/116863003/diff/2100001/gpu/command_buffer/com... File gpu/command_buffer/common/gles2_cmd_format_test.cc (right): https://codereview.chromium.org/116863003/diff/2100001/gpu/command_buffer/com... gpu/command_buffer/common/gles2_cmd_format_test.cc:98: base::subtle::MemoryBarrier(); Why the barrier? On the contrary, we want to make sure it works without that barrier (i.e. that HasAsyncUploadTokenPassed has the right Acquire semantic). Note: the PostTask implies before->after data semantics for buffer_tokens, if that's what you're concerned about. https://codereview.chromium.org/116863003/diff/2100001/gpu/command_buffer/ser... File gpu/command_buffer/service/gles2_cmd_decoder.cc (right): https://codereview.chromium.org/116863003/diff/2100001/gpu/command_buffer/ser... gpu/command_buffer/service/gles2_cmd_decoder.cc:10334: if (mem_params.shm_data_offset + mem_params.shm_data_size > Beware of overflows. If shm_data_offset is UINT32_MAX, then this will pass, but we'll dereference out-of-bounds later. We can use CheckedNumerics, like we did in https://codereview.chromium.org/196633009
https://codereview.chromium.org/116863003/diff/2100001/gpu/command_buffer/com... File gpu/command_buffer/common/gles2_cmd_format_test.cc (right): https://codereview.chromium.org/116863003/diff/2100001/gpu/command_buffer/com... gpu/command_buffer/common/gles2_cmd_format_test.cc:98: base::subtle::MemoryBarrier(); On 2014/03/20 21:18:26, piman wrote: > Why the barrier? On the contrary, we want to make sure it works without that > barrier (i.e. that HasAsyncUploadTokenPassed has the right Acquire semantic). > Note: the PostTask implies before->after data semantics for buffer_tokens, if > that's what you're concerned about. I added it to ensure the buffer_tokens[buffer] store, but as you say PostTask implies it, I will remove the barrier.
New rebased version uploaded that also addresses the recent raised issues.
LGTM+nit https://codereview.chromium.org/116863003/diff/2230001/gpu/command_buffer/ser... File gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc (right): https://codereview.chromium.org/116863003/diff/2230001/gpu/command_buffer/ser... gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc:8407: // asynchronously and AsyncTexSubImage2D does not involved binding. nit: typo involved->involves
Uploaded new version. Changes are comment typo corrections. https://codereview.chromium.org/116863003/diff/2230001/gpu/command_buffer/ser... File gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc (right): https://codereview.chromium.org/116863003/diff/2230001/gpu/command_buffer/ser... gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc:8407: // asynchronously and AsyncTexSubImage2D does not involved binding. On 2014/03/21 22:35:51, piman wrote: > nit: typo involved->involves Hmm, isn't "involve" more correct? Changed to that here and above (where this was copied from).
On Mon, Mar 24, 2014 at 3:09 AM, <jadahl@opera.com> wrote: > Uploaded new version. Changes are comment typo corrections. > > > > https://codereview.chromium.org/116863003/diff/2230001/ > gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc > File gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc (right): > > https://codereview.chromium.org/116863003/diff/2230001/ > gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc#newcode8407 > gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc:8407: // > asynchronously and AsyncTexSubImage2D does not involved binding. > On 2014/03/21 22:35:51, piman wrote: > >> nit: typo involved->involves >> > > Hmm, isn't "involve" more correct? Changed to that here and above (where > this was copied from). > Yes, sorry, you're right. Thanks. LGTM still. > > https://codereview.chromium.org/116863003/ > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
lgtm
The CQ bit was checked by jadahl@opera.com
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/jadahl@opera.com/116863003/2240001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: tryserver.chromium on mac_chromium_compile_dbg
New upload. This one fixes the compilation/lint issues reported by the try-bots. Updated *Sync manager to the new approach. I noticed the conflicting CL, but have not attempted any rebase yet as it has not landed. Will do so if it lands before this one.
Uploaded rebased version.
https://codereview.chromium.org/116863003/diff/2300001/gpu/command_buffer/ser... File gpu/command_buffer/service/gles2_cmd_decoder.cc (right): https://codereview.chromium.org/116863003/diff/2300001/gpu/command_buffer/ser... gpu/command_buffer/service/gles2_cmd_decoder.cc:10374: if (!end.IsValid() || end.ValueOrDie() > mem_params.buffer()->size()) nit: do this check before creating the AsyncMemoryParams, since the constructor DCHECKs that this passes. We probably need to check that the buffer itself is valid. Also the size check can be rewritten with GetDataAddress: scoped_refptr<Buffer> buffer = GetSharedMemoryBuffer(sync_data_shm_id); if (!buffer || !buffer->GetDataAddress(sync_data_shm_offset, sizeof(AsyncUploadSync)) return base::Closure(); AsyncMemoryParams mem_params(buffer, sync_data_shm_offset, sizeof(AsyncUploadSync)); ...
> nit: do this check before creating the AsyncMemoryParams, since the constructor > DCHECKs that this passes. Antoine, I removed the DCHECK in the constructor because tests were hitting and it was just an FYI (I didn't consider it important since we need to check GetDataAddress() != null in all cases anyway or we'll crash 'for real'). Sorry I didn't bring that up before landing. I can add it back if you like.
New upload; addresses review issue
lgtm
The CQ bit was checked by jadahl@opera.com
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/jadahl@opera.com/116863003/2320001
The CQ bit was unchecked by commit-bot@chromium.org
Retried try job too often on ios_dbg_simulator for step(s) base_unittests http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=ios_dbg_si...
The CQ bit was checked by jadahl@opera.com
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/jadahl@opera.com/116863003/2320001
Message was sent while issue was closed.
Change committed as 260177
Message was sent while issue was closed.
The new upload contains the following change: --- a/gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc +++ b/gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc @@ -8399,7 +8400,6 @@ TEST_F(GLES2DecoderManualInitTest, AsyncPixelTransfers) { DoDeleteTexture(client_texture_id_, kServiceTextureId); EXPECT_FALSE( decoder_->GetAsyncPixelTransferManager()->GetPixelTransferDelegate( texture_ref)); - texture->SetImmutable(false); delegate = NULL; { // Get a fresh texture since the existing texture cannot be respecified The failed test reported use after free where use was at the removed line, and it had been freed on the DoDeleteTexture line. I have not verified locally, but I think this fix is correct and should fix the reported issue.
Message was sent while issue was closed.
https://codereview.chromium.org/116863003/diff/2340001/gpu/command_buffer/ser... File gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc (right): https://codereview.chromium.org/116863003/diff/2340001/gpu/command_buffer/ser... gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc:8402: texture_ref)); nit: actually, texture_ref should probably be a scoped_refptr. Otherwise we may be comparing a pointer to deleted memory. While we don't dereference it, it's in theory possible that another TextureRef would be allocated at the same address (actually, with allocators like tcmalloc, it becomes very likely), and inserted into the map. It probably doesn't happen here given how the test is written, but if we want to be thorough, that'd be the right thing to do.
Message was sent while issue was closed.
https://codereview.chromium.org/116863003/diff/2340001/gpu/command_buffer/ser... File gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc (right): https://codereview.chromium.org/116863003/diff/2340001/gpu/command_buffer/ser... gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc:8402: texture_ref)); On 2014/03/28 20:04:35, piman wrote: > nit: actually, texture_ref should probably be a scoped_refptr. Otherwise we may > be comparing a pointer to deleted memory. While we don't dereference it, it's in > theory possible that another TextureRef would be allocated at the same address > (actually, with allocators like tcmalloc, it becomes very likely), and inserted > into the map. It probably doesn't happen here given how the test is written, but > if we want to be thorough, that'd be the right thing to do. Do you want that change in this patch? It'll change texture_ref to texture_ref.get() in several places spread all over this function.
Message was sent while issue was closed.
https://codereview.chromium.org/116863003/diff/2340001/gpu/command_buffer/ser... File gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc (right): https://codereview.chromium.org/116863003/diff/2340001/gpu/command_buffer/ser... gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc:8402: texture_ref)); On 2014/03/28 20:11:38, jadahl wrote: > On 2014/03/28 20:04:35, piman wrote: > > nit: actually, texture_ref should probably be a scoped_refptr. Otherwise we > may > > be comparing a pointer to deleted memory. While we don't dereference it, it's > in > > theory possible that another TextureRef would be allocated at the same address > > (actually, with allocators like tcmalloc, it becomes very likely), and > inserted > > into the map. It probably doesn't happen here given how the test is written, > but > > if we want to be thorough, that'd be the right thing to do. > > Do you want that change in this patch? It'll change texture_ref to > texture_ref.get() in several places spread all over this function. Either that or add a comment to indicate that texture_ref is actually deleted, so one must be extra careful when touching this. I prefer the former though.
Message was sent while issue was closed.
Message was sent while issue was closed.
https://codereview.chromium.org/116863003/diff/2340001/gpu/command_buffer/ser... File gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc (right): https://codereview.chromium.org/116863003/diff/2340001/gpu/command_buffer/ser... gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc:8402: texture_ref)); On 2014/03/28 20:11:38, jadahl wrote: > On 2014/03/28 20:04:35, piman wrote: > > nit: actually, texture_ref should probably be a scoped_refptr. Otherwise we > may > > be comparing a pointer to deleted memory. While we don't dereference it, it's > in > > theory possible that another TextureRef would be allocated at the same address > > (actually, with allocators like tcmalloc, it becomes very likely), and > inserted > > into the map. It probably doesn't happen here given how the test is written, > but > > if we want to be thorough, that'd be the right thing to do. > > Do you want that change in this patch? It'll change texture_ref to > texture_ref.get() in several places spread all over this function. Tests also assume no reference is held, and fail because we don't dereference. If we do (after DoDelete, before GetPixelTransferDelegate) we have to compare to with the pointer texture_ref had before we dereferenced, which would still have the same issue as before.
Message was sent while issue was closed.
The diff from the previous upload follows: --- a/gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc +++ b/gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc @@ -8358,6 +8358,8 @@ TEST_F(GLES2DecoderManualInitTest, AsyncPixelTransfers) { EXPECT_FALSE( decoder_->GetAsyncPixelTransferManager()->GetPixelTransferDelegate( texture_ref)); + texture = NULL; + texture_ref = NULL; delegate = NULL; } @@ -8400,6 +8402,8 @@ TEST_F(GLES2DecoderManualInitTest, AsyncPixelTransfers) { EXPECT_FALSE( decoder_->GetAsyncPixelTransferManager()->GetPixelTransferDelegate( texture_ref)); + texture = NULL; + texture_ref = NULL; delegate = NULL; { // Get a fresh texture since the existing texture cannot be respecified I cleared the pointers to make it clear that they are now expected to be invalid. I did not change to scoped_refptr since it wouldn't fix the issue (as mentioned in previous e-mail). If the DoDelete() call should indeed result GetPixelTransferDelegate() not returning the delegate even though the texture ref was not destroyed, then that's sounds an issue not directly related to this patch.
On 2014/03/28 22:08:17, jadahl wrote: > The diff from the previous upload follows: > > --- a/gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc > +++ b/gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc > @@ -8358,6 +8358,8 @@ TEST_F(GLES2DecoderManualInitTest, AsyncPixelTransfers) { > EXPECT_FALSE( > decoder_->GetAsyncPixelTransferManager()->GetPixelTransferDelegate( > texture_ref)); > + texture = NULL; > + texture_ref = NULL; > delegate = NULL; > } > > @@ -8400,6 +8402,8 @@ TEST_F(GLES2DecoderManualInitTest, AsyncPixelTransfers) { > EXPECT_FALSE( > decoder_->GetAsyncPixelTransferManager()->GetPixelTransferDelegate( > texture_ref)); > + texture = NULL; > + texture_ref = NULL; > delegate = NULL; > { > // Get a fresh texture since the existing texture cannot be respecified > > I cleared the pointers to make it clear that they are now expected to be > invalid. I did not change to scoped_refptr since it wouldn't fix the issue (as > mentioned in previous e-mail). If the DoDelete() call should indeed result > GetPixelTransferDelegate() not returning the delegate even though the texture > ref was not destroyed, then that's sounds an issue not directly related to this > patch. LGTM for now. It's just that calling GetPixelTransferDelegate with a deleted pointer is meaningless
On 2014/03/28 22:08:17, jadahl wrote: > The diff from the previous upload follows: > > --- a/gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc > +++ b/gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc > @@ -8358,6 +8358,8 @@ TEST_F(GLES2DecoderManualInitTest, AsyncPixelTransfers) { > EXPECT_FALSE( > decoder_->GetAsyncPixelTransferManager()->GetPixelTransferDelegate( > texture_ref)); > + texture = NULL; > + texture_ref = NULL; > delegate = NULL; > } > > @@ -8400,6 +8402,8 @@ TEST_F(GLES2DecoderManualInitTest, AsyncPixelTransfers) { > EXPECT_FALSE( > decoder_->GetAsyncPixelTransferManager()->GetPixelTransferDelegate( > texture_ref)); > + texture = NULL; > + texture_ref = NULL; > delegate = NULL; > { > // Get a fresh texture since the existing texture cannot be respecified > > I cleared the pointers to make it clear that they are now expected to be > invalid. I did not change to scoped_refptr since it wouldn't fix the issue (as > mentioned in previous e-mail). If the DoDelete() call should indeed result > GetPixelTransferDelegate() not returning the delegate even though the texture > ref was not destroyed, then that's sounds an issue not directly related to this > patch. LGTM for now. It's just that calling GetPixelTransferDelegate with a deleted pointer is meaningless
On 2014/03/28 22:30:33, piman wrote: > On 2014/03/28 22:08:17, jadahl wrote: > > The diff from the previous upload follows: > > > > --- a/gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc > > +++ b/gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc > > @@ -8358,6 +8358,8 @@ TEST_F(GLES2DecoderManualInitTest, AsyncPixelTransfers) > { > > EXPECT_FALSE( > > decoder_->GetAsyncPixelTransferManager()->GetPixelTransferDelegate( > > texture_ref)); > > + texture = NULL; > > + texture_ref = NULL; > > delegate = NULL; > > } > > > > @@ -8400,6 +8402,8 @@ TEST_F(GLES2DecoderManualInitTest, AsyncPixelTransfers) > { > > EXPECT_FALSE( > > decoder_->GetAsyncPixelTransferManager()->GetPixelTransferDelegate( > > texture_ref)); > > + texture = NULL; > > + texture_ref = NULL; > > delegate = NULL; > > { > > // Get a fresh texture since the existing texture cannot be respecified > > > > I cleared the pointers to make it clear that they are now expected to be > > invalid. I did not change to scoped_refptr since it wouldn't fix the issue (as > > mentioned in previous e-mail). If the DoDelete() call should indeed result > > GetPixelTransferDelegate() not returning the delegate even though the texture > > ref was not destroyed, then that's sounds an issue not directly related to > this > > patch. > > LGTM for now. It's just that calling GetPixelTransferDelegate with a deleted > pointer is meaningless It's awkward indeed. Do you think it's better to remove these asserts all together?
The CQ bit was checked by jadahl@opera.com
On Fri, Mar 28, 2014 at 3:40 PM, <jadahl@opera.com> wrote: > On 2014/03/28 22:30:33, piman wrote: > >> On 2014/03/28 22:08:17, jadahl wrote: >> > The diff from the previous upload follows: >> > >> > --- a/gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc >> > +++ b/gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc >> > @@ -8358,6 +8358,8 @@ TEST_F(GLES2DecoderManualInitTest, >> > AsyncPixelTransfers) > >> { >> > EXPECT_FALSE( >> > decoder_->GetAsyncPixelTransferManager() >> ->GetPixelTransferDelegate( >> > texture_ref)); >> > + texture = NULL; >> > + texture_ref = NULL; >> > delegate = NULL; >> > } >> > >> > @@ -8400,6 +8402,8 @@ TEST_F(GLES2DecoderManualInitTest, >> > AsyncPixelTransfers) > >> { >> > EXPECT_FALSE( >> > decoder_->GetAsyncPixelTransferManager() >> ->GetPixelTransferDelegate( >> > texture_ref)); >> > + texture = NULL; >> > + texture_ref = NULL; >> > delegate = NULL; >> > { >> > // Get a fresh texture since the existing texture cannot be >> respecified >> > >> > I cleared the pointers to make it clear that they are now expected to be >> > invalid. I did not change to scoped_refptr since it wouldn't fix the >> issue >> > (as > >> > mentioned in previous e-mail). If the DoDelete() call should indeed >> result >> > GetPixelTransferDelegate() not returning the delegate even though the >> > texture > >> > ref was not destroyed, then that's sounds an issue not directly related >> to >> this >> > patch. >> > > LGTM for now. It's just that calling GetPixelTransferDelegate with a >> deleted >> pointer is meaningless >> > > It's awkward indeed. Do you think it's better to remove these asserts all > together? > I guess we don't have anything else that verifies this behavior, so let's leave it for now. > > https://codereview.chromium.org/116863003/ > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/jadahl@opera.com/116863003/2360001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: tryserver.chromium on linux_chromium_chromeos_rel
The CQ bit was checked by jadahl@opera.com
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/jadahl@opera.com/116863003/2360001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: tryserver.chromium on linux_chromium_chromeos_rel
The CQ bit was checked by jadahl@opera.com
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/jadahl@opera.com/116863003/2360001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: tryserver.chromium on linux_chromium_chromeos_rel
The CQ bit was checked by jadahl@opera.com
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/jadahl@opera.com/116863003/2360001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: tryserver.chromium on linux_chromium_chromeos_rel
The CQ bit was checked by jadahl@opera.com
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/jadahl@opera.com/116863003/2360001
Message was sent while issue was closed.
Change committed as 260507 |