Issue 116863003: gpu: Reuse transfer buffers more aggresively

jadahl

Hi, This patch introduces the ability to aggressively reuse transfer buffers. This is a work-in-progress, ...

7 years ago (2013-12-18 17:42:41 UTC) #1

epennerAtGoogle

If I understand this correctly, I like (at least in concept) where this is going. ...

7 years ago (2013-12-18 19:16:40 UTC) #2

epennerAtGoogle

https://codereview.chromium.org/116863003/diff/1/gpu/command_buffer/client/fenced_allocator.h File gpu/command_buffer/client/fenced_allocator.h (right): https://codereview.chromium.org/116863003/diff/1/gpu/command_buffer/client/fenced_allocator.h#newcode38 gpu/command_buffer/client/fenced_allocator.h:38: bool aggressive_reuse, Rather than a separate allocator, do you ...

7 years ago (2013-12-18 19:21:14 UTC) #3

jadahl

On 2013/12/18 19:21:14, epennerAtGoogle wrote: > https://codereview.chromium.org/116863003/diff/1/gpu/command_buffer/client/fenced_allocator.h > File gpu/command_buffer/client/fenced_allocator.h (right): > > https://codereview.chromium.org/116863003/diff/1/gpu/command_buffer/client/fenced_allocator.h#newcode38 > ...

7 years ago (2013-12-19 08:25:24 UTC) #4

epennerAtGoogle

> I think the problem with that is that we'd get a chunk with reusable ...

7 years ago (2013-12-19 19:27:43 UTC) #5

jadahl

On 2013/12/19 19:27:43, epennerAtGoogle wrote: > > I think the problem with that is that ...

7 years ago (2013-12-19 21:45:29 UTC) #6

epennerAtGoogle

> With this, if I understand correctly, you mean to not FreePendingToken() in > GLES2Implementation, ...

7 years ago (2013-12-19 22:41:03 UTC) #7

> With this, if I understand correctly, you mean to not FreePendingToken() in
> GLES2Implementation, but to Free(), for the appropriate buffers. This is
> something I agree on will be simpler. And it will eliminate the changes to
> MappedMemoryManager and the FencedAllocators.
> 
> > 
> > Aggressive free could be per-transfer buffer (via different xfer buffer
type),
> > or (slightly more ugly) via a flag:
> > glEnable(GL_AGGRESSIVE_FREE_BUFFERS_HINT)
> > ... // Free buffer that we know is no longer being used
> > glDisable(GL_AGGRESSIVE_FREE_BUFFERS_HINT)
> 
> Wouldn't we achieve the same, by just having a
> GL_PIXEL_SYNC_UNPACK_TRANSFER_BUFFER_CHROMIUM (or something other similar to
the
> current GL_PIXEL_UNPACK_TRANSFER_BUFFER_CHROMIUM) that would just work the
same
> but mark the buffer with a flag saying it can be free:ed aggressively? Keeping
> that kind of state would be easier than the a bit more implicit state that
we'd
> have with glEnable/glDisable.

Yes that would work too. The only issue there is it's a bit less flexible.
That's fine perhaps, and maybe more clean. Currently there are a few cases that
would prevent us from using that in the compositor though, and we'd need to fix
those before enabling it.

Lastly Sievers@ and I were chatting, and there is one last option that might be
the best of all. If we tracked the 'last-usage-token' for these buffers, we
could optimize aggressive deletion under the hood for all users. Rather than
creating a token on unmap, we could use the 'last-usage-token', which might have
already passed! (in this case we can re-use immediately) Unfortunately,
async-uploads add a wrinkle to this option, as we would also need to store the
'last-async-query'. The async-query is the equivalent of tokens when using async
uploads, and we would need to confirm that also passed before recycling the
buffer.

I'm guessing sievers/reveman will like the last option since it's completely
'automatic', with no new API required.

reveman

On 2013/12/19 22:41:03, epennerAtGoogle wrote: > > With this, if I understand correctly, you mean ...

7 years ago (2013-12-20 00:00:17 UTC) #8

On 2013/12/19 22:41:03, epennerAtGoogle wrote:
> > With this, if I understand correctly, you mean to not FreePendingToken() in
> > GLES2Implementation, but to Free(), for the appropriate buffers. This is
> > something I agree on will be simpler. And it will eliminate the changes to
> > MappedMemoryManager and the FencedAllocators.
> > 
> > > 
> > > Aggressive free could be per-transfer buffer (via different xfer buffer
> type),
> > > or (slightly more ugly) via a flag:
> > > glEnable(GL_AGGRESSIVE_FREE_BUFFERS_HINT)
> > > ... // Free buffer that we know is no longer being used
> > > glDisable(GL_AGGRESSIVE_FREE_BUFFERS_HINT)
> > 
> > Wouldn't we achieve the same, by just having a
> > GL_PIXEL_SYNC_UNPACK_TRANSFER_BUFFER_CHROMIUM (or something other similar to
> the
> > current GL_PIXEL_UNPACK_TRANSFER_BUFFER_CHROMIUM) that would just work the
> same
> > but mark the buffer with a flag saying it can be free:ed aggressively?
Keeping
> > that kind of state would be easier than the a bit more implicit state that
> we'd
> > have with glEnable/glDisable.
> 
> Yes that would work too. The only issue there is it's a bit less flexible.
> That's fine perhaps, and maybe more clean. Currently there are a few cases
that
> would prevent us from using that in the compositor though, and we'd need to
fix
> those before enabling it.
> 
> Lastly Sievers@ and I were chatting, and there is one last option that might
be
> the best of all. If we tracked the 'last-usage-token' for these buffers, we
> could optimize aggressive deletion under the hood for all users. Rather than
> creating a token on unmap, we could use the 'last-usage-token', which might
have
> already passed! (in this case we can re-use immediately) Unfortunately,
> async-uploads add a wrinkle to this option, as we would also need to store the
> 'last-async-query'. The async-query is the equivalent of tokens when using
async
> uploads, and we would need to confirm that also passed before recycling the
> buffer.
> 
> I'm guessing sievers/reveman will like the last option since it's completely
> 'automatic', with no new API required.

This last option sgtm. 'last-async-query' could be a query we create and manage
internally whenever a buffer is used for async uploads.

jadahl

On 2013/12/20 00:00:17, David Reveman wrote: > On 2013/12/19 22:41:03, epennerAtGoogle wrote: > > > ...

7 years ago (2013-12-20 15:53:06 UTC) #9

On 2013/12/20 00:00:17, David Reveman wrote:
> On 2013/12/19 22:41:03, epennerAtGoogle wrote:
> > > With this, if I understand correctly, you mean to not FreePendingToken()
in
> > > GLES2Implementation, but to Free(), for the appropriate buffers. This is
> > > something I agree on will be simpler. And it will eliminate the changes to
> > > MappedMemoryManager and the FencedAllocators.
> > > 
> > > > 
> > > > Aggressive free could be per-transfer buffer (via different xfer buffer
> > type),
> > > > or (slightly more ugly) via a flag:
> > > > glEnable(GL_AGGRESSIVE_FREE_BUFFERS_HINT)
> > > > ... // Free buffer that we know is no longer being used
> > > > glDisable(GL_AGGRESSIVE_FREE_BUFFERS_HINT)
> > > 
> > > Wouldn't we achieve the same, by just having a
> > > GL_PIXEL_SYNC_UNPACK_TRANSFER_BUFFER_CHROMIUM (or something other similar
to
> > the
> > > current GL_PIXEL_UNPACK_TRANSFER_BUFFER_CHROMIUM) that would just work the
> > same
> > > but mark the buffer with a flag saying it can be free:ed aggressively?
> Keeping
> > > that kind of state would be easier than the a bit more implicit state that
> > we'd
> > > have with glEnable/glDisable.
> > 
> > Yes that would work too. The only issue there is it's a bit less flexible.
> > That's fine perhaps, and maybe more clean. Currently there are a few cases
> that
> > would prevent us from using that in the compositor though, and we'd need to
> fix
> > those before enabling it.
> > 
> > Lastly Sievers@ and I were chatting, and there is one last option that might
> be
> > the best of all. If we tracked the 'last-usage-token' for these buffers, we
> > could optimize aggressive deletion under the hood for all users. Rather than
> > creating a token on unmap, we could use the 'last-usage-token', which might
> have
> > already passed! (in this case we can re-use immediately) Unfortunately,
> > async-uploads add a wrinkle to this option, as we would also need to store
the
> > 'last-async-query'. The async-query is the equivalent of tokens when using
> async
> > uploads, and we would need to confirm that also passed before recycling the
> > buffer.

Interesting idea. So, rephrasing (for clarity and to confirm I understand
correctly), for transfer buffers specifically (not mapped memory used by the
QueryTracker), when the buffer is unbound (either via glBindBuffer replacing the
previous binding or glDeleteBuffer) we mark the buffer as
"unused_pending_token". Async-upload buffers are marked with a separate
"async-usage" flag which should be tracked separately.

Where today we call BufferTracker::FreePendingToken(), we should call
BufferTracker::Free() for non-async-upload buffers, which should just
MappedMemoryManager::Free() except if the "last_usage_token" has not yet passed;
then it should MappedMemoryManager::FreePendingToken() with that token instead.

> > 
> > I'm guessing sievers/reveman will like the last option since it's completely
> > 'automatic', with no new API required.
> 
> This last option sgtm. 'last-async-query' could be a query we create and
manage
> internally whenever a buffer is used for async uploads.

So what you mean here is more or less do what the compositor is already doing?
I.e. using the query associated with |gl_upload_query_id| in ResourceProvider,
but internally in GLES2Implementation. Sounds like the compositor should be able
to piggy-back on that query some how, in order to not waste memory on having its
own query doing the exact same thing.

There is an artificial limit in GLES2Implementation only allowing one query at a
time. Assuming that one would circumvent that limit (just
query_tracker_->CreateQuery() etc), are there any assumptions in other places
that there will only be one query alive at a time?

reveman

On 2013/12/20 15:53:06, jadahl wrote: > On 2013/12/20 00:00:17, David Reveman wrote: > > On ...

6 years, 12 months ago (2013-12-27 21:29:57 UTC) #10

On 2013/12/20 15:53:06, jadahl wrote:
> On 2013/12/20 00:00:17, David Reveman wrote:
> > On 2013/12/19 22:41:03, epennerAtGoogle wrote:
> > > > With this, if I understand correctly, you mean to not FreePendingToken()
> in
> > > > GLES2Implementation, but to Free(), for the appropriate buffers. This is
> > > > something I agree on will be simpler. And it will eliminate the changes
to
> > > > MappedMemoryManager and the FencedAllocators.
> > > > 
> > > > > 
> > > > > Aggressive free could be per-transfer buffer (via different xfer
buffer
> > > type),
> > > > > or (slightly more ugly) via a flag:
> > > > > glEnable(GL_AGGRESSIVE_FREE_BUFFERS_HINT)
> > > > > ... // Free buffer that we know is no longer being used
> > > > > glDisable(GL_AGGRESSIVE_FREE_BUFFERS_HINT)
> > > > 
> > > > Wouldn't we achieve the same, by just having a
> > > > GL_PIXEL_SYNC_UNPACK_TRANSFER_BUFFER_CHROMIUM (or something other
similar
> to
> > > the
> > > > current GL_PIXEL_UNPACK_TRANSFER_BUFFER_CHROMIUM) that would just work
the
> > > same
> > > > but mark the buffer with a flag saying it can be free:ed aggressively?
> > Keeping
> > > > that kind of state would be easier than the a bit more implicit state
that
> > > we'd
> > > > have with glEnable/glDisable.
> > > 
> > > Yes that would work too. The only issue there is it's a bit less flexible.
> > > That's fine perhaps, and maybe more clean. Currently there are a few cases
> > that
> > > would prevent us from using that in the compositor though, and we'd need
to
> > fix
> > > those before enabling it.
> > > 
> > > Lastly Sievers@ and I were chatting, and there is one last option that
might
> > be
> > > the best of all. If we tracked the 'last-usage-token' for these buffers,
we
> > > could optimize aggressive deletion under the hood for all users. Rather
than
> > > creating a token on unmap, we could use the 'last-usage-token', which
might
> > have
> > > already passed! (in this case we can re-use immediately) Unfortunately,
> > > async-uploads add a wrinkle to this option, as we would also need to store
> the
> > > 'last-async-query'. The async-query is the equivalent of tokens when using
> > async
> > > uploads, and we would need to confirm that also passed before recycling
the
> > > buffer.
> 
> Interesting idea. So, rephrasing (for clarity and to confirm I understand
> correctly), for transfer buffers specifically (not mapped memory used by the
> QueryTracker), when the buffer is unbound (either via glBindBuffer replacing
the
> previous binding or glDeleteBuffer) we mark the buffer as
> "unused_pending_token". Async-upload buffers are marked with a separate
> "async-usage" flag which should be tracked separately.
> 
> Where today we call BufferTracker::FreePendingToken(), we should call
> BufferTracker::Free() for non-async-upload buffers, which should just
> MappedMemoryManager::Free() except if the "last_usage_token" has not yet
passed;
> then it should MappedMemoryManager::FreePendingToken() with that token
instead.
> 
> > > 
> > > I'm guessing sievers/reveman will like the last option since it's
completely
> > > 'automatic', with no new API required.
> > 
> > This last option sgtm. 'last-async-query' could be a query we create and
> manage
> > internally whenever a buffer is used for async uploads.
> 
> So what you mean here is more or less do what the compositor is already doing?
> I.e. using the query associated with |gl_upload_query_id| in ResourceProvider,
> but internally in GLES2Implementation. Sounds like the compositor should be
able
> to piggy-back on that query some how, in order to not waste memory on having
its
> own query doing the exact same thing.
> 
> There is an artificial limit in GLES2Implementation only allowing one query at
a
> time. Assuming that one would circumvent that limit (just
> query_tracker_->CreateQuery() etc), are there any assumptions in other places
> that there will only be one query alive at a time?

That assumption is also made on the service side but I don't think it should be
too hard to fix so different queries of different types can be active at the
same time. Maybe we can use an internal query type for all async uploads, which
we can reliably use to determine when a buffer can be freed, and the client
facing query type can just be built on top of this.

jadahl

Hi, This is a WIP, and since I'm not over familiar with these areas I'd ...

6 years, 11 months ago (2013-12-30 15:53:24 UTC) #11

reveman

How about we start by landing the changes to support multiple queries on the service ...

6 years, 11 months ago (2014-01-02 01:31:29 UTC) #12

jadahl

> How about we start by landing the changes to support > multiple queries on ...

6 years, 11 months ago (2014-01-02 10:59:54 UTC) #13

> How about we start by landing the changes to support
> multiple queries on the
> service side first? That seems useful on its own.

Sure. I'll submit a separate code review request for that part.

https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/build...
File gpu/command_buffer/build_gles2_cmd_buffer.py (right):

https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/build...
gpu/command_buffer/build_gles2_cmd_buffer.py:831:
'GL_ASYNC_PIXEL_UNPACK_COMPLETED_PRIVATE_CHROMIUM',
On 2014/01/02 01:31:29, David Reveman wrote:
> Are clients prevented from using this somehow?
> 
> Did you consider adding a new pair of Begin/EndQuery commands instead? ie.
> BeginInternalQueryEXT and EndInternalQueryEXT. Or a "bool private" parameter
to
> the existing Begin/EndQuery commands.

The idea is that this target should be private, and not allowed to be used by
some client. Not sure how to enforce that rule except with error reporting. Any
ideas here?

What we'd gain from having a "bool private"? Would it be applicable to all types
of queries? Also the assumption that one type of target only has one running
query would longer hold, and any such assumptions throughout the code would no
longer be true.

https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien...
File gpu/command_buffer/client/cmd_buffer_helper.h (right):

https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien...
gpu/command_buffer/client/cmd_buffer_helper.h:35: class GPU_EXPORT
CommandBufferHelper {
On 2014/01/02 01:31:29, David Reveman wrote:
> What if we enumerate all internal queries and add API to this interface that
> allows the mapped memory manager to handle FreePendingQuery in a way that's
> consistent with how FreePendingToken currently works?
> 
> I guess we'd need:
> void WaitForQuery(int32 query);
> int32 last_query_completed() const;
> 
> Wdyt?

When would we WaitForQuery? In the current implementation buffers with a query
are stored separately and not free:ed, and checked and free:ed before creating a
new buffer (in order to flush out finished buffers before getting a new one).

A query can take a very long time (especially if it was for an upload added very
late, with a long line in front of it) so it feels like that function could
stall more than what would be acceptable.

The idea you propose also makes assumption of ordering. Will such assumptions
always hold, for example if there are two uploader threads (i.e. ordering of
uploads completed won't necessarily be the same as uploads queued).

https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien...
gpu/command_buffer/client/cmd_buffer_helper.h:89: bool HasTokenPassed(int32
token);
On 2014/01/02 01:31:29, David Reveman wrote:
> why is this needed in addition to last_token_read()?

No need. Will remove. No idea why I added it herein the first place :P

reveman

https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/build_gles2_cmd_buffer.py File gpu/command_buffer/build_gles2_cmd_buffer.py (right): https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/build_gles2_cmd_buffer.py#newcode831 gpu/command_buffer/build_gles2_cmd_buffer.py:831: 'GL_ASYNC_PIXEL_UNPACK_COMPLETED_PRIVATE_CHROMIUM', On 2014/01/02 10:59:54, jadahl wrote: > On 2014/01/02 ...

6 years, 11 months ago (2014-01-02 11:56:43 UTC) #14

https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/build...
File gpu/command_buffer/build_gles2_cmd_buffer.py (right):

https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/build...
gpu/command_buffer/build_gles2_cmd_buffer.py:831:
'GL_ASYNC_PIXEL_UNPACK_COMPLETED_PRIVATE_CHROMIUM',
On 2014/01/02 10:59:54, jadahl wrote:
> On 2014/01/02 01:31:29, David Reveman wrote:
> > Are clients prevented from using this somehow?
> > 
> > Did you consider adding a new pair of Begin/EndQuery commands instead? ie.
> > BeginInternalQueryEXT and EndInternalQueryEXT. Or a "bool private" parameter
> to
> > the existing Begin/EndQuery commands.
> 
> The idea is that this target should be private, and not allowed to be used by
> some client. Not sure how to enforce that rule except with error reporting.
Any
> ideas here?
> 
> What we'd gain from having a "bool private"? Would it be applicable to all
types
> of queries? Also the assumption that one type of target only has one running
> query would longer hold, and any such assumptions throughout the code would no
> longer be true.

The idea is that this would make it easy to keep it internal as we'd simply not
expose it in the client API. It could be limited to ASYNC_PIXEL_UNPACK_COMPLETED
for now but it would be trivial to add support for other targets if necessary.

The key type for the QueryMap you introduced in this patch would instead be a
std::pair<bool, GLuint>. Where "bool" is whether the query is private. Are we
making more assumptions than this related to running queries?

https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien...
File gpu/command_buffer/client/cmd_buffer_helper.h (right):

https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien...
gpu/command_buffer/client/cmd_buffer_helper.h:35: class GPU_EXPORT
CommandBufferHelper {
On 2014/01/02 10:59:54, jadahl wrote:
> On 2014/01/02 01:31:29, David Reveman wrote:
> > What if we enumerate all internal queries and add API to this interface that
> > allows the mapped memory manager to handle FreePendingQuery in a way that's
> > consistent with how FreePendingToken currently works?
> > 
> > I guess we'd need:
> > void WaitForQuery(int32 query);
> > int32 last_query_completed() const;
> > 
> > Wdyt?
> 
> When would we WaitForQuery? In the current implementation buffers with a query
> are stored separately and not free:ed, and checked and free:ed before creating
a
> new buffer (in order to flush out finished buffers before getting a new one).
> 
> A query can take a very long time (especially if it was for an upload added
very
> late, with a long line in front of it) so it feels like that function could
> stall more than what would be acceptable.

Maybe we wouldn't block on queries when trying to allocate (that might be useful
to limit memory usage though) but I think we'd need it during tear down of the
allocator.

> 
> The idea you propose also makes assumption of ordering. Will such assumptions
> always hold, for example if there are two uploader threads (i.e. ordering of
> uploads completed won't necessarily be the same as uploads queued).

Async upload commands are guaranteed to complete in sequential order. The
compositor is already making assumptions based on this so I think it'd OK to
make the same assumption internally as well.

jadahl

On 2014/01/02 11:56:43, David Reveman wrote: > https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/build_gles2_cmd_buffer.py > File gpu/command_buffer/build_gles2_cmd_buffer.py (right): > > https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/build_gles2_cmd_buffer.py#newcode831 ...

6 years, 11 months ago (2014-01-02 12:06:23 UTC) #15

On 2014/01/02 11:56:43, David Reveman wrote:
>
https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/build...
> File gpu/command_buffer/build_gles2_cmd_buffer.py (right):
> 
>
https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/build...
> gpu/command_buffer/build_gles2_cmd_buffer.py:831:
> 'GL_ASYNC_PIXEL_UNPACK_COMPLETED_PRIVATE_CHROMIUM',
> On 2014/01/02 10:59:54, jadahl wrote:
> > On 2014/01/02 01:31:29, David Reveman wrote:
> > > Are clients prevented from using this somehow?
> > > 
> > > Did you consider adding a new pair of Begin/EndQuery commands instead? ie.
> > > BeginInternalQueryEXT and EndInternalQueryEXT. Or a "bool private"
parameter
> > to
> > > the existing Begin/EndQuery commands.
> > 
> > The idea is that this target should be private, and not allowed to be used
by
> > some client. Not sure how to enforce that rule except with error reporting.
> Any
> > ideas here?
> > 
> > What we'd gain from having a "bool private"? Would it be applicable to all
> types
> > of queries? Also the assumption that one type of target only has one running
> > query would longer hold, and any such assumptions throughout the code would
no
> > longer be true.
> 
> The idea is that this would make it easy to keep it internal as we'd simply
not
> expose it in the client API. It could be limited to
ASYNC_PIXEL_UNPACK_COMPLETED
> for now but it would be trivial to add support for other targets if necessary.
> 
> The key type for the QueryMap you introduced in this patch would instead be a
> std::pair<bool, GLuint>. Where "bool" is whether the query is private. Are we
> making more assumptions than this related to running queries?

The map might not be the problem, I was more suspecting that there might be
assumptions elsewhere. It did work when I tested with UNPACK_COMPLETED (well,
not the same target, but more or less so). Anyhow, I can give it a try and see
if I run into any blocker.

> 
>
https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien...
> File gpu/command_buffer/client/cmd_buffer_helper.h (right):
> 
>
https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien...
> gpu/command_buffer/client/cmd_buffer_helper.h:35: class GPU_EXPORT
> CommandBufferHelper {
> On 2014/01/02 10:59:54, jadahl wrote:
> > On 2014/01/02 01:31:29, David Reveman wrote:
> > > What if we enumerate all internal queries and add API to this interface
that
> > > allows the mapped memory manager to handle FreePendingQuery in a way
that's
> > > consistent with how FreePendingToken currently works?
> > > 
> > > I guess we'd need:
> > > void WaitForQuery(int32 query);
> > > int32 last_query_completed() const;
> > > 
> > > Wdyt?
> > 
> > When would we WaitForQuery? In the current implementation buffers with a
query
> > are stored separately and not free:ed, and checked and free:ed before
creating
> a
> > new buffer (in order to flush out finished buffers before getting a new
one).
> > 
> > A query can take a very long time (especially if it was for an upload added
> very
> > late, with a long line in front of it) so it feels like that function could
> > stall more than what would be acceptable.
> 
> Maybe we wouldn't block on queries when trying to allocate (that might be
useful
> to limit memory usage though) but I think we'd need it during tear down of the
> allocator.

Right, for that case it would make sense, true.

> 
> > 
> > The idea you propose also makes assumption of ordering. Will such
assumptions
> > always hold, for example if there are two uploader threads (i.e. ordering of
> > uploads completed won't necessarily be the same as uploads queued).
> 
> Async upload commands are guaranteed to complete in sequential order. The
> compositor is already making assumptions based on this so I think it'd OK to
> make the same assumption internally as well.

I see, then I guess that won't be an issue now then. It would complicate things
more than it already would if this would ever change though.

jadahl

https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/client/cmd_buffer_helper.h File gpu/command_buffer/client/cmd_buffer_helper.h (right): https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/client/cmd_buffer_helper.h#newcode35 gpu/command_buffer/client/cmd_buffer_helper.h:35: class GPU_EXPORT CommandBufferHelper { On 2014/01/02 11:56:44, David Reveman ...

6 years, 11 months ago (2014-01-03 14:13:27 UTC) #16

https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien...
File gpu/command_buffer/client/cmd_buffer_helper.h (right):

https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien...
gpu/command_buffer/client/cmd_buffer_helper.h:35: class GPU_EXPORT
CommandBufferHelper {
On 2014/01/02 11:56:44, David Reveman wrote:
> On 2014/01/02 10:59:54, jadahl wrote:
> > On 2014/01/02 01:31:29, David Reveman wrote:
> > > What if we enumerate all internal queries and add API to this interface
that
> > > allows the mapped memory manager to handle FreePendingQuery in a way
that's
> > > consistent with how FreePendingToken currently works?
> > > 
> > > I guess we'd need:
> > > void WaitForQuery(int32 query);
> > > int32 last_query_completed() const;
> > > 
> > > Wdyt?
> > 
> > When would we WaitForQuery? In the current implementation buffers with a
query
> > are stored separately and not free:ed, and checked and free:ed before
creating
> a
> > new buffer (in order to flush out finished buffers before getting a new
one).
> > 
> > A query can take a very long time (especially if it was for an upload added
> very
> > late, with a long line in front of it) so it feels like that function could
> > stall more than what would be acceptable.
> 
> Maybe we wouldn't block on queries when trying to allocate (that might be
useful
> to limit memory usage though) but I think we'd need it during tear down of the
> allocator.

Thinking about tear down, is it really needed to wait for queries at that stage?
Since the command buffer is going down, there won't be more drawing going
through it, so even if we unmap while the uploader is still using it, it wont
cause reading free:ed memory as the service will yet to have unmapped.

In other words, with the assumptions that buffers that are still alive at the
time of destruction won't be used by anything anywhere, can't we just free and
unmap without waiting for queries?

> 
> > 
> > The idea you propose also makes assumption of ordering. Will such
assumptions
> > always hold, for example if there are two uploader threads (i.e. ordering of
> > uploads completed won't necessarily be the same as uploads queued).
> 
> Async upload commands are guaranteed to complete in sequential order. The
> compositor is already making assumptions based on this so I think it'd OK to
> make the same assumption internally as well.

Thinking some more about this, the idea I had was to not spread out async upload
logic outside of gles2_implementation.cc. Wouldn't it be better to limit the
number of units that need to care about queries? For example maybe limit the
exposure to gles2_implementation.cc and buffer_tracker.cc instead of adding more
state to lower levels (mapped memory, fenced allocator).

Maybe we could add enumeration to internal queries via this helper as you
proposed but have the buffer tracker manage removed-but-pending-query buffers
instead of the mapped memory manager (i.e. fenced allocator (wrapper))?

reveman

On 2014/01/03 14:13:27, jadahl wrote: > https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/client/cmd_buffer_helper.h > File gpu/command_buffer/client/cmd_buffer_helper.h (right): > > https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/client/cmd_buffer_helper.h#newcode35 > ...

6 years, 11 months ago (2014-01-03 19:52:45 UTC) #17

On 2014/01/03 14:13:27, jadahl wrote:
>
https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien...
> File gpu/command_buffer/client/cmd_buffer_helper.h (right):
> 
>
https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/clien...
> gpu/command_buffer/client/cmd_buffer_helper.h:35: class GPU_EXPORT
> CommandBufferHelper {
> On 2014/01/02 11:56:44, David Reveman wrote:
> > On 2014/01/02 10:59:54, jadahl wrote:
> > > On 2014/01/02 01:31:29, David Reveman wrote:
> > > > What if we enumerate all internal queries and add API to this interface
> that
> > > > allows the mapped memory manager to handle FreePendingQuery in a way
> that's
> > > > consistent with how FreePendingToken currently works?
> > > > 
> > > > I guess we'd need:
> > > > void WaitForQuery(int32 query);
> > > > int32 last_query_completed() const;
> > > > 
> > > > Wdyt?
> > > 
> > > When would we WaitForQuery? In the current implementation buffers with a
> query
> > > are stored separately and not free:ed, and checked and free:ed before
> creating
> > a
> > > new buffer (in order to flush out finished buffers before getting a new
> one).
> > > 
> > > A query can take a very long time (especially if it was for an upload
added
> > very
> > > late, with a long line in front of it) so it feels like that function
could
> > > stall more than what would be acceptable.
> > 
> > Maybe we wouldn't block on queries when trying to allocate (that might be
> useful
> > to limit memory usage though) but I think we'd need it during tear down of
the
> > allocator.
> 
> Thinking about tear down, is it really needed to wait for queries at that
stage?
> Since the command buffer is going down, there won't be more drawing going
> through it, so even if we unmap while the uploader is still using it, it wont
> cause reading free:ed memory as the service will yet to have unmapped.
> 
> In other words, with the assumptions that buffers that are still alive at the
> time of destruction won't be used by anything anywhere, can't we just free and
> unmap without waiting for queries?

Maybe but I think we'd have to wait for a token instead then to guarantee that
we're not producing any errors. I suggest that we do whatever is easiest for
now. Tear down is not really performance critical.

> 
> > 
> > > 
> > > The idea you propose also makes assumption of ordering. Will such
> assumptions
> > > always hold, for example if there are two uploader threads (i.e. ordering
of
> > > uploads completed won't necessarily be the same as uploads queued).
> > 
> > Async upload commands are guaranteed to complete in sequential order. The
> > compositor is already making assumptions based on this so I think it'd OK to
> > make the same assumption internally as well.
> 
> Thinking some more about this, the idea I had was to not spread out async
upload
> logic outside of gles2_implementation.cc. Wouldn't it be better to limit the
> number of units that need to care about queries? For example maybe limit the
> exposure to gles2_implementation.cc and buffer_tracker.cc instead of adding
more
> state to lower levels (mapped memory, fenced allocator).
> 
> Maybe we could add enumeration to internal queries via this helper as you
> proposed but have the buffer tracker manage removed-but-pending-query buffers
> instead of the mapped memory manager (i.e. fenced allocator (wrapper))?

I think the query API and the details of how to use it should be limited to
gles2_implementation.cc. As far as buffer_tracker.cc and mapped memory manager
cares it's just a sequential number, like the tokens we already have. 

I think it's best to handle all this in a way that's consistent with how tokens
are used with buffer objects. It will be harder to understand and control the
memory usage if buffers can be temporarily held on to by
gles2_implementation.cc.

epennerAtGoogle

Thanks for working on this :) Couple comments inline. https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/client/buffer_tracker.h File gpu/command_buffer/client/buffer_tracker.h (right): https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/client/buffer_tracker.h#newcode115 gpu/command_buffer/client/buffer_tracker.h:115: ...

6 years, 11 months ago (2014-01-07 00:47:59 UTC) #18

reveman

https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/client/buffer_tracker.h File gpu/command_buffer/client/buffer_tracker.h (right): https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/client/buffer_tracker.h#newcode115 gpu/command_buffer/client/buffer_tracker.h:115: int32 unused_token_; On 2014/01/07 00:47:59, epennerAtGoogle wrote: > Does ...

6 years, 11 months ago (2014-01-07 06:32:05 UTC) #19

jadahl

https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/client/buffer_tracker.h File gpu/command_buffer/client/buffer_tracker.h (right): https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/client/buffer_tracker.h#newcode115 gpu/command_buffer/client/buffer_tracker.h:115: int32 unused_token_; On 2014/01/07 06:32:06, David Reveman wrote: > ...

6 years, 11 months ago (2014-01-07 10:52:02 UTC) #20

piman

We will want unit tests for the new behavior. https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/client/cmd_buffer_helper.h File gpu/command_buffer/client/cmd_buffer_helper.h (right): https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/client/cmd_buffer_helper.h#newcode35 gpu/command_buffer/client/cmd_buffer_helper.h:35: ...

6 years, 11 months ago (2014-01-08 05:08:03 UTC) #21

reveman

https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/client/cmd_buffer_helper.h File gpu/command_buffer/client/cmd_buffer_helper.h (right): https://codereview.chromium.org/116863003/diff/70001/gpu/command_buffer/client/cmd_buffer_helper.h#newcode35 gpu/command_buffer/client/cmd_buffer_helper.h:35: class GPU_EXPORT CommandBufferHelper { On 2014/01/08 05:08:04, piman (OOO ...

6 years, 11 months ago (2014-01-08 05:56:21 UTC) #22

piman

On Tue, Jan 7, 2014 at 9:56 PM, <reveman@chromium.org> wrote: > > https://codereview.chromium.org/116863003/diff/70001/gpu/ > command_buffer/client/cmd_buffer_helper.h ...

6 years, 11 months ago (2014-01-08 06:03:35 UTC) #23

jadahl

On 2014/01/08 06:03:35, piman (OOO back 2014-1-7) wrote: > On Tue, Jan 7, 2014 at ...

6 years, 11 months ago (2014-01-08 10:22:13 UTC) #24

On 2014/01/08 06:03:35, piman (OOO back 2014-1-7) wrote:
> On Tue, Jan 7, 2014 at 9:56 PM, <mailto:reveman@chromium.org> wrote:
> 
> >
> > https://codereview.chromium.org/116863003/diff/70001/gpu/
> > command_buffer/client/cmd_buffer_helper.h
> > File gpu/command_buffer/client/cmd_buffer_helper.h (right):
> >
> > https://codereview.chromium.org/116863003/diff/70001/gpu/
> > command_buffer/client/cmd_buffer_helper.h#newcode35
> > gpu/command_buffer/client/cmd_buffer_helper.h:35: class GPU_EXPORT
> > CommandBufferHelper {
> > On 2014/01/08 05:08:04, piman (OOO back 2014-1-7) wrote:
> >
> >> On 2014/01/02 01:31:29, David Reveman wrote:
> >> > What if we enumerate all internal queries and add API to this
> >>
> > interface that
> >
> >> > allows the mapped memory manager to handle FreePendingQuery in a way
> >>
> > that's
> >
> >> > consistent with how FreePendingToken currently works?
> >> >
> >> > I guess we'd need:
> >> > void WaitForQuery(int32 query);
> >> > int32 last_query_completed() const;
> >> >
> >> > Wdyt?
> >>
> >
> >  I would prefer if GL concepts didn't leak into CommandBufferHelper
> >>
> > which is
> >
> >> GL-agnostic.
> >>
> >
> > Makes sense. What if we just call this sequence number a "async command
> > token" or just "async token"? As that's really what it is, a sequence
> > number representing a position in our async command processing pipeline.
> > The fact that GL queries are used by GLImplementation to determine this,
> > does not need to be exposed here.
> >
> 
> That's fine in principle, but keep in mind that CommandBuffer doesn't have
> a notion of what this async token is, and can't update it by itself, so I'm
> not sure how you would implement WaitForAsyncToken at this level.
> 

I don't think we should use the term token here, as it doesn't work as the other
tokens do. I'd prefer something making the difference obvious, such as naming it
serial, or sequence number.

Regarding WaitFor(AsyncToken|Serial), do we really need to? Just because we
track async uploads internally now doesn't change current usage, where the
upload is tracked externally. What is the reason for starting to wait for these
uploads to be completed that we didn't wait for before?

FencedAllocator would then have FREE_PENDING_TOKEN and FREE_PENDING_SERIAL where
a serial is defined as a sequential number that which has nothing to do with
tokens and roundtrips that on tear-down should be ignored. When serial is
synced, FencedAllocator would either change the state to FREE_PENDING_TOKEN or
FREE depending on if there is a token set that passed or not.

jadahl

Hi, I uploaded a WIP version of what we have been discussing here. This version ...

6 years, 11 months ago (2014-01-10 13:17:04 UTC) #25

piman

https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/client/buffer_tracker.h File gpu/command_buffer/client/buffer_tracker.h (right): https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/client/buffer_tracker.h#newcode79 gpu/command_buffer/client/buffer_tracker.h:79: void set_used(bool used) { What does used mean exactly? ...

6 years, 11 months ago (2014-01-11 02:02:32 UTC) #26

jadahl

https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/client/buffer_tracker.h File gpu/command_buffer/client/buffer_tracker.h (right): https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/client/buffer_tracker.h#newcode79 gpu/command_buffer/client/buffer_tracker.h:79: void set_used(bool used) { On 2014/01/11 02:02:32, piman wrote: ...

6 years, 11 months ago (2014-01-11 11:35:27 UTC) #27

https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie...
File gpu/command_buffer/client/buffer_tracker.h (right):

https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie...
gpu/command_buffer/client/buffer_tracker.h:79: void set_used(bool used) {
On 2014/01/11 02:02:32, piman wrote:
> What does used mean exactly?

It's mean to be used for keeping track if a buffer is used, so user can properly
insert a last usage token. set_bound/bound might be better names, or maybe it
can be removed, not sure.

https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie...
File gpu/command_buffer/client/gles2_implementation.cc (right):

https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie...
gpu/command_buffer/client/gles2_implementation.cc:1384: }
On 2014/01/11 02:02:32, piman wrote:
> Where does the last_usage_token get set if it's not bound to
> GL_PIXEL_UNPACK_TRANSFER_BUFFER_CHROMIUM ?

Right, should do the same for other buffer targets in BindBufferHelper.

https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie...
gpu/command_buffer/client/gles2_implementation.cc:3713:
buffer->set_async_query_id(id);
On 2014/01/11 02:02:32, piman wrote:
> What if we do several AsyncTexImage2DCHROMIUM on the same buffer?

That would not work, true. I guess it could work by associating a buffer with a
query serial instead, and if a later AsyncTexImage2DCHROMIUM would occur, it'd
be updated with the new serial.

https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/serv...
File gpu/command_buffer/service/query_manager.cc (right):

https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/serv...
gpu/command_buffer/service/query_manager.cc:74:
manager_->decoder()->engine()->set_serial(serial_);
On 2014/01/11 02:02:32, piman wrote:
> So, the rest of the logic assumes serials get set on the CommandBuffer in the
> same order they are created. Is it currently true? (assumes queries always
> complete in exactly the same order as they are created). How can we defend for
> that in the future, if someone wants to extend the serial to other queries?

AFAIK uploads complete in order, so queries should as well, just as this
function should be invoked in order as such. Usage parallel to async uploads is
problematic and somewhat of a limitation of this approach, as they would have to
follow the order of uploads.

reveman

If we're plumbing this all the way to the service side, would it make more ...

6 years, 11 months ago (2014-01-11 23:39:03 UTC) #28

If we're plumbing this all the way to the service side, would it make more sense
to just add a real "async pixel transfer token" rather than using queries?

So basically add a "SetAsyncPixelTransferToken" command that is the same as
"SetToken" except that it needs to be processed by the async pixel transfer
manager in sequence with async pixel transfers. On the client side, we'd simply
call InsertAsyncPixelTransferToken() in a way that's consistent with how
InsertToken() is used.

https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie...
File gpu/command_buffer/client/fenced_allocator.cc (right):

https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie...
gpu/command_buffer/client/fenced_allocator.cc:228: block.serial <=
last_serial_read) {
maybe use a switch statement here and above

https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie...
File gpu/command_buffer/client/gles2_implementation.cc (right):

https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie...
gpu/command_buffer/client/gles2_implementation.cc:1384: }
On 2014/01/11 11:35:29, jadahl wrote:
> On 2014/01/11 02:02:32, piman wrote:
> > Where does the last_usage_token get set if it's not bound to
> > GL_PIXEL_UNPACK_TRANSFER_BUFFER_CHROMIUM ?
> 
> Right, should do the same for other buffer targets in BindBufferHelper.

What if we set last usage token when issuing the command that will use the
buffer (ie. DrawArrays) instead of bind/unbind? Then you can get rid of
set_used() and the token would be more tightly packed with the usage. Also more
consistent with how we handle AsyncTexImage2D.

https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/serv...
File gpu/command_buffer/service/query_manager.cc (right):

https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/serv...
gpu/command_buffer/service/query_manager.cc:62: this));
Hm, I think this will be a performance problem. We used to post tasks  back to
the main thread but we switched to writing directly to the shared memory from
the transfer thread instead. Two reason for doing this:

1. The main gpu thread might be busy so it can take awhile for it to process the
task.

2. And I think even more importantly, posting a task after each transfer results
in a lot of context switching overhead. This is because the transfer thread has
lower priority so posting a task to the main thread will consistently cause the
OS to switch to that thread and process the task.

I think we need to write directly to some shared memory from here instead of
posting a task.

jadahl

On 2014/01/11 23:39:03, David Reveman wrote: > If we're plumbing this all the way to ...

6 years, 11 months ago (2014-01-12 09:51:56 UTC) #29

On 2014/01/11 23:39:03, David Reveman wrote:
> If we're plumbing this all the way to the service side, would it make more
sense
> to just add a real "async pixel transfer token" rather than using queries?
> 
> So basically add a "SetAsyncPixelTransferToken" command that is the same as
> "SetToken" except that it needs to be processed by the async pixel transfer
> manager in sequence with async pixel transfers. On the client side, we'd
simply
> call InsertAsyncPixelTransferToken() in a way that's consistent with how
> InsertToken() is used.

Correct me if I'm wrong, but the concern with this was that it'd add GL and
query logic to the command buffer. At the same time, this patch already
introduces such logic on the service side, except that it doesn't call it
anything related to pixels or uploads.

Maybe what would be best is to introduce a generic async token API to the
command buffer that the user may use in any way, and if multiple enumerated
tokens would be used in the future, it'd only be another enum (or array index or
something) separating them as far as the command buffer is concerned.

> 
>
https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie...
> File gpu/command_buffer/client/fenced_allocator.cc (right):
> 
>
https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie...
> gpu/command_buffer/client/fenced_allocator.cc:228: block.serial <=
> last_serial_read) {
> maybe use a switch statement here and above
> 
>
https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie...
> File gpu/command_buffer/client/gles2_implementation.cc (right):
> 
>
https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie...
> gpu/command_buffer/client/gles2_implementation.cc:1384: }
> On 2014/01/11 11:35:29, jadahl wrote:
> > On 2014/01/11 02:02:32, piman wrote:
> > > Where does the last_usage_token get set if it's not bound to
> > > GL_PIXEL_UNPACK_TRANSFER_BUFFER_CHROMIUM ?
> > 
> > Right, should do the same for other buffer targets in BindBufferHelper.
> 
> What if we set last usage token when issuing the command that will use the
> buffer (ie. DrawArrays) instead of bind/unbind? Then you can get rid of
> set_used() and the token would be more tightly packed with the usage. Also
more
> consistent with how we handle AsyncTexImage2D.
> 
>
https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/serv...
> File gpu/command_buffer/service/query_manager.cc (right):
> 
>
https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/serv...
> gpu/command_buffer/service/query_manager.cc:62: this));
> Hm, I think this will be a performance problem. We used to post tasks  back to
> the main thread but we switched to writing directly to the shared memory from
> the transfer thread instead. Two reason for doing this:
> 
> 1. The main gpu thread might be busy so it can take awhile for it to process
the
> task.
> 
> 2. And I think even more importantly, posting a task after each transfer
results
> in a lot of context switching overhead. This is because the transfer thread
has
> lower priority so posting a task to the main thread will consistently cause
the
> OS to switch to that thread and process the task.
> 
> I think we need to write directly to some shared memory from here instead of
> posting a task.

jadahl

https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/client/gles2_implementation.cc File gpu/command_buffer/client/gles2_implementation.cc (right): https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/client/gles2_implementation.cc#newcode1384 gpu/command_buffer/client/gles2_implementation.cc:1384: } On 2014/01/11 23:39:04, David Reveman wrote: > On ...

6 years, 11 months ago (2014-01-12 09:52:24 UTC) #30

https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie...
File gpu/command_buffer/client/gles2_implementation.cc (right):

https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/clie...
gpu/command_buffer/client/gles2_implementation.cc:1384: }
On 2014/01/11 23:39:04, David Reveman wrote:
> On 2014/01/11 11:35:29, jadahl wrote:
> > On 2014/01/11 02:02:32, piman wrote:
> > > Where does the last_usage_token get set if it's not bound to
> > > GL_PIXEL_UNPACK_TRANSFER_BUFFER_CHROMIUM ?
> > 
> > Right, should do the same for other buffer targets in BindBufferHelper.
> 
> What if we set last usage token when issuing the command that will use the
> buffer (ie. DrawArrays) instead of bind/unbind? Then you can get rid of
> set_used() and the token would be more tightly packed with the usage. Also
more
> consistent with how we handle AsyncTexImage2D.

I think this sounds like a good idea.

https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/serv...
File gpu/command_buffer/service/query_manager.cc (right):

https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/serv...
gpu/command_buffer/service/query_manager.cc:62: this));
On 2014/01/11 23:39:04, David Reveman wrote:
> Hm, I think this will be a performance problem. We used to post tasks  back to
> the main thread but we switched to writing directly to the shared memory from
> the transfer thread instead. Two reason for doing this:
> 
> 1. The main gpu thread might be busy so it can take awhile for it to process
the
> task.
> 
> 2. And I think even more importantly, posting a task after each transfer
results
> in a lot of context switching overhead. This is because the transfer thread
has
> lower priority so posting a task to the main thread will consistently cause
the
> OS to switch to that thread and process the task.
> 
> I think we need to write directly to some shared memory from here instead of
> posting a task.

This is what I did initially, but changed to posting as I hadn't done it in a
thread safe manner. Is there a way to write to the command buffer state safely
directly from the upload thread?

reveman

On 2014/01/12 09:51:56, jadahl wrote: > On 2014/01/11 23:39:03, David Reveman wrote: > > If ...

6 years, 11 months ago (2014-01-12 19:23:26 UTC) #31

reveman

https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/service/query_manager.cc File gpu/command_buffer/service/query_manager.cc (right): https://codereview.chromium.org/116863003/diff/330001/gpu/command_buffer/service/query_manager.cc#newcode62 gpu/command_buffer/service/query_manager.cc:62: this)); On 2014/01/12 09:52:24, jadahl wrote: > On 2014/01/11 ...

6 years, 11 months ago (2014-01-12 19:29:45 UTC) #32

jadahl

https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buffer/client/gles2_implementation.cc File gpu/command_buffer/client/gles2_implementation.cc (right): https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buffer/client/gles2_implementation.cc#newcode1384 gpu/command_buffer/client/gles2_implementation.cc:1384: } On 2014/01/12 09:52:24, jadahl wrote: > On 2014/01/11 ...

6 years, 11 months ago (2014-01-14 14:15:46 UTC) #33

reveman

https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buffer/client/gles2_implementation.cc File gpu/command_buffer/client/gles2_implementation.cc (right): https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buffer/client/gles2_implementation.cc#newcode1384 gpu/command_buffer/client/gles2_implementation.cc:1384: } On 2014/01/14 14:15:47, jadahl wrote: > On 2014/01/12 ...

6 years, 11 months ago (2014-01-14 15:27:20 UTC) #34

reveman

https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buffer/client/gles2_implementation.cc File gpu/command_buffer/client/gles2_implementation.cc (right): https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buffer/client/gles2_implementation.cc#newcode1384 gpu/command_buffer/client/gles2_implementation.cc:1384: } On 2014/01/14 15:27:20, David Reveman wrote: > On ...

6 years, 11 months ago (2014-01-14 15:54:25 UTC) #35

https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff...
File gpu/command_buffer/client/gles2_implementation.cc (right):

https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff...
gpu/command_buffer/client/gles2_implementation.cc:1384: }
On 2014/01/14 15:27:20, David Reveman wrote:
> On 2014/01/14 14:15:47, jadahl wrote:
> > On 2014/01/12 09:52:24, jadahl wrote:
> > > On 2014/01/11 23:39:04, David Reveman wrote:
> > > > On 2014/01/11 11:35:29, jadahl wrote:
> > > > > On 2014/01/11 02:02:32, piman wrote:
> > > > > > Where does the last_usage_token get set if it's not bound to
> > > > > > GL_PIXEL_UNPACK_TRANSFER_BUFFER_CHROMIUM ?
> > > > > 
> > > > > Right, should do the same for other buffer targets in
BindBufferHelper.
> > > > 
> > > > What if we set last usage token when issuing the command that will use
the
> > > > buffer (ie. DrawArrays) instead of bind/unbind? Then you can get rid of
> > > > set_used() and the token would be more tightly packed with the usage.
Also
> > > more
> > > > consistent with how we handle AsyncTexImage2D.
> > > 
> > > I think this sounds like a good idea.
> > 
> > Hmm, is what you mean to do this for the functions that deal with transfer
> > buffers? i.e. UnmapBufferCHROMIUM, (Compressed)(Sub)TexImage2D? and not
> > DrawArrays, which does not deal with pixel transfer buffers at all.
> 
> Not only transfer buffers but all buffer objects so that you can get rid of
> set_used() completely. So both (Compressed)(Sub)TexImage2D and DrawArrays, as
> well as any other command that can use a buffer object.

Sorry, DrawArrays doesn't use a buffer object as we don't have shared memory
VBOs. The general idea is that any command that will use the contents of a
buffer object on the service side needs a set_last_usage_token call. ie.
BufferSubData in case of VBOs and (Compressed)(Sub)TexImage2D in case of pixel
transfer buffers.

Think this will also allow us to remove set_transfer_ready_token().

jadahl

https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buffer/client/gles2_implementation.cc File gpu/command_buffer/client/gles2_implementation.cc (right): https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buffer/client/gles2_implementation.cc#newcode1384 gpu/command_buffer/client/gles2_implementation.cc:1384: } On 2014/01/14 15:54:26, David Reveman wrote: > On ...

6 years, 11 months ago (2014-01-14 16:32:08 UTC) #36

https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff...
File gpu/command_buffer/client/gles2_implementation.cc (right):

https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff...
gpu/command_buffer/client/gles2_implementation.cc:1384: }
On 2014/01/14 15:54:26, David Reveman wrote:
> On 2014/01/14 15:27:20, David Reveman wrote:
> > On 2014/01/14 14:15:47, jadahl wrote:
> > > On 2014/01/12 09:52:24, jadahl wrote:
> > > > On 2014/01/11 23:39:04, David Reveman wrote:
> > > > > On 2014/01/11 11:35:29, jadahl wrote:
> > > > > > On 2014/01/11 02:02:32, piman wrote:
> > > > > > > Where does the last_usage_token get set if it's not bound to
> > > > > > > GL_PIXEL_UNPACK_TRANSFER_BUFFER_CHROMIUM ?
> > > > > > 
> > > > > > Right, should do the same for other buffer targets in
> BindBufferHelper.
> > > > > 
> > > > > What if we set last usage token when issuing the command that will use
> the
> > > > > buffer (ie. DrawArrays) instead of bind/unbind? Then you can get rid
of
> > > > > set_used() and the token would be more tightly packed with the usage.
> Also
> > > > more
> > > > > consistent with how we handle AsyncTexImage2D.
> > > > 
> > > > I think this sounds like a good idea.
> > > 
> > > Hmm, is what you mean to do this for the functions that deal with transfer
> > > buffers? i.e. UnmapBufferCHROMIUM, (Compressed)(Sub)TexImage2D? and not
> > > DrawArrays, which does not deal with pixel transfer buffers at all.
> > 
> > Not only transfer buffers but all buffer objects so that you can get rid of
> > set_used() completely. So both (Compressed)(Sub)TexImage2D and DrawArrays,
as
> > well as any other command that can use a buffer object.
> 
> Sorry, DrawArrays doesn't use a buffer object as we don't have shared memory
> VBOs. The general idea is that any command that will use the contents of a
> buffer object on the service side needs a set_last_usage_token call. ie.
> BufferSubData in case of VBOs and (Compressed)(Sub)TexImage2D in case of pixel
> transfer buffers.

I don't see how BufferSubData would be used service side for a
BufferTracker::Buffer, as in those cases it'll be a simple client side memcpy.

> 
> Think this will also allow us to remove set_transfer_ready_token().

reveman

https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buffer/client/gles2_implementation.cc File gpu/command_buffer/client/gles2_implementation.cc (right): https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buffer/client/gles2_implementation.cc#newcode1384 gpu/command_buffer/client/gles2_implementation.cc:1384: } On 2014/01/14 16:32:09, jadahl wrote: > On 2014/01/14 ...

6 years, 11 months ago (2014-01-14 16:55:24 UTC) #37

https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff...
File gpu/command_buffer/client/gles2_implementation.cc (right):

https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff...
gpu/command_buffer/client/gles2_implementation.cc:1384: }
On 2014/01/14 16:32:09, jadahl wrote:
> On 2014/01/14 15:54:26, David Reveman wrote:
> > On 2014/01/14 15:27:20, David Reveman wrote:
> > > On 2014/01/14 14:15:47, jadahl wrote:
> > > > On 2014/01/12 09:52:24, jadahl wrote:
> > > > > On 2014/01/11 23:39:04, David Reveman wrote:
> > > > > > On 2014/01/11 11:35:29, jadahl wrote:
> > > > > > > On 2014/01/11 02:02:32, piman wrote:
> > > > > > > > Where does the last_usage_token get set if it's not bound to
> > > > > > > > GL_PIXEL_UNPACK_TRANSFER_BUFFER_CHROMIUM ?
> > > > > > > 
> > > > > > > Right, should do the same for other buffer targets in
> > BindBufferHelper.
> > > > > > 
> > > > > > What if we set last usage token when issuing the command that will
use
> > the
> > > > > > buffer (ie. DrawArrays) instead of bind/unbind? Then you can get rid
> of
> > > > > > set_used() and the token would be more tightly packed with the
usage.
> > Also
> > > > > more
> > > > > > consistent with how we handle AsyncTexImage2D.
> > > > > 
> > > > > I think this sounds like a good idea.
> > > > 
> > > > Hmm, is what you mean to do this for the functions that deal with
transfer
> > > > buffers? i.e. UnmapBufferCHROMIUM, (Compressed)(Sub)TexImage2D? and not
> > > > DrawArrays, which does not deal with pixel transfer buffers at all.
> > > 
> > > Not only transfer buffers but all buffer objects so that you can get rid
of
> > > set_used() completely. So both (Compressed)(Sub)TexImage2D and DrawArrays,
> as
> > > well as any other command that can use a buffer object.
> > 
> > Sorry, DrawArrays doesn't use a buffer object as we don't have shared memory
> > VBOs. The general idea is that any command that will use the contents of a
> > buffer object on the service side needs a set_last_usage_token call. ie.
> > BufferSubData in case of VBOs and (Compressed)(Sub)TexImage2D in case of
pixel
> > transfer buffers.
> 
> I don't see how BufferSubData would be used service side for a
> BufferTracker::Buffer, as in those cases it'll be a simple client side memcpy.


Right. I guess it's really just (Compressed)(Sub)TexImage2D that need a
set_last_usage_token.

jadahl

Hi, I uploaded another WIP that addresses some of the issues raised in the review ...

6 years, 11 months ago (2014-01-16 16:24:39 UTC) #38

Hi,

I uploaded another WIP that addresses some of the issues raised in the review
(thanks for the review!).

What is left is to make a decision if these internal queries should be using
their own command ala InsertToken and not go via query_manager.cc at all, and as
such not involve OpenGL queries at all, or if we should (somewhat ab)use them
for our needs.

I have also waited with the mutex based serial updating as its implementation
would depend on the decision mentioned above.

All that, and tests.

https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff...
File gpu/command_buffer/client/fenced_allocator.cc (right):

https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff...
gpu/command_buffer/client/fenced_allocator.cc:52: CollapseFreeBlock(i);
On 2014/01/11 02:02:32, piman wrote:
> This is probably ok in the current use, if we can assume that the
> FencedAllocator is destroyed as we delete the SHM it manages allocations on.
In
> that case it would actually be ok to not WaitForToken in the
FREE_PENDING_TOKEN
> case, either.
> 
> We should document the behavior though. It does mean that the user can't reuse
> the SHM (e.g. if we keep a pool of open SHMs) without a Finish().
Alternatively,
> I suppose we could extract the set of serials/tokens to wait for, and let the
> client deal with waiting - though we don't have a way to wait-for-a-serial,
I'm
> not sure the client could really do anything. Mmh.

Where do you suggest we document this? Here?

https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff...
File gpu/command_buffer/client/query_tracker.cc (right):

https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff...
gpu/command_buffer/client/query_tracker.cc:279: return NextSerial();
On 2014/01/11 02:02:32, piman wrote:
> In case of wrap around we need to Finish().
> The rest of the code assumes last_read_serial > |serial| means serial has
> passed. It would not be true with this code, after wraparound.
> 
> We do Finish when tokens wrap (see CommandBufferHelper::InsertToken).
> 
> If we're worried that we might actually wrap in practice and don't want to
> Finish() in that case, we can do something like double-buffer the range, and
> only call Finish() if the last_read_serial has the same high bit as the new
> serial, when that high bit changes. That should not happen in practice (unless
> we can have 2^31 queries actually in flight).

Will Finish() really help anything? In the version I just uploaded I simply
provide an API via CommandBufferHelper (HasSerialPassed(uint32)) that just
checks the bits.

This will work if we have less than 2^31 concurrent queries, and if we'd ever
have that many I think we have bigger problems already :)

reveman

IMO, I think it would be significantly cleaner if we avoided the use of queries ...

6 years, 11 months ago (2014-01-16 17:24:48 UTC) #39

piman

https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buffer/client/gles2_implementation.cc File gpu/command_buffer/client/gles2_implementation.cc (right): https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buffer/client/gles2_implementation.cc#newcode3713 gpu/command_buffer/client/gles2_implementation.cc:3713: buffer->set_async_query_id(id); On 2014/01/11 11:35:29, jadahl wrote: > On 2014/01/11 ...

6 years, 11 months ago (2014-01-16 21:22:49 UTC) #40

https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff...
File gpu/command_buffer/client/gles2_implementation.cc (right):

https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff...
gpu/command_buffer/client/gles2_implementation.cc:3713:
buffer->set_async_query_id(id);
On 2014/01/11 11:35:29, jadahl wrote:
> On 2014/01/11 02:02:32, piman wrote:
> > What if we do several AsyncTexImage2DCHROMIUM on the same buffer?
> 
> That would not work, true. I guess it could work by associating a buffer with
a
> query serial instead, and if a later AsyncTexImage2DCHROMIUM would occur, it'd
> be updated with the new serial.

We need to fix this, or change the API to prevent multiple
AsyncTexImage2DCHROMIUM on the same buffer (and document that), but I think the
latter is very limiting.

https://chromiumcodereview.appspot.com/116863003/diff/550001/gpu/command_buff...
File gpu/command_buffer/client/buffer_tracker.h (right):

https://chromiumcodereview.appspot.com/116863003/diff/550001/gpu/command_buff...
gpu/command_buffer/client/buffer_tracker.h:97: bool used_ : 1;
nit: remove unused field.

https://chromiumcodereview.appspot.com/116863003/diff/550001/gpu/command_buff...
File gpu/command_buffer/client/fenced_allocator.cc (right):

https://chromiumcodereview.appspot.com/116863003/diff/550001/gpu/command_buff...
gpu/command_buffer/client/fenced_allocator.cc:53: CollapseFreeBlock(i);
On 2014/01/16 17:24:49, David Reveman wrote:
> I don't think it makes sense free this immediately but wait for a token in the
> above case. If this doesn't need to wait for a serial then it at least has to
> wait for a token.
> 
> I assume that the WaitForToken code above protects against freeing a block
> before all commands that might reference it have been processed on the service
> side.

That is actually not needed to ensure things on the service side, since the
commands that refer to the SHM would execute before the IPC to destroy the SHM
gets handled (assuming proper order of operations).

As I mentioned earlier, If the assumption is that the SHM will be destroyed
after this, then there is no synchronization issue. If the SHM will get reused
with a fresh new FencedAllocator, then we need some sort of synchronization,
yes.

> Why wouldn't we need the same protection for blocks used with async
> transfers?

I agree that being consistent is better. However to be able to "finish", we need
to dependency-inject a way to call the right WaitAsyncTexImage2DCHROMIUM that
corresponds to the serial.
An alternative is to document that it's the responsibility of the client to
either destroy the SHM, or ensure synchronization before reuse.

https://chromiumcodereview.appspot.com/116863003/diff/550001/gpu/command_buff...
File gpu/command_buffer/client/gles2_implementation.cc (right):

https://chromiumcodereview.appspot.com/116863003/diff/550001/gpu/command_buff...
gpu/command_buffer/client/gles2_implementation.cc:1381:
FreeTransferBuffer(buffer);
How does synchronization work wrt asynchronous ReadPixels?

jadahl

https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buffer/client/gles2_implementation.cc File gpu/command_buffer/client/gles2_implementation.cc (right): https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buffer/client/gles2_implementation.cc#newcode3713 gpu/command_buffer/client/gles2_implementation.cc:3713: buffer->set_async_query_id(id); On 2014/01/16 21:22:50, piman wrote: > On 2014/01/11 ...

6 years, 11 months ago (2014-01-17 08:50:24 UTC) #41

https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff...
File gpu/command_buffer/client/gles2_implementation.cc (right):

https://chromiumcodereview.appspot.com/116863003/diff/330001/gpu/command_buff...
gpu/command_buffer/client/gles2_implementation.cc:3713:
buffer->set_async_query_id(id);
On 2014/01/16 21:22:50, piman wrote:
> On 2014/01/11 11:35:29, jadahl wrote:
> > On 2014/01/11 02:02:32, piman wrote:
> > > What if we do several AsyncTexImage2DCHROMIUM on the same buffer?
> > 
> > That would not work, true. I guess it could work by associating a buffer
with
> a
> > query serial instead, and if a later AsyncTexImage2DCHROMIUM would occur,
it'd
> > be updated with the new serial.
> 
> We need to fix this, or change the API to prevent multiple
> AsyncTexImage2DCHROMIUM on the same buffer (and document that), but I think
the
> latter is very limiting.

Then should we just introduce a new internal command as discussed, without
having to hack gl queries to deal with this?

https://chromiumcodereview.appspot.com/116863003/diff/550001/gpu/command_buff...
File gpu/command_buffer/client/fenced_allocator.cc (right):

https://chromiumcodereview.appspot.com/116863003/diff/550001/gpu/command_buff...
gpu/command_buffer/client/fenced_allocator.cc:53: CollapseFreeBlock(i);
On 2014/01/16 21:22:50, piman wrote:
> On 2014/01/16 17:24:49, David Reveman wrote:
> > I don't think it makes sense free this immediately but wait for a token in
the
> > above case. If this doesn't need to wait for a serial then it at least has
to
> > wait for a token.
> > 
> > I assume that the WaitForToken code above protects against freeing a block
> > before all commands that might reference it have been processed on the
service
> > side.
> 
> That is actually not needed to ensure things on the service side, since the
> commands that refer to the SHM would execute before the IPC to destroy the SHM
> gets handled (assuming proper order of operations).
> 
> As I mentioned earlier, If the assumption is that the SHM will be destroyed
> after this, then there is no synchronization issue. If the SHM will get reused
> with a fresh new FencedAllocator, then we need some sort of synchronization,
> yes.
> 
> > Why wouldn't we need the same protection for blocks used with async
> > transfers?
> 
> I agree that being consistent is better. However to be able to "finish", we
need
> to dependency-inject a way to call the right WaitAsyncTexImage2DCHROMIUM that
> corresponds to the serial.
> An alternative is to document that it's the responsibility of the client to
> either destroy the SHM, or ensure synchronization before reuse.

We could just helper_->Finish() before the loop to make sure all potential
tokens have passed; but that wouldn't synchronize the async commands. To do that
now we'd need to busy-wait, hoping the uploader didn't stop uploading.

If we really don't want to put the responsibility to some layer above, could we
put some kind of waitable here and add another flush-serial command that either
signals the waitable when done, or when aborted?

https://chromiumcodereview.appspot.com/116863003/diff/550001/gpu/command_buff...
File gpu/command_buffer/client/gles2_implementation.cc (right):

https://chromiumcodereview.appspot.com/116863003/diff/550001/gpu/command_buff...
gpu/command_buffer/client/gles2_implementation.cc:1381:
FreeTransferBuffer(buffer);
On 2014/01/16 21:22:50, piman wrote:
> How does synchronization work wrt asynchronous ReadPixels?

Hmm. Good point. Reading how readpixels is implemented, it looks like it should
be enough to mark last usage after helper_->ReadPixels() when the
pack_transfer_buffer is bound, assuming service side doesn't reads async. Do you
think that'd be enough?

reveman

https://chromiumcodereview.appspot.com/116863003/diff/550001/gpu/command_buffer/client/fenced_allocator.cc File gpu/command_buffer/client/fenced_allocator.cc (right): https://chromiumcodereview.appspot.com/116863003/diff/550001/gpu/command_buffer/client/fenced_allocator.cc#newcode53 gpu/command_buffer/client/fenced_allocator.cc:53: CollapseFreeBlock(i); On 2014/01/17 08:50:25, jadahl wrote: > On 2014/01/16 ...

6 years, 11 months ago (2014-01-17 16:56:46 UTC) #42

https://chromiumcodereview.appspot.com/116863003/diff/550001/gpu/command_buff...
File gpu/command_buffer/client/fenced_allocator.cc (right):

https://chromiumcodereview.appspot.com/116863003/diff/550001/gpu/command_buff...
gpu/command_buffer/client/fenced_allocator.cc:53: CollapseFreeBlock(i);
On 2014/01/17 08:50:25, jadahl wrote:
> On 2014/01/16 21:22:50, piman wrote:
> > On 2014/01/16 17:24:49, David Reveman wrote:
> > > I don't think it makes sense free this immediately but wait for a token in
> the
> > > above case. If this doesn't need to wait for a serial then it at least has
> to
> > > wait for a token.
> > > 
> > > I assume that the WaitForToken code above protects against freeing a block
> > > before all commands that might reference it have been processed on the
> service
> > > side.
> > 
> > That is actually not needed to ensure things on the service side, since the
> > commands that refer to the SHM would execute before the IPC to destroy the
SHM
> > gets handled (assuming proper order of operations).
> > 
> > As I mentioned earlier, If the assumption is that the SHM will be destroyed
> > after this, then there is no synchronization issue. If the SHM will get
reused
> > with a fresh new FencedAllocator, then we need some sort of synchronization,
> > yes.
> > 
> > > Why wouldn't we need the same protection for blocks used with async
> > > transfers?
> > 
> > I agree that being consistent is better. However to be able to "finish", we
> need
> > to dependency-inject a way to call the right WaitAsyncTexImage2DCHROMIUM
that
> > corresponds to the serial.
> > An alternative is to document that it's the responsibility of the client to
> > either destroy the SHM, or ensure synchronization before reuse.
> 
> We could just helper_->Finish() before the loop to make sure all potential
> tokens have passed; but that wouldn't synchronize the async commands. To do
that
> now we'd need to busy-wait, hoping the uploader didn't stop uploading.
> 
> If we really don't want to put the responsibility to some layer above, could
we
> put some kind of waitable here and add another flush-serial command that
either
> signals the waitable when done, or when aborted?

If we're adding an async-token, then some command that will wait for the token
on the service side probably makes sense too. We already have a command that
will wait for an async upload to finish so this shouldn't be hard to implement.

jadahl

Hi, Another WIP uploaded. In this version I rolled back the usage of "internal queries", ...

6 years, 11 months ago (2014-01-22 16:03:14 UTC) #43

reveman

https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buffer/client/buffer_tracker.h File gpu/command_buffer/client/buffer_tracker.h (right): https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buffer/client/buffer_tracker.h#newcode97 gpu/command_buffer/client/buffer_tracker.h:97: GLuint async_token_; GLuint? "int32 last_async_usage_token_;" instead? https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buffer/client/fenced_allocator.cc File gpu/command_buffer/client/fenced_allocator.cc ...

6 years, 11 months ago (2014-01-22 17:30:03 UTC) #44

piman

I'm not a big fan of this approach: - adding a GL-specific command in the ...

6 years, 11 months ago (2014-01-22 21:53:58 UTC) #45

epenner

https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buffer/service/async_pixel_transfer_manager.h File gpu/command_buffer/service/async_pixel_transfer_manager.h (right): https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buffer/service/async_pixel_transfer_manager.h#newcode66 gpu/command_buffer/service/async_pixel_transfer_manager.h:66: virtual void AsyncRunWhenCompleted(base::Callback<void()> callback) = 0; This is very ...

6 years, 11 months ago (2014-01-23 02:26:10 UTC) #46

epenner

https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buffer/client/cmd_buffer_helper.cc File gpu/command_buffer/client/cmd_buffer_helper.cc (right): https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buffer/client/cmd_buffer_helper.cc#newcode195 gpu/command_buffer/client/cmd_buffer_helper.cc:195: uint32 CommandBufferHelper::InsertAsyncToken() { Perhaps async token is meant to ...

6 years, 11 months ago (2014-01-23 02:52:42 UTC) #47

jadahl

On 2014/01/22 21:53:58, piman wrote: > I'm not a big fan of this approach: > ...

6 years, 11 months ago (2014-01-23 10:11:38 UTC) #48

jadahl

On 2014/01/23 02:52:42, epenner wrote: > https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buffer/client/cmd_buffer_helper.cc > File gpu/command_buffer/client/cmd_buffer_helper.cc (right): > > https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buffer/client/cmd_buffer_helper.cc#newcode195 > ...

6 years, 11 months ago (2014-01-23 10:21:51 UTC) #49

On 2014/01/23 02:52:42, epenner wrote:
>
https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff...
> File gpu/command_buffer/client/cmd_buffer_helper.cc (right):
> 
>
https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff...
> gpu/command_buffer/client/cmd_buffer_helper.cc:195: uint32
> CommandBufferHelper::InsertAsyncToken() {
> Perhaps async token is meant to be more generic, but if it was
> 'async_upload_token', could we just use the async upload commands rather than
> have a new command? 

My first plan of this last version was to make the SetAsyncToken command
actually be a SetAsyncUploadToken command, but there were currently no way of
adding internal commands without having a corresponding gl* function, so I took
the easy path in order to get input on the general approach (using new
commands).

> 
> If it is meant to be generic for other 'async things', it feels like we need
an
> indicator of what 'async thing' we are talking about, to be clear about who
owns
> the async_token and is allowed to change it. Perhaps there could be an enum of
> async-token-types which currently only has ASYNC_TRANSFER?

I suggested this (enum based async token) before as well. If we'd go for the
generic SetAsyncToken approach, I think this would be better.

> 
>
https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff...
> File gpu/command_buffer/client/cmd_buffer_helper.h (right):
> 
>
https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff...
> gpu/command_buffer/client/cmd_buffer_helper.h:108: // buffer does mean it has
> passed.
> Nit: 'doesn't'
> 
>
https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff...
> File gpu/command_buffer/service/command_buffer_service.h (right):
> 
>
https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buff...
> gpu/command_buffer/service/command_buffer_service.h:90: base::Lock lock_;
> I'm agreeing with Piman that a lock_ doesn't seem very nice here. It's
primarily
> just an ownership and/or lifetime thing right? Could clear ownership/lifetime
of
> the async-token relative to the transfer thread solve this? That might not be
> trivial but I think it can be done without locks.

GetState() doesn't only get a state, it also increases an int, and thus needed
to be protected. It was also put there to protect writing to the shared state,
which could now happen from different threads.

If *only* GLES2DecoderImpl interacts with the CommandBufferService, we could
probably add locks there instead, but it doesn't feel right either.

jadahl

https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buffer/client/buffer_tracker.h File gpu/command_buffer/client/buffer_tracker.h (right): https://chromiumcodereview.appspot.com/116863003/diff/910001/gpu/command_buffer/client/buffer_tracker.h#newcode97 gpu/command_buffer/client/buffer_tracker.h:97: GLuint async_token_; On 2014/01/22 17:30:04, David Reveman wrote: > ...

6 years, 11 months ago (2014-01-23 10:24:05 UTC) #50

jadahl

So there seems to be several options of going forward regarding this change. I will ...

6 years, 11 months ago (2014-01-23 10:49:30 UTC) #51

epenner

This is just regarding CommandBuffer::State. I'll defer to the other reviewers for the other bits ...

6 years, 11 months ago (2014-01-25 00:50:36 UTC) #52

piman

On Thu, Jan 23, 2014 at 2:49 AM, <jadahl@opera.com> wrote: > So there seems to ...

6 years, 11 months ago (2014-01-25 01:03:51 UTC) #53

On Thu, Jan 23, 2014 at 2:49 AM, <jadahl@opera.com> wrote:

> So there seems to be several options of going forward regarding this
> change. I
> will list them, so that its easier to compare them.
>
> client -> service
> =================
>
> 1. Generic commands (with or without identifying enum's), as done in most
> recent
> upload
>

The problem with the current way is that it doesn't scale with other types
of asynchronous processes.

> 2. Specific commands (e.g. SetAsyncUploadToken), similar to most recent
> upload,
> except the command is not generic
>

That may be workable, but needs to be at the GLES2 level.

> 3. Extra parameter(s) to AsyncTex(Sub)Image2DCHROMIUM
>

That's currently my preference.

>
> service -> client
> =================
>
> 1. CommandBuffer::State, similar to most recent upload. Tricky because it
> will
> need to deal with multiple threads updating it.
>

There's also the (tricky) protocol to access it between the service and the
client.
For the other reasons mentioned above, it doesn't scale either and/or adds
GL concepts at that level which is a layering violation.

>
> 2. Separate shared memory state ala QuerySync. Easier access model (only
> upload
> thread). Client needs to keep track of more state, and update "last async
> token"
> on the CommandBufferHelper.
>
> 3. Reuse QuerySync and QueryManager, but don't use Begin/EndQueryEXT
> commands.
>

2 or 3 are fine with me

>
> client -> fenced allocator
> ==========================
>
> 1. I think the only solution that has been in discussion recently is the
> HasAsyncTokenPassed() way.
>

Really, the problem is essentially that you want to be able to poll the
state when allocating, to more aggressively reuse.
We can add a generic concept where the FenceAllocator client can register a
Poll function, that would be called by the FenceAllocator when it wants to
try to get more memory. The responsibility of the Poll function would be to
free any buffer that it can.
Then all the state can stay in the GLES2Implementation (that would
implement the Poll function, keeping track of the buffers and associated
queries/tokens).

>
>
> I think regarding client -> service, it's not a hard problem, and just
> need to
> agree on what is the better way.
>
> service -> client, I'm leaning towards being in favor of 2, as it has a
> simpler
> access model, and tracking the state client side is probably easier than
> dealing
> with the various threads on the service side. 3 I think there is a risk of
> having to deal with query details for something that is not a query. As
> detecting "completion" is more or less an invoked closure posted to a
> message
> loop, there is not much to win with reusing.
>
> What do people think about this?
>
> https://chromiumcodereview.appspot.com/116863003/
>

To unsubscribe from this group and stop receiving emails from it, send an email
to chromium-reviews+unsubscribe@chromium.org.

piman

On Fri, Jan 24, 2014 at 4:50 PM, <epenner@chromium.org> wrote: > This is just regarding ...

6 years, 11 months ago (2014-01-25 01:06:16 UTC) #54

epennerAtGoogle

> The way the State is communicated with the client is tricky - see > ...

6 years, 11 months ago (2014-01-25 01:40:14 UTC) #55

jadahl

On 2014/01/25 01:03:51, piman wrote: > On Thu, Jan 23, 2014 at 2:49 AM, <mailto:jadahl@opera.com> ...

6 years, 10 months ago (2014-02-03 13:31:10 UTC) #56

On 2014/01/25 01:03:51, piman wrote:
> On Thu, Jan 23, 2014 at 2:49 AM, <mailto:jadahl@opera.com> wrote:
> 
> > So there seems to be several options of going forward regarding this
> > change. I
> > will list them, so that its easier to compare them.
> >
> > client -> service
> > =================
> >
> > 1. Generic commands (with or without identifying enum's), as done in most
> > recent
> > upload
> >
> 
> The problem with the current way is that it doesn't scale with other types
> of asynchronous processes.
> 
> 
> > 2. Specific commands (e.g. SetAsyncUploadToken), similar to most recent
> > upload,
> > except the command is not generic
> >
> 
> That may be workable, but needs to be at the GLES2 level.
> 
> 
> > 3. Extra parameter(s) to AsyncTex(Sub)Image2DCHROMIUM
> >
> 
> That's currently my preference.
> 
> 
> >
> > service -> client
> > =================
> >
> > 1. CommandBuffer::State, similar to most recent upload. Tricky because it
> > will
> > need to deal with multiple threads updating it.
> >
> 
> There's also the (tricky) protocol to access it between the service and the
> client.
> For the other reasons mentioned above, it doesn't scale either and/or adds
> GL concepts at that level which is a layering violation.
> 
> 
> >
> > 2. Separate shared memory state ala QuerySync. Easier access model (only
> > upload
> > thread). Client needs to keep track of more state, and update "last async
> > token"
> > on the CommandBufferHelper.
> >
> > 3. Reuse QuerySync and QueryManager, but don't use Begin/EndQueryEXT
> > commands.
> >
> 
> 2 or 3 are fine with me
> 
> 
> >
> > client -> fenced allocator
> > ==========================
> >
> > 1. I think the only solution that has been in discussion recently is the
> > HasAsyncTokenPassed() way.
> >
> 
> Really, the problem is essentially that you want to be able to poll the
> state when allocating, to more aggressively reuse.
> We can add a generic concept where the FenceAllocator client can register a
> Poll function, that would be called by the FenceAllocator when it wants to
> try to get more memory. The responsibility of the Poll function would be to
> free any buffer that it can.
> Then all the state can stay in the GLES2Implementation (that would
> implement the Poll function, keeping track of the buffers and associated
> queries/tokens).

This sounds do-able, and would get rid of exposing a GLES2 implementation detail
through the command buffer layer.

However, what should we do with non-freed memory on destruction. We'd need some
extra Wait next to the Poll that would do something equivalent with
WaitForAsyncToken, i.e. poll until all its buffers have their associated async
upload token passed. Or should the destruction of fenced allocator just do a
Poll in a busy-loop until all memory is freed?

> 
> 
> >
> >
> > I think regarding client -> service, it's not a hard problem, and just
> > need to
> > agree on what is the better way.
> >
> > service -> client, I'm leaning towards being in favor of 2, as it has a
> > simpler
> > access model, and tracking the state client side is probably easier than
> > dealing
> > with the various threads on the service side. 3 I think there is a risk of
> > having to deal with query details for something that is not a query. As
> > detecting "completion" is more or less an invoked closure posted to a
> > message
> > loop, there is not much to win with reusing.
> >
> > What do people think about this?
> >
> > https://chromiumcodereview.appspot.com/116863003/
> >
> 
> To unsubscribe from this group and stop receiving emails from it, send an
email
> to mailto:chromium-reviews+unsubscribe@chromium.org.

As conclusion current implementation plan is:

* GLES2Implementation mmaps a AsyncUpladSync struct that has an uint32
async_upload_token (not one per request, one per GLES2Implementation instance)

* AsyncTex(Sub)Image2D commands are extended with 3 new arguments: uin32
async_upload_token, void *sync_data (i.e. shm id and offset)
  - may be 0, 0, 0 for requests with no data.

* GLES2CmdDecoderImpl queues closures that writes the associated
async_upload_token to the mmap:ed AsyncUploadSync (writing *only* on the thread
used for uploading)

* GLES2Implementation keeps track of released transfer buffer memory (detached
from BufferTracker, still managed by MappedMemoryManager) pending completed
async upload
  - Slightly similar to a previous version, where an issue was raised regarding
the unmanaging of buffers

* Add generic Poll hook to mapped_memory and (wrapped_)fenced_allocator

* Hook up GLES2Implementation to Poll hook, each Poll, read last completed
async_upload_token from AsyncUploadSync, Free mapped memory accordingly


This means:
FREE_PENDING_ASYNC_TOKEN is dropped completely.
No more write-from-multiple-threads to generic CommandBuffer state.
No more single-usage generic-sounding "tokens" in generic CommandBuffer layer.

Please raise any issues you can think of regarding this most recent approach.

jadahl

On 2014/02/03 13:31:10, jadahl wrote: > On 2014/01/25 01:03:51, piman wrote: > > On Thu, ...

6 years, 10 months ago (2014-02-03 14:38:47 UTC) #57

On 2014/02/03 13:31:10, jadahl wrote:
> On 2014/01/25 01:03:51, piman wrote:
> > On Thu, Jan 23, 2014 at 2:49 AM, <mailto:jadahl@opera.com> wrote:
> > 
> > > So there seems to be several options of going forward regarding this
> > > change. I
> > > will list them, so that its easier to compare them.
> > >
> > > client -> service
> > > =================
> > >
> > > 1. Generic commands (with or without identifying enum's), as done in most
> > > recent
> > > upload
> > >
> > 
> > The problem with the current way is that it doesn't scale with other types
> > of asynchronous processes.
> > 
> > 
> > > 2. Specific commands (e.g. SetAsyncUploadToken), similar to most recent
> > > upload,
> > > except the command is not generic
> > >
> > 
> > That may be workable, but needs to be at the GLES2 level.
> > 
> > 
> > > 3. Extra parameter(s) to AsyncTex(Sub)Image2DCHROMIUM
> > >
> > 
> > That's currently my preference.
> > 
> > 
> > >
> > > service -> client
> > > =================
> > >
> > > 1. CommandBuffer::State, similar to most recent upload. Tricky because it
> > > will
> > > need to deal with multiple threads updating it.
> > >
> > 
> > There's also the (tricky) protocol to access it between the service and the
> > client.
> > For the other reasons mentioned above, it doesn't scale either and/or adds
> > GL concepts at that level which is a layering violation.
> > 
> > 
> > >
> > > 2. Separate shared memory state ala QuerySync. Easier access model (only
> > > upload
> > > thread). Client needs to keep track of more state, and update "last async
> > > token"
> > > on the CommandBufferHelper.
> > >
> > > 3. Reuse QuerySync and QueryManager, but don't use Begin/EndQueryEXT
> > > commands.
> > >
> > 
> > 2 or 3 are fine with me
> > 
> > 
> > >
> > > client -> fenced allocator
> > > ==========================
> > >
> > > 1. I think the only solution that has been in discussion recently is the
> > > HasAsyncTokenPassed() way.
> > >
> > 
> > Really, the problem is essentially that you want to be able to poll the
> > state when allocating, to more aggressively reuse.
> > We can add a generic concept where the FenceAllocator client can register a
> > Poll function, that would be called by the FenceAllocator when it wants to
> > try to get more memory. The responsibility of the Poll function would be to
> > free any buffer that it can.
> > Then all the state can stay in the GLES2Implementation (that would
> > implement the Poll function, keeping track of the buffers and associated
> > queries/tokens).
> 
> This sounds do-able, and would get rid of exposing a GLES2 implementation
detail
> through the command buffer layer.
> 
> However, what should we do with non-freed memory on destruction. We'd need
some
> extra Wait next to the Poll that would do something equivalent with
> WaitForAsyncToken, i.e. poll until all its buffers have their associated async
> upload token passed. Or should the destruction of fenced allocator just do a
> Poll in a busy-loop until all memory is freed?
> 
> > 
> > 
> > >
> > >
> > > I think regarding client -> service, it's not a hard problem, and just
> > > need to
> > > agree on what is the better way.
> > >
> > > service -> client, I'm leaning towards being in favor of 2, as it has a
> > > simpler
> > > access model, and tracking the state client side is probably easier than
> > > dealing
> > > with the various threads on the service side. 3 I think there is a risk of
> > > having to deal with query details for something that is not a query. As
> > > detecting "completion" is more or less an invoked closure posted to a
> > > message
> > > loop, there is not much to win with reusing.
> > >
> > > What do people think about this?
> > >
> > > https://chromiumcodereview.appspot.com/116863003/
> > >
> > 
> > To unsubscribe from this group and stop receiving emails from it, send an
> email
> > to mailto:chromium-reviews+unsubscribe@chromium.org.
> 
> As conclusion current implementation plan is:
> 
> * GLES2Implementation mmaps a AsyncUpladSync struct that has an uint32
> async_upload_token (not one per request, one per GLES2Implementation instance)
> 
> * AsyncTex(Sub)Image2D commands are extended with 3 new arguments: uin32
> async_upload_token, void *sync_data (i.e. shm id and offset)
>   - may be 0, 0, 0 for requests with no data.
> 
> * GLES2CmdDecoderImpl queues closures that writes the associated
> async_upload_token to the mmap:ed AsyncUploadSync (writing *only* on the
thread
> used for uploading)
> 
> * GLES2Implementation keeps track of released transfer buffer memory (detached
> from BufferTracker, still managed by MappedMemoryManager) pending completed
> async upload
>   - Slightly similar to a previous version, where an issue was raised
regarding
> the unmanaging of buffers
> 
> * Add generic Poll hook to mapped_memory and (wrapped_)fenced_allocator

Rethinking this, there is no need to have a Poll hook in the FencedAllocator
simply because we can just do that check before allocating a transfer buffer.
This would still mean that we'd need to "unmanage" buffer data by removing it
from the BufferTracker without free:ing or free:ing pending token it from the
MappedMemoryManager. As I mentioned, this was a change that was reverted before
with the reason that its easier to manage memory if the BufferTracker manages
all memory that it ever owned. Any opinions regarding this?

> 
> * Hook up GLES2Implementation to Poll hook, each Poll, read last completed
> async_upload_token from AsyncUploadSync, Free mapped memory accordingly
> 
> 
> This means:
> FREE_PENDING_ASYNC_TOKEN is dropped completely.
> No more write-from-multiple-threads to generic CommandBuffer state.
> No more single-usage generic-sounding "tokens" in generic CommandBuffer layer.
> 
> Please raise any issues you can think of regarding this most recent approach.

jadahl

On 2014/02/03 14:38:47, jadahl wrote: > On 2014/02/03 13:31:10, jadahl wrote: > > On 2014/01/25 ...

6 years, 10 months ago (2014-02-03 14:42:50 UTC) #58

On 2014/02/03 14:38:47, jadahl wrote:
> On 2014/02/03 13:31:10, jadahl wrote:
> > On 2014/01/25 01:03:51, piman wrote:
> > > On Thu, Jan 23, 2014 at 2:49 AM, <mailto:jadahl@opera.com> wrote:
> > > 
> > > > So there seems to be several options of going forward regarding this
> > > > change. I
> > > > will list them, so that its easier to compare them.
> > > >
> > > > client -> service
> > > > =================
> > > >
> > > > 1. Generic commands (with or without identifying enum's), as done in
most
> > > > recent
> > > > upload
> > > >
> > > 
> > > The problem with the current way is that it doesn't scale with other types
> > > of asynchronous processes.
> > > 
> > > 
> > > > 2. Specific commands (e.g. SetAsyncUploadToken), similar to most recent
> > > > upload,
> > > > except the command is not generic
> > > >
> > > 
> > > That may be workable, but needs to be at the GLES2 level.
> > > 
> > > 
> > > > 3. Extra parameter(s) to AsyncTex(Sub)Image2DCHROMIUM
> > > >
> > > 
> > > That's currently my preference.
> > > 
> > > 
> > > >
> > > > service -> client
> > > > =================
> > > >
> > > > 1. CommandBuffer::State, similar to most recent upload. Tricky because
it
> > > > will
> > > > need to deal with multiple threads updating it.
> > > >
> > > 
> > > There's also the (tricky) protocol to access it between the service and
the
> > > client.
> > > For the other reasons mentioned above, it doesn't scale either and/or adds
> > > GL concepts at that level which is a layering violation.
> > > 
> > > 
> > > >
> > > > 2. Separate shared memory state ala QuerySync. Easier access model (only
> > > > upload
> > > > thread). Client needs to keep track of more state, and update "last
async
> > > > token"
> > > > on the CommandBufferHelper.
> > > >
> > > > 3. Reuse QuerySync and QueryManager, but don't use Begin/EndQueryEXT
> > > > commands.
> > > >
> > > 
> > > 2 or 3 are fine with me
> > > 
> > > 
> > > >
> > > > client -> fenced allocator
> > > > ==========================
> > > >
> > > > 1. I think the only solution that has been in discussion recently is the
> > > > HasAsyncTokenPassed() way.
> > > >
> > > 
> > > Really, the problem is essentially that you want to be able to poll the
> > > state when allocating, to more aggressively reuse.
> > > We can add a generic concept where the FenceAllocator client can register
a
> > > Poll function, that would be called by the FenceAllocator when it wants to
> > > try to get more memory. The responsibility of the Poll function would be
to
> > > free any buffer that it can.
> > > Then all the state can stay in the GLES2Implementation (that would
> > > implement the Poll function, keeping track of the buffers and associated
> > > queries/tokens).
> > 
> > This sounds do-able, and would get rid of exposing a GLES2 implementation
> detail
> > through the command buffer layer.
> > 
> > However, what should we do with non-freed memory on destruction. We'd need
> some
> > extra Wait next to the Poll that would do something equivalent with
> > WaitForAsyncToken, i.e. poll until all its buffers have their associated
async
> > upload token passed. Or should the destruction of fenced allocator just do a
> > Poll in a busy-loop until all memory is freed?
> > 
> > > 
> > > 
> > > >
> > > >
> > > > I think regarding client -> service, it's not a hard problem, and just
> > > > need to
> > > > agree on what is the better way.
> > > >
> > > > service -> client, I'm leaning towards being in favor of 2, as it has a
> > > > simpler
> > > > access model, and tracking the state client side is probably easier than
> > > > dealing
> > > > with the various threads on the service side. 3 I think there is a risk
of
> > > > having to deal with query details for something that is not a query. As
> > > > detecting "completion" is more or less an invoked closure posted to a
> > > > message
> > > > loop, there is not much to win with reusing.
> > > >
> > > > What do people think about this?
> > > >
> > > > https://chromiumcodereview.appspot.com/116863003/
> > > >
> > > 
> > > To unsubscribe from this group and stop receiving emails from it, send an
> > email
> > > to mailto:chromium-reviews+unsubscribe@chromium.org.
> > 
> > As conclusion current implementation plan is:
> > 
> > * GLES2Implementation mmaps a AsyncUpladSync struct that has an uint32
> > async_upload_token (not one per request, one per GLES2Implementation
instance)
> > 
> > * AsyncTex(Sub)Image2D commands are extended with 3 new arguments: uin32
> > async_upload_token, void *sync_data (i.e. shm id and offset)
> >   - may be 0, 0, 0 for requests with no data.
> > 
> > * GLES2CmdDecoderImpl queues closures that writes the associated
> > async_upload_token to the mmap:ed AsyncUploadSync (writing *only* on the
> thread
> > used for uploading)
> > 
> > * GLES2Implementation keeps track of released transfer buffer memory
(detached
> > from BufferTracker, still managed by MappedMemoryManager) pending completed
> > async upload
> >   - Slightly similar to a previous version, where an issue was raised
> regarding
> > the unmanaging of buffers
> > 
> > * Add generic Poll hook to mapped_memory and (wrapped_)fenced_allocator
> 
> Rethinking this, there is no need to have a Poll hook in the FencedAllocator
> simply because we can just do that check before allocating a transfer buffer.
> This would still mean that we'd need to "unmanage" buffer data by removing it
> from the BufferTracker without free:ing or free:ing pending token it from the
> MappedMemoryManager. As I mentioned, this was a change that was reverted
before
> with the reason that its easier to manage memory if the BufferTracker manages
> all memory that it ever owned. Any opinions regarding this?

Well, before allocating a buffer would not be enough, we'd need to poll before
allocating anything mapped memory manager, so maybe its better anyway to have a
hook instead of trying to cover all the cases. Sorry for the noise :P

> 
> > 
> > * Hook up GLES2Implementation to Poll hook, each Poll, read last completed
> > async_upload_token from AsyncUploadSync, Free mapped memory accordingly
> > 
> > 
> > This means:
> > FREE_PENDING_ASYNC_TOKEN is dropped completely.
> > No more write-from-multiple-threads to generic CommandBuffer state.
> > No more single-usage generic-sounding "tokens" in generic CommandBuffer
layer.
> > 
> > Please raise any issues you can think of regarding this most recent
approach.

piman

On Mon, Feb 3, 2014 at 5:31 AM, <jadahl@opera.com> wrote: > On 2014/01/25 01:03:51, piman ...

6 years, 10 months ago (2014-02-03 22:48:48 UTC) #59

On Mon, Feb 3, 2014 at 5:31 AM, <jadahl@opera.com> wrote:

> On 2014/01/25 01:03:51, piman wrote:
>
>> On Thu, Jan 23, 2014 at 2:49 AM, <mailto:jadahl@opera.com> wrote:
>>
>
>  > So there seems to be several options of going forward regarding this
>> > change. I
>> > will list them, so that its easier to compare them.
>> >
>> > client -> service
>> > =================
>> >
>> > 1. Generic commands (with or without identifying enum's), as done in
>> most
>> > recent
>> > upload
>> >
>>
>
>  The problem with the current way is that it doesn't scale with other types
>> of asynchronous processes.
>>
>
>
>  > 2. Specific commands (e.g. SetAsyncUploadToken), similar to most recent
>> > upload,
>> > except the command is not generic
>> >
>>
>
>  That may be workable, but needs to be at the GLES2 level.
>>
>
>
>  > 3. Extra parameter(s) to AsyncTex(Sub)Image2DCHROMIUM
>> >
>>
>
>  That's currently my preference.
>>
>
>
>  >
>> > service -> client
>> > =================
>> >
>> > 1. CommandBuffer::State, similar to most recent upload. Tricky because
>> it
>> > will
>> > need to deal with multiple threads updating it.
>> >
>>
>
>  There's also the (tricky) protocol to access it between the service and
>> the
>> client.
>> For the other reasons mentioned above, it doesn't scale either and/or adds
>> GL concepts at that level which is a layering violation.
>>
>
>
>  >
>> > 2. Separate shared memory state ala QuerySync. Easier access model (only
>> > upload
>> > thread). Client needs to keep track of more state, and update "last
>> async
>> > token"
>> > on the CommandBufferHelper.
>> >
>> > 3. Reuse QuerySync and QueryManager, but don't use Begin/EndQueryEXT
>> > commands.
>> >
>>
>
>  2 or 3 are fine with me
>>
>
>
>  >
>> > client -> fenced allocator
>> > ==========================
>> >
>> > 1. I think the only solution that has been in discussion recently is the
>> > HasAsyncTokenPassed() way.
>> >
>>
>
>  Really, the problem is essentially that you want to be able to poll the
>> state when allocating, to more aggressively reuse.
>> We can add a generic concept where the FenceAllocator client can register
>> a
>> Poll function, that would be called by the FenceAllocator when it wants to
>> try to get more memory. The responsibility of the Poll function would be
>> to
>> free any buffer that it can.
>> Then all the state can stay in the GLES2Implementation (that would
>> implement the Poll function, keeping track of the buffers and associated
>> queries/tokens).
>>
>
> This sounds do-able, and would get rid of exposing a GLES2 implementation
> detail
> through the command buffer layer.
>
> However, what should we do with non-freed memory on destruction. We'd need
> some
> extra Wait next to the Poll that would do something equivalent with
> WaitForAsyncToken, i.e. poll until all its buffers have their associated
> async
> upload token passed. Or should the destruction of fenced allocator just do
> a
> Poll in a busy-loop until all memory is freed?
>

I suppose there's 2 approaches we can take:
- either make the client responsible for ensuring that, e.g. by making sure
it calls glFinish before destroying the FenceAllocator, and DCHECK-ing in
the FenceAllocator destructor that the Poll indicate everything is passed
- add a Wait function, or a parameter to Poll to force a wait by the
FencedAllocator.


>
>
>
>  >
>> >
>> > I think regarding client -> service, it's not a hard problem, and just
>> > need to
>> > agree on what is the better way.
>> >
>> > service -> client, I'm leaning towards being in favor of 2, as it has a
>> > simpler
>> > access model, and tracking the state client side is probably easier than
>> > dealing
>> > with the various threads on the service side. 3 I think there is a risk
>> of
>> > having to deal with query details for something that is not a query. As
>> > detecting "completion" is more or less an invoked closure posted to a
>> > message
>> > loop, there is not much to win with reusing.
>> >
>> > What do people think about this?
>> >
>> > https://chromiumcodereview.appspot.com/116863003/
>> >
>>
>
>  To unsubscribe from this group and stop receiving emails from it, send an
>>
> email
>
>> to mailto:chromium-reviews+unsubscribe@chromium.org.
>>
>
> As conclusion current implementation plan is:
>
> * GLES2Implementation mmaps a AsyncUpladSync struct that has an uint32
> async_upload_token (not one per request, one per GLES2Implementation
> instance)
>
> * AsyncTex(Sub)Image2D commands are extended with 3 new arguments: uin32
> async_upload_token, void *sync_data (i.e. shm id and offset)
>   - may be 0, 0, 0 for requests with no data.
>
> * GLES2CmdDecoderImpl queues closures that writes the associated
> async_upload_token to the mmap:ed AsyncUploadSync (writing *only* on the
> thread
> used for uploading)
>
> * GLES2Implementation keeps track of released transfer buffer memory
> (detached
> from BufferTracker, still managed by MappedMemoryManager) pending completed
> async upload
>   - Slightly similar to a previous version, where an issue was raised
> regarding
> the unmanaging of buffers
>
> * Add generic Poll hook to mapped_memory and (wrapped_)fenced_allocator
>
> * Hook up GLES2Implementation to Poll hook, each Poll, read last completed
> async_upload_token from AsyncUploadSync, Free mapped memory accordingly
>
>
> This means:
> FREE_PENDING_ASYNC_TOKEN is dropped completely.
> No more write-from-multiple-threads to generic CommandBuffer state.
> No more single-usage generic-sounding "tokens" in generic CommandBuffer
> layer.
>
> Please raise any issues you can think of regarding this most recent
> approach.
>

Just make sure you use base::subtle::Atomic32 and the right operations to
modify the shared memory (AsyncUploadSync), but otherwise this sounds good.


>
> https://chromiumcodereview.appspot.com/116863003/
>

To unsubscribe from this group and stop receiving emails from it, send an email
to chromium-reviews+unsubscribe@chromium.org.

jadahl

Hi, New version uploaded. This one uses the existing Async(Sub)TexImage2D to communicate the next async ...

6 years, 10 months ago (2014-02-07 12:59:04 UTC) #60

piman

On Fri, Feb 7, 2014 at 4:59 AM, <jadahl@opera.com> wrote: > Hi, > > New ...

6 years, 10 months ago (2014-02-07 19:07:22 UTC) #61

epennerAtGoogle

This is starting to look pretty great IMO. Thanks! https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/client/buffer_tracker.h File gpu/command_buffer/client/buffer_tracker.h (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/client/buffer_tracker.h#newcode39 gpu/command_buffer/client/buffer_tracker.h:39: ...

6 years, 10 months ago (2014-02-07 20:43:56 UTC) #62

This is starting to look pretty great IMO. Thanks!

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
File gpu/command_buffer/client/buffer_tracker.h (right):

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
gpu/command_buffer/client/buffer_tracker.h:39: async_token_(0) {
Naming nit: 'last_async_token'?  I recall there was reason to not change the
name when using queries, but tokens are thee same right? So the name should be
the same?

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
File gpu/command_buffer/client/gles2_implementation.cc (right):

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
gpu/command_buffer/client/gles2_implementation.cc:162: base::Unretained(this),
Nit: Maybe overkill in this case, but it never hurts to comment why Unretained
is being used. Eg:
// Unretained since 'this' outlives mapped_memory_, and it's called on this
thread.

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
gpu/command_buffer/client/gles2_implementation.cc:1529:
base::subtle::MemoryBarrier();
See nit on other MemoryBarrier. This one also seems extraneous if
RemoveTransferBuffer is called several times in a row. Maybe I'm wrong, but see
my other comment on the other MemoryBarrier for my reasoning.

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
gpu/command_buffer/client/gles2_implementation.cc:3696:
base::subtle::MemoryBarrier();
It wouldn't hurt to document each of these base::subtle calls, and why exactly
it is needed. It seems mostly clear now, but for posterity it's always nice to
know.

Two things on this one:

First, do we really need a MemoryBarrier here? Don't we just need the latest
token, and could that be done without a full MemoryBarrier?

Second, do we need to have a memory barrier in a loop here? The chances of the
token changing while we are in this loop seems low and not worth caring about.
If we did this only once would the token just have a slight chance of being out
of date? Alternatively could we get the token once and use it for testing all
uploads?

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser...
File gpu/command_buffer/service/async_pixel_transfer_manager.h (right):

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser...
gpu/command_buffer/service/async_pixel_transfer_manager.h:67: virtual void
AsyncRun(const base::Closure& callback) = 0;
This leads to duplicate work happening here where we notify the query and also
update the token. It's fine for this patch, but do you think we could use the
token in the client eventually to replace the queries being used?

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser...
File gpu/command_buffer/service/gles2_cmd_decoder.cc (right):

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser...
gpu/command_buffer/service/gles2_cmd_decoder.cc:10296:
base::subtle::MemoryBarrier();
See my other comment on MemoryBarriers. Wouldn't hurt to have a comment. And, do
we need a full barrier here since we are only writing to the value? I have to
admit I'm not completely familiar with what is required on either side.

piman

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/client/cmd_buffer_helper.h File gpu/command_buffer/client/cmd_buffer_helper.h (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/client/cmd_buffer_helper.h#newcode10 gpu/command_buffer/client/cmd_buffer_helper.h:10: #include <list> nit: you don't need this. https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/client/cmd_buffer_helper.h#newcode14 gpu/command_buffer/client/cmd_buffer_helper.h:14: ...

6 years, 10 months ago (2014-02-07 22:58:19 UTC) #63

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
File gpu/command_buffer/client/cmd_buffer_helper.h (right):

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
gpu/command_buffer/client/cmd_buffer_helper.h:10: #include <list>
nit: you don't need this.

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
gpu/command_buffer/client/cmd_buffer_helper.h:14: #include "base/bind.h"
nit: or this

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
File gpu/command_buffer/client/fenced_allocator.cc (right):

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
gpu/command_buffer/client/fenced_allocator.cc:55: // DCHECK_EQ(blocks_[0].state,
FREE);
I would like to reinstate those checks, for the async case - to ensure we
guaranteed we finished all async uploads before destroying the MemoryManager.

For the lost context case, once we know the context is lost, we know that either
the service already used the buffer that's in flight, or it won't use it in the
future. At that point it is safe to free/reuse (we could check that condition in
FreeUnused).

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
gpu/command_buffer/client/fenced_allocator.cc:214: if (block.state ==
FREE_PENDING_TOKEN && block.token <= last_token_read) {
While you're here... it looks like the 'block.token <= last_token_read' check
doesn't take wraparound into account. Could you make it use your new
HasTokenPassed ?

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
File gpu/command_buffer/client/gles2_implementation.cc (right):

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
gpu/command_buffer/client/gles2_implementation.cc:111: async_upload_sync_(NULL),
Also need initializers for  async_upload_token_, async_upload_sync_shm_id_,
async_upload_sync_shm_offset_

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
gpu/command_buffer/client/gles2_implementation.cc:291:
mapped_memory_->Free(static_cast<void*>(async_upload_sync_));
nit: static_cast isn't needed.

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
gpu/command_buffer/client/gles2_implementation.cc:3696:
base::subtle::MemoryBarrier();
On 2014/02/07 20:43:58, epennerAtGoogle wrote:
> It wouldn't hurt to document each of these base::subtle calls, and why exactly
> it is needed. It seems mostly clear now, but for posterity it's always nice to
> know.
> 
> Two things on this one:
> 
> First, do we really need a MemoryBarrier here? Don't we just need the latest
> token, and could that be done without a full MemoryBarrier?
> 
> Second, do we need to have a memory barrier in a loop here? The chances of the
> token changing while we are in this loop seems low and not worth caring about.
> If we did this only once would the token just have a slight chance of being
out
> of date? Alternatively could we get the token once and use it for testing all
> uploads?

TBH, I'd rather make the access to the AsyncUploadSync be explicit about the
memory semantics (i.e. use Acquire_Load to read and Release_Store to write).
Basically encapsulating the atomicity as low as possible (in AsyncUploadSync),
so that 'subtle' doesn't leak everywhere. Like we did for e.g. gpu::SharedState

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
gpu/command_buffer/client/gles2_implementation.cc:3700: if
(HasAsyncUploadTokenPassed(it->second)) {
Similar to what I mentioned in FencedAllocator, we can free this if the context
is lost too (it either has consumed the buffer already, or it will never do so).

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
gpu/command_buffer/client/gles2_implementation.cc:3708: WaitForCmd();
We need more than this, otherwise we'll spin loop (possibly preventing the GPU
process from making progress). WaitForCmd only ensures all the commands have
been consumed by the service side, but doesn't ensure completion of the async
uploads.
As is, nothing ensures that on the next loop the GPU process will have made
progress (in truth, WaitForCmd becomes a noop after the first one).
I think we want the semantics of WaitAsyncTexImage2DCHROMIUM, but applied to all
async uploads. Maybe a new WaitAllAsyncTexImage2DCHROMIUM?

Also, we need this to exit the loop if the context is lost and the service side
doesn't make progress any more (WaitForCmd is a noop).

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
File gpu/command_buffer/client/gles2_implementation.h (right):

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
gpu/command_buffer/client/gles2_implementation.h:609:
!(async_upload_sync_->async_token & 0x80000000)) {
Reading async_token twice seems race-prone, it could have changed between line
606 and 609.
Instead read it once and use that value twice.

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser...
File gpu/command_buffer/service/async_pixel_transfer_manager.h (right):

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser...
gpu/command_buffer/service/async_pixel_transfer_manager.h:66: // This includes
asynchronous upload tasks.
I'm not the biggest fan of this pattern, because it's very hard to reason with.
It's not inconceivable that an async upload needs to do several hops between
threads, in which case "AsyncRun" becomes hard to define.
I'd rather pass the callback to AsyncPixelTransferDelegate::AsyncTexImage2D with
the explicit semantic of "this callback is run as soon as the input buffer has
been consumed, possibly run on another thread", and have the implementation of
that semantic being the responsibility of each delegate (or manager).

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser...
File gpu/command_buffer/service/command_buffer_service.h (right):

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser...
gpu/command_buffer/service/command_buffer_service.h:10: #include
"base/synchronization/lock.h"
You don't need this, or the other changes in this file any more.

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser...
File gpu/command_buffer/service/gles2_cmd_decoder.cc (right):

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser...
gpu/command_buffer/service/gles2_cmd_decoder.cc:10324: sync_data_shm_id,
sync_data_shm_offset, sizeof(*sync));
The shared memory backing the AsyncUploadSync could be destroyed / unmapped
before the completion. For the transfer data, we dup the shm (for better or
worse), so we should do something similar or add some refcounting of the
transfer buffers, or something.

epennerAtGoogle

> https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/service/gles2_cmd_decoder.cc#newcode10324 > gpu/command_buffer/service/gles2_cmd_decoder.cc:10324: sync_data_shm_id, > sync_data_shm_offset, sizeof(*sync)); > The shared memory backing the AsyncUploadSync ...

6 years, 10 months ago (2014-02-07 23:10:20 UTC) #64

jadahl

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/client/buffer_tracker.h File gpu/command_buffer/client/buffer_tracker.h (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/client/buffer_tracker.h#newcode39 gpu/command_buffer/client/buffer_tracker.h:39: async_token_(0) { On 2014/02/07 20:43:58, epennerAtGoogle wrote: > Naming ...

6 years, 10 months ago (2014-02-08 09:18:24 UTC) #65

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
File gpu/command_buffer/client/buffer_tracker.h (right):

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
gpu/command_buffer/client/buffer_tracker.h:39: async_token_(0) {
On 2014/02/07 20:43:58, epennerAtGoogle wrote:
> Naming nit: 'last_async_token'?  I recall there was reason to not change the
> name when using queries, but tokens are thee same right? So the name should be
> the same?
> 

Before I called it "serial" and changed it to async_token when putting the
implementation "in parallel" to regular tokens. Should we stay with token naming
or go back to something sounding unrelated? async_upload_serial,
last_async_upload_serial, last_async_upload_token, ...

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
File gpu/command_buffer/client/fenced_allocator.cc (right):

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
gpu/command_buffer/client/fenced_allocator.cc:55: // DCHECK_EQ(blocks_[0].state,
FREE);
On 2014/02/07 22:58:20, piman wrote:
> I would like to reinstate those checks, for the async case - to ensure we
> guaranteed we finished all async uploads before destroying the MemoryManager.
> 
> For the lost context case, once we know the context is lost, we know that
either
> the service already used the buffer that's in flight, or it won't use it in
the
> future. At that point it is safe to free/reuse (we could check that condition
in
> FreeUnused).

Should maybe the ~GLES2Implementation() take care of free:ing all of the memory
it manages without waiting in the lost context case?

Regarding the blocks pending token, sounds a bit unrelated to this patch.
Wouldn't it be better doing that separately, for bisectability? I could upload
another patch once this has landed that reinstates this check.

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
gpu/command_buffer/client/fenced_allocator.cc:214: if (block.state ==
FREE_PENDING_TOKEN && block.token <= last_token_read) {
On 2014/02/07 22:58:20, piman wrote:
> While you're here... it looks like the 'block.token <= last_token_read' check
> doesn't take wraparound into account. Could you make it use your new
> HasTokenPassed ?

Sure.

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
File gpu/command_buffer/client/gles2_implementation.cc (right):

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
gpu/command_buffer/client/gles2_implementation.cc:3696:
base::subtle::MemoryBarrier();
On 2014/02/07 20:43:58, epennerAtGoogle wrote:
> It wouldn't hurt to document each of these base::subtle calls, and why exactly
> it is needed. It seems mostly clear now, but for posterity it's always nice to
> know.
> 
> Two things on this one:
> 
> First, do we really need a MemoryBarrier here? Don't we just need the latest
> token, and could that be done without a full MemoryBarrier?
> 
> Second, do we need to have a memory barrier in a loop here? The chances of the
> token changing while we are in this loop seems low and not worth caring about.
> If we did this only once would the token just have a slight chance of being
out
> of date? Alternatively could we get the token once and use it for testing all
> uploads?

To be honest, I added the barrier here mostly because it was used in a similar
way by the query manager. But I don't think it is really necessary here, as I
don't see how we could ever write out-of-order tokens as whoever writes next
will do so from the same message loop at some later point in time.

I guess we could just go with very explicit inc-store/load here and in the
client. But when would this really matter? Atomic increase is used to protect
writing from multiple threads, but when would it matter if we read without
synchronization? The only risk is we read one token before the most recent, and
that would be extremely rare and harmless.

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
File gpu/command_buffer/client/gles2_implementation.h (right):

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
gpu/command_buffer/client/gles2_implementation.h:609:
!(async_upload_sync_->async_token & 0x80000000)) {
On 2014/02/07 22:58:20, piman wrote:
> Reading async_token twice seems race-prone, it could have changed between line
> 606 and 609.
> Instead read it once and use that value twice.

True. Good catch.

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser...
File gpu/command_buffer/service/async_pixel_transfer_manager.h (right):

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser...
gpu/command_buffer/service/async_pixel_transfer_manager.h:66: // This includes
asynchronous upload tasks.
On 2014/02/07 22:58:20, piman wrote:
> I'm not the biggest fan of this pattern, because it's very hard to reason
with.
> It's not inconceivable that an async upload needs to do several hops between
> threads, in which case "AsyncRun" becomes hard to define.
> I'd rather pass the callback to AsyncPixelTransferDelegate::AsyncTexImage2D
with
> the explicit semantic of "this callback is run as soon as the input buffer has
> been consumed, possibly run on another thread", and have the implementation of
> that semantic being the responsibility of each delegate (or manager).

Sounds reasonable to pass the callback to AsyncTexImage2D, I'll update the patch
to do that.

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser...
gpu/command_buffer/service/async_pixel_transfer_manager.h:67: virtual void
AsyncRun(const base::Closure& callback) = 0;
On 2014/02/07 20:43:58, epennerAtGoogle wrote:
> This leads to duplicate work happening here where we notify the query and also
> update the token. It's fine for this patch, but do you think we could use the
> token in the client eventually to replace the queries being used?

I'm thinking, can't we just handle those queries completely on the client side
after this?

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser...
File gpu/command_buffer/service/command_buffer_service.h (right):

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser...
gpu/command_buffer/service/command_buffer_service.h:10: #include
"base/synchronization/lock.h"
On 2014/02/07 22:58:20, piman wrote:
> You don't need this, or the other changes in this file any more.

Oops. Left them here unintentionally.

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser...
File gpu/command_buffer/service/gles2_cmd_decoder.cc (right):

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/ser...
gpu/command_buffer/service/gles2_cmd_decoder.cc:10324: sync_data_shm_id,
sync_data_shm_offset, sizeof(*sync));
On 2014/02/07 22:58:20, piman wrote:
> The shared memory backing the AsyncUploadSync could be destroyed / unmapped
> before the completion. For the transfer data, we dup the shm (for better or
> worse), so we should do something similar or add some refcounting of the
> transfer buffers, or something.

As epenner replied elsewhere, SafeSharedMemoryPool handles this other places, so
I guess I should look into using that. And then it should be replaced with ref
counting and some layering. epenner, what bug number is that?

jadahl

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/service/async_pixel_transfer_manager.h File gpu/command_buffer/service/async_pixel_transfer_manager.h (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/service/async_pixel_transfer_manager.h#newcode66 gpu/command_buffer/service/async_pixel_transfer_manager.h:66: // This includes asynchronous upload tasks. On 2014/02/08 09:18:25, ...

6 years, 10 months ago (2014-02-09 11:54:50 UTC) #66

epennerAtGoogle

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/client/buffer_tracker.h File gpu/command_buffer/client/buffer_tracker.h (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/client/buffer_tracker.h#newcode39 gpu/command_buffer/client/buffer_tracker.h:39: async_token_(0) { Ahh I see. +1 to last_async_upload_token. No ...

6 years, 10 months ago (2014-02-10 23:12:05 UTC) #67

piman

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/client/gles2_implementation.cc File gpu/command_buffer/client/gles2_implementation.cc (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/client/gles2_implementation.cc#newcode3696 gpu/command_buffer/client/gles2_implementation.cc:3696: base::subtle::MemoryBarrier(); On 2014/02/08 09:18:25, jadahl wrote: > On 2014/02/07 ...

6 years, 10 months ago (2014-02-11 00:09:08 UTC) #68

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
File gpu/command_buffer/client/gles2_implementation.cc (right):

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
gpu/command_buffer/client/gles2_implementation.cc:3696:
base::subtle::MemoryBarrier();
On 2014/02/08 09:18:25, jadahl wrote:
> On 2014/02/07 20:43:58, epennerAtGoogle wrote:
> > It wouldn't hurt to document each of these base::subtle calls, and why
exactly
> > it is needed. It seems mostly clear now, but for posterity it's always nice
to
> > know.
> > 
> > Two things on this one:
> > 
> > First, do we really need a MemoryBarrier here? Don't we just need the latest
> > token, and could that be done without a full MemoryBarrier?
> > 
> > Second, do we need to have a memory barrier in a loop here? The chances of
the
> > token changing while we are in this loop seems low and not worth caring
about.
> > If we did this only once would the token just have a slight chance of being
> out
> > of date? Alternatively could we get the token once and use it for testing
all
> > uploads?
> 
> To be honest, I added the barrier here mostly because it was used in a similar
> way by the query manager. But I don't think it is really necessary here, as I
> don't see how we could ever write out-of-order tokens as whoever writes next
> will do so from the same message loop at some later point in time.

> I guess we could just go with very explicit inc-store/load here and in the
> client. But when would this really matter? Atomic increase is used to protect
> writing from multiple threads, but when would it matter if we read without
> synchronization? The only risk is we read one token before the most recent,
and
> that would be extremely rare and harmless.

Some useful background:
http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could...
The compiler/cpu only guarantees serialization as seen from one thread. But what
can appear as serial in one thread may be seen out-of-order (or worse,
"impossible" states) from another thread.

Just throwing MemoryBarriers around isn't so useful because:
- it's hard to convince oneself of whether it's necessary and/or sufficient.
- they don't convey the semantics.
In this case, I'm pretty sure that this one MemoryBarrier is neither necessary
nor sufficient.

Aquire_Load and Release_Store convey the proper semantics. The AsyncUploadSync
is your semaphore that guards the shared state which is the transfer buffer.
On the GPU side, you need to make sure that all memory accesses to the transfer
buffer are not reordered to after the write to the AsyncUploadSync is visible to
other threads. That's the semantic of Release_Store.
On the client side, you need to make sure all memory accesses to the transfer
buffer are not reordered to before the read of the AsyncUploadSync. That's the
semantic of Acquire_Load.

The good news is that on x86, you can elide the memory barrier in those 2 cases
- see the implementation of Acquire_Load/Release_Store (they just need a
compiler barrier to make sure the compiler itself doesn't reorder). That's
properly abstracted by the per-platform implementations of
Acquire_Load/Release_Store

Since the behavior is very sublte (hence the namespace), the use of atomic
operations should be invisible/irrelevant to the higher-level code, and instead
should be abstracted as low as possible.
Best, I think, would be to have it in AsyncUploadSync, encapsulating the
semantic that we care about, something like:
void SignalCompletion(uint32_t token) {
  Release_Store(&async_toke, token);
}

bool IsCompleted(uint32_t token) {
  uint32_t current_token = Acquire_Load(&async_token);
  return (current_token - token < 0x80000000);
}

After that, all the memory logic is irrelevant in the rest of the code. We can
test the hell out of AsyncUploadSync for "strange" races, and make sure the rest
of the code will be correct.

jadahl

On 2014/02/11 00:09:08, piman wrote: > https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/client/gles2_implementation.cc > File gpu/command_buffer/client/gles2_implementation.cc (right): > > https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/client/gles2_implementation.cc#newcode3696 > ...

6 years, 10 months ago (2014-02-13 13:44:49 UTC) #69

On 2014/02/11 00:09:08, piman wrote:
>
https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
> File gpu/command_buffer/client/gles2_implementation.cc (right):
> 
>
https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
> gpu/command_buffer/client/gles2_implementation.cc:3696:
> base::subtle::MemoryBarrier();
> On 2014/02/08 09:18:25, jadahl wrote:
> > On 2014/02/07 20:43:58, epennerAtGoogle wrote:
> > > It wouldn't hurt to document each of these base::subtle calls, and why
> exactly
> > > it is needed. It seems mostly clear now, but for posterity it's always
nice
> to
> > > know.
> > > 
> > > Two things on this one:
> > > 
> > > First, do we really need a MemoryBarrier here? Don't we just need the
latest
> > > token, and could that be done without a full MemoryBarrier?
> > > 
> > > Second, do we need to have a memory barrier in a loop here? The chances of
> the
> > > token changing while we are in this loop seems low and not worth caring
> about.
> > > If we did this only once would the token just have a slight chance of
being
> > out
> > > of date? Alternatively could we get the token once and use it for testing
> all
> > > uploads?
> > 
> > To be honest, I added the barrier here mostly because it was used in a
similar
> > way by the query manager. But I don't think it is really necessary here, as
I
> > don't see how we could ever write out-of-order tokens as whoever writes next
> > will do so from the same message loop at some later point in time.
> 
> > I guess we could just go with very explicit inc-store/load here and in the
> > client. But when would this really matter? Atomic increase is used to
protect
> > writing from multiple threads, but when would it matter if we read without
> > synchronization? The only risk is we read one token before the most recent,
> and
> > that would be extremely rare and harmless.
> 
> Some useful background:
>
http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could...
> The compiler/cpu only guarantees serialization as seen from one thread. But
what
> can appear as serial in one thread may be seen out-of-order (or worse,
> "impossible" states) from another thread.
> 
> Just throwing MemoryBarriers around isn't so useful because:
> - it's hard to convince oneself of whether it's necessary and/or sufficient.
> - they don't convey the semantics.
> In this case, I'm pretty sure that this one MemoryBarrier is neither necessary
> nor sufficient.
> 
> 
> Aquire_Load and Release_Store convey the proper semantics. The AsyncUploadSync
> is your semaphore that guards the shared state which is the transfer buffer.
> On the GPU side, you need to make sure that all memory accesses to the
transfer
> buffer are not reordered to after the write to the AsyncUploadSync is visible
to
> other threads. That's the semantic of Release_Store.
> On the client side, you need to make sure all memory accesses to the transfer
> buffer are not reordered to before the read of the AsyncUploadSync. That's the
> semantic of Acquire_Load.
> 
> The good news is that on x86, you can elide the memory barrier in those 2
cases
> - see the implementation of Acquire_Load/Release_Store (they just need a
> compiler barrier to make sure the compiler itself doesn't reorder). That's
> properly abstracted by the per-platform implementations of
> Acquire_Load/Release_Store
> 
> Since the behavior is very sublte (hence the namespace), the use of atomic
> operations should be invisible/irrelevant to the higher-level code, and
instead
> should be abstracted as low as possible.
> Best, I think, would be to have it in AsyncUploadSync, encapsulating the
> semantic that we care about, something like:
> void SignalCompletion(uint32_t token) {
>   Release_Store(&async_toke, token);
> }
> 
> bool IsCompleted(uint32_t token) {
>   uint32_t current_token = Acquire_Load(&async_token);
>   return (current_token - token < 0x80000000);
> }
> 
> 
> After that, all the memory logic is irrelevant in the rest of the code. We can
> test the hell out of AsyncUploadSync for "strange" races, and make sure the
rest
> of the code will be correct.

Thanks for the explanation and link, I think I understand more what's going on
now, more or less. It seems that on the client side I implemented semantics
similar to Release_Read instead of Acquire_Read which would not ensure the order
of the data read, as it was not put behind a fence (in the ARM case at least)
(if I understand correctly). 

I agree its a good idea put all this kind of memory logic in one place, and
AsyncUploadSync seems like a good place.

epenner

> Thanks for the explanation and link, I think I understand more what's going on ...

6 years, 10 months ago (2014-02-13 20:35:12 UTC) #70

piman

On Thu, Feb 13, 2014 at 12:35 PM, <epenner@chromium.org> wrote: > Thanks for the explanation ...

6 years, 10 months ago (2014-02-13 20:45:37 UTC) #71

jadahl

On 2014/02/13 20:35:12, epenner wrote: > > Thanks for the explanation and link, I think ...

6 years, 10 months ago (2014-02-14 09:03:24 UTC) #72

jadahl

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/client/fenced_allocator.cc File gpu/command_buffer/client/fenced_allocator.cc (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/client/fenced_allocator.cc#newcode55 gpu/command_buffer/client/fenced_allocator.cc:55: // DCHECK_EQ(blocks_[0].state, FREE); On 2014/02/08 09:18:25, jadahl wrote: > ...

6 years, 10 months ago (2014-02-14 15:32:19 UTC) #73

jadahl

New version uploaded. In this one I added a glWaitAllAsyncTexImage2D that is more or less ...

6 years, 10 months ago (2014-02-19 14:06:19 UTC) #74

piman

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/client/gles2_implementation.cc File gpu/command_buffer/client/gles2_implementation.cc (right): https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/client/gles2_implementation.cc#newcode111 gpu/command_buffer/client/gles2_implementation.cc:111: async_upload_sync_(NULL), On 2014/02/19 14:06:20, jadahl wrote: > On 2014/02/07 ...

6 years, 10 months ago (2014-02-20 01:52:35 UTC) #75

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
File gpu/command_buffer/client/gles2_implementation.cc (right):

https://codereview.chromium.org/116863003/diff/1400001/gpu/command_buffer/cli...
gpu/command_buffer/client/gles2_implementation.cc:111: async_upload_sync_(NULL),
On 2014/02/19 14:06:20, jadahl wrote:
> On 2014/02/07 22:58:20, piman wrote:
> > Also need initializers for  async_upload_token_, async_upload_sync_shm_id_,
> > async_upload_sync_shm_offset_
> 
> async_upload_token_ is needed indeed, but shm_id and shm_offset would be
> redundant since they are coupled with |async_upload_sync_|.
|async_upload_sync_|
> is used as the guard for checking the validity/no-oom of async upload sync.

Per style guide (
http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml?showone=Initia...
), please initialize all fields that are not default-initialized (i.e. POD).

https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/cli...
File gpu/command_buffer/client/fenced_allocator_test.cc (right):

https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/cli...
gpu/command_buffer/client/fenced_allocator_test.cc:107: }
Could we add a test that exercises the poll feature at the low level?
Create a FencedAllocator passing a mock Poll function. Allocate several entries.
Call FreeUnused, with/whithout the mock Poll function set up to free an entry.
Check consistency.

https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/cli...
File gpu/command_buffer/client/gles2_implementation.cc (right):

https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/cli...
gpu/command_buffer/client/gles2_implementation.cc:299: }
It would be nice to do this in FreeEverything too. FreeEverything is called when
a tab goes offscreen, so:
1- it's ok to take long (i.e. waiting for things to happen).
2- we want to free as much as possible at that point

https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/cli...
gpu/command_buffer/client/gles2_implementation.cc:1540:
base::subtle::MemoryBarrier();
Because the necessary memory barrier is handled in HasAsyncUploadTokenPassed,
you can remove this.

https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/com...
File gpu/command_buffer/common/gles2_cmd_format.h (right):

https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/com...
gpu/command_buffer/common/gles2_cmd_format.h:169: };
Can we add a unit test for this, that mimics our usage pattern?

Something along the lines of:

void ReadCompareAndSignal(uint32* value, uint32 expected, uint32 token,
AsyncUploadSync* sync) {
  // Simulate a "read" of the shared data, similar to what texture upload would
do
  EXPECT_EQ(expected, *value);
  sync->SetAsyncUploadToken(token);
}

TEST() {
  const size_t kSize = 10;
  const size_t kCount = 10000;  // some value that keeps running time to below
1s or so but making sure its >> kSize so we reuse memory many times.
  AsyncUploadSync sync;
  sync.Reset();
  uint32 token = 0;  // or some value to exercise wrapping.
  scoped_ptr<uint32[]> buffer(new uint32[kSize]);
  scoped_ptr<uint32[]> tokens(new uint32[kSize]);
  memset(tokens.get(), 0, sizeod(uint32[kSize]);
  base::Thread thread("Upload Thread");
  thread.Start();

  for (uint32 i = 0; i < kCount; ++i) {
    size_t offset = i % kSize;
    while (!sync.HasAsyncUploadTokenPassed(tokens[i]))
      PlatformThread::YieldCurrentThread();   // Or something to back off
without explicit synchronization. 

    uint32* data = buffer.get()+offset;
    *data = i;
    ++token;
    tokens[i] = token;
    thread.message_loop()->PostTask(
      FROM_HERE, base::Bind(&ReadCompareAndSignal, data, i, token, &sync));
  }
}

https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/ser...
File gpu/command_buffer/service/gles2_cmd_decoder.cc (right):

https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/ser...
gpu/command_buffer/service/gles2_cmd_decoder.cc:486:
AsyncUploadTokenCompletionObserver(uint32 async_upload_token)
nit: explicit

https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/ser...
gpu/command_buffer/service/gles2_cmd_decoder.cc:498: uint32 async_upload_token_;
nit: DISALLOW_COPY_AND_ASSIGN

https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/ser...
gpu/command_buffer/service/gles2_cmd_decoder.cc:10330: mem_params.shm_data_size
= sizeof(AsyncUploadSync);
There's a DCHECK in AsyncNotifyCompletion that says it expects
shm_data_offset+shm_data_size <= shm_size

For security we need an overflow-safe check on the size.

jadahl

https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/service/gles2_cmd_decoder.cc File gpu/command_buffer/service/gles2_cmd_decoder.cc (right): https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/service/gles2_cmd_decoder.cc#newcode10330 gpu/command_buffer/service/gles2_cmd_decoder.cc:10330: mem_params.shm_data_size = sizeof(AsyncUploadSync); On 2014/02/20 01:52:37, piman (OOO back ...

6 years, 10 months ago (2014-02-20 10:24:21 UTC) #76

epennerAtGoogle

https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/service/gles2_cmd_decoder.cc File gpu/command_buffer/service/gles2_cmd_decoder.cc (right): https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/service/gles2_cmd_decoder.cc#newcode10330 gpu/command_buffer/service/gles2_cmd_decoder.cc:10330: mem_params.shm_data_size = sizeof(AsyncUploadSync); It's checked in in ValidateTexSubImage2D. The ...

6 years, 10 months ago (2014-02-20 19:15:05 UTC) #77

epennerAtGoogle

https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/service/gles2_cmd_decoder.cc File gpu/command_buffer/service/gles2_cmd_decoder.cc (right): https://codereview.chromium.org/116863003/diff/1870001/gpu/command_buffer/service/gles2_cmd_decoder.cc#newcode10330 gpu/command_buffer/service/gles2_cmd_decoder.cc:10330: mem_params.shm_data_size = sizeof(AsyncUploadSync); It's in GetSharedMemoryAs: https://code.google.com/p/chromium/codesearch#chromium/src/gpu/command_buffer/service/common_decoder.h&rcl=1392880938&l=129

6 years, 10 months ago (2014-02-20 19:20:06 UTC) #78

jadahl

Hi, Another one uploaded. This new version adds a test that checks the usage of ...

6 years, 10 months ago (2014-02-21 14:46:35 UTC) #79

epennerAtGoogle

> In those cases the corresponding function (HandleAsyncTex(Sub)Image2DCHROMIUM()) > would return an error. Should we ...

6 years, 10 months ago (2014-02-21 22:47:36 UTC) #80

jadahl

On 2014/02/21 22:47:36, epennerAtGoogle wrote: > > In those cases the corresponding function > (HandleAsyncTex(Sub)Image2DCHROMIUM()) ...

6 years, 10 months ago (2014-02-24 12:45:01 UTC) #81

epenner

> I did some digging and it looks like we'd end up in > GpuCommandBufferStub::OnParseError() ...

6 years, 10 months ago (2014-02-27 05:37:42 UTC) #82

jadahl

On 2014/02/27 05:37:42, epenner wrote: > > I did some digging and it looks like ...

6 years, 10 months ago (2014-02-27 08:20:43 UTC) #83

jadahl

On 2014/02/27 08:20:43, jadahl wrote: > On 2014/02/27 05:37:42, epenner wrote: > > > I ...

6 years, 9 months ago (2014-03-20 11:04:20 UTC) #84

piman

I realize I had old comments that I somehow never submitted. https://codereview.chromium.org/116863003/diff/2100001/gpu/command_buffer/common/gles2_cmd_format_test.cc File gpu/command_buffer/common/gles2_cmd_format_test.cc (right): ...

6 years, 9 months ago (2014-03-20 21:18:25 UTC) #85

jadahl

https://codereview.chromium.org/116863003/diff/2100001/gpu/command_buffer/common/gles2_cmd_format_test.cc File gpu/command_buffer/common/gles2_cmd_format_test.cc (right): https://codereview.chromium.org/116863003/diff/2100001/gpu/command_buffer/common/gles2_cmd_format_test.cc#newcode98 gpu/command_buffer/common/gles2_cmd_format_test.cc:98: base::subtle::MemoryBarrier(); On 2014/03/20 21:18:26, piman wrote: > Why the ...

6 years, 9 months ago (2014-03-21 08:30:51 UTC) #86

jadahl

New rebased version uploaded that also addresses the recent raised issues.

6 years, 9 months ago (2014-03-21 09:23:22 UTC) #87

piman

LGTM+nit https://codereview.chromium.org/116863003/diff/2230001/gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc File gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc (right): https://codereview.chromium.org/116863003/diff/2230001/gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc#newcode8407 gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc:8407: // asynchronously and AsyncTexSubImage2D does not involved binding. ...

6 years, 9 months ago (2014-03-21 22:35:49 UTC) #88

jadahl

Uploaded new version. Changes are comment typo corrections. https://codereview.chromium.org/116863003/diff/2230001/gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc File gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc (right): https://codereview.chromium.org/116863003/diff/2230001/gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc#newcode8407 gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc:8407: // ...

6 years, 9 months ago (2014-03-24 10:09:53 UTC) #89

piman

On Mon, Mar 24, 2014 at 3:09 AM, <jadahl@opera.com> wrote: > Uploaded new version. Changes ...

6 years, 9 months ago (2014-03-24 19:10:14 UTC) #90

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/jadahl@opera.com/116863003/2240001

6 years, 9 months ago (2014-03-25 08:53:00 UTC) #93

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

6 years, 9 months ago (2014-03-25 09:14:25 UTC) #94

commit-bot: I haz the power

Try jobs failed on following builders: tryserver.chromium on mac_chromium_compile_dbg

6 years, 9 months ago (2014-03-25 09:14:26 UTC) #95

jadahl

New upload. This one fixes the compilation/lint issues reported by the try-bots. Updated *Sync manager ...

6 years, 9 months ago (2014-03-26 09:08:44 UTC) #96

piman

https://codereview.chromium.org/116863003/diff/2300001/gpu/command_buffer/service/gles2_cmd_decoder.cc File gpu/command_buffer/service/gles2_cmd_decoder.cc (right): https://codereview.chromium.org/116863003/diff/2300001/gpu/command_buffer/service/gles2_cmd_decoder.cc#newcode10374 gpu/command_buffer/service/gles2_cmd_decoder.cc:10374: if (!end.IsValid() || end.ValueOrDie() > mem_params.buffer()->size()) nit: do this ...

6 years, 9 months ago (2014-03-27 17:43:03 UTC) #98

epennerAtGoogle

> nit: do this check before creating the AsyncMemoryParams, since the constructor > DCHECKs that ...

6 years, 9 months ago (2014-03-27 19:14:06 UTC) #99

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/jadahl@opera.com/116863003/2320001

6 years, 9 months ago (2014-03-28 08:06:12 UTC) #103

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

6 years, 9 months ago (2014-03-28 12:57:32 UTC) #104

commit-bot: I haz the power

Retried try job too often on ios_dbg_simulator for step(s) base_unittests http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=ios_dbg_simulator&number=137201

6 years, 9 months ago (2014-03-28 12:57:34 UTC) #105

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/jadahl@opera.com/116863003/2320001

6 years, 9 months ago (2014-03-28 13:01:38 UTC) #107

jadahl

The new upload contains the following change: --- a/gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc +++ b/gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc @@ -8399,7 +8400,6 @@ ...

6 years, 9 months ago (2014-03-28 19:45:23 UTC) #109

piman

https://codereview.chromium.org/116863003/diff/2340001/gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc File gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc (right): https://codereview.chromium.org/116863003/diff/2340001/gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc#newcode8402 gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc:8402: texture_ref)); nit: actually, texture_ref should probably be a scoped_refptr. ...

6 years, 9 months ago (2014-03-28 20:04:34 UTC) #110

jadahl

https://codereview.chromium.org/116863003/diff/2340001/gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc File gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc (right): https://codereview.chromium.org/116863003/diff/2340001/gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc#newcode8402 gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc:8402: texture_ref)); On 2014/03/28 20:04:35, piman wrote: > nit: actually, ...

6 years, 9 months ago (2014-03-28 20:11:37 UTC) #111

piman

https://codereview.chromium.org/116863003/diff/2340001/gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc File gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc (right): https://codereview.chromium.org/116863003/diff/2340001/gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc#newcode8402 gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc:8402: texture_ref)); On 2014/03/28 20:11:38, jadahl wrote: > On 2014/03/28 ...

6 years, 9 months ago (2014-03-28 20:22:47 UTC) #112

jadahl

https://codereview.chromium.org/116863003/diff/2340001/gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc File gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc (right): https://codereview.chromium.org/116863003/diff/2340001/gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc#newcode8402 gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc:8402: texture_ref)); On 2014/03/28 20:11:38, jadahl wrote: > On 2014/03/28 ...

6 years, 9 months ago (2014-03-28 20:24:58 UTC) #114

jadahl

The diff from the previous upload follows: --- a/gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc +++ b/gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc @@ -8358,6 +8358,8 @@ ...

6 years, 9 months ago (2014-03-28 22:08:17 UTC) #115

piman

On 2014/03/28 22:08:17, jadahl wrote: > The diff from the previous upload follows: > > ...

6 years, 9 months ago (2014-03-28 22:30:28 UTC) #116

piman

On 2014/03/28 22:08:17, jadahl wrote: > The diff from the previous upload follows: > > ...

6 years, 9 months ago (2014-03-28 22:30:33 UTC) #117

jadahl

On 2014/03/28 22:30:33, piman wrote: > On 2014/03/28 22:08:17, jadahl wrote: > > The diff ...

6 years, 9 months ago (2014-03-28 22:40:37 UTC) #118

piman

On Fri, Mar 28, 2014 at 3:40 PM, <jadahl@opera.com> wrote: > On 2014/03/28 22:30:33, piman ...

6 years, 9 months ago (2014-03-28 22:46:53 UTC) #120

On Fri, Mar 28, 2014 at 3:40 PM, <jadahl@opera.com> wrote:

> On 2014/03/28 22:30:33, piman wrote:
>
>> On 2014/03/28 22:08:17, jadahl wrote:
>> > The diff from the previous upload follows:
>> >
>> > --- a/gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc
>> > +++ b/gpu/command_buffer/service/gles2_cmd_decoder_unittest.cc
>> > @@ -8358,6 +8358,8 @@ TEST_F(GLES2DecoderManualInitTest,
>>
> AsyncPixelTransfers)
>
>> {
>> >      EXPECT_FALSE(
>> >          decoder_->GetAsyncPixelTransferManager()
>> ->GetPixelTransferDelegate(
>> >              texture_ref));
>> > +    texture = NULL;
>> > +    texture_ref = NULL;
>> >      delegate = NULL;
>> >    }
>> >
>> > @@ -8400,6 +8402,8 @@ TEST_F(GLES2DecoderManualInitTest,
>>
> AsyncPixelTransfers)
>
>> {
>> >    EXPECT_FALSE(
>> >        decoder_->GetAsyncPixelTransferManager()
>> ->GetPixelTransferDelegate(
>> >            texture_ref));
>> > +  texture = NULL;
>> > +  texture_ref = NULL;
>> >    delegate = NULL;
>> >    {
>> >      // Get a fresh texture since the existing texture cannot be
>> respecified
>> >
>> > I cleared the pointers to make it clear that they are now expected to be
>> > invalid. I did not change to scoped_refptr since it wouldn't fix the
>> issue
>>
> (as
>
>> > mentioned in previous e-mail). If the DoDelete() call should indeed
>> result
>> > GetPixelTransferDelegate() not returning the delegate even though the
>>
> texture
>
>> > ref was not destroyed, then that's sounds an issue not directly related
>> to
>> this
>> > patch.
>>
>
>  LGTM for now. It's just that calling GetPixelTransferDelegate with a
>> deleted
>> pointer is meaningless
>>
>
> It's awkward indeed. Do you think it's better to remove these asserts all
> together?
>

I guess we don't have anything else that verifies this behavior, so let's
leave it for now.


>
> https://codereview.chromium.org/116863003/
>

To unsubscribe from this group and stop receiving emails from it, send an email
to chromium-reviews+unsubscribe@chromium.org.

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/jadahl@opera.com/116863003/2360001

6 years, 9 months ago (2014-03-28 22:47:40 UTC) #121

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

6 years, 9 months ago (2014-03-29 00:28:02 UTC) #122

commit-bot: I haz the power

Try jobs failed on following builders: tryserver.chromium on linux_chromium_chromeos_rel

6 years, 9 months ago (2014-03-29 00:28:03 UTC) #123

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/jadahl@opera.com/116863003/2360001

6 years, 9 months ago (2014-03-29 01:03:43 UTC) #125

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

6 years, 9 months ago (2014-03-29 02:36:55 UTC) #126

commit-bot: I haz the power

Try jobs failed on following builders: tryserver.chromium on linux_chromium_chromeos_rel

6 years, 9 months ago (2014-03-29 02:36:56 UTC) #127

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/jadahl@opera.com/116863003/2360001

6 years, 9 months ago (2014-03-29 08:00:38 UTC) #129

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

6 years, 9 months ago (2014-03-29 09:49:07 UTC) #130

commit-bot: I haz the power

Try jobs failed on following builders: tryserver.chromium on linux_chromium_chromeos_rel

6 years, 9 months ago (2014-03-29 09:49:08 UTC) #131

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/jadahl@opera.com/116863003/2360001

6 years, 8 months ago (2014-03-29 17:34:10 UTC) #133

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

6 years, 8 months ago (2014-03-29 19:11:30 UTC) #134

commit-bot: I haz the power

Try jobs failed on following builders: tryserver.chromium on linux_chromium_chromeos_rel

6 years, 8 months ago (2014-03-29 19:11:32 UTC) #135

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/jadahl@opera.com/116863003/2360001

6 years, 8 months ago (2014-03-31 07:31:29 UTC) #137

Message was sent while issue was closed.

Change committed as 260507

Issue 116863003: gpu: Reuse transfer buffers more aggresively (Closed)

Description

Patch Set 1 #

Patch Set 2 : [RFC] gpu: Reuse transfer buffers more aggressively #

Patch Set 3 : [WIP] gpu: Reuse transfer buffers more aggresively #

Patch Set 4 : [WIP] Review comments follow-up #

Patch Set 5 : [WIP] Introduced internal SetAsyncToken command buffer command #

Patch Set 6 : Async upload token part of existing Async command; use separate shared memory to sync async upload … #

Patch Set 7 : Added glWaitAllAsyncTexImage2DCHROMIUM; other review issues addressed #

Patch Set 8 : Added AsyncUploadSync test, FencedAllocator test, addressed review issues #

Patch Set 9 : Rebased #

Patch Set 10 : Rebased; removed unnecessary barrier; use CheckedNumeric #

Patch Set 11 : Fixed comment typos #

Patch Set 12 : Lint fixes; updated AsyncPixelTransferManagerSync to the new approach #

Patch Set 13 : Rebased #

Patch Set 14 : Addressed review issues #

Patch Set 15 : Fixed bug in unittest #

Patch Set 16 : Unset texture and texture_ref after deleting #

Messages