Issue 18083015: Add queued_time_ms trace for events in message loop

Issue 18083015: Add queued_time_ms trace for events in message loop (Closed)

Created:
7 years, 5 months ago by Xianzhu

Modified:
7 years, 5 months ago

Reviewers:
Mark Mentovai, darin (slow to review), nduca, jar (doing other things)

CC:
chromium-reviews, erikwright+watch_chromium.org, n.s.buttar

Base URL:
https://chromium.googlesource.com/chromium/src.git@master

Visibility:
Public.

More Reviews

Description

Add queued_time_ms trace for events in message loop These trace can be useful for examining the status of message loop queues when debugging some performance issues. BUG=none Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=210880

Patch Set 1 #

Patch Set 2 : Combine queued_time into TRACE_EVENT("RunTask") and remove queue length counters #

Patch Set 3 : Use ConvertableToTraceFormat #

Total comments: 11

Patch Set 4 : Naming etc #

Total comments: 4

Patch Set 5 : Add queue_duration argument into TRACE_EVENT_FLOW_BEGIN #

Total comments: 4

Patch Set 6 : #

Created: 7 years, 5 months ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+22 lines, -16 lines)			Patch
M	base/message_loop/message_loop.cc	View	1 2 3 4 5	2 chunks	+8 lines, -5 lines	0 comments	Download
M	base/tracked_objects.cc	View	1 2 3 4 5	2 chunks	+2 lines, -11 lines	0 comments	Download
M	base/tracking_info.h	View	1 2 3 4 5	2 chunks	+12 lines, -0 lines	0 comments	Download

Messages

Total messages: 24 (0 generated)

Expand Messages | Collapse Messages

nduca

What about putting the queuing times into an arg of the trace_event for the runtask ...

7 years, 5 months ago (2013-07-02 22:34:48 UTC) #4

jar (doing other things)

On 2013/07/02 22:34:48, nduca wrote: > What about putting the queuing times into an arg ...

7 years, 5 months ago (2013-07-02 22:43:23 UTC) #5

Xianzhu

On 2013/07/02 22:34:48, nduca wrote: > What about putting the queuing times into an arg ...

7 years, 5 months ago (2013-07-02 23:24:24 UTC) #6

Xianzhu

Can we omit "src_file" to make space for "queued_time_ms"? Looked at several traces. Seems in ...

7 years, 5 months ago (2013-07-02 23:43:35 UTC) #7

Xianzhu

FYI, http://www/~wangxianzhu/profile-touch-delay-theverge is the trace containing the new counters. The incoming_queue_length and working_queue_length counters may ...

7 years, 5 months ago (2013-07-02 23:53:04 UTC) #8

jar (doing other things)

I would expect you'd be more concerned with the queuing delay, than the queued task ...

7 years, 5 months ago (2013-07-03 00:04:23 UTC) #9

Xianzhu

On 2013/07/03 00:04:23, jar wrote: > I would expect you'd be more concerned with the ...

7 years, 5 months ago (2013-07-03 00:14:32 UTC) #10

nduca

You can use trace_Event_impl's ConvertableToTraceLog to capture >2 arguments.

7 years, 5 months ago (2013-07-03 00:18:05 UTC) #11

Xianzhu

On 2013/07/03 00:18:05, nduca wrote: > You can use trace_Event_impl's ConvertableToTraceLog to capture >2 arguments. ...

7 years, 5 months ago (2013-07-03 18:21:17 UTC) #12

jar (doing other things)

https://codereview.chromium.org/18083015/diff/23001/base/location.cc File base/location.cc (right): https://codereview.chromium.org/18083015/diff/23001/base/location.cc#newcode30 base/location.cc:30: virtual void AppendAsTraceFormat(std::string* out) const OVERRIDE { When/how often ...

7 years, 5 months ago (2013-07-08 17:08:33 UTC) #13

https://codereview.chromium.org/18083015/diff/23001/base/location.cc
File base/location.cc (right):

https://codereview.chromium.org/18083015/diff/23001/base/location.cc#newcode30
base/location.cc:30: virtual void AppendAsTraceFormat(std::string* out) const
OVERRIDE {
When/how often does this get called? Is it only at display time?

This probably going to be expensive.

https://codereview.chromium.org/18083015/diff/23001/base/message_loop/message...
File base/message_loop/message_loop.cc (right):

https://codereview.chromium.org/18083015/diff/23001/base/message_loop/message...
base/message_loop/message_loop.cc:464:
pending_task.DurationSinceExpectedRunTime().InMilliseconds());
Possibly not a big deal...  I'm always wary of the cost of conversion to
milliseconds.  It uses 64 bit arithmetic (on Windows) to do a divide-by-1000,
which commonly requires calling a library call (sadly), and hence is also pretty
expensive.

https://codereview.chromium.org/18083015/diff/23001/base/tracked_objects.cc
File base/tracked_objects.cc (right):

https://codereview.chromium.org/18083015/diff/23001/base/tracked_objects.cc#n...
base/tracked_objects.cc:476: TrackedTime
expected_run_time(completed_task.ExpectedRunTime());
Can you move across the comment that was deleted on line 468-473?

https://codereview.chromium.org/18083015/diff/23001/base/tracked_objects.cc#n...
base/tracked_objects.cc:477: queue_duration = (start_of_run -
expected_run_time).InMilliseconds();
I'm not sure if this is a better name.

I would think "expected_run_time" would be the "time of posting" plus the
"expected queuing delay."  Sadly, that is not the intent.

This variable is more like "post time," except for the fact that the posting
time for delayed tasks was looooong ago, and not significant to calculating
queueing delay.

IMO, effective_post_time is a better name for the variable, and probably for the
method call "completed_task.EffectivePostTime()".

It might also be listed as "zero_queue_time_execution_time," but that is pretty
long... and probably more confusing.

Perhaps you have other better suggestions for the name?

https://codereview.chromium.org/18083015/diff/23001/base/tracking_info.h
File base/tracking_info.h (right):

https://codereview.chromium.org/18083015/diff/23001/base/tracking_info.h#newc...
base/tracking_info.h:35: return TimeTicks::Now() - ExpectedRunTime();
nit: Now() is generally considered an expensive call. This is not only because
it calls 64bit library code (on windows), and grabs a lock.  The lock is also a
shared lock across all threads, and can routinely cause contention (potentially
blocking a thread, increasing queueing delay... which is I think what you're
after measuring).

Usually we "try hard" to not call this repeatedly in a pass through the message
loop.  That said, I think this extra call is limited to cases where we're
Trace()ing.... so it may all be an issue only for your code.

I think that you could probably get the info you needed by hooking into the
profiler a tad deeper, and then you woudln't need to call Now().

YMMV.

Xianzhu

PTAL. https://codereview.chromium.org/18083015/diff/23001/base/location.cc File base/location.cc (right): https://codereview.chromium.org/18083015/diff/23001/base/location.cc#newcode30 base/location.cc:30: virtual void AppendAsTraceFormat(std::string* out) const OVERRIDE { On ...

7 years, 5 months ago (2013-07-08 18:51:39 UTC) #14

PTAL.

https://codereview.chromium.org/18083015/diff/23001/base/location.cc
File base/location.cc (right):

https://codereview.chromium.org/18083015/diff/23001/base/location.cc#newcode30
base/location.cc:30: virtual void AppendAsTraceFormat(std::string* out) const
OVERRIDE {
On 2013/07/08 17:08:33, jar wrote:
> When/how often does this get called? Is it only at display time?
> 
> This probably going to be expensive.

It's only called at display (or save) time, once per event, when TraceLog
converts each event into JSON. It costs almost the same as what EventLog does
for other trace values (or a bit more if conversion to ConvertableToTraceFormat
and the virtual call are considered).

https://codereview.chromium.org/18083015/diff/23001/base/message_loop/message...
File base/message_loop/message_loop.cc (right):

https://codereview.chromium.org/18083015/diff/23001/base/message_loop/message...
base/message_loop/message_loop.cc:464:
pending_task.DurationSinceExpectedRunTime().InMilliseconds());
On 2013/07/08 17:08:33, jar wrote:
> Possibly not a big deal...  I'm always wary of the cost of conversion to
> milliseconds.  It uses 64 bit arithmetic (on Windows) to do a divide-by-1000,
> which commonly requires calling a library call (sadly), and hence is also
pretty
> expensive.

Changed DurationSinceExpectedRunTime() to return tracked_objects::Duration.

https://codereview.chromium.org/18083015/diff/23001/base/tracked_objects.cc
File base/tracked_objects.cc (right):

https://codereview.chromium.org/18083015/diff/23001/base/tracked_objects.cc#n...
base/tracked_objects.cc:476: TrackedTime
expected_run_time(completed_task.ExpectedRunTime());
On 2013/07/08 17:08:33, jar wrote:
> Can you move across the comment that was deleted on line 468-473?

Done.

https://codereview.chromium.org/18083015/diff/23001/base/tracked_objects.cc#n...
base/tracked_objects.cc:477: queue_duration = (start_of_run -
expected_run_time).InMilliseconds();
On 2013/07/08 17:08:33, jar wrote:
> I'm not sure if this is a better name.
> 
> I would think "expected_run_time" would be the "time of posting" plus the
> "expected queuing delay."  Sadly, that is not the intent.

Realized I had missed "expected queuing delay" :)

> 
> This variable is more like "post time," except for the fact that the posting
> time for delayed tasks was looooong ago, and not significant to calculating
> queueing delay.
> 
> IMO, effective_post_time is a better name for the variable, and probably for
the
> method call "completed_task.EffectivePostTime()".
> 
> It might also be listed as "zero_queue_time_execution_time," but that is
pretty
> long... and probably more confusing.
> j
> Perhaps you have other better suggestions for the name?
> 

Now use EffectiveTimePosted() to be consistent with TrackingInfo::time_posted.

https://codereview.chromium.org/18083015/diff/23001/base/tracking_info.h
File base/tracking_info.h (right):

https://codereview.chromium.org/18083015/diff/23001/base/tracking_info.h#newc...
base/tracking_info.h:35: return TimeTicks::Now() - ExpectedRunTime();
On 2013/07/08 17:08:33, jar wrote:
> nit: Now() is generally considered an expensive call. This is not only because
> it calls 64bit library code (on windows), and grabs a lock.  The lock is also
a
> shared lock across all threads, and can routinely cause contention
(potentially
> blocking a thread, increasing queueing delay... which is I think what you're
> after measuring).
> 
> Usually we "try hard" to not call this repeatedly in a pass through the
message
> loop.  That said, I think this extra call is limited to cases where we're
> Trace()ing.... so it may all be an issue only for your code.
> 
> I think that you could probably get the info you needed by hooking into the
> profiler a tad deeper, and then you woudln't need to call Now().
> 
> YMMV.
> 

Changed to use TrackedTime to avoid lock on Windows. Is that a correct way?

jar (doing other things)

https://codereview.chromium.org/18083015/diff/23001/base/tracking_info.h File base/tracking_info.h (right): https://codereview.chromium.org/18083015/diff/23001/base/tracking_info.h#newcode35 base/tracking_info.h:35: return TimeTicks::Now() - ExpectedRunTime(); On 2013/07/08 18:51:39, Xianzhu wrote: ...

7 years, 5 months ago (2013-07-08 23:14:18 UTC) #15

https://codereview.chromium.org/18083015/diff/23001/base/tracking_info.h
File base/tracking_info.h (right):

https://codereview.chromium.org/18083015/diff/23001/base/tracking_info.h#newc...
base/tracking_info.h:35: return TimeTicks::Now() - ExpectedRunTime();
On 2013/07/08 18:51:39, Xianzhu wrote:
> On 2013/07/08 17:08:33, jar wrote:
> > nit: Now() is generally considered an expensive call. This is not only
because
> > it calls 64bit library code (on windows), and grabs a lock.  The lock is
also
> a
> > shared lock across all threads, and can routinely cause contention
> (potentially
> > blocking a thread, increasing queueing delay... which is I think what you're
> > after measuring).
> > 
> > Usually we "try hard" to not call this repeatedly in a pass through the
> message
> > loop.  That said, I think this extra call is limited to cases where we're
> > Trace()ing.... so it may all be an issue only for your code.
> > 
> > I think that you could probably get the info you needed by hooking into the
> > profiler a tad deeper, and then you woudln't need to call Now().
> > 
> > YMMV.
> > 
> 
> Changed to use TrackedTime to avoid lock on Windows. Is that a correct way?

Yeah... that will go a lot faster.  It uses a truncated 32-bit timer... which is
more than enough to track queue delay. It gets you up around 4K seconds, or
around an hour... which is way beyond plausible queueing delay on a thread.

https://codereview.chromium.org/18083015/diff/30001/base/location.cc
File base/location.cc (right):

https://codereview.chromium.org/18083015/diff/30001/base/location.cc#newcode41
base/location.cc:41: };
nit: NO_DEFAULT_COPY_OR_ASSIGN

https://codereview.chromium.org/18083015/diff/30001/base/tracked_objects.cc
File base/tracked_objects.cc (right):

https://codereview.chromium.org/18083015/diff/30001/base/tracked_objects.cc#n...
base/tracked_objects.cc:475: if (!start_of_run.is_null()) {
I think that start_of_run is probably now always set (i.e., I suspect that this
was written when I had thought of turning on profiling, rather than leaving it
on all the time).  If however this is still an issue, you probably need to take
corresponding precautions as you generate your queue_duration externally.

If you wanted to avoid repeating similar logic, you *could* change this method
to return queue_duration... and propagate that up to the message loop... and
just snag that value.

Looking at how you do all this in the message loop... it looks like you really
wanted the queueing_time calculated sooner (when you used the macro), so that
might not work.  I'm not sure what other options you have... but it wouldn't
surprise me if there was a flavor of the macro that allowed you to augment an
event (being tallied) with the additional info (queueing_time).

If you decide not to do this... you should probably add a DCHECK() to validate
the assumption that we always have a start_of_run time that is real.  (...and
you could probably assign a bug for me to cleanup up profiling, now that it is
always on... I just left the code in place to allow me to test the impact (on vs
off)).

https://codereview.chromium.org/18083015/diff/30001/base/tracking_info.h
File base/tracking_info.h (right):

https://codereview.chromium.org/18083015/diff/30001/base/tracking_info.h#newc...
base/tracking_info.h:43: tracked_objects::TrackedTime(EffectiveTimePosted());
Now that you're always converting to TrackedTime() (both here, and in the
profile code), you may as well have EffectiveTimePosted() return an instance of
that class directly.

Xianzhu

https://codereview.chromium.org/18083015/diff/30001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/18083015/diff/30001/base/tracked_objects.cc#newcode475 base/tracked_objects.cc:475: if (!start_of_run.is_null()) { On 2013/07/08 23:14:19, jar wrote: > ...

7 years, 5 months ago (2013-07-09 00:39:07 UTC) #16

https://codereview.chromium.org/18083015/diff/30001/base/tracked_objects.cc
File base/tracked_objects.cc (right):

https://codereview.chromium.org/18083015/diff/30001/base/tracked_objects.cc#n...
base/tracked_objects.cc:475: if (!start_of_run.is_null()) {
On 2013/07/08 23:14:19, jar wrote:
> I think that start_of_run is probably now always set (i.e., I suspect that
this
> was written when I had thought of turning on profiling, rather than leaving it
> on all the time).  If however this is still an issue, you probably need to
take
> corresponding precautions as you generate your queue_duration externally.
> 
> If you wanted to avoid repeating similar logic, you *could* change this method
> to return queue_duration... and propagate that up to the message loop... and
> just snag that value.
> 
> Looking at how you do all this in the message loop... it looks like you really
> wanted the queueing_time calculated sooner (when you used the macro), so that
> might not work.  I'm not sure what other options you have... but it wouldn't
> surprise me if there was a flavor of the macro that allowed you to augment an
> event (being tallied) with the additional info (queueing_time).
> 

Yes, we can use TRACE_EVENT_BEGIN and TRACE_EVENT_END and set the argument in
TRACE_EVENT_END. However, I just noticed the TRACE_EVENT_FLOW_END0("task",
"MessageLoop::PostTask"...) at the top of MessageLoop::RunTask. It seems better
to add the queueing_time argument to that event than TRACE_EVENT2("task",
"MessageLoop:RunTask") because the queuing_time is more related the flow between
PostTask to RunTask instead of RunTask itself. Using the TRACE_EVENT_FLOW_END we
must calculate the queueing_time there so it seems impossible to combine the
calculation with the profiler.

The small repeating similar logic looks ok to me.

Uploaded a much simpler change set as we don't need to compact the location into
one argument.

> If you decide not to do this... you should probably add a DCHECK() to validate
> the assumption that we always have a start_of_run time that is real.  (...and
> you could probably assign a bug for me to cleanup up profiling, now that it is
> always on... I just left the code in place to allow me to test the impact (on
vs
> off)).

I'll file a bug for you :)

jar (doing other things)

https://codereview.chromium.org/18083015/diff/31010/base/message_loop/message_loop.cc File base/message_loop/message_loop.cc (right): https://codereview.chromium.org/18083015/diff/31010/base/message_loop/message_loop.cc#newcode462 base/message_loop/message_loop.cc:462: (tracked_objects::TrackedTime::Now() - I believe that we get the value ...

7 years, 5 months ago (2013-07-09 16:50:53 UTC) #17

Xianzhu

7 years, 5 months ago (2013-07-09 17:22:42 UTC) #18

Xianzhu

Looked at the code. It seems that the profiler can be disabled with enable-profiling=0, when ...

7 years, 5 months ago (2013-07-09 17:49:17 UTC) #19

jar (doing other things)

On 2013/07/09 17:49:17, Xianzhu wrote: > Looked at the code. It seems that the profiler ...

7 years, 5 months ago (2013-07-10 15:48:09 UTC) #20

jar (doing other things)

LGTM This is now a nice compact change. Thanks!

7 years, 5 months ago (2013-07-10 15:48:47 UTC) #21

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/wangxianzhu@chromium.org/18083015/17011

7 years, 5 months ago (2013-07-10 15:51:54 UTC) #22

Xianzhu

7 years, 5 months ago (2013-07-10 21:28:40 UTC) #24

Message was sent while issue was closed.

https://codereview.chromium.org/18083015/diff/30001/base/tracked_objects.cc
File base/tracked_objects.cc (right):

https://codereview.chromium.org/18083015/diff/30001/base/tracked_objects.cc#n...
base/tracked_objects.cc:475: if (!start_of_run.is_null()) {

On 2013/07/08 23:14:19, jar wrote:
> I think that start_of_run is probably now always set...
>
> ... add a DCHECK() to validate
> the assumption that we always have a start_of_run time that is real.  (...and
> you could probably assign a bug for me to cleanup up profiling, now that it is
> always on... I just left the code in place to allow me to test the impact (on
vs
> off)).

I'm not sure if I have understood all the comments. Should the bug be like "Turn
profiling always on and cleanup conditions"?

Expand Messages | Collapse Messages