|
|
Created:
7 years, 5 months ago by Xianzhu Modified:
7 years, 5 months ago CC:
chromium-reviews, erikwright+watch_chromium.org, n.s.buttar Base URL:
https://chromium.googlesource.com/chromium/src.git@master Visibility:
Public. |
DescriptionAdd queued_time_ms trace for events in message loop
These trace can be useful for examining the status of message loop
queues when debugging some performance issues.
BUG=none
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=210880
Patch Set 1 #Patch Set 2 : Combine queued_time into TRACE_EVENT("RunTask") and remove queue length counters #Patch Set 3 : Use ConvertableToTraceFormat #
Total comments: 11
Patch Set 4 : Naming etc #
Total comments: 4
Patch Set 5 : Add queue_duration argument into TRACE_EVENT_FLOW_BEGIN #
Total comments: 4
Patch Set 6 : #
Messages
Total messages: 24 (0 generated)
Looks fine, but → jar for tracey stuff.
What about putting the queuing times into an arg of the trace_event for the runtask itself? I have a notion on how to add UI in trace-viewer that would visualize those in the timeline JUSt when the message loop slices are selected...
On 2013/07/02 22:34:48, nduca wrote: > What about putting the queuing times into an arg of the trace_event for the > runtask itself? I have a notion on how to add UI in trace-viewer that would > visualize those in the timeline JUSt when the message loop slices are > selected... The cool trick might be to land tracing instrumentation into the profiler itself, or worst case parallel to the profiling hooks in the message loop. The profiler extracts total queue delay (from post time, until execution, combining message loop internal and external queue delays). Take a look at where the profiler hooks grab this from, and consider copying that. You'll also be able to get the name of the task posting point (which usually implies the target method name), which I assume will be helpful data... if you can record it. The string value is in a persistent/fixed location (for a given binary), so you don't need to do the string copy (and can decode it during the run I'd presume?).
On 2013/07/02 22:34:48, nduca wrote: > What about putting the queuing times into an arg of the trace_event for the > runtask itself? I have a notion on how to add UI in trace-viewer that would > visualize those in the timeline JUSt when the message loop slices are > selected... Didn't combine with TRACE_EVENT2("task", "RunTask"...) because the two parameters are already used and there is no available slot for the new parameter. Would you like StringPrintf to combine "src_file" and "src_func" to "src" in order to make an empty slot for the new parameter? (or create TRACE_EVENT3?) On 2013/07/02 22:43:23, jar wrote: > On 2013/07/02 22:34:48, nduca wrote: > > What about putting the queuing times into an arg of the trace_event for the > > runtask itself? I have a notion on how to add UI in trace-viewer that would > > visualize those in the timeline JUSt when the message loop slices are > > selected... > > The cool trick might be to land tracing instrumentation into the profiler > itself, or worst case parallel to the profiling hooks in the message loop. > > The profiler extracts total queue delay (from post time, until execution, > combining message loop internal and external queue delays). > > Take a look at where the profiler hooks grab this from, and consider copying > that. Will change according to the profiler code to include the queued time for delayed_run events. > You'll also be able to get the name of the task posting point (which > usually implies the target method name), which I assume will be helpful data... > if you can record it. The string value is in a persistent/fixed location (for a > given binary), so you don't need to do the string copy (and can decode it during > the run I'd presume?). Yes, the current RunTask trace event is using the posting point as original pointers without copying the strings.
Can we omit "src_file" to make space for "queued_time_ms"? Looked at several traces. Seems in most cases "src_func" is enough. There may be cases that static or anonymous namespace functions in different files having the same name, but I think it won't be difficult to distinguish them from the context.
FYI, http://www/~wangxianzhu/profile-touch-delay-theverge is the trace containing the new counters. The incoming_queue_length and working_queue_length counters may look annoying, but they do expose the problem: about 300 events queued for several seconds when the main thread is blocked by layout. Maybe we should let the new counters "disabled-by-default"?
I would expect you'd be more concerned with the queuing delay, than the queued task count. I guess there could be cases where a bazillion tiny tasks were run... and *that* was what caused the accumulated delay. Is that the concern? On Tue, Jul 2, 2013 at 4:53 PM, <wangxianzhu@chromium.org> wrote: > FYI, http://www/~wangxianzhu/**profile-touch-delay-theverge<http://www/~wangxianzh... the trace > containing the new counters. The incoming_queue_length and > working_queue_length > counters may look annoying, but they do expose the problem: about 300 > events > queued for several seconds when the main thread is blocked by layout. > Maybe we > should let the new counters "disabled-by-default"? > > https://codereview.chromium.**org/18083015/<https://codereview.chromium.org/1... >
On 2013/07/03 00:04:23, jar wrote: > I would expect you'd be more concerned with the queuing delay, than the > queued task count. I guess there could be cases where a bazillion tiny > tasks were run... and *that* was what caused the accumulated delay. Is > that the concern? > Yes, the queuing delay is more important. Uploaded a new patch set. PTAL.
You can use trace_Event_impl's ConvertableToTraceLog to capture >2 arguments.
On 2013/07/03 00:18:05, nduca wrote: > You can use trace_Event_impl's ConvertableToTraceLog to capture >2 arguments. Thanks, that's useful. In this case, compared to the ToString() method, ConvertableToTraceLog method saves a std::string object and one copy of the contents (requiring https://codereview.chromium.org/18648003/), though needs a copy of the Location object and 30 lines of code. Seems worth it. Uploaded Patch Set 3. PTAL.
https://codereview.chromium.org/18083015/diff/23001/base/location.cc File base/location.cc (right): https://codereview.chromium.org/18083015/diff/23001/base/location.cc#newcode30 base/location.cc:30: virtual void AppendAsTraceFormat(std::string* out) const OVERRIDE { When/how often does this get called? Is it only at display time? This probably going to be expensive. https://codereview.chromium.org/18083015/diff/23001/base/message_loop/message... File base/message_loop/message_loop.cc (right): https://codereview.chromium.org/18083015/diff/23001/base/message_loop/message... base/message_loop/message_loop.cc:464: pending_task.DurationSinceExpectedRunTime().InMilliseconds()); Possibly not a big deal... I'm always wary of the cost of conversion to milliseconds. It uses 64 bit arithmetic (on Windows) to do a divide-by-1000, which commonly requires calling a library call (sadly), and hence is also pretty expensive. https://codereview.chromium.org/18083015/diff/23001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/18083015/diff/23001/base/tracked_objects.cc#n... base/tracked_objects.cc:476: TrackedTime expected_run_time(completed_task.ExpectedRunTime()); Can you move across the comment that was deleted on line 468-473? https://codereview.chromium.org/18083015/diff/23001/base/tracked_objects.cc#n... base/tracked_objects.cc:477: queue_duration = (start_of_run - expected_run_time).InMilliseconds(); I'm not sure if this is a better name. I would think "expected_run_time" would be the "time of posting" plus the "expected queuing delay." Sadly, that is not the intent. This variable is more like "post time," except for the fact that the posting time for delayed tasks was looooong ago, and not significant to calculating queueing delay. IMO, effective_post_time is a better name for the variable, and probably for the method call "completed_task.EffectivePostTime()". It might also be listed as "zero_queue_time_execution_time," but that is pretty long... and probably more confusing. Perhaps you have other better suggestions for the name? https://codereview.chromium.org/18083015/diff/23001/base/tracking_info.h File base/tracking_info.h (right): https://codereview.chromium.org/18083015/diff/23001/base/tracking_info.h#newc... base/tracking_info.h:35: return TimeTicks::Now() - ExpectedRunTime(); nit: Now() is generally considered an expensive call. This is not only because it calls 64bit library code (on windows), and grabs a lock. The lock is also a shared lock across all threads, and can routinely cause contention (potentially blocking a thread, increasing queueing delay... which is I think what you're after measuring). Usually we "try hard" to not call this repeatedly in a pass through the message loop. That said, I think this extra call is limited to cases where we're Trace()ing.... so it may all be an issue only for your code. I think that you could probably get the info you needed by hooking into the profiler a tad deeper, and then you woudln't need to call Now(). YMMV.
PTAL. https://codereview.chromium.org/18083015/diff/23001/base/location.cc File base/location.cc (right): https://codereview.chromium.org/18083015/diff/23001/base/location.cc#newcode30 base/location.cc:30: virtual void AppendAsTraceFormat(std::string* out) const OVERRIDE { On 2013/07/08 17:08:33, jar wrote: > When/how often does this get called? Is it only at display time? > > This probably going to be expensive. It's only called at display (or save) time, once per event, when TraceLog converts each event into JSON. It costs almost the same as what EventLog does for other trace values (or a bit more if conversion to ConvertableToTraceFormat and the virtual call are considered). https://codereview.chromium.org/18083015/diff/23001/base/message_loop/message... File base/message_loop/message_loop.cc (right): https://codereview.chromium.org/18083015/diff/23001/base/message_loop/message... base/message_loop/message_loop.cc:464: pending_task.DurationSinceExpectedRunTime().InMilliseconds()); On 2013/07/08 17:08:33, jar wrote: > Possibly not a big deal... I'm always wary of the cost of conversion to > milliseconds. It uses 64 bit arithmetic (on Windows) to do a divide-by-1000, > which commonly requires calling a library call (sadly), and hence is also pretty > expensive. Changed DurationSinceExpectedRunTime() to return tracked_objects::Duration. https://codereview.chromium.org/18083015/diff/23001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/18083015/diff/23001/base/tracked_objects.cc#n... base/tracked_objects.cc:476: TrackedTime expected_run_time(completed_task.ExpectedRunTime()); On 2013/07/08 17:08:33, jar wrote: > Can you move across the comment that was deleted on line 468-473? Done. https://codereview.chromium.org/18083015/diff/23001/base/tracked_objects.cc#n... base/tracked_objects.cc:477: queue_duration = (start_of_run - expected_run_time).InMilliseconds(); On 2013/07/08 17:08:33, jar wrote: > I'm not sure if this is a better name. > > I would think "expected_run_time" would be the "time of posting" plus the > "expected queuing delay." Sadly, that is not the intent. Realized I had missed "expected queuing delay" :) > > This variable is more like "post time," except for the fact that the posting > time for delayed tasks was looooong ago, and not significant to calculating > queueing delay. > > IMO, effective_post_time is a better name for the variable, and probably for the > method call "completed_task.EffectivePostTime()". > > It might also be listed as "zero_queue_time_execution_time," but that is pretty > long... and probably more confusing. > j > Perhaps you have other better suggestions for the name? > Now use EffectiveTimePosted() to be consistent with TrackingInfo::time_posted. https://codereview.chromium.org/18083015/diff/23001/base/tracking_info.h File base/tracking_info.h (right): https://codereview.chromium.org/18083015/diff/23001/base/tracking_info.h#newc... base/tracking_info.h:35: return TimeTicks::Now() - ExpectedRunTime(); On 2013/07/08 17:08:33, jar wrote: > nit: Now() is generally considered an expensive call. This is not only because > it calls 64bit library code (on windows), and grabs a lock. The lock is also a > shared lock across all threads, and can routinely cause contention (potentially > blocking a thread, increasing queueing delay... which is I think what you're > after measuring). > > Usually we "try hard" to not call this repeatedly in a pass through the message > loop. That said, I think this extra call is limited to cases where we're > Trace()ing.... so it may all be an issue only for your code. > > I think that you could probably get the info you needed by hooking into the > profiler a tad deeper, and then you woudln't need to call Now(). > > YMMV. > Changed to use TrackedTime to avoid lock on Windows. Is that a correct way?
https://codereview.chromium.org/18083015/diff/23001/base/tracking_info.h File base/tracking_info.h (right): https://codereview.chromium.org/18083015/diff/23001/base/tracking_info.h#newc... base/tracking_info.h:35: return TimeTicks::Now() - ExpectedRunTime(); On 2013/07/08 18:51:39, Xianzhu wrote: > On 2013/07/08 17:08:33, jar wrote: > > nit: Now() is generally considered an expensive call. This is not only because > > it calls 64bit library code (on windows), and grabs a lock. The lock is also > a > > shared lock across all threads, and can routinely cause contention > (potentially > > blocking a thread, increasing queueing delay... which is I think what you're > > after measuring). > > > > Usually we "try hard" to not call this repeatedly in a pass through the > message > > loop. That said, I think this extra call is limited to cases where we're > > Trace()ing.... so it may all be an issue only for your code. > > > > I think that you could probably get the info you needed by hooking into the > > profiler a tad deeper, and then you woudln't need to call Now(). > > > > YMMV. > > > > Changed to use TrackedTime to avoid lock on Windows. Is that a correct way? Yeah... that will go a lot faster. It uses a truncated 32-bit timer... which is more than enough to track queue delay. It gets you up around 4K seconds, or around an hour... which is way beyond plausible queueing delay on a thread. https://codereview.chromium.org/18083015/diff/30001/base/location.cc File base/location.cc (right): https://codereview.chromium.org/18083015/diff/30001/base/location.cc#newcode41 base/location.cc:41: }; nit: NO_DEFAULT_COPY_OR_ASSIGN https://codereview.chromium.org/18083015/diff/30001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/18083015/diff/30001/base/tracked_objects.cc#n... base/tracked_objects.cc:475: if (!start_of_run.is_null()) { I think that start_of_run is probably now always set (i.e., I suspect that this was written when I had thought of turning on profiling, rather than leaving it on all the time). If however this is still an issue, you probably need to take corresponding precautions as you generate your queue_duration externally. If you wanted to avoid repeating similar logic, you *could* change this method to return queue_duration... and propagate that up to the message loop... and just snag that value. Looking at how you do all this in the message loop... it looks like you really wanted the queueing_time calculated sooner (when you used the macro), so that might not work. I'm not sure what other options you have... but it wouldn't surprise me if there was a flavor of the macro that allowed you to augment an event (being tallied) with the additional info (queueing_time). If you decide not to do this... you should probably add a DCHECK() to validate the assumption that we always have a start_of_run time that is real. (...and you could probably assign a bug for me to cleanup up profiling, now that it is always on... I just left the code in place to allow me to test the impact (on vs off)). https://codereview.chromium.org/18083015/diff/30001/base/tracking_info.h File base/tracking_info.h (right): https://codereview.chromium.org/18083015/diff/30001/base/tracking_info.h#newc... base/tracking_info.h:43: tracked_objects::TrackedTime(EffectiveTimePosted()); Now that you're always converting to TrackedTime() (both here, and in the profile code), you may as well have EffectiveTimePosted() return an instance of that class directly.
https://codereview.chromium.org/18083015/diff/30001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/18083015/diff/30001/base/tracked_objects.cc#n... base/tracked_objects.cc:475: if (!start_of_run.is_null()) { On 2013/07/08 23:14:19, jar wrote: > I think that start_of_run is probably now always set (i.e., I suspect that this > was written when I had thought of turning on profiling, rather than leaving it > on all the time). If however this is still an issue, you probably need to take > corresponding precautions as you generate your queue_duration externally. > > If you wanted to avoid repeating similar logic, you *could* change this method > to return queue_duration... and propagate that up to the message loop... and > just snag that value. > > Looking at how you do all this in the message loop... it looks like you really > wanted the queueing_time calculated sooner (when you used the macro), so that > might not work. I'm not sure what other options you have... but it wouldn't > surprise me if there was a flavor of the macro that allowed you to augment an > event (being tallied) with the additional info (queueing_time). > Yes, we can use TRACE_EVENT_BEGIN and TRACE_EVENT_END and set the argument in TRACE_EVENT_END. However, I just noticed the TRACE_EVENT_FLOW_END0("task", "MessageLoop::PostTask"...) at the top of MessageLoop::RunTask. It seems better to add the queueing_time argument to that event than TRACE_EVENT2("task", "MessageLoop:RunTask") because the queuing_time is more related the flow between PostTask to RunTask instead of RunTask itself. Using the TRACE_EVENT_FLOW_END we must calculate the queueing_time there so it seems impossible to combine the calculation with the profiler. The small repeating similar logic looks ok to me. Uploaded a much simpler change set as we don't need to compact the location into one argument. > If you decide not to do this... you should probably add a DCHECK() to validate > the assumption that we always have a start_of_run time that is real. (...and > you could probably assign a bug for me to cleanup up profiling, now that it is > always on... I just left the code in place to allow me to test the impact (on vs > off)). I'll file a bug for you :)
https://codereview.chromium.org/18083015/diff/31010/base/message_loop/message... File base/message_loop/message_loop.cc (right): https://codereview.chromium.org/18083015/diff/31010/base/message_loop/message... base/message_loop/message_loop.cc:462: (tracked_objects::TrackedTime::Now() - I believe that we get the value of Now() on line 483 below. If you relocated this pair of events down to line 485, you could use that value directly. In truth, you've switched to use the TrackedTime flavor of Now(), so it is not as expensive as Time::Now(). On the other hand, I don't see any time consuming functions that you'd be bypassing.... and using the same value of Now() will cause your stats to align with the profiler's. I might also be convinced to move line 483 up above line 459, to give you access to it another way. https://codereview.chromium.org/18083015/diff/31010/base/tracking_info.h File base/tracking_info.h (right): https://codereview.chromium.org/18083015/diff/31010/base/tracking_info.h#newc... base/tracking_info.h:36: TimeTicks EffectiveTimePosted() const { Both uses of the return value immediately convert to: tracked_objects::TrackedTime You probably should adjust the return value here, and simplify that code.
https://codereview.chromium.org/18083015/diff/31010/base/message_loop/message... File base/message_loop/message_loop.cc (right): https://codereview.chromium.org/18083015/diff/31010/base/message_loop/message... base/message_loop/message_loop.cc:462: (tracked_objects::TrackedTime::Now() - On 2013/07/09 16:50:53, jar wrote: > I believe that we get the value of Now() on line 483 below. If you relocated > this pair of events down to line 485, you could use that value directly. > > In truth, you've switched to use the TrackedTime flavor of Now(), so it is not > as expensive as Time::Now(). > > On the other hand, I don't see any time consuming functions that you'd be > bypassing.... and using the same value of Now() will cause your stats to align > with the profiler's. > > I might also be convinced to move line 483 up above line 459, to give you access > to it another way. Done. I had been afraid that sometimes TrackedObject::Now() would return null before you confirmed that the profiler will be always on. https://codereview.chromium.org/18083015/diff/31010/base/tracking_info.h File base/tracking_info.h (right): https://codereview.chromium.org/18083015/diff/31010/base/tracking_info.h#newc... base/tracking_info.h:36: TimeTicks EffectiveTimePosted() const { On 2013/07/09 16:50:53, jar wrote: > Both uses of the return value immediately convert to: > > tracked_objects::TrackedTime > > You probably should adjust the return value here, and simplify that code. Done. I thought of consistency with the type of time_posted and delayed_run_time. It's fine to use TrackedTime as it's only used for tracing/profiling.
Looked at the code. It seems that the profiler can be disabled with enable-profiling=0, when TrackedObjects::Now() will return null. I think it's fine that the added tracing argument needs the profiler to be activated, but ThreadData::TallyRunOnNamedThreadIfTracking() seems to get null start_of_run and end_of_run so the conditions to check null of them seem still needed. Am I missing anything?
On 2013/07/09 17:49:17, Xianzhu wrote: > Looked at the code. It seems that the profiler can be disabled with > enable-profiling=0, when TrackedObjects::Now() will return null. I think it's > fine that the added tracing argument needs the profiler to be activated, but > ThreadData::TallyRunOnNamedThreadIfTracking() seems to get null start_of_run and > end_of_run so the conditions to check null of them seem still needed. Am I > missing anything? I think I was asking about whether you needed corresponding protection in your tracing macros. You could insist that "you can't do tracing and disable profiling."
LGTM This is now a nice compact change. Thanks!
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/wangxianzhu@chromium.org/18083015/17011
Message was sent while issue was closed.
Change committed as 210880
Message was sent while issue was closed.
https://codereview.chromium.org/18083015/diff/30001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/18083015/diff/30001/base/tracked_objects.cc#n... base/tracked_objects.cc:475: if (!start_of_run.is_null()) { On 2013/07/08 23:14:19, jar wrote: > I think that start_of_run is probably now always set... > > ... add a DCHECK() to validate > the assumption that we always have a start_of_run time that is real. (...and > you could probably assign a bug for me to cleanup up profiling, now that it is > always on... I just left the code in place to allow me to test the impact (on vs > off)). I'm not sure if I have understood all the comments. Should the bug be like "Turn profiling always on and cleanup conditions"? |