|
|
Created:
7 years, 3 months ago by Roger McFarlane (Chromium) Modified:
7 years, 3 months ago Reviewers:
cpu_(ooo_6.6-7.5), rvargas (doing something else), jar (doing other things), grt (UTC plus 2) CC:
chromium-reviews, asvitkine+watch_chromium.org, kareng, Ilya Sherman, sukolsak, gab Base URL:
https://chromium.googlesource.com/chromium/src.git@master Visibility:
Public. |
DescriptionChrome.BrowserCrashDumpAttempts needs to account for multiple dumps from the same browser process.
This CL fixes a bug wherein a non-fatal crash report would prevent subsequent (fatal or non-fatal) crash reports in the same browser process from being counted in the histograms.
It has been further expanded to define separate histograms for crashing, non-crashing and total-number-of attenoted crash reports.
R= cpu, jar, rvargas, grt
BUG=290196
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=223314
Patch Set 1 #
Total comments: 9
Patch Set 2 : Multi-dump logic was faulty, fixed key proliferation, send smoke signal in filter callback. #
Total comments: 18
Patch Set 3 : Count dumps with/without crashes separately. #
Total comments: 7
Patch Set 4 : fix nits #Patch Set 5 : Declare the new histograms #
Total comments: 13
Patch Set 6 : use base::strings::SafeSPrintf #
Total comments: 2
Patch Set 7 : Use NOTREACHED() #
Total comments: 2
Patch Set 8 : Retry git cl upload #
Messages
Total messages: 27 (0 generated)
PTAL?
what are we trying to solve? that there could be two (or more threads crashing at the same time? https://codereview.chromium.org/23453032/diff/1/chrome/app/breakpad_win.cc File chrome/app/breakpad_win.cc (right): https://codereview.chromium.org/23453032/diff/1/chrome/app/breakpad_win.cc#ne... chrome/app/breakpad_win.cc:131: static base::subtle::Atomic32 g_browser_crash_dump_count = 0; aka int32 btw
https://codereview.chromium.org/23453032/diff/1/chrome/app/breakpad_win.cc File chrome/app/breakpad_win.cc (right): https://codereview.chromium.org/23453032/diff/1/chrome/app/breakpad_win.cc#ne... chrome/app/breakpad_win.cc:595: we don't have the smoke signal here ...
The problem we're trying to solve: The current code has a bug in that it assumes we're about to crash when generating the smoke signal. I.e., after the first minidump is generated it closes the registry key. But we sometimes generate minidumps without crashing, which would mean a subsequent crash (or a seconds minidump without crashing) would not get counted. We're not particularly trying to ensure that two threads can crash at the same time. https://codereview.chromium.org/23453032/diff/1/chrome/app/breakpad_win.cc File chrome/app/breakpad_win.cc (right): https://codereview.chromium.org/23453032/diff/1/chrome/app/breakpad_win.cc#ne... chrome/app/breakpad_win.cc:131: static base::subtle::Atomic32 g_browser_crash_dump_count = 0; On 2013/09/04 23:29:45, cpu wrote: > aka int32 btw Yes, I know. But this communicates the intended usage. https://codereview.chromium.org/23453032/diff/1/chrome/app/breakpad_win.cc#ne... chrome/app/breakpad_win.cc:595: On 2013/09/04 23:32:20, cpu wrote: > we don't have the smoke signal here ... We send the smoke signal before generating the dump, or at least that's the intent. AFAICT, I've hit all of the spots just before we write a minidump. I did it in this order to count all the times we intend to generate a dump, and if the dump generation should fail we should still have a count of the attempt.
drive-by https://codereview.chromium.org/23453032/diff/1/chrome/app/breakpad_win.cc File chrome/app/breakpad_win.cc (right): https://codereview.chromium.org/23453032/diff/1/chrome/app/breakpad_win.cc#ne... chrome/app/breakpad_win.cc:141: if (regkey.Create(HKEY_CURRENT_USER, when are these keys deleted?
https://codereview.chromium.org/23453032/diff/1/chrome/app/breakpad_win.cc File chrome/app/breakpad_win.cc (right): https://codereview.chromium.org/23453032/diff/1/chrome/app/breakpad_win.cc#ne... chrome/app/breakpad_win.cc:141: if (regkey.Create(HKEY_CURRENT_USER, On 2013/09/05 21:09:13, grt wrote: > when are these keys deleted? one possibility: the installer could delete the keys for previous versions when an update is applied, and likewise delete all keys when chrome is uninstalled. if you wish, you could preserve the key for the version-from-which-chrome-was-just-updated when an update is applied so that you could somehow send this data up rather than losing it.
On 2013/09/06 02:21:35, grt wrote: > https://codereview.chromium.org/23453032/diff/1/chrome/app/breakpad_win.cc > File chrome/app/breakpad_win.cc (right): > > https://codereview.chromium.org/23453032/diff/1/chrome/app/breakpad_win.cc#ne... > chrome/app/breakpad_win.cc:141: if (regkey.Create(HKEY_CURRENT_USER, > On 2013/09/05 21:09:13, grt wrote: > > when are these keys deleted? > > one possibility: the installer could delete the keys for previous versions when > an update is applied, and likewise delete all keys when chrome is uninstalled. > if you wish, you could preserve the key for the > version-from-which-chrome-was-just-updated when an update is applied so that you > could somehow send this data up rather than losing it. hmm, except that this won't work for system-level installs since they don't have access to individual users' registry hives. nor do uninstalls of Chrome originating from the MSI (which run as SYSTEM). so, yeah, how do you delete these keys? could the keys be written in the omaha ClientState key for user-level installs (in HKCU) or ClientStateMedium key for system-level installs (in HKLM)? then the keys for all users would be accessible by the installer.
Another look? I walked through this again and realized that multiple dumps from a given process were still a problem. In the previous incarnation, if a user ran more than one browser instance they could end up double counting crashes. I think this is more or less correct. Wrapping the counting code in a registry transaction would be slightly better, but may not be worth it. I've put the version specificity into the value names, so no proliferation of keys. There's still the possibility of left-over unreported values across version upgrades, but presumably the update process could be made to handle this? Not sure if there is any histogram reporting in there already. I also moved the smoke signal generation into the crash filter callbacks, as these are automatically invoked by breakpad before it generated a dump. https://codereview.chromium.org/23453032/diff/1/chrome/app/breakpad_win.cc File chrome/app/breakpad_win.cc (right): https://codereview.chromium.org/23453032/diff/1/chrome/app/breakpad_win.cc#ne... chrome/app/breakpad_win.cc:141: if (regkey.Create(HKEY_CURRENT_USER, On 2013/09/05 21:09:13, grt wrote: > when are these keys deleted? Indeed. I got another ping from a concerned dev shortly after you wrote this. But he was asking about post uninstall. Another approach would be to put the version identifier into the value. All chromen would share a key but write/delete version specific values. That still doesn't address uninstall, but it does prevent the proliferation of keys. https://codereview.chromium.org/23453032/diff/1/chrome/app/breakpad_win.cc#ne... chrome/app/breakpad_win.cc:141: if (regkey.Create(HKEY_CURRENT_USER, On 2013/09/06 02:21:35, grt wrote: > On 2013/09/05 21:09:13, grt wrote: > > when are these keys deleted? > > one possibility: the installer could delete the keys for previous versions when > an update is applied, and likewise delete all keys when chrome is uninstalled. > if you wish, you could preserve the key for the > version-from-which-chrome-was-just-updated when an update is applied so that you > could somehow send this data up rather than losing it. That sounds like the makings of a plan. https://codereview.chromium.org/23453032/diff/1/chrome/app/breakpad_win.cc#ne... chrome/app/breakpad_win.cc:595: On 2013/09/05 13:57:53, Roger McFarlane (Chromium) wrote: > On 2013/09/04 23:32:20, cpu wrote: > > we don't have the smoke signal here ... > > We send the smoke signal before generating the dump, or at least that's the > intent. AFAICT, I've hit all of the spots just before we write a minidump. > > I did it in this order to count all the times we intend to generate a dump, and > if the dump generation should fail we should still have a count of the attempt. I moved the call to the filter callbacks (which are called by breakpad before generating any dumps).
https://codereview.chromium.org/23453032/diff/13001/chrome/app/breakpad_win.cc File chrome/app/breakpad_win.cc (right): https://codereview.chromium.org/23453032/diff/13001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:130: wchar_t g_browser_crash_dump_prefix[kBrowserCrashDumpPrefixLength + 1] = { 0 }; grt's pet peeve: "{ 0 }" -> "{}" https://codereview.chromium.org/23453032/diff/13001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:163: DCHECK_GT(length, 0); how about being a little more defensive here: if (length > 0 && length < arraysize(g_browser_crash_dump_prefix)) { // Hold the registry key in a global for update on crash dump. g_browser_crash_dump_regkey = regkey.Take(); } else { g_browser_crash_dump_prefix[0] = '\0'; } https://codereview.chromium.org/23453032/diff/13001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:164: DCHECK_LT(length, kBrowserCrashDumpPrefixLength); DCHECK_LE since kBrowserCrashDumpPrefixLength is already one less than arraysize(g_browser_crash_dump_prefix)? https://codereview.chromium.org/23453032/diff/13001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:167: void SendSmokeSignalForCrashDump() { Send -> Write or Store since this function doesn't send anything. https://codereview.chromium.org/23453032/diff/13001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:175: wchar_t value[2 * arraysize(g_browser_crash_dump_prefix)] = { 0 }; initializer isn't necessary here if you check the return value below. https://codereview.chromium.org/23453032/diff/13001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:180: if (length > 0 && length < arraysize(value)) { // Write the value to the registry. .... } https://codereview.chromium.org/23453032/diff/13001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:183: regkey.WriteValue(value, 1); since you don't need data in the value, how about: regkey.WriteValue(value, NULL, 0, REG_NONE); https://codereview.chromium.org/23453032/diff/13001/chrome/browser/metrics/me... File chrome/browser/metrics/metrics_service.cc (right): https://codereview.chromium.org/23453032/diff/13001/chrome/browser/metrics/me... chrome/browser/metrics/metrics_service.cc:813: if (regkey.GetValueNameAt(i, &temp_name) == ERROR_SUCCESS && i don't think that this will work the way you hope it will since there's no guarantee on a stable ordering of registry values. i suggest you do a simple walk through the values via RegistryValueIterator and keep track of the names of the to-be-deleted values as you go (in a deque or vector, for example) which you then delete after you finish iterating. one idea: { base::win::RegKey regkey; // Don't report any value in the histogram if the registry key doesn't exist // so that the zero-crashes bucket isn't polluted. if (regkey.Open(HKEY_CURRENT_USER, chrome::kBrowserCrashDumpAttemptsRegistryPath, KEY_SET_VALUE) == ERROR_SUCCESS) { base::string16 chrome_version(base::ASCIIToUTF16(chrome::kChromeVersion)); std::deque<base::string16> crash_dump_attempts; for (base::win::RegistryValueIterator iter( HKEY_CURRENT_USER, chrome::kBrowserCrashDumpAttemptsRegistryPath); iter.Valid(); ++iter) { if (StartsWith(iter.Name(), chrome_version, false)) crash_dump_attempts.push_back(base::string16(iter.Name())); } UMA_HISTOGRAM_COUNTS("Chrome.BrowserCrashDumpAttempts", crash_dump_attempts.size()); while (!crash_dump_attempts.empty()) { regkey.DeleteValue(crash_dump_attempts.back().c_str()); crash_dump_attempts.pop_back(); } } } https://codereview.chromium.org/23453032/diff/13001/chrome/browser/metrics/me... chrome/browser/metrics/metrics_service.cc:814: StartsWith(temp_name, chrome_version, false) && how about also deleting the values for older versions? or maybe unconditionally delete all values after counting the ones corresponding to this version? is it useful to have another UMA stat for the number of other-version crashes?
Drive by. As far as I can see, at a high level the problem that we have today is that the count returned by this histogram seems too low when compared against other sources, but if something, it is overcounting* crashes when compared against UMA stability numbers (even though it is still below that number). (*) it is overcounting because it reports a crash when the browser has not crashed yet (so UMA stability will not count this as a crash). I think we want to report two numbers instead of one, so that we get meaningful points of comparison with our other two sources (stability and crash data). First, a counter that only gets incremented with real crashes, and not dumps without crashes. Close to what we do today, as a browser can crash only once! (but skipping the dump only paths, to avoid overcounting) Second, a new counter that reports all attempts to upload dumps... to compare against crash data.
if you are adding a counter for real crashes can you please do it in the new process, we usually launch a new browser with the 'woa! chrome has crashed" That is to avoid doing more work on a crashed process.
Thanks. Another look? https://codereview.chromium.org/23453032/diff/13001/chrome/app/breakpad_win.cc File chrome/app/breakpad_win.cc (right): https://codereview.chromium.org/23453032/diff/13001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:130: wchar_t g_browser_crash_dump_prefix[kBrowserCrashDumpPrefixLength + 1] = { 0 }; On 2013/09/06 15:53:23, grt wrote: > grt's pet peeve: "{ 0 }" -> "{}" Done. https://codereview.chromium.org/23453032/diff/13001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:163: DCHECK_GT(length, 0); On 2013/09/06 15:53:23, grt wrote: > how about being a little more defensive here: > if (length > 0 && length < arraysize(g_browser_crash_dump_prefix)) { > // Hold the registry key in a global for update on crash dump. > g_browser_crash_dump_regkey = regkey.Take(); > } else { > g_browser_crash_dump_prefix[0] = '\0'; > } Done. https://codereview.chromium.org/23453032/diff/13001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:164: DCHECK_LT(length, kBrowserCrashDumpPrefixLength); On 2013/09/06 15:53:23, grt wrote: > DCHECK_LE since kBrowserCrashDumpPrefixLength is already one less than > arraysize(g_browser_crash_dump_prefix)? Done. https://codereview.chromium.org/23453032/diff/13001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:167: void SendSmokeSignalForCrashDump() { On 2013/09/06 15:53:23, grt wrote: > Send -> Write or Store since this function doesn't send anything. RecordCrashDumpAttempt (the smoke signal terminology is defunct) https://codereview.chromium.org/23453032/diff/13001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:175: wchar_t value[2 * arraysize(g_browser_crash_dump_prefix)] = { 0 }; On 2013/09/06 15:53:23, grt wrote: > initializer isn't necessary here if you check the return value below. Don't we always initialize our vars in Chrome? https://codereview.chromium.org/23453032/diff/13001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:180: On 2013/09/06 15:53:23, grt wrote: > if (length > 0 && length < arraysize(value)) { > // Write the value to the registry. > .... > } Done. https://codereview.chromium.org/23453032/diff/13001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:183: regkey.WriteValue(value, 1); On 2013/09/06 15:53:23, grt wrote: > since you don't need data in the value, how about: > regkey.WriteValue(value, NULL, 0, REG_NONE); I now need to data in the value (to tell between a crashing and non-crashing dump). https://codereview.chromium.org/23453032/diff/13001/chrome/browser/metrics/me... File chrome/browser/metrics/metrics_service.cc (right): https://codereview.chromium.org/23453032/diff/13001/chrome/browser/metrics/me... chrome/browser/metrics/metrics_service.cc:813: if (regkey.GetValueNameAt(i, &temp_name) == ERROR_SUCCESS && On 2013/09/06 15:53:23, grt wrote: > i don't think that this will work the way you hope it will since there's no > guarantee on a stable ordering of registry values. i suggest you do a simple > walk through the values via RegistryValueIterator and keep track of the names of > the to-be-deleted values as you go (in a deque or vector, for example) which you > then delete after you finish iterating. one idea: > { > base::win::RegKey regkey; > // Don't report any value in the histogram if the registry key doesn't exist > // so that the zero-crashes bucket isn't polluted. > if (regkey.Open(HKEY_CURRENT_USER, > chrome::kBrowserCrashDumpAttemptsRegistryPath, > KEY_SET_VALUE) == ERROR_SUCCESS) { > base::string16 chrome_version(base::ASCIIToUTF16(chrome::kChromeVersion)); > std::deque<base::string16> crash_dump_attempts; > for (base::win::RegistryValueIterator iter( > HKEY_CURRENT_USER, > chrome::kBrowserCrashDumpAttemptsRegistryPath); > iter.Valid(); > ++iter) { > if (StartsWith(iter.Name(), chrome_version, false)) > crash_dump_attempts.push_back(base::string16(iter.Name())); > } > > UMA_HISTOGRAM_COUNTS("Chrome.BrowserCrashDumpAttempts", > crash_dump_attempts.size()); > > while (!crash_dump_attempts.empty()) { > regkey.DeleteValue(crash_dump_attempts.back().c_str()); > crash_dump_attempts.pop_back(); > } > } > } I've done as you suggest per the walkthrough and delete after. I've also limited the reporting to non-zero counts. It was a suggestion on the initial Cl creating this code that it always be reported, but have the zero-valued samples does seem to confuse consumers of the histograms. https://codereview.chromium.org/23453032/diff/13001/chrome/browser/metrics/me... chrome/browser/metrics/metrics_service.cc:814: StartsWith(temp_name, chrome_version, false) && On 2013/09/06 15:53:23, grt wrote: > how about also deleting the values for older versions? or maybe unconditionally > delete all values after counting the ones corresponding to this version? is it > useful to have another UMA stat for the number of other-version crashes? Hmm... That would be wonky in multi-install, which is probably rare outside the chromium developer community. Since I'm not immediately sure what the right answer is on that one, how about I punt it to a future CL?
code lgtm w/ two nits. thanks. https://codereview.chromium.org/23453032/diff/29001/chrome/app/breakpad_win.cc File chrome/app/breakpad_win.cc (right): https://codereview.chromium.org/23453032/diff/29001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:162: if (length > 0 && length < arraysize(g_browser_crash_dump_prefix)) { nit: remove braces since the conditional and the two bodies are each one liners. https://codereview.chromium.org/23453032/diff/29001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:177: const int kMaxValueSize = 2 * arraysize(g_browser_crash_dump_prefix); int -> size_t since this is counting things (and arraysize returns size_t)
Thanks. I've addressed the remaining nits and updated histograms.xml. cpu? rvargas? jar? https://codereview.chromium.org/23453032/diff/29001/chrome/app/breakpad_win.cc File chrome/app/breakpad_win.cc (right): https://codereview.chromium.org/23453032/diff/29001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:162: if (length > 0 && length < arraysize(g_browser_crash_dump_prefix)) { On 2013/09/11 03:25:56, grt wrote: > nit: remove braces since the conditional and the two bodies are each one liners. Done. https://codereview.chromium.org/23453032/diff/29001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:177: const int kMaxValueSize = 2 * arraysize(g_browser_crash_dump_prefix); On 2013/09/11 03:25:56, grt wrote: > int -> size_t since this is counting things (and arraysize returns size_t) Done.
https://codereview.chromium.org/23453032/diff/52001/chrome/app/breakpad_win.cc File chrome/app/breakpad_win.cc (right): https://codereview.chromium.org/23453032/diff/52001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:179: int length = swprintf( This is a non trivial CL. Do you mind creating a bug so that it is easier to track what we are doing and why? I wonder if we want to do this work pre-exception processing (as is currently the case) or post- exception. I mean, we could as well do work on the callback as opposed to the filter. One approach focuses on issues uploading crash reports (I though that was the main objective), while the other includes issues generating the dump as well, but has the potential to significantly affect the object being measured. On that topic, see ExceptionHandler::HandleInvalidParameter... breakpad code is careful to minimize work when handling exceptions, as anything may result in not being able to generate a report, or getting wrong reports. printf is one of those things... https://codereview.chromium.org/23453032/diff/52001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:184: DCHECK_GT(length, 0); What does it mean to DCHECK in the middle of handling an exception?
Thanks. Created a bug, updated the CL description, and responded to the questions raised by rvargas. Another look? https://codereview.chromium.org/23453032/diff/52001/chrome/app/breakpad_win.cc File chrome/app/breakpad_win.cc (right): https://codereview.chromium.org/23453032/diff/52001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:179: int length = swprintf( On 2013/09/12 02:50:21, rvargas wrote: > This is a non trivial CL. Do you mind creating a bug so that it is easier to > track what we are doing and why? Done. http://crbug/290196 > I wonder if we want to do this work pre-exception processing (as is currently > the case) or post- exception. I mean, we could as well do work on the callback > as opposed to the filter. I put it pre-exception in case there are instances where the crash reporting mechanism itself might be failing, which is a large part of what these histograms are trying to confirm/deny. > One approach focuses on issues uploading crash reports (I though that was the > main objective), while the other includes issues generating the dump as well, > but has the potential to significantly affect the object being measured. > > On that topic, see ExceptionHandler::HandleInvalidParameter... breakpad code is > careful to minimize work when handling exceptions, as anything may result in not > being able to generate a report, or getting wrong reports. printf is one of > those things... So, in theory, this could be problematic. But we're not calling printf in a potentially error inducing manner. The alternative is to implement a custom string formatting function here. While certainly do-able, I think that would be hard to justify. Ideally, I would have pre-computed the entirety of the string outside the exception handler, but we need to be able to reliably differentiate between multiple instances occurring in the same process, and the exception handler chain is our only hook for doing so (AFAIK). https://codereview.chromium.org/23453032/diff/52001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:184: DCHECK_GT(length, 0); On 2013/09/12 02:50:21, rvargas wrote: > What does it mean to DCHECK in the middle of handling an exception? DCHECKs look for programming errors, so in reality this should never fire. That said, in a debug build this should cause chrome to die a stupendously fiery death (after giving the debugger a chance to inspect the problem)... alerting us to something wrong. :)
https://codereview.chromium.org/23453032/diff/29001/chrome/app/breakpad_win.cc File chrome/app/breakpad_win.cc (right): https://codereview.chromium.org/23453032/diff/29001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:177: const int kMaxValueSize = 2 * arraysize(g_browser_crash_dump_prefix); On 2013/09/11 03:25:56, grt wrote: > int -> size_t since this is counting things (and arraysize returns size_t) Small clarification about the first part: something that is counting things should be an int (http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Integer_Types), even if negative things doesn't make sense. Of course I'm fine with size_t given that we interact with arraysize... https://codereview.chromium.org/23453032/diff/52001/chrome/app/breakpad_win.cc File chrome/app/breakpad_win.cc (right): https://codereview.chromium.org/23453032/diff/52001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:153: arraysize(g_browser_crash_dump_prefix), why not use kBrowserCrashDumpPrefixLength directly? (and yes, I understand there's a + 1 going on) https://codereview.chromium.org/23453032/diff/52001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:177: const size_t kMaxValueSize = 2 * arraysize(g_browser_crash_dump_prefix); kBrowserCrashDumpPrefixLength ? https://codereview.chromium.org/23453032/diff/52001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:179: int length = swprintf( On 2013/09/12 17:51:38, Roger McFarlane (Chromium) wrote: > On 2013/09/12 02:50:21, rvargas wrote: > > This is a non trivial CL. Do you mind creating a bug so that it is easier to > > track what we are doing and why? > > Done. http://crbug/290196 > > > I wonder if we want to do this work pre-exception processing (as is currently > > the case) or post- exception. I mean, we could as well do work on the callback > > as opposed to the filter. > > I put it pre-exception in case there are instances where the crash reporting > mechanism itself might be failing, which is a large part of what these > histograms are trying to confirm/deny. > > > One approach focuses on issues uploading crash reports (I though that was the > > main objective), while the other includes issues generating the dump as well, > > but has the potential to significantly affect the object being measured. > > > > On that topic, see ExceptionHandler::HandleInvalidParameter... breakpad code > is > > careful to minimize work when handling exceptions, as anything may result in > not > > being able to generate a report, or getting wrong reports. printf is one of > > those things... > > So, in theory, this could be problematic. But we're not calling printf in a > potentially error inducing manner. The alternative is to implement a custom famous last words :) > string formatting function here. While certainly do-able, I think that would be > hard to justify. That code already exists: base::strings::SafeSPrintf. That said, what is the issue with pre-computing the string? Why not two fixed names with the counter as the value? Do we really worry about races between two threads attempting to dump without crash at the same time? I think the code would be simpler that way, no? > > Ideally, I would have pre-computed the entirety of the string outside the > exception handler, but we need to be able to reliably differentiate between > multiple instances occurring in the same process, and the exception handler > chain is our only hook for doing so (AFAIK). https://codereview.chromium.org/23453032/diff/52001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:184: DCHECK_GT(length, 0); On 2013/09/12 17:51:38, Roger McFarlane (Chromium) wrote: > On 2013/09/12 02:50:21, rvargas wrote: > > What does it mean to DCHECK in the middle of handling an exception? > > DCHECKs look for programming errors, so in reality this should never fire. That > said, in a debug build this should cause chrome to die a stupendously fiery > death (after giving the debugger a chance to inspect the problem)... alerting us > to something wrong. > > :) Maybe it is just me, but I don't think intentionally reentering code while processing exceptions is a sane thing to do, even for debug builds.
Thanks. PTAL? https://codereview.chromium.org/23453032/diff/52001/chrome/app/breakpad_win.cc File chrome/app/breakpad_win.cc (right): https://codereview.chromium.org/23453032/diff/52001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:153: arraysize(g_browser_crash_dump_prefix), On 2013/09/12 20:15:59, rvargas wrote: > why not use kBrowserCrashDumpPrefixLength directly? (and yes, I understand > there's a + 1 going on) Switched to use SafeSPrintf here as well, no need to specify the size. https://codereview.chromium.org/23453032/diff/52001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:177: const size_t kMaxValueSize = 2 * arraysize(g_browser_crash_dump_prefix); On 2013/09/12 20:15:59, rvargas wrote: > kBrowserCrashDumpPrefixLength ? Done. https://codereview.chromium.org/23453032/diff/52001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:179: int length = swprintf( On 2013/09/12 20:15:59, rvargas wrote: > On 2013/09/12 17:51:38, Roger McFarlane (Chromium) wrote: > > On 2013/09/12 02:50:21, rvargas wrote: > > > This is a non trivial CL. Do you mind creating a bug so that it is easier to > > > track what we are doing and why? > > > > Done. http://crbug/290196 > > > > > I wonder if we want to do this work pre-exception processing (as is > currently > > > the case) or post- exception. I mean, we could as well do work on the > callback > > > as opposed to the filter. > > > > I put it pre-exception in case there are instances where the crash reporting > > mechanism itself might be failing, which is a large part of what these > > histograms are trying to confirm/deny. > > > > > One approach focuses on issues uploading crash reports (I though that was > the > > > main objective), while the other includes issues generating the dump as > well, > > > but has the potential to significantly affect the object being measured. > > > > > > On that topic, see ExceptionHandler::HandleInvalidParameter... breakpad code > > is > > > careful to minimize work when handling exceptions, as anything may result in > > not > > > being able to generate a report, or getting wrong reports. printf is one of > > > those things... > > > > So, in theory, this could be problematic. But we're not calling printf in a > > potentially error inducing manner. The alternative is to implement a custom > > famous last words :) > > > > string formatting function here. While certainly do-able, I think that would > be > > hard to justify. > > That code already exists: base::strings::SafeSPrintf. > > That said, what is the issue with pre-computing the string? Why not two fixed > names with the counter as the value? Do we really worry about races between two > threads attempting to dump without crash at the same time? I think the code > would be simpler that way, no? > > > > > Ideally, I would have pre-computed the entirety of the string outside the > > exception handler, but we need to be able to reliably differentiate between > > multiple instances occurring in the same process, and the exception handler > > chain is our only hook for doing so (AFAIK). > Re: famous last words Indeed. After I hit 'send' I thought: I'm going to live to regret those words. re: SafeSPrintf Nice! Didn't know that was there (looks like it landed just over a week ago). Will use that instead. Re: Concurrent non-crashing dumps While unlikely, we've got a lot a users, so eventually this is going to happen (it's called out in the comments for the non-crashing filter callback as well). Actually, the problem is slightly bigger than just concurrency, it's that writing the value as a counter (instead of counted booleans) makes it possible to double count dump attempts. 1 - B1 records a dump attempt but keeps running 2 - B2 consumes the registry key 3 - B1 records another dump attempt, value now equals 2 4 - Next consumer double counts the first attempt. https://codereview.chromium.org/23453032/diff/52001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:184: DCHECK_GT(length, 0); On 2013/09/12 20:15:59, rvargas wrote: > On 2013/09/12 17:51:38, Roger McFarlane (Chromium) wrote: > > On 2013/09/12 02:50:21, rvargas wrote: > > > What does it mean to DCHECK in the middle of handling an exception? > > > > DCHECKs look for programming errors, so in reality this should never fire. > That > > said, in a debug build this should cause chrome to die a stupendously fiery > > death (after giving the debugger a chance to inspect the problem)... alerting > us > > to something wrong. > > > > :) > > Maybe it is just me, but I don't think intentionally reentering code while > processing exceptions is a sane thing to do, even for debug builds. Fair enough, I guess. The intent of the DCHECK was call attention to a problem instead of just silently becoming a NOP, which is what the defensive programming will result in. In this case, I agree that falling back to a NOP in release builds is the right approach, but in debug it would be nice to fail noticeably. While in principle, I'd prefer to leave the DCHECKs there, I'm not averse to removing them. Done.
https://codereview.chromium.org/23453032/diff/29001/chrome/app/breakpad_win.cc File chrome/app/breakpad_win.cc (right): https://codereview.chromium.org/23453032/diff/29001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:177: const int kMaxValueSize = 2 * arraysize(g_browser_crash_dump_prefix); On 2013/09/12 20:15:59, rvargas wrote: > On 2013/09/11 03:25:56, grt wrote: > > int -> size_t since this is counting things (and arraysize returns size_t) > > Small clarification about the first part: something that is counting things > should be an int > (http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Integer_Types), > even if negative things doesn't make sense. Of course I'm fine with size_t given > that we interact with arraysize... ah, i chose words poorly. thanks for pointing that out, Ricardo. my understanding of the style guide was that when you're specifically talking about the size of something measured by sizeof() or an STL thing that is size_type, then size_t is the right choice. i didn't mean "counting sheep" sort of counting. does that seem correct to you?
lgtm https://codereview.chromium.org/23453032/diff/29001/chrome/app/breakpad_win.cc File chrome/app/breakpad_win.cc (right): https://codereview.chromium.org/23453032/diff/29001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:177: const int kMaxValueSize = 2 * arraysize(g_browser_crash_dump_prefix); On 2013/09/13 18:39:29, grt wrote: > On 2013/09/12 20:15:59, rvargas wrote: > > On 2013/09/11 03:25:56, grt wrote: > > > int -> size_t since this is counting things (and arraysize returns size_t) > > > > Small clarification about the first part: something that is counting things > > should be an int > > > (http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Integer_Types), > > even if negative things doesn't make sense. Of course I'm fine with size_t > given > > that we interact with arraysize... > > ah, i chose words poorly. thanks for pointing that out, Ricardo. my > understanding of the style guide was that when you're specifically talking about > the size of something measured by sizeof() or an STL thing that is size_type, > then size_t is the right choice. i didn't mean "counting sheep" sort of > counting. does that seem correct to you? yes, that's also my reading of the guide. Once something is returning size_t, the code should adapt to it and the simplest way to do it is to also use size_t around it (reduce unsafe casts etc). It is unfortunate that size_t is unsigned and 64 bits wide, but there's nothing to do about it and it hardly ever matters. https://codereview.chromium.org/23453032/diff/52001/chrome/app/breakpad_win.cc File chrome/app/breakpad_win.cc (right): https://codereview.chromium.org/23453032/diff/52001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:179: int length = swprintf( On 2013/09/13 18:34:23, Roger McFarlane (Chromium) wrote: > On 2013/09/12 20:15:59, rvargas wrote: > > On 2013/09/12 17:51:38, Roger McFarlane (Chromium) wrote: > > > On 2013/09/12 02:50:21, rvargas wrote: > > > > This is a non trivial CL. Do you mind creating a bug so that it is easier > to > > > > track what we are doing and why? > > > > > > Done. http://crbug/290196 > > > > > > > I wonder if we want to do this work pre-exception processing (as is > > currently > > > > the case) or post- exception. I mean, we could as well do work on the > > callback > > > > as opposed to the filter. > > > > > > I put it pre-exception in case there are instances where the crash reporting > > > mechanism itself might be failing, which is a large part of what these > > > histograms are trying to confirm/deny. > > > > > > > One approach focuses on issues uploading crash reports (I though that was > > the > > > > main objective), while the other includes issues generating the dump as > > well, > > > > but has the potential to significantly affect the object being measured. > > > > > > > > On that topic, see ExceptionHandler::HandleInvalidParameter... breakpad > code > > > is > > > > careful to minimize work when handling exceptions, as anything may result > in > > > not > > > > being able to generate a report, or getting wrong reports. printf is one > of > > > > those things... > > > > > > So, in theory, this could be problematic. But we're not calling printf in a > > > potentially error inducing manner. The alternative is to implement a custom > > > > famous last words :) > > > > > > > string formatting function here. While certainly do-able, I think that would > > be > > > hard to justify. > > > > That code already exists: base::strings::SafeSPrintf. > > > > That said, what is the issue with pre-computing the string? Why not two fixed > > names with the counter as the value? Do we really worry about races between > two > > threads attempting to dump without crash at the same time? I think the code > > would be simpler that way, no? > > > > > > > > Ideally, I would have pre-computed the entirety of the string outside the > > > exception handler, but we need to be able to reliably differentiate between > > > multiple instances occurring in the same process, and the exception handler > > > chain is our only hook for doing so (AFAIK). > > > > Re: famous last words > > Indeed. After I hit 'send' I thought: I'm going to live to regret those words. > > re: SafeSPrintf > > Nice! Didn't know that was there (looks like it landed just over a week ago). > Will use that instead. yeah... it took quite a while to land. > > Re: Concurrent non-crashing dumps > > While unlikely, we've got a lot a users, so eventually this is going to happen > (it's called out in the comments for the non-crashing filter callback as well). sure, but for this code we only care if it actually affects the result of the measurement in a way that we can notice :) > > Actually, the problem is slightly bigger than just concurrency, it's that > writing the value as a counter (instead of counted booleans) makes it possible > to double count dump attempts. > > 1 - B1 records a dump attempt but keeps running > 2 - B2 consumes the registry key > 3 - B1 records another dump attempt, value now equals 2 > 4 - Next consumer double counts the first attempt. That should not be a concern because B2 should have a lock against B1 before MetricsService starts. If that is not the case (or we worry about the very small group of users who have multiple user data dirs), then this code is also wrong in that reading back the registry keys is not locked against another browser doing the same... two browsers may gather keys, report and attempt to delete keys at the same time. I'm not saying that we should add more synchronization to cover more edge cases. And of course I'm not opposed to code that actually takes into account more edge cases. What I was trying to say is that sometimes having simpler code is better, as long as the consequences of not being exhaustive are acceptable. The goal here is to correlate other metrics that have a 10x factor of disagreement, so unless we have a significant source of errors (we miscount a very large percentage of cases), the end result is not going to change. On the other hand, I'm always supper cautious about doing any work at all while dealing with a process that is in the middle of crashing, so my opinion will be weighted towards simpler code. https://codereview.chromium.org/23453032/diff/62001/chrome/app/breakpad_win.cc File chrome/app/breakpad_win.cc (right): https://codereview.chromium.org/23453032/diff/62001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:158: DCHECK_GT(length, 0); nit: use notreached inside the if below
Thanks. https://codereview.chromium.org/23453032/diff/62001/chrome/app/breakpad_win.cc File chrome/app/breakpad_win.cc (right): https://codereview.chromium.org/23453032/diff/62001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:158: DCHECK_GT(length, 0); On 2013/09/13 19:45:29, rvargas wrote: > nit: use notreached inside the if below Done.
https://codereview.chromium.org/23453032/diff/72001/chrome/app/breakpad_win.cc File chrome/app/breakpad_win.cc (right): https://codereview.chromium.org/23453032/diff/72001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:627: RecordCrashDumpAttempt(true); Is this thing here temporary? I want to keep the code we execute while crashed (for realz) to the minimun possible. For example, if the stack is corrupted we would get less crashes the more we executue code here.
https://codereview.chromium.org/23453032/diff/72001/chrome/app/breakpad_win.cc File chrome/app/breakpad_win.cc (right): https://codereview.chromium.org/23453032/diff/72001/chrome/app/breakpad_win.c... chrome/app/breakpad_win.cc:627: RecordCrashDumpAttempt(true); On 2013/09/13 21:04:28, cpu wrote: > Is this thing here temporary? I want to keep the code we execute while crashed > (for realz) to the minimun possible. For example, if the stack is corrupted we > would get less crashes the more we executue code here. It's here for as long as want to keep these histograms alive. For the most recent snapshots of this CL I've simply move the call from being sprinkled before every WriteMiniDump call to being in the callback(s) invoked by WriteMiniDump, so we don't miss any. It's otherwise the same basic set of actions as the previous iterations. In sum, the "new" stuff happening on crash dump are a SafeSPrintf and a RegSetValueEx.
lgtm
lgtm
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/rogerm@chromium.org/23453032/53004
Message was sent while issue was closed.
Change committed as 223314 |