|
|
Created:
8 years, 1 month ago by Paweł Hajdan Jr. Modified:
8 years, 1 month ago CC:
chromium-reviews, erikwright+watch_chromium.org, Markus (顧孟勤) Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionGTTF: Make Linux stack dump signal handler async-signal safe.
This also re-enables in-process stack dumping on Linux.
BUG=101155
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=167714
Patch Set 1 #
Total comments: 13
Patch Set 2 : fixes #
Total comments: 2
Patch Set 3 : fixes #
Total comments: 12
Patch Set 4 : fixes #Patch Set 5 : fixes #
Total comments: 7
Patch Set 6 : fixes #Patch Set 7 : trybot debugging ios & android #
Total comments: 2
Patch Set 8 : fixes #Patch Set 9 : hopefully the ultimate android fix #
Total comments: 4
Patch Set 10 : shess' comments #
Messages
Total messages: 26 (0 generated)
Please review: Jim: entire CL Julien: is there some way for me to share code between base/ and sandbox-bpf?
On 2012/11/01 22:54:46, Paweł Hajdan Jr. wrote: > Please review: > > Jim: entire CL > Julien: is there some way for me to share code between base/ and sandbox-bpf? Excellent to see this! sandbox/linux/sandbox-bpf can use base, even though in general we don't because we need to operate at a very low level. We also try to keep the code re-usable outside of Chrome, so there is a "SECCOMP_BPF_STANDALONE" #ifdef for that. What did you have in mind?
I didn't have time to go though the rest of the code, and stopped after itoa (which I probably spent too much time thinking about). Please have a look at the comments, and at a minimum write the tests that show that I got confused somewhere. https://codereview.chromium.org/11362048/diff/1/base/debug/debugger_posix.cc File base/debug/debugger_posix.cc (right): https://codereview.chromium.org/11362048/diff/1/base/debug/debugger_posix.cc#... base/debug/debugger_posix.cc:84: // While some code used below may be async-signal unsafe, note how Please be specific (don't ask me to find the offending code). I didn't see any statics... so I'm missing what you mean by cached. https://codereview.chromium.org/11362048/diff/1/base/debug/stack_trace_posix.cc File base/debug/stack_trace_posix.cc (right): https://codereview.chromium.org/11362048/diff/1/base/debug/stack_trace_posix.... base/debug/stack_trace_posix.cc:58: // Handle negative numbers. Negative numbers are only supposed to be handled when the base is 10. https://codereview.chromium.org/11362048/diff/1/base/debug/stack_trace_posix.... base/debug/stack_trace_posix.cc:80: // Loop until we have converted the entire number. Output at least one At this point you should convert to an unsigned value, so that the high-order bit will not be distorted to be a sign bit. uintptr_t j = reinterpret_cast<uintptr_t>(i); https://codereview.chromium.org/11362048/diff/1/base/debug/stack_trace_posix.... base/debug/stack_trace_posix.cc:87: goto truncate; nit: better is "break" https://codereview.chromium.org/11362048/diff/1/base/debug/stack_trace_posix.... base/debug/stack_trace_posix.cc:93: // in 2, 4, 6, or 8. The comment about the least-significant-character is only correct if the base is 10. https://codereview.chromium.org/11362048/diff/1/base/debug/stack_trace_posix.... base/debug/stack_trace_posix.cc:94: *ptr++ = "0123456789abcdef"[i % base] + minint; This then seems wrong, in the case where minint == 1, and a base of 16. I suspect (not sure) it will print (for a 32 bit int): -7ffffffg rather than the vaguely desired (assuming you print negative hex values): -100000000 Please write a test case for: itoa(MAX_INT + 1) (with sufficiently large buffers, and a base of 16). You should probably also have that test case for a base of 10.
On 2012/11/02 01:17:48, Julien Tinnes wrote: > Excellent to see this! sandbox/linux/sandbox-bpf can use base, even though in > general we don't because we need to operate at a very low level. How should I modify the hand-written Makefile there so I can link to base then? > We also try to keep the code re-usable outside of Chrome, so there is a > "SECCOMP_BPF_STANDALONE" #ifdef for that. What am I supposed to do in the standalone case then? Keep the duplicated code? :-/ > What did you have in mind? I'm duplicating code here and asking for advice. I'd prefer to have only one (tested!) copy in the codebase. http://codereview.chromium.org/11362048/diff/1/base/debug/debugger_posix.cc File base/debug/debugger_posix.cc (right): http://codereview.chromium.org/11362048/diff/1/base/debug/debugger_posix.cc#n... base/debug/debugger_posix.cc:84: // While some code used below may be async-signal unsafe, note how On 2012/11/02 02:36:57, jar wrote: > Please be specific (don't ask me to find the offending code). > > I didn't see any statics... so I'm missing what you mean by cached. Done. http://codereview.chromium.org/11362048/diff/1/base/debug/stack_trace_posix.cc File base/debug/stack_trace_posix.cc (right): http://codereview.chromium.org/11362048/diff/1/base/debug/stack_trace_posix.c... base/debug/stack_trace_posix.cc:58: // Handle negative numbers. On 2012/11/02 02:36:57, jar wrote: > Negative numbers are only supposed to be handled when the base is 10. Done. http://codereview.chromium.org/11362048/diff/1/base/debug/stack_trace_posix.c... base/debug/stack_trace_posix.cc:80: // Loop until we have converted the entire number. Output at least one On 2012/11/02 02:36:57, jar wrote: > At this point you should convert to an unsigned value, so that the high-order > bit will not be distorted to be a sign bit. > > uintptr_t j = reinterpret_cast<uintptr_t>(i); I don't really understand this (read: I see a scary reinterpret_cast here and I'd learn a lot from more detailed explanation). http://codereview.chromium.org/11362048/diff/1/base/debug/stack_trace_posix.c... base/debug/stack_trace_posix.cc:87: goto truncate; On 2012/11/02 02:36:57, jar wrote: > nit: better is "break" Done. http://codereview.chromium.org/11362048/diff/1/base/debug/stack_trace_posix.c... base/debug/stack_trace_posix.cc:93: // in 2, 4, 6, or 8. On 2012/11/02 02:36:57, jar wrote: > The comment about the least-significant-character is only correct if the base is > 10. Done. http://codereview.chromium.org/11362048/diff/1/base/debug/stack_trace_posix.c... base/debug/stack_trace_posix.cc:94: *ptr++ = "0123456789abcdef"[i % base] + minint; On 2012/11/02 02:36:57, jar wrote: > This then seems wrong, in the case where minint == 1, and a base of 16. > > I suspect (not sure) it will print (for a 32 bit int): > -7ffffffg > > rather than the vaguely desired (assuming you print negative hex values): > > -100000000 > > > Please write a test case for: > itoa(MAX_INT + 1) (with sufficiently large buffers, and a base of 16). > > You should probably also have that test case for a base of 10. Right, I was thinking about tests. For that, I'd need to move itoa_r to some separate file (any suggestions?) and write the tests. Could you just suggest the best place?
On 2012/11/06 17:58:45, Paweł Hajdan Jr. wrote: > On 2012/11/02 01:17:48, Julien Tinnes wrote: > > Excellent to see this! sandbox/linux/sandbox-bpf can use base, even though in > > general we don't because we need to operate at a very low level. > > How should I modify the hand-written Makefile there so I can link to base then? > > > We also try to keep the code re-usable outside of Chrome, so there is a > > "SECCOMP_BPF_STANDALONE" #ifdef for that. > > What am I supposed to do in the standalone case then? Keep the duplicated code? > :-/ > > > What did you have in mind? > > I'm duplicating code here and asking for advice. I'd prefer to have only one > (tested!) copy in the codebase. Ohh, I had missed that you were duplicating code from demo.cc, now I understand. You can roughly consider demo.cc and the Makefile as not being part of Chrome. It's just a convenient helper that can be used for standalone projects and for debugging. I dislike code duplication as much as anyone, but in this particular case I don't think it is an issue. I wouldn't be surprised if reasonably soon we started to consider that maintaining anything that works outside of Chrome is not worth the effort and it would disappear altogether. I've added Markus who wrote this code to this thread.
Much better... and I just realized something else to make it much much simpler. You should put the corresponding itoa tests into the adjacent file, since we have the source in this stack trace file. It could be that it should be in some utility.... but I'd wait for a better reason to push it further into base. https://codereview.chromium.org/11362048/diff/1/base/debug/stack_trace_posix.cc File base/debug/stack_trace_posix.cc (right): https://codereview.chromium.org/11362048/diff/1/base/debug/stack_trace_posix.... base/debug/stack_trace_posix.cc:80: // Loop until we have converted the entire number. Output at least one Yeah... unsigned is broader... you don't need the cast (I think) and could use static_cast if the compiler(s) complain. On 2012/11/06 17:58:46, Paweł Hajdan Jr. wrote: > On 2012/11/02 02:36:57, jar wrote: > > At this point you should convert to an unsigned value, so that the high-order > > bit will not be distorted to be a sign bit. > > > > uintptr_t j = reinterpret_cast<uintptr_t>(i); > > I don't really understand this (read: I see a scary reinterpret_cast here and > I'd learn a lot from more detailed explanation). https://codereview.chromium.org/11362048/diff/8001/base/debug/stack_trace_pos... File base/debug/stack_trace_posix.cc (right): https://codereview.chromium.org/11362048/diff/8001/base/debug/stack_trace_pos... base/debug/stack_trace_posix.cc:78: i = -i; You don't actually need the minint game, now that you convert it to unsigned in line 83. In the weird/problematic case where we are exactly something like: 10000000000B This is indeed the most negative number... and it is the case that when we negate it (take complement, and add one) we do indeed get back the same negative number. The good news is that this number *is* is range when we convert it to unsigned, and hence we don't need to subtract one here, and then add one on line 96. The code just works ;-). The trick is that the range of unsigned (on the positive end) is twice as large as signed, so we have no trouble representing the maximally negative number in an unsigned variable. https://codereview.chromium.org/11362048/diff/8001/base/debug/stack_trace_pos... base/debug/stack_trace_posix.cc:97: if (base == 10) { nit: Don't bother to test here. Minint will only be set to 1 if there was a problem, and it was base 10. The old code on lines 94 was fine (with j instead of i now).
PTAL
https://codereview.chromium.org/11362048/diff/5/base/debug/stack_trace_posix.cc File base/debug/stack_trace_posix.cc (right): https://codereview.chromium.org/11362048/diff/5/base/debug/stack_trace_posix.... base/debug/stack_trace_posix.cc:256: if (i < 0) { This should only be handled if base is 10. https://codereview.chromium.org/11362048/diff/5/base/debug/stack_trace_posix.... base/debug/stack_trace_posix.cc:265: uintptr_t j = (i > 0) ? i : -i; The fixup for signed conversion should only be done if this is base 10. https://codereview.chromium.org/11362048/diff/5/base/debug/stack_trace_unitte... File base/debug/stack_trace_unittest.cc (right): https://codereview.chromium.org/11362048/diff/5/base/debug/stack_trace_unitte... base/debug/stack_trace_unittest.cc:158: EXPECT_EQ("-1", itoa_r_wrapper(-1, 128, 10)); You should do this base 16, and get (on your 64 bit machine): "FFFFFFFFFFFFFFFF" https://codereview.chromium.org/11362048/diff/5/base/debug/stack_trace_unitte... base/debug/stack_trace_unittest.cc:161: EXPECT_EQ("-9223372036854775808", On a 32 bit machine, this will give a different answer. You can base it in sizeof(intptr_t). https://codereview.chromium.org/11362048/diff/5/base/debug/stack_trace_unitte... base/debug/stack_trace_unittest.cc:164: itoa_r_wrapper(std::numeric_limits<intptr_t>::max(), 128, 10)); Also do this verification, base 16. The results should be (on your 64 bit machine): max ==> "7FFFFFFFFFFFFFFF" min==> "8000000000000000" Again, results will vary depending on word size. There should be no
https://codereview.chromium.org/11362048/diff/5/base/debug/stack_trace_unitte... File base/debug/stack_trace_unittest.cc (right): https://codereview.chromium.org/11362048/diff/5/base/debug/stack_trace_unitte... base/debug/stack_trace_unittest.cc:161: EXPECT_EQ("-9223372036854775808", On 2012/11/07 00:32:07, jar wrote: > On a 32 bit machine, this will give a different answer. You can base it in > sizeof(intptr_t). Do we have any 32-bit machines we test on? In case a trybot or buildbot fails on this I'm defnitely going to add 32-bit case.
Sorry about the late and nit-picky comments. This feels kind of excessive, like maybe rather than handling edge cases in that code, we should just forbid them from being possible. Like if you just forbid negative values, only accept base-10 and base-16, and provide constant for the required buffer size. Since it's for signal handling, I don't see the value in having a lot of flexibility, here. Before running with that suggestion, maybe jar@ could comment pro/con. https://codereview.chromium.org/11362048/diff/5/base/debug/stack_trace_posix.cc File base/debug/stack_trace_posix.cc (right): https://codereview.chromium.org/11362048/diff/5/base/debug/stack_trace_posix.... base/debug/stack_trace_posix.cc:244: namespace internal { itoa() is defined as handling bases between 2 and 36. This only handles bases between 2 and 16. At minimum you need to fail if base is outside the range [2,16]. I think it would be reasonable to only allow bases 10 and 16, honestly. https://codereview.chromium.org/11362048/diff/5/base/debug/stack_trace_posix.... base/debug/stack_trace_posix.cc:247: char *itoa_r(intptr_t i, char *buf, size_t sz, int base) { You might consider just requiring sz to be able to contain enough space to accomodate sizeof(i)*8+1 (largest 2-bit output plus nul terminator). Adjust accordingly if you decide to restrict to bases 10 and 16. https://codereview.chromium.org/11362048/diff/5/base/debug/stack_trace_posix.... base/debug/stack_trace_posix.cc:256: if (i < 0) { On 2012/11/07 00:32:07, jar wrote: > This should only be handled if base is 10. As currently used, you could just push the negative case into the caller. Actually, I'm not entirely clear whether the negative case actually happens in practice for us. https://codereview.chromium.org/11362048/diff/5/base/debug/stack_trace_posix.... base/debug/stack_trace_posix.cc:273: buf = NULL; I think this would make more sense as return NULL, or the same thing as the case above where it also sets *buf='\000'. As-is, it's stuffing a truncated representation of the low-order bits into the output buffer. https://codereview.chromium.org/11362048/diff/5/base/debug/stack_trace_posix.... base/debug/stack_trace_posix.cc:274: break; I think this should just return NULL, or *buf='\000' and return NULL. If someone depends on the nul-terminated low-order digits to be in the buffer, that's weird. https://codereview.chromium.org/11362048/diff/5/base/debug/stack_trace_unitte... File base/debug/stack_trace_unittest.cc (right): https://codereview.chromium.org/11362048/diff/5/base/debug/stack_trace_unitte... base/debug/stack_trace_unittest.cc:161: EXPECT_EQ("-9223372036854775808", On 2012/11/07 00:52:28, Paweł Hajdan Jr. wrote: > On 2012/11/07 00:32:07, jar wrote: > > On a 32 bit machine, this will give a different answer. You can base it in > > sizeof(intptr_t). > > Do we have any 32-bit machines we test on? In case a trybot or buildbot fails on > this I'm defnitely going to add 32-bit case. OSX still builds 32-bit.
Windows is only built in 32 bit mode, and hence pointers will only be 32 bits. Jim On 2012/11/07 00:52:28, Paweł Hajdan Jr. wrote: > https://codereview.chromium.org/11362048/diff/5/base/debug/stack_trace_unitte... > File base/debug/stack_trace_unittest.cc (right): > > https://codereview.chromium.org/11362048/diff/5/base/debug/stack_trace_unitte... > base/debug/stack_trace_unittest.cc:161: EXPECT_EQ("-9223372036854775808", > On 2012/11/07 00:32:07, jar wrote: > > On a 32 bit machine, this will give a different answer. You can base it in > > sizeof(intptr_t). > > Do we have any 32-bit machines we test on? In case a trybot or buildbot fails on > this I'm defnitely going to add 32-bit case.
FYI: Linux is built and shipped both as a 32 and 64 bit binary. I'd assume we test both. Jim On 2012/11/07 01:07:04, jar wrote: > Windows is only built in 32 bit mode, and hence pointers will only be 32 bits. > > Jim > > On 2012/11/07 00:52:28, Paweł Hajdan Jr. wrote: > > > https://codereview.chromium.org/11362048/diff/5/base/debug/stack_trace_unitte... > > File base/debug/stack_trace_unittest.cc (right): > > > > > https://codereview.chromium.org/11362048/diff/5/base/debug/stack_trace_unitte... > > base/debug/stack_trace_unittest.cc:161: EXPECT_EQ("-9223372036854775808", > > On 2012/11/07 00:32:07, jar wrote: > > > On a 32 bit machine, this will give a different answer. You can base it in > > > sizeof(intptr_t). > > > > Do we have any 32-bit machines we test on? In case a trybot or buildbot fails > on > > this I'm defnitely going to add 32-bit case.
PTAL I'm trying to keep itoa_r general, so if it needs to be re-used later in slightly different context, it can. Also, note that when running in a signal handler, especially after catching SIGSEGV, we might get unexpected negative values, and I'd prefer the handler to write negative values if they are negative, and not something different and confusing.
https://codereview.chromium.org/11362048/diff/1006/base/debug/stack_trace_pos... File base/debug/stack_trace_posix.cc (right): https://codereview.chromium.org/11362048/diff/1006/base/debug/stack_trace_pos... base/debug/stack_trace_posix.cc:64: #endif // defined(USE_SYMBOLIZE) I might be missing something, but AFAICT this means on OSX we'll just get hexcodes on output? https://codereview.chromium.org/11362048/diff/1006/base/debug/stack_trace_pos... base/debug/stack_trace_posix.cc:249: DCHECK_GE(16, base); If someone passes 1, the code is going to zero-fill then give up, if someone passes zero it's going to divide-by-zero, if someone passes >16 it's going to access memory off the end of the digit buffer. The only thing you know about the point where this is being called is that bad things are happening. You should check the bounds and fail safely in production. I have no idea how to model what happens when ou run DCHECK in a signal handler. Probably not a good thing. https://codereview.chromium.org/11362048/diff/1006/base/debug/stack_trace_pos... base/debug/stack_trace_posix.cc:278: *ptr = '\000'; To be consistent, *buf=0. As-is you'll leave the low-order digits in reverse order, perhaps with a leading -.
http://codereview.chromium.org/11362048/diff/1006/base/debug/stack_trace_posi... File base/debug/stack_trace_posix.cc (right): http://codereview.chromium.org/11362048/diff/1006/base/debug/stack_trace_posi... base/debug/stack_trace_posix.cc:64: #endif // defined(USE_SYMBOLIZE) On 2012/11/07 17:58:07, shess wrote: > I might be missing something, but AFAICT this means on OSX we'll just get > hexcodes on output? Unfortunately, yes. The previous code was _not_ async-signal safe, so I deleted it. Do you have some suggestions what to do about that? I'm not happy about losing in-process symbolization on Mac. On the other hand, hangs are worse. http://codereview.chromium.org/11362048/diff/1006/base/debug/stack_trace_posi... base/debug/stack_trace_posix.cc:249: DCHECK_GE(16, base); On 2012/11/07 17:58:07, shess wrote: > I have no idea how to model what happens when ou run DCHECK in a signal handler. > Probably not a good thing. Oops, that'd be the very hang I'm trying to prevent. Fixed now. Good catch. http://codereview.chromium.org/11362048/diff/1006/base/debug/stack_trace_posi... base/debug/stack_trace_posix.cc:278: *ptr = '\000'; On 2012/11/07 17:58:07, shess wrote: > To be consistent, *buf=0. As-is you'll leave the low-order digits in reverse > order, perhaps with a leading -. Done.
https://codereview.chromium.org/11362048/diff/1006/base/debug/stack_trace_pos... File base/debug/stack_trace_posix.cc (right): https://codereview.chromium.org/11362048/diff/1006/base/debug/stack_trace_pos... base/debug/stack_trace_posix.cc:64: #endif // defined(USE_SYMBOLIZE) On 2012/11/07 19:45:11, Paweł Hajdan Jr. wrote: > On 2012/11/07 17:58:07, shess wrote: > > I might be missing something, but AFAICT this means on OSX we'll just get > > hexcodes on output? > > Unfortunately, yes. The previous code was _not_ async-signal safe, so I deleted > it. Do you have some suggestions what to do about that? I'm not happy about > losing in-process symbolization on Mac. On the other hand, hangs are worse. This is general-purpose code which the signal handler happens to call, not code targetted to only be used by signal handlers. Dropping the readable stack frames for all users is not acceptable. AFAICT, the hang is only a problem for ASAN's malloc implementation? Do you have any evidence that it's a problem anywhere else? Because I think you'll find lots of pushback even if you only drop stack frames for the signal handler. Hex codes are useless without the original binary, so this would be a big impact for OSX. AFAICT, backtrace_symbols_fd() might be safe for this. It uses snprintf(), but not the format specifiers which call into malloc() (for vectors, doubles and wide-char conversion, from what I can follow).
On 2012/11/07 20:29:26, shess wrote: > This is general-purpose code which the signal handler happens to call, not code > targetted to only be used by signal handlers. Dropping the readable stack > frames for all users is not acceptable. Well, what do you suggest then? I could plumb a bool parameter like "running inside a signal handler" through things. > AFAICT, the hang is only a problem for ASAN's malloc implementation? Do you > have any evidence that it's a problem anywhere else? Yes. In fact, this has nothing to do with ASAN. This signal handler calls malloc, which is just not safe to do. Oh, and it used to call fflush too, which is also unsafe. ;-) > AFAICT, backtrace_symbols_fd() might be safe for this. It uses snprintf(), but > not the format specifiers which call into malloc() (for vectors, doubles and > wide-char conversion, from what I can follow). It's not obvious how to implement OutputToStream based on backtrace_symbols_fd - but maybe I'm overlooking something. Do you have some ideas?
On 2012/11/07 21:13:20, Paweł Hajdan Jr. wrote: > On 2012/11/07 20:29:26, shess wrote: > > This is general-purpose code which the signal handler happens to call, not > code > > targetted to only be used by signal handlers. Dropping the readable stack > > frames for all users is not acceptable. > > Well, what do you suggest then? I could plumb a bool parameter like "running > inside a signal handler" through things. > > > AFAICT, backtrace_symbols_fd() might be safe for this. It uses snprintf(), > but > > not the format specifiers which call into malloc() (for vectors, doubles and > > wide-char conversion, from what I can follow). > > It's not obvious how to implement OutputToStream based on backtrace_symbols_fd - > but maybe I'm overlooking something. Do you have some ideas? StackTrace attempts to provide a portable thing for generating stack traces, which is great for debugging in general. There's no reason it has to also be great for signal handling, and there's no reason that the signal-handling version has to go via OutputToStream. I think it would be perfectly fine to just call backtrace() and backtrace_symbols_fd() right in the signal handler, for OSX. Linux might gripe about losing demangling in that case, but if different platforms have different needs, that justifies a different implementation for Linux. > > AFAICT, the hang is only a problem for ASAN's malloc implementation? Do you > > have any evidence that it's a problem anywhere else? > > Yes. In fact, this has nothing to do with ASAN. This signal handler calls > malloc, which is just not safe to do. Oh, and it used to call fflush too, which > is also unsafe. ;-) As noted on the bug, this is in a bit of code which ends with "CRASH EVERYTHING". So it doesn't exactly have to be 100% safe, it just has to be safe from a practical viewpoint. If it caused a hand 1/1000 times, I'd be concerned, but 1/1000000? Less so. I think it would be reasonable to just disable the tombstone for ASAN, or make it different. The OSX malloc library is entirely different from tcmalloc, I'd be more concerned about causing recursion into the signal handler than I would be about deadlock. Hangs on trybots are bad, because they mess with cycle time, but do we have any evidence that this code is causing actual hangs, or are we just worried about the potential for hangs? If we're unlucky, it could be hanging before we could detect it, but that could be easily enough remedied by making sure the first thing the handler does is emit something on the stderr fd, then we could look for cases where things hung and the final output was from the signal handler.
Commenting only on the itoa (which looks much much better now), and leaving it to Shess to work out something that is mac friendly, I have only one nit below. https://chromiumcodereview.appspot.com/11362048/diff/24001/base/debug/stack_t... File base/debug/stack_trace_posix.cc (right): https://chromiumcodereview.appspot.com/11362048/diff/24001/base/debug/stack_t... base/debug/stack_trace_posix.cc:249: return NULL; nit: Several distinct NULL returns (looking at the buffer) line 249: Nothing written to buffer (maybe we could). line 254: Nothing written to buffer ('cause we can't). line 257: Null written to buf[0] line 279: Null written after partial (reversed) translation. You should probably unify this: a) Always write a \0 if we can (move test in lines 252-254 up). b) When we fail, and we have a buffer to write to, always put a \0 into buf[0]. (no reason to return a partial and reversed result).
PTAL This restores some of the older logic for demangling. http://codereview.chromium.org/11362048/diff/24001/base/debug/stack_trace_pos... File base/debug/stack_trace_posix.cc (right): http://codereview.chromium.org/11362048/diff/24001/base/debug/stack_trace_pos... base/debug/stack_trace_posix.cc:249: return NULL; On 2012/11/09 04:20:41, jar wrote: > nit: Several distinct NULL returns (looking at the buffer) > line 249: Nothing written to buffer (maybe we could). > line 254: Nothing written to buffer ('cause we can't). > line 257: Null written to buf[0] > line 279: Null written after partial (reversed) translation. > > You should probably unify this: > a) Always write a \0 if we can (move test in lines 252-254 up). > > b) When we fail, and we have a buffer to write to, always put a \0 into buf[0]. > (no reason to return a partial and reversed result). Done.
On 2012/11/07 21:48:24, shess wrote: > As noted on the bug, this is in a bit of code which ends with "CRASH > EVERYTHING". So it doesn't exactly have to be 100% safe, it just has to be safe > from a practical viewpoint. If it caused a hand 1/1000 times, I'd be concerned, > but 1/1000000? Less so. I think it would be reasonable to just disable the > tombstone for ASAN, or make it different. The OSX malloc library is entirely > different from tcmalloc, I'd be more concerned about causing recursion into the > signal handler than I would be about deadlock. None of this is dealing with crashes in the signal handler. It's all about the hangs. > Hangs on trybots are bad, because they mess with cycle time, but do we have any > evidence that this code is causing actual hangs, or are we just worried about > the potential for hangs? This is causing real issues, and actually that was one of the main reasons https://codereview.chromium.org/11340054/ has been applied. > If we're unlucky, it could be hanging before we could > detect it, but that could be easily enough remedied by making sure the first > thing the handler does is emit something on the stderr fd, then we could look > for cases where things hung and the final output was from the signal handler. I think the best thing to do is just to make it correct.
itoa looks nice, safe, and clean. The tests for the function also look great... so LGTM. I'll leave it to Scott to agree that he likes the rest of the code.
I still don't feel convinced that the problem you are solving is an actual problem for OSX, but since you feel strongly about it, LGTM. If people complain I'm going to point them at you. https://codereview.chromium.org/11362048/diff/10004/base/debug/stack_trace_po... File base/debug/stack_trace_posix.cc (right): https://codereview.chromium.org/11362048/diff/10004/base/debug/stack_trace_po... base/debug/stack_trace_posix.cc:185: strncat(buf, "Received signal ", sizeof(buf) - 1); Why not just assign the string to buf[] from the get-go? Also, I think strlcat() is probably less error-prone. https://codereview.chromium.org/11362048/diff/10004/base/debug/stack_trace_po... base/debug/stack_trace_posix.cc:186: internal::itoa_r(signal, buf + strlen(buf), sizeof(buf) - strlen(buf), 10); I'd prefer to pull strlen(buf) out, not so much for performance, but to guarantee that the result is consistent.
Thank you for the review! https://codereview.chromium.org/11362048/diff/10004/base/debug/stack_trace_po... File base/debug/stack_trace_posix.cc (right): https://codereview.chromium.org/11362048/diff/10004/base/debug/stack_trace_po... base/debug/stack_trace_posix.cc:185: strncat(buf, "Received signal ", sizeof(buf) - 1); On 2012/11/14 01:12:35, shess wrote: > Why not just assign the string to buf[] from the get-go? > > Also, I think strlcat() is probably less error-prone. Done. https://codereview.chromium.org/11362048/diff/10004/base/debug/stack_trace_po... base/debug/stack_trace_posix.cc:186: internal::itoa_r(signal, buf + strlen(buf), sizeof(buf) - strlen(buf), 10); On 2012/11/14 01:12:35, shess wrote: > I'd prefer to pull strlen(buf) out, not so much for performance, but to > guarantee that the result is consistent. Done.
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/phajdan.jr@chromium.org/11362048/13013
Change committed as 167714 |