|
|
Created:
8 years, 3 months ago by Scott Hess - ex-Googler Modified:
8 years, 3 months ago CC:
chromium-reviews, erikwright+watch_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
Descriptionunchecked_malloc() for Skia on OSX.
On tcmalloc-based versions of Chromium, skia resets the global new
handler to prevent the oom-killer from kicking in. Traditionally the
OSX developers have been agin this, because it's not thread-safe, and
we have questions about whether it's even the right thing to do. This
change should be thread-safe because g_old_malloc is only set on the
main thread at startup.
As far as whether it's the right thing to do, I'm tired and want to
quit hearing about it.
Additionally, remove a bit of leftover pre-10.6 code.
BUG=103980, 73751, 117949
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=157646
Patch Set 1 #
Total comments: 4
Patch Set 2 : Oops, GetPurgeableZone() was exported for tests. #
Total comments: 3
Patch Set 3 : Assume purgeable zone always present. Drop another couple 10.5 things. Light IWYU cleanup. #Patch Set 4 : -> unchecked_malloc #
Total comments: 3
Patch Set 5 : -> UncheckedMalloc and EBADF #
Total comments: 2
Patch Set 6 : Back out EBADF, allow syslog logging from sandbox. #
Total comments: 2
Patch Set 7 : Back out sandbox changes (and accidental safe-browsing changes), override _simple_asl_log(). #Patch Set 8 : Encode errno into death_ptr. #Patch Set 9 : Back out _simple_asl_log() override, use TLS to flag EBADF check. #
Total comments: 3
Patch Set 10 : Protect against crazy errno #Patch Set 11 : Put EBADF/syslog commentary back. #
Messages
Total messages: 43 (0 generated)
The design works so far, but it doesn't handle cases like crbug.com/42342 where we'd need to expose this as try_malloc in WebKit. https://codereview.chromium.org/10908245/diff/1/base/process_util.h File base/process_util.h (right): https://codereview.chromium.org/10908245/diff/1/base/process_util.h#newcode866 base/process_util.h:866: BASE_EXPORT void* oom_safe_malloc(size_t size); I don't like the name "oom safe". While it's safe from the oom killer, it's a very _unsafe_ call overall. I'd lean towards "unsafe_malloc" or the like. https://codereview.chromium.org/10908245/diff/1/base/process_util_mac.mm File base/process_util_mac.mm (left): https://codereview.chromium.org/10908245/diff/1/base/process_util_mac.mm#oldc... base/process_util_mac.mm:848: return NULL; Thanks for removing this.
On 2012/09/13 19:45:54, Avi wrote: > The design works so far, but it doesn't handle cases like crbug.com/42342 where > we'd need to expose this as try_malloc in WebKit. I've managed to get this far without ever hearing about that, but the image-bitmap version seems to come up every three months or so, so I'm willing to just punt on 42342 :-). https://codereview.chromium.org/10908245/diff/1/base/process_util.h File base/process_util.h (right): https://codereview.chromium.org/10908245/diff/1/base/process_util.h#newcode866 base/process_util.h:866: BASE_EXPORT void* oom_safe_malloc(size_t size); On 2012/09/13 19:45:54, Avi wrote: > I don't like the name "oom safe". While it's safe from the oom killer, it's a > very _unsafe_ call overall. I'd lean towards "unsafe_malloc" or the like. Well, it's not _unsafe_, per se. Just against general policy. unchecked_malloc()? original_malloc() or system_malloc() seem kind implementation-driven, but since this is explicitly a case about avoiding a bit of our implementation, seems fair. I suppose it would also be worth putting in a scary comment, like "If you use this without running it by a few of the OSX developers, your code will be reverted." https://codereview.chromium.org/10908245/diff/1/base/process_util_mac.mm File base/process_util_mac.mm (left): https://codereview.chromium.org/10908245/diff/1/base/process_util_mac.mm#oldc... base/process_util_mac.mm:848: return NULL; On 2012/09/13 19:45:54, Avi wrote: > Thanks for removing this. Oops, added about this to the CL description.
The WebKit name for this is TryMalloc, though I'm not a fan of that. And this _is_ unsafe, or at least more unsafe than the oom malloc. Do we want to keep watch over the codebase? We could add it as a private function on a class and require users to explicitly list themselves as a friend to use it. That way we'd only need to keep an eye on one place. Yeah, I know. Eeew.
https://codereview.chromium.org/10908245/diff/9001/base/process_util_unittest.cc File base/process_util_unittest.cc (right): https://codereview.chromium.org/10908245/diff/9001/base/process_util_unittest... base/process_util_unittest.cc:1122: if (zone) Do we need these if calls? The default purgeable zone will exist post 10.6.
On 2012/09/13 20:21:17, Avi wrote: > The WebKit name for this is TryMalloc, though I'm not a fan of that. And this > _is_ unsafe, or at least more unsafe than the oom malloc. I don't really like TryMalloc, either. unsafe_malloc() feels mis-leading - the malloc itself is not unsafe, it's what you do with the results that is unsafe. An argument can perhaps be made that any caller who can do the right thing necessarily can handle the naming mis-match, but, still. > Do we want to keep watch over the codebase? We could add it as a private > function on a class and require users to explicitly list themselves as a friend > to use it. That way we'd only need to keep an eye on one place. > > Yeah, I know. Eeew. Aiigh! I'll poke at it to see how evil it looks. But I'm inclined to think that if we're to the point where we really need to do that, rather than try to address social problems with technological solutions it would be saner to just quit the project and go work on something else. Since this is inherently non-portable, nobody is likely to use it accidentally, and if they use it intentionally, so they'll either have a compelling argument, or will work around your barriers (for instance by calling sk_malloc_flags). https://codereview.chromium.org/10908245/diff/9001/base/process_util_unittest.cc File base/process_util_unittest.cc (right): https://codereview.chromium.org/10908245/diff/9001/base/process_util_unittest... base/process_util_unittest.cc:1122: if (zone) On 2012/09/13 20:21:25, Avi wrote: > Do we need these if calls? The default purgeable zone will exist post 10.6. Tough crowd! https://codereview.chromium.org/10908245/diff/9001/base/process_util_unittest... base/process_util_unittest.cc:1161: // runtime because it may not be present in the SDK used for compilation. Looks like I can fix this, too.
Scott, I won't LG this with the word "safe" in the name of the function. Everything else looks nice, but we need a different name. As bad as try_malloc is, it's better than oom_safe_malloc.
On 2012/09/13 21:15:47, shess wrote: > On 2012/09/13 20:21:17, Avi wrote: > > Do we want to keep watch over the codebase? We could add it as a private > > function on a class and require users to explicitly list themselves as a friend > > to use it. That way we'd only need to keep an eye on one place. > > > > Yeah, I know. Eeew. > > Aiigh! > > I'll poke at it to see how evil it looks. Wow. I have been entirely unable to make sk_malloc_flags() a friend function. The only way I could manage this was to declare a hider class which was friends with a provider class over in SkMemory_new_handler.cpp. I don't think it was a namespacing issue, because I didn't do anything special with the gateway class in terms of namespacing (well, except it couldn't be anon, of course). I think that would sum up to a class in process_util friending a class in base/mac somewhere which in turn friended a global class in skia-land. I'm not feeling enthusiastic about all that, really.
OK, so that's not worth it. Never mind then about that.
On 2012/09/13 21:30:48, Avi wrote: > Scott, I won't LG this with the word "safe" in the name of the function. > Everything else looks nice, but we need a different name. As bad as try_malloc > is, it's better than oom_safe_malloc. Yeah, I'm fine with no lg. try_malloc() feels funky, but maybe I can come around to it. - unsafe_malloc() - unchecked_malloc() - try_malloc() - system_malloc() - malloc_no_throw() - malloc_no_oom_killer() skia_malloc_no_throw() doesn't look half bad in terms of fitting with the context (!(flags & SK_MALLOC_THROW)), and in terms of preventing opportunistic users. Admittedly, the entire concept of screwing around with throwing in a C function is abnormal, but that's where they're at over there. Later we can add webkit_try_malloc(), implemented identically.
On 2012/09/13 22:06:55, shess wrote: > - unsafe_malloc() > - unchecked_malloc() I kinda like "unchecked_malloc", and if you're not thrilled with "unsafe" I can dig this. > - try_malloc() > - system_malloc() > - malloc_no_throw() Throw? In C? > - malloc_no_oom_killer() too_many_words
Looping in mark@ for OWNERS, and another shot at the bike shed. [man, the autocomplete for "mark" sucks.] On 2012/09/13 22:09:17, Avi wrote: > On 2012/09/13 22:06:55, shess wrote: > > - unsafe_malloc() > > - unchecked_malloc() > > I kinda like "unchecked_malloc", and if you're not thrilled with "unsafe" I can > dig this. Ima take that and run wit it! > > - try_malloc() > > - system_malloc() > > - malloc_no_throw() > > Throw? In C? Yeah, like I said, in skia-land it makes sense, even though it obvious does not make sense. > > - malloc_no_oom_killer() > > too_many_words i_am_smarter_than_all_of_you_malloc()
I'm happy. LGTM
LGTM https://codereview.chromium.org/10908245/diff/14001/base/process_util.h File base/process_util.h (right): https://codereview.chromium.org/10908245/diff/14001/base/process_util.h#newco... base/process_util.h:865: BASE_EXPORT void* unchecked_malloc(size_t size); I only wonder if this should be named UncheckedMalloc.
ping
https://codereview.chromium.org/10908245/diff/14001/base/process_util.h File base/process_util.h (right): https://codereview.chromium.org/10908245/diff/14001/base/process_util.h#newco... base/process_util.h:865: BASE_EXPORT void* unchecked_malloc(size_t size); UncheckedMalloc. Or UnsafeUncheckedMalloc. https://codereview.chromium.org/10908245/diff/14001/base/process_util_mac.mm File base/process_util_mac.mm (left): https://codereview.chromium.org/10908245/diff/14001/base/process_util_mac.mm#... base/process_util_mac.mm:839: malloc_zone_t* GetPurgeableZone() { Good cleanup.
Oooooohhhhhkay. I figured WTF, I should actually, you know, test to see if it works. In practice, rather than in theory. CrMallocErrorBreak() crashes, and when I change the LOG(ERROR) to PLOG(ERROR), I get errno 9, EBADF "Bad file descriptor." Full output is like: Chromium Helper(58127,0xa0c69540) malloc: *** mmap(size=2120433664) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug [58127:-1597598400:0917/130240:ERROR:process_util_mac.mm(580)] Terminating process due to a potential for future heap corruption: Bad file descriptor The 12 is the ENOMEM expected. AFAICT, the EBADF is from where malloc.c in _malloc_vprintf() calls _simple_asl_log(), which will call write(-1, junk, junk_len). Kinda hating malloc_error_break() right now. I'm kind of wondering if I'm missing something key, here, because AFAICT this means CrMallocErrorBreak() won't work at all. It's a debug build, would that be different? Any ideas of whether this can even be addressed?
Converted to UncheckedMalloc(). Adding EBADF check allows things to work, though it makes me ill. By "work", I mean that the image in http://crbug.com/103980 ends up as a giant white image (when you click on it), rather than crash, which matches Windows. The failure to allocate is logged on console. On 2012/09/17 20:30:27, shess wrote: > Oooooohhhhhkay. I figured WTF, I should actually, you know, test to see if it > works. In practice, rather than in theory. > > CrMallocErrorBreak() crashes, and when I change the LOG(ERROR) to PLOG(ERROR), I > get errno 9, EBADF "Bad file descriptor." Full output is like: > > Chromium Helper(58127,0xa0c69540) malloc: *** mmap(size=2120433664) failed > (error code=12) > *** error: can't allocate region > *** set a breakpoint in malloc_error_break to debug > [58127:-1597598400:0917/130240:ERROR:process_util_mac.mm(580)] Terminating > process due to a potential for future heap corruption: Bad file descriptor > > The 12 is the ENOMEM expected. AFAICT, the EBADF is from where malloc.c in > _malloc_vprintf() calls _simple_asl_log(), which will call write(-1, junk, > junk_len). > > Kinda hating malloc_error_break() right now. I'm kind of wondering if I'm > missing something key, here, because AFAICT this means CrMallocErrorBreak() > won't work at all. It's a debug build, would that be different? Any ideas of > whether this can even be addressed?
https://codereview.chromium.org/10908245/diff/11002/base/process_util_mac.mm File base/process_util_mac.mm (right): https://codereview.chromium.org/10908245/diff/11002/base/process_util_mac.mm#... base/process_util_mac.mm:572: if (errno == ENOMEM || errno == EBADF) I don’t think this is cool. Can you be more surgical?
https://codereview.chromium.org/10908245/diff/11002/base/process_util_mac.mm File base/process_util_mac.mm (right): https://codereview.chromium.org/10908245/diff/11002/base/process_util_mac.mm#... base/process_util_mac.mm:572: if (errno == ENOMEM || errno == EBADF) On 2012/09/18 00:18:22, Mark Mentovai wrote: > I don’t think this is cool. Can you be more surgical? I don't really think it's cool, either, but ... I also have no idea what the alternative is. Tomorrow I'll review the malloc code to see if there are any other likely errno-setting operations which can lead here, but that's obviously a failure waiting to happen. Presumably there wouldn't be a ton of file-descriptor accesses in the malloc library, outside of logging and debugging code. We could hook szone_error() to pre-empt the problem. That's terrifying in its own way, but it would have the side benefit of allowing us to suppress the logging currently done on NULL pointer return. [I'm away from my Libc checkout right now, so that logging may only happen on debug.] I'm waiting for a rsesek response, though. This started out as "Oh, yeah, we could just <x>", but it's quickly building momentum...
On 2012/09/18 01:55:26, shess wrote: > https://codereview.chromium.org/10908245/diff/11002/base/process_util_mac.mm > File base/process_util_mac.mm (right): > > https://codereview.chromium.org/10908245/diff/11002/base/process_util_mac.mm#... > base/process_util_mac.mm:572: if (errno == ENOMEM || errno == EBADF) > On 2012/09/18 00:18:22, Mark Mentovai wrote: > > I don’t think this is cool. Can you be more surgical? > > I don't really think it's cool, either, but ... I also have no idea what the > alternative is. Tomorrow I'll review the malloc code to see if there are any > other likely errno-setting operations which can lead here, but that's obviously > a failure waiting to happen. Presumably there wouldn't be a ton of > file-descriptor accesses in the malloc library, outside of logging and debugging > code. > > We could hook szone_error() to pre-empt the problem. That's terrifying in its > own way, but it would have the side benefit of allowing us to suppress the > logging currently done on NULL pointer return. [I'm away from my Libc checkout > right now, so that logging may only happen on debug.] > > I'm waiting for a rsesek response, though. This started out as "Oh, yeah, we > could just <x>", but it's quickly building momentum... What specifically are you wanting me to respond to? I looked at hooking szone_error() but that's a bit tricky since it's varargs and has no va_list version. We could also use our own TLS slot instead of errno for controlling when to fatalize CrMallocErrorBreak, which we talked about before. That may suck for performance, though.
On 2012/09/18 21:14:38, rsesek wrote: > On 2012/09/18 01:55:26, shess wrote: > > I'm waiting for a rsesek response, though. This started out as "Oh, yeah, we > > could just <x>", but it's quickly building momentum... > > What specifically are you wanting me to respond to? I am ... befuddled. In the context of ProcessUtilTest.MacMallocFailureDoesNotTerminate, errno is ENOMEM. But in the context of this CL, loading the giant image from issue 103980 into Chromium, it sees EBADF instead of ENOMEM (with "error code=12" logged as noted above, which means it was ENOMEM at that point in the malloc code). AHAHA! It's syslog stuff in Libc's _simple.c. It is sent to a unix-domain socket /var/run/syslog, which isn't accessible in renderers. For base_unittests, there's no problem. --no-sandbox fixes it right up for Chromium. Libc-763.13 (10.7.4) uses a Mach port retrieved via bootstrap port for this logging, so that'll probably be different.
I think it's _probably_ reasonable to allow renderers to log to syslog. I think someone could DoS the system via the stderr logging to the console-logging facility. But maybe it's more scary than that?
nit: agin->against in CL description. https://codereview.chromium.org/10908245/diff/17002/DEPS File DEPS (right): https://codereview.chromium.org/10908245/diff/17002/DEPS#newcode90 DEPS:90: (Var("googlecode_url") % "google-safe-browsing") + "/trunk/testing@111", ? https://codereview.chromium.org/10908245/diff/17002/content/common/common.sb File content/common/common.sb (right): https://codereview.chromium.org/10908245/diff/17002/content/common/common.sb#... content/common/common.sb:41: (remote unix-socket (path-literal "/private/var/run/syslog"))) ; 10.6 Is there a problem not adding this and letting the log messages fall on the floor? Just trying to understand if it's absolutely necessary to poke a hole in the sandbox for this...
On 2012/09/19 04:19:39, jeremy wrote: > nit: agin->against in CL description. "agin" is a perfectly cromulent word. > https://codereview.chromium.org/10908245/diff/17002/DEPS > File DEPS (right): > > https://codereview.chromium.org/10908245/diff/17002/DEPS#newcode90 > DEPS:90: (Var("googlecode_url") % "google-safe-browsing") + > "/trunk/testing@111", > ? Sorry, multi-tasking. Someone thought my sb change had broke sb, I was proving otherwise. > https://codereview.chromium.org/10908245/diff/17002/content/common/common.sb > File content/common/common.sb (right): > > https://codereview.chromium.org/10908245/diff/17002/content/common/common.sb#... > content/common/common.sb:41: (remote unix-socket (path-literal > "/private/var/run/syslog"))) ; 10.6 > Is there a problem not adding this and letting the log messages fall on the > floor? > Just trying to understand if it's absolutely necessary to poke a hole in the > sandbox for this... Justin suggests that we shouldn't allow renderers to write to syslog. I'll go debate that with him tomorrow. The gist of this CL is allowing malloc() to return NULL in a special case. In the course of returning NULL, the malloc library attempts to log to syslog. When it fails, errno changes from ENOMEM to EBADF, and our failure handler has to detect that case in order to allow it through. Obviously, EBADF is a very weird thing to look for in a malloc library! Fortunately for other apps, the actual malloc() implementation checks for NULL return from the lower-level code and resets errno to ENOMEM. So, we could: - allow syslog in the renderer. - check EBADF - wire up an alternative strategy for the error handler. - give up. We took a pass on the alternative strategy long ago, because it would slow things down. But it may be reasonable to slow down this special case call to malloc().
On Wed, Sep 19, 2012 at 7:45 AM, <shess@chromium.org> wrote: > On 2012/09/19 04:19:39, jeremy wrote: > >> nit: agin->against in CL description. >> > > "agin" is a perfectly cromulent word. :) https://codereview.chromium.**org/10908245/diff/17002/** >> content/common/common.sb<https://codereview.chromium.org/10908245/diff/17002/content/common/common.sb> >> File content/common/common.sb (right): >> > > > https://codereview.chromium.**org/10908245/diff/17002/** > content/common/common.sb#**newcode41<https://codereview.chromium.org/10908245/diff/17002/content/common/common.sb#newcode41> > >> content/common/common.sb:41: (remote unix-socket (path-literal >> "/private/var/run/syslog"))) ; 10.6 >> Is there a problem not adding this and letting the log messages fall on >> the >> floor? >> Just trying to understand if it's absolutely necessary to poke a hole in >> the >> sandbox for this... >> > > Justin suggests that we shouldn't allow renderers to write to syslog. > I'll go > debate that with him tomorrow. > > The gist of this CL is allowing malloc() to return NULL in a special case. > In > the course of returning NULL, the malloc library attempts to log to syslog. > When it fails, errno changes from ENOMEM to EBADF, and our failure handler > has > to detect that case in order to allow it through. Obviously, EBADF is a > very > weird thing to look for in a malloc library! Fortunately for other apps, > the > actual malloc() implementation checks for NULL return from the lower-level > code > and resets errno to ENOMEM. > > So, we could: > - allow syslog in the renderer. > - check EBADF > - wire up an alternative strategy for the error handler. > - give up. > > We took a pass on the alternative strategy long ago, because it would slow > things down. But it may be reasonable to slow down this special case call > to > malloc(). > > https://codereview.chromium.**org/10908245/<https://codereview.chromium.org/1... > Is checking EBADF an acceptable solution? I'd personally prefer to keep the sandbox definition as small as possible if it's ok. Otherwise if you and Justin think that poking a hole in the sandbox for this is the way to go then LGTM :)
On 2012/09/19 04:45:49, shess wrote: > So, we could: > - allow syslog in the renderer. -1 I'd prefer to not poke a hole in the sandbox. > - check EBADF +0 Not really sure we can reason about all the failure cases this will/won't hit. > - wire up an alternative strategy for the error handler. ±1 We have two of those, AFAIK: - Use TLS and set a flag that will only be checked if errno!=ENOMEM. That probably won't hit performance too bad, especially because we will likely crash the browser after that. But for large allocations, it will slow things down even more :/. - Do a quick stackwalk to see if we're coming up through UncheckedMalloc. That may not be to bad, considering that'll only be like 3-5 frames away. > - give up. +0.25
> - Do a quick stackwalk to see if we're coming up through UncheckedMalloc. That > may not be to bad, considering that'll only be like 3-5 frames away. This shouldn’t be in your ±1 section, or any other section ≥ 0.
On 2012/09/19 16:56:04, rsesek wrote: > On 2012/09/19 04:45:49, shess wrote: > > So, we could: > > - allow syslog in the renderer. > > -1 I'd prefer to not poke a hole in the sandbox. I think I'm against this, too. About the only thing I can think of along those lines is to have the sandboxing code warm up asl_socket, and use nlist to test for -1 and reset it to /dev/null. But that's super AWESOME. > > - check EBADF > > +0 Not really sure we can reason about all the failure cases this will/won't > hit. I'm actually not disliking this as much today. We're in the corrupt-metadata handler, after all. There is no SLA for this, we shouldn't even be here. All we know is that something poor has happened, but there could have been any number of operations executed between the point where the corruption happened and where it was detected, so it's not a matter of preventing anything bad from happening, it's a matter of how quickly we react. Checking against EBADF might mask some potential crash cases, but most likely we'll just get called again real soon now. We could add a tailored crash path for EBADF to test things in the wild. Hmm, or we could make death_ptr be NULL+errno and get that for free. Anyhow, I think we'd see every CrMallocErrorBreak on 10.7 in the sandbox with errno EBADF. > > - wire up an alternative strategy for the error handler. > > ±1 We have two of those, AFAIK: > - Use TLS and set a flag that will only be checked if errno!=ENOMEM. That > probably won't hit performance too bad, especially because we will likely crash > the browser after that. But for large allocations, it will slow things down even > more :/. Large allocations imply large operations. So something like setting the TLS only for sizes over 1M we add negligible cost. For smaller sizes, all of the arguments about "It just moves the OOM crash forward 50ms" apply. Even with TLS, though then the condition looks like: if (errno == ENOMEM || (errno == EBADF && IsSetTLS()) return; because I doubt we'd want to allow other than EBADF through at that point. > - Do a quick stackwalk to see if we're coming up through UncheckedMalloc. That > may not be to bad, considering that'll only be like 3-5 frames away. We should do a STACKWALK using a REGEX. That would be AWESOME.
On 2012/09/19 18:09:12, shess wrote: > On 2012/09/19 16:56:04, rsesek wrote: > > On 2012/09/19 04:45:49, shess wrote: > > > So, we could: > > > - allow syslog in the renderer. > > > > -1 I'd prefer to not poke a hole in the sandbox. > > I think I'm against this, too. About the only thing I can think of along those > lines is to have the sandboxing code warm up asl_socket, and use nlist to test > for -1 and reset it to /dev/null. But that's super AWESOME. We actually just talked about doing something like this at lunch :-). But -1. > > > > - check EBADF > > > > +0 Not really sure we can reason about all the failure cases this will/won't > > hit. > > I'm actually not disliking this as much today. We're in the corrupt-metadata > handler, after all. There is no SLA for this, we shouldn't even be here. All > we know is that something poor has happened, but there could have been any > number of operations executed between the point where the corruption happened > and where it was detected, so it's not a matter of preventing anything bad from > happening, it's a matter of how quickly we react. Checking against EBADF might > mask some potential crash cases, but most likely we'll just get called again > real soon now. > > We could add a tailored crash path for EBADF to test things in the wild. Hmm, > or we could make death_ptr be NULL+errno and get that for free. Anyhow, I think > we'd see every CrMallocErrorBreak on 10.7 in the sandbox with errno EBADF. +0.5 If we do this, also do the death_ptr+errno because that's cute. I'm sure in this state we'd crash again real soon, but it may not point directly at the smoking gun because we deferred the crash until the next time. CrMallocErrorBreak has been pretty good at pointing at the thing that was wrongly freed/managed. If we deferred, though, we could get these crashes for just some random thing because the heap would now be corrupt. > > > - wire up an alternative strategy for the error handler. > > > > ±1 We have two of those, AFAIK: > > - Use TLS and set a flag that will only be checked if errno!=ENOMEM. That > > probably won't hit performance too bad, especially because we will likely > crash > > the browser after that. But for large allocations, it will slow things down > even > > more :/. > > Large allocations imply large operations. So something like setting the TLS > only for sizes over 1M we add negligible cost. For smaller sizes, all of the > arguments about "It just moves the OOM crash forward 50ms" apply. > > Even with TLS, though then the condition looks like: > > if (errno == ENOMEM || (errno == EBADF && IsSetTLS()) > return; > > because I doubt we'd want to allow other than EBADF through at that point. I think this is the most technically correct solution. So +1 to this. > > - Do a quick stackwalk to see if we're coming up through UncheckedMalloc. That > > may not be to bad, considering that'll only be like 3-5 frames away. > > We should do a STACKWALK using a REGEX. That would be AWESOME. +∞ if you can do it with some extra dynamic memory allocations.
On 2012/09/19 18:20:10, rsesek wrote: > On 2012/09/19 18:09:12, shess wrote: > > On 2012/09/19 16:56:04, rsesek wrote: > > > On 2012/09/19 04:45:49, shess wrote: > > > > So, we could: > > > > - allow syslog in the renderer. > > > > > > -1 I'd prefer to not poke a hole in the sandbox. > > > > I think I'm against this, too. About the only thing I can think of along > those > > lines is to have the sandboxing code warm up asl_socket, and use nlist to test > > for -1 and reset it to /dev/null. But that's super AWESOME. > > We actually just talked about doing something like this at lunch :-). But -1. I just thought of another solution: mach_override() _simple_asl_log() . It works, so I posted the change. AFAICT, the only downside is losing the logging to syslog, but stderr still shows up on Console.app (I cannot recall anyone ever suggesting to look in any syslog logs for Chrome info). Performance-wise I expect this would actually provide a nominal speedup.
Oh no, no, I don’t think so, no, not at all.
On 2012/09/19 19:31:45, Mark Mentovai wrote: > Oh no, no, I don’t think so, no, not at all. dammit!
OK! TLS version. Argument for not gating the TLS on size is that tcmalloc platforms have an explicit lock protecting the malloc() and surrounding set_new_handler() calls, so obviously someone decided that that was an appropriate amount of overhead. But I can go there if desired.
https://codereview.chromium.org/10908245/diff/26001/base/process_util_mac.mm File base/process_util_mac.mm (right): https://codereview.chromium.org/10908245/diff/26001/base/process_util_mac.mm#... base/process_util_mac.mm:605: death_ptr += errno; Oh, so now you trust errno to not be corrupt?
https://codereview.chromium.org/10908245/diff/26001/base/process_util_mac.mm File base/process_util_mac.mm (right): https://codereview.chromium.org/10908245/diff/26001/base/process_util_mac.mm#... base/process_util_mac.mm:605: death_ptr += errno; On 2012/09/19 20:12:33, Mark Mentovai wrote: > Oh, so now you trust errno to not be corrupt? Done. Though at some point we kind of have to assume that it's just corrupt all the way down...
https://codereview.chromium.org/10908245/diff/26001/base/process_util_mac.mm File base/process_util_mac.mm (right): https://codereview.chromium.org/10908245/diff/26001/base/process_util_mac.mm#... base/process_util_mac.mm:605: death_ptr += errno; shess wrote: > On 2012/09/19 20:12:33, Mark Mentovai wrote: > > Oh, so now you trust errno to not be corrupt? > > Done. Though at some point we kind of have to assume that it's just corrupt all > the way down... Yeah, but we absolutely need to crash when we hit this code. If we return from this, we’re totally lame.
I consider this satisfactory. You can treat that as an LGTM if you like, or you can consider the feedback from four or more other reviewers.
LGTM++ This is awesome.
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/shess@chromium.org/10908245/16004
What endorsements! Moving forward. Last patch was comment-only, I realized that I had removed the syslog/EBADF commentary, which is perhaps essential to future readers.
No sandbox changes so LGTM too, thanks!
Change committed as 157646 |