Issue 10908245: unchecked_malloc() for Skia on OSX.

Issue 10908245: unchecked_malloc() for Skia on OSX. (Closed)

Created:
8 years, 3 months ago by Scott Hess - ex-Googler

Modified:
8 years, 3 months ago

Reviewers:
Robert Sesek, Mark Mentovai, Avi (use Gerrit), jeremy, playmobil

CC:
chromium-reviews, erikwright+watch_chromium.org

Base URL:
svn://svn.chromium.org/chrome/trunk/src

Visibility:
Public.

More Reviews

Description

unchecked_malloc() for Skia on OSX. On tcmalloc-based versions of Chromium, skia resets the global new handler to prevent the oom-killer from kicking in. Traditionally the OSX developers have been agin this, because it's not thread-safe, and we have questions about whether it's even the right thing to do. This change should be thread-safe because g_old_malloc is only set on the main thread at startup. As far as whether it's the right thing to do, I'm tired and want to quit hearing about it. Additionally, remove a bit of leftover pre-10.6 code. BUG=103980, 73751, 117949 Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=157646

Patch Set 1 #

Total comments: 4

Patch Set 2 : Oops, GetPurgeableZone() was exported for tests. #

Total comments: 3

Patch Set 3 : Assume purgeable zone always present. Drop another couple 10.5 things. Light IWYU cleanup. #

Patch Set 4 : -> unchecked_malloc #

Total comments: 3

Patch Set 5 : -> UncheckedMalloc and EBADF #

Total comments: 2

Patch Set 6 : Back out EBADF, allow syslog logging from sandbox. #

Total comments: 2

Patch Set 7 : Back out sandbox changes (and accidental safe-browsing changes), override _simple_asl_log(). #

Patch Set 8 : Encode errno into death_ptr. #

Patch Set 9 : Back out _simple_asl_log() override, use TLS to flag EBADF check. #

Total comments: 3

Patch Set 10 : Protect against crazy errno #

Patch Set 11 : Put EBADF/syslog commentary back. #

Created: 8 years, 3 months ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+101 lines, -82 lines)			Patch
M	base/process_util.h	View	1 2 3 4	2 chunks	+14 lines, -5 lines	0 comments	Download
M	base/process_util_mac.mm	View	1 2 3 4 5 6 7 8 9 10	5 chunks	+48 lines, -19 lines	0 comments	Download
M	base/process_util_unittest.cc	View	1 2	1 chunk	+33 lines, -58 lines	0 comments	Download
M	skia/ext/SkMemory_new_handler.cpp	View	1 2 3 4	2 chunks	+6 lines, -0 lines	0 comments	Download

Messages

Total messages: 43 (0 generated)

Expand Messages | Collapse Messages

Avi (use Gerrit)

The design works so far, but it doesn't handle cases like crbug.com/42342 where we'd need ...

8 years, 3 months ago (2012-09-13 19:45:54 UTC) #2

Scott Hess - ex-Googler

On 2012/09/13 19:45:54, Avi wrote: > The design works so far, but it doesn't handle ...

8 years, 3 months ago (2012-09-13 20:04:00 UTC) #3

Avi (use Gerrit)

The WebKit name for this is TryMalloc, though I'm not a fan of that. And ...

8 years, 3 months ago (2012-09-13 20:21:17 UTC) #4

Avi (use Gerrit)

https://codereview.chromium.org/10908245/diff/9001/base/process_util_unittest.cc File base/process_util_unittest.cc (right): https://codereview.chromium.org/10908245/diff/9001/base/process_util_unittest.cc#newcode1122 base/process_util_unittest.cc:1122: if (zone) Do we need these if calls? The ...

8 years, 3 months ago (2012-09-13 20:21:24 UTC) #5

Scott Hess - ex-Googler

On 2012/09/13 20:21:17, Avi wrote: > The WebKit name for this is TryMalloc, though I'm ...

8 years, 3 months ago (2012-09-13 21:15:47 UTC) #6

Avi (use Gerrit)

Scott, I won't LG this with the word "safe" in the name of the function. ...

8 years, 3 months ago (2012-09-13 21:30:48 UTC) #7

Scott Hess - ex-Googler

On 2012/09/13 21:15:47, shess wrote: > On 2012/09/13 20:21:17, Avi wrote: > > Do we ...

8 years, 3 months ago (2012-09-13 22:00:30 UTC) #8

Avi (use Gerrit)

OK, so that's not worth it. Never mind then about that.

8 years, 3 months ago (2012-09-13 22:02:51 UTC) #9

Scott Hess - ex-Googler

On 2012/09/13 21:30:48, Avi wrote: > Scott, I won't LG this with the word "safe" ...

8 years, 3 months ago (2012-09-13 22:06:55 UTC) #10

Avi (use Gerrit)

On 2012/09/13 22:06:55, shess wrote: > - unsafe_malloc() > - unchecked_malloc() I kinda like "unchecked_malloc", ...

8 years, 3 months ago (2012-09-13 22:09:17 UTC) #11

Scott Hess - ex-Googler

Looping in mark@ for OWNERS, and another shot at the bike shed. [man, the autocomplete ...

8 years, 3 months ago (2012-09-13 22:28:23 UTC) #12

Robert Sesek

LGTM https://codereview.chromium.org/10908245/diff/14001/base/process_util.h File base/process_util.h (right): https://codereview.chromium.org/10908245/diff/14001/base/process_util.h#newcode865 base/process_util.h:865: BASE_EXPORT void* unchecked_malloc(size_t size); I only wonder if ...

8 years, 3 months ago (2012-09-13 23:18:15 UTC) #14

Mark Mentovai

https://codereview.chromium.org/10908245/diff/14001/base/process_util.h File base/process_util.h (right): https://codereview.chromium.org/10908245/diff/14001/base/process_util.h#newcode865 base/process_util.h:865: BASE_EXPORT void* unchecked_malloc(size_t size); UncheckedMalloc. Or UnsafeUncheckedMalloc. https://codereview.chromium.org/10908245/diff/14001/base/process_util_mac.mm File ...

8 years, 3 months ago (2012-09-17 18:28:11 UTC) #16

Scott Hess - ex-Googler

Oooooohhhhhkay. I figured WTF, I should actually, you know, test to see if it works. ...

8 years, 3 months ago (2012-09-17 20:30:27 UTC) #17

Scott Hess - ex-Googler

Converted to UncheckedMalloc(). Adding EBADF check allows things to work, though it makes me ill. ...

8 years, 3 months ago (2012-09-17 23:49:56 UTC) #18

Mark Mentovai

https://codereview.chromium.org/10908245/diff/11002/base/process_util_mac.mm File base/process_util_mac.mm (right): https://codereview.chromium.org/10908245/diff/11002/base/process_util_mac.mm#newcode572 base/process_util_mac.mm:572: if (errno == ENOMEM || errno == EBADF) I ...

8 years, 3 months ago (2012-09-18 00:18:22 UTC) #19

Scott Hess - ex-Googler

8 years, 3 months ago (2012-09-18 01:55:26 UTC) #20

Robert Sesek

On 2012/09/18 01:55:26, shess wrote: > https://codereview.chromium.org/10908245/diff/11002/base/process_util_mac.mm > File base/process_util_mac.mm (right): > > https://codereview.chromium.org/10908245/diff/11002/base/process_util_mac.mm#newcode572 > ...

8 years, 3 months ago (2012-09-18 21:14:38 UTC) #21

Scott Hess - ex-Googler

On 2012/09/18 21:14:38, rsesek wrote: > On 2012/09/18 01:55:26, shess wrote: > > I'm waiting ...

8 years, 3 months ago (2012-09-18 23:31:16 UTC) #22

Scott Hess - ex-Googler

I think it's _probably_ reasonable to allow renderers to log to syslog. I think someone ...

8 years, 3 months ago (2012-09-18 23:33:25 UTC) #23

jeremy

nit: agin->against in CL description. https://codereview.chromium.org/10908245/diff/17002/DEPS File DEPS (right): https://codereview.chromium.org/10908245/diff/17002/DEPS#newcode90 DEPS:90: (Var("googlecode_url") % "google-safe-browsing") + ...

8 years, 3 months ago (2012-09-19 04:19:39 UTC) #24

Scott Hess - ex-Googler

On 2012/09/19 04:19:39, jeremy wrote: > nit: agin->against in CL description. "agin" is a perfectly ...

8 years, 3 months ago (2012-09-19 04:45:49 UTC) #25

playmobil

On Wed, Sep 19, 2012 at 7:45 AM, <shess@chromium.org> wrote: > On 2012/09/19 04:19:39, jeremy ...

8 years, 3 months ago (2012-09-19 04:55:55 UTC) #26

On Wed, Sep 19, 2012 at 7:45 AM, <shess@chromium.org> wrote:

> On 2012/09/19 04:19:39, jeremy wrote:
>
>> nit: agin->against in CL description.
>>
>
> "agin" is a perfectly cromulent word.

:)

https://codereview.chromium.**org/10908245/diff/17002/**
>>
content/common/common.sb<https://codereview.chromium.org/10908245/diff/17002/content/common/common.sb>
>> File content/common/common.sb (right):
>>
>
>
> https://codereview.chromium.**org/10908245/diff/17002/**
>
content/common/common.sb#**newcode41<https://codereview.chromium.org/10908245/diff/17002/content/common/common.sb#newcode41>
>
>> content/common/common.sb:41: (remote unix-socket (path-literal
>> "/private/var/run/syslog")))  ; 10.6
>> Is there a problem not adding this and letting the log messages fall on
>> the
>> floor?
>> Just trying to understand if it's absolutely necessary to poke a hole in
>> the
>> sandbox for this...
>>
>
> Justin suggests that we shouldn't allow renderers to write to syslog.
>  I'll go
> debate that with him tomorrow.
>
> The gist of this CL is allowing malloc() to return NULL in a special case.
>  In
> the course of returning NULL, the malloc library attempts to log to syslog.
> When it fails, errno changes from ENOMEM to EBADF, and our failure handler
> has
> to detect that case in order to allow it through.  Obviously, EBADF is a
> very
> weird thing to look for in a malloc library!  Fortunately for other apps,
> the
> actual malloc() implementation checks for NULL return from the lower-level
> code
> and resets errno to ENOMEM.
>
> So, we could:
>  - allow syslog in the renderer.
>  - check EBADF
>  - wire up an alternative strategy for the error handler.
>  - give up.
>
> We took a pass on the alternative strategy long ago, because it would slow
> things down.  But it may be reasonable to slow down this special case call
> to
> malloc().
>
>
https://codereview.chromium.**org/10908245/<https://codereview.chromium.org/1...
>

Is checking EBADF an acceptable solution? I'd personally prefer to keep the
sandbox definition as small as possible if it's ok.
Otherwise if you and Justin think that poking a hole in the sandbox for
this is the way to go then LGTM :)

Robert Sesek

On 2012/09/19 04:45:49, shess wrote: > So, we could: > - allow syslog in the ...

8 years, 3 months ago (2012-09-19 16:56:04 UTC) #27

Mark Mentovai

> - Do a quick stackwalk to see if we're coming up through UncheckedMalloc. That ...

8 years, 3 months ago (2012-09-19 16:59:39 UTC) #28

Scott Hess - ex-Googler

On 2012/09/19 16:56:04, rsesek wrote: > On 2012/09/19 04:45:49, shess wrote: > > So, we ...

8 years, 3 months ago (2012-09-19 18:09:12 UTC) #29

On 2012/09/19 16:56:04, rsesek wrote:
> On 2012/09/19 04:45:49, shess wrote:
> > So, we could:
> >  - allow syslog in the renderer.
> 
> -1 I'd prefer to not poke a hole in the sandbox.

I think I'm against this, too.  About the only thing I can think of along those
lines is to have the sandboxing code warm up asl_socket, and use nlist to test
for -1 and reset it to /dev/null.  But that's super AWESOME.

> >  - check EBADF
> 
> +0 Not really sure we can reason about all the failure cases this will/won't
> hit.

I'm actually not disliking this as much today.  We're in the corrupt-metadata
handler, after all.  There is no SLA for this, we shouldn't even be here.  All
we know is that something poor has happened, but there could have been any
number of operations executed between the point where the corruption happened
and where it was detected, so it's not a matter of preventing anything bad from
happening, it's a matter of how quickly we react.  Checking against EBADF might
mask some potential crash cases, but most likely we'll just get called again
real soon now.

We could add a tailored crash path for EBADF to test things in the wild.  Hmm,
or we could make death_ptr be NULL+errno and get that for free.  Anyhow, I think
we'd see every CrMallocErrorBreak on 10.7 in the sandbox with errno EBADF.

> >  - wire up an alternative strategy for the error handler.
> 
> ±1 We have two of those, AFAIK:
> - Use TLS and set a flag that will only be checked if errno!=ENOMEM. That
> probably won't hit performance too bad, especially because we will likely
crash
> the browser after that. But for large allocations, it will slow things down
even
> more :/.

Large allocations imply large operations.  So something like setting the TLS
only for sizes over 1M we add negligible cost.  For smaller sizes, all of the
arguments about "It just moves the OOM crash forward 50ms" apply.

Even with TLS, though then the condition looks like:

  if (errno == ENOMEM || (errno == EBADF && IsSetTLS())
    return;

because I doubt we'd want to allow other than EBADF through at that point.

> - Do a quick stackwalk to see if we're coming up through UncheckedMalloc. That
> may not be to bad, considering that'll only be like 3-5 frames away.

We should do a STACKWALK using a REGEX.  That would be AWESOME.

Robert Sesek

On 2012/09/19 18:09:12, shess wrote: > On 2012/09/19 16:56:04, rsesek wrote: > > On 2012/09/19 ...

8 years, 3 months ago (2012-09-19 18:20:10 UTC) #30

On 2012/09/19 18:09:12, shess wrote:
> On 2012/09/19 16:56:04, rsesek wrote:
> > On 2012/09/19 04:45:49, shess wrote:
> > > So, we could:
> > >  - allow syslog in the renderer.
> > 
> > -1 I'd prefer to not poke a hole in the sandbox.
> 
> I think I'm against this, too.  About the only thing I can think of along
those
> lines is to have the sandboxing code warm up asl_socket, and use nlist to test
> for -1 and reset it to /dev/null.  But that's super AWESOME.

We actually just talked about doing something like this at lunch :-). But -1.

> 
> > >  - check EBADF
> > 
> > +0 Not really sure we can reason about all the failure cases this will/won't
> > hit.
> 
> I'm actually not disliking this as much today.  We're in the corrupt-metadata
> handler, after all.  There is no SLA for this, we shouldn't even be here.  All
> we know is that something poor has happened, but there could have been any
> number of operations executed between the point where the corruption happened
> and where it was detected, so it's not a matter of preventing anything bad
from
> happening, it's a matter of how quickly we react.  Checking against EBADF
might
> mask some potential crash cases, but most likely we'll just get called again
> real soon now.
> 
> We could add a tailored crash path for EBADF to test things in the wild.  Hmm,
> or we could make death_ptr be NULL+errno and get that for free.  Anyhow, I
think
> we'd see every CrMallocErrorBreak on 10.7 in the sandbox with errno EBADF.

+0.5 If we do this, also do the death_ptr+errno because that's cute. I'm sure in
this state we'd crash again real soon, but it may not point directly at the
smoking gun because we deferred the crash until the next time.
CrMallocErrorBreak has been pretty good at pointing at the thing that was
wrongly freed/managed. If we deferred, though, we could get these crashes for
just some random thing because the heap would now be corrupt.

> > >  - wire up an alternative strategy for the error handler.
> > 
> > ±1 We have two of those, AFAIK:
> > - Use TLS and set a flag that will only be checked if errno!=ENOMEM. That
> > probably won't hit performance too bad, especially because we will likely
> crash
> > the browser after that. But for large allocations, it will slow things down
> even
> > more :/.
> 
> Large allocations imply large operations.  So something like setting the TLS
> only for sizes over 1M we add negligible cost.  For smaller sizes, all of the
> arguments about "It just moves the OOM crash forward 50ms" apply.
> 
> Even with TLS, though then the condition looks like:
> 
>   if (errno == ENOMEM || (errno == EBADF && IsSetTLS())
>     return;
> 
> because I doubt we'd want to allow other than EBADF through at that point.

I think this is the most technically correct solution. So +1 to this.

> > - Do a quick stackwalk to see if we're coming up through UncheckedMalloc.
That
> > may not be to bad, considering that'll only be like 3-5 frames away.
> 
> We should do a STACKWALK using a REGEX.  That would be AWESOME.

+∞ if you can do it with some extra dynamic memory allocations.

Scott Hess - ex-Googler

On 2012/09/19 18:20:10, rsesek wrote: > On 2012/09/19 18:09:12, shess wrote: > > On 2012/09/19 ...

8 years, 3 months ago (2012-09-19 19:30:03 UTC) #31

Scott Hess - ex-Googler

On 2012/09/19 19:31:45, Mark Mentovai wrote: > Oh no, no, I don’t think so, no, ...

8 years, 3 months ago (2012-09-19 19:35:42 UTC) #33

Scott Hess - ex-Googler

OK! TLS version. Argument for not gating the TLS on size is that tcmalloc platforms ...

8 years, 3 months ago (2012-09-19 20:04:34 UTC) #34

Mark Mentovai

https://codereview.chromium.org/10908245/diff/26001/base/process_util_mac.mm File base/process_util_mac.mm (right): https://codereview.chromium.org/10908245/diff/26001/base/process_util_mac.mm#newcode605 base/process_util_mac.mm:605: death_ptr += errno; Oh, so now you trust errno ...

8 years, 3 months ago (2012-09-19 20:12:33 UTC) #35

Scott Hess - ex-Googler

8 years, 3 months ago (2012-09-19 20:27:50 UTC) #36

Mark Mentovai

8 years, 3 months ago (2012-09-19 20:34:06 UTC) #37

Mark Mentovai

I consider this satisfactory. You can treat that as an LGTM if you like, or ...

8 years, 3 months ago (2012-09-19 20:42:26 UTC) #38

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/shess@chromium.org/10908245/16004

8 years, 3 months ago (2012-09-19 20:56:31 UTC) #40

Scott Hess - ex-Googler

What endorsements! Moving forward. Last patch was comment-only, I realized that I had removed the ...

8 years, 3 months ago (2012-09-19 20:57:02 UTC) #41

Change committed as 157646

Expand Messages | Collapse Messages