|
|
Created:
7 years, 9 months ago by Robert Sesek Modified:
7 years, 9 months ago CC:
chromium-reviews, erikwright+watch_chromium.org, sail+watch_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
Description[Mac] Do not unload base::NativeLibrary-ies if they contain ObjC segments.
Once an ObjC image is loaded into a process, the runtime will place unloadable
data into its caches. If the bundle is unloaded, then ObjC calls could result
in dereferencing memory in an unloaded module.
CFBundle and NSBundle explicitly say to not unload executable bundles
containing ObjC, but we were doing so anyways (oops!).
BUG=172319
TEST=Verify that fixupSelectorsInMethodList crash does not happen anymore.
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=188356
Patch Set 1 #
Total comments: 31
Patch Set 2 : Address comments #
Total comments: 3
Patch Set 3 : Assume true #Patch Set 4 : Revert Scott #Patch Set 5 : Use _dyld functions #Patch Set 6 : dladdr #
Total comments: 6
Patch Set 7 : fin #
Total comments: 2
Patch Set 8 : The never ending review! #
Total comments: 3
Messages
Total messages: 43 (0 generated)
https://codereview.chromium.org/12793004/diff/1/base/native_library.h File base/native_library.h (right): https://codereview.chromium.org/12793004/diff/1/base/native_library.h#newcode1 base/native_library.h:1: // Copyright (c) 2011 The Chromium Authors. All rights reserved. Fix spelling in CL description: > [Mac] Do not unload base::NativeLibary-ies if they contain ObjC segments. to [Mac] Do not unload base::NativeLibrary-ies if they contain ObjC segments. https://codereview.chromium.org/12793004/diff/1/base/native_library.h#newcode2 base/native_library.h:2: // Use of this source code is governed by a BSD-style license that can be Also in the CL description: > TEST=Verify that fixupSelectorsInMethodList does not happen anymore. This doesn’t say that fixupSelectorsInMethodList is a crash signature, but it should. https://codereview.chromium.org/12793004/diff/1/base/native_library.h#newcode17 base/native_library.h:17: #include "base/files/file_path.h" Dunno if I’d have bothered making this OS-conditional, although I see why you did it. https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm File base/native_library_mac.mm (right): https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm#ne... base/native_library_mac.mm:12: #include <Foundation/Foundation.h> #import me. https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm#ne... base/native_library_mac.mm:24: // In 32-bit images, ObjC can be recognized in __OBJC.__image_info, whereas It’s usual to separate the segment and section names with commas, I think. (Like “man otool” does.) https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm#ne... base/native_library_mac.mm:26: static const char kObjCImageInfo32[] = "__image_info"; Why do you need to check for both flavors? If you’re a 32-bit process, only the 32-bit one is relevant. If you’re a 64-bit process, only the 64-bit one is relevant. You can have one constant here inside an #ifdef and then the code below becomes simpler. https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm#ne... base/native_library_mac.mm:41: const char* const library_path_ascii = library_path.MaybeAsASCII().c_str(); This is bogus since you know what platform you’re on in this file. Just use library_path.value().c_str(). Less object code that way, too. I’d rename the variable to “library_path_c” in that case. https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm#ne... base/native_library_mac.mm:42: VLOG(2) << "Looking up image data for " << library_path_ascii; Is this line even needed now? https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm#ne... base/native_library_mac.mm:57: if (image_info.imageLoadAddress->cputype == CPU_TYPE_X86_64) { Evaluating this conditional at runtime is stupid (see above). https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm#ne... base/native_library_mac.mm:61: SEG_DATA, kObjCImageInfo64); kObjCImageInfo64/32 didn’t need to be defined external to this function after all. https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm#ne... base/native_library_mac.mm:111: base::FilePath([[executable_url_ns path] UTF8String]); Not UTF8String, but fileSystemRepresentation. https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm#ne... base/native_library_mac.mm:120: << "an Objective-C segment."; Simplify the object code by coalescing these last two static const char[]s into one. (It’ll all fit on one line.) https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm#ne... base/native_library_mac.mm:124: delete library; Why don’t you share the “delete” with the delete below, and get rid of the early return, by flipping the logic? if (!ImageContainsObjectiveC(…)) { // CFBundleCloseBundleResourceMap/CFRelease or dlclose } delete library;
https://codereview.chromium.org/12793004/diff/1/base/native_library.h File base/native_library.h (right): https://codereview.chromium.org/12793004/diff/1/base/native_library.h#newcode1 base/native_library.h:1: // Copyright (c) 2011 The Chromium Authors. All rights reserved. On 2013/03/12 19:19:31, Mark Mentovai wrote: > Fix spelling in CL description: > > > [Mac] Do not unload base::NativeLibary-ies if they contain ObjC segments. > to > [Mac] Do not unload base::NativeLibrary-ies if they contain ObjC segments. heh libary https://codereview.chromium.org/12793004/diff/1/base/native_library.h#newcode2 base/native_library.h:2: // Use of this source code is governed by a BSD-style license that can be On 2013/03/12 19:19:31, Mark Mentovai wrote: > Also in the CL description: > > > TEST=Verify that fixupSelectorsInMethodList does not happen anymore. > > This doesn’t say that fixupSelectorsInMethodList is a crash signature, but it > should. Oops. Done. https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm File base/native_library_mac.mm (right): https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm#ne... base/native_library_mac.mm:12: #include <Foundation/Foundation.h> On 2013/03/12 19:19:31, Mark Mentovai wrote: > #import me. Done. https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm#ne... base/native_library_mac.mm:24: // In 32-bit images, ObjC can be recognized in __OBJC.__image_info, whereas On 2013/03/12 19:19:31, Mark Mentovai wrote: > It’s usual to separate the segment and section names with commas, I think. (Like > “man otool” does.) Done. https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm#ne... base/native_library_mac.mm:26: static const char kObjCImageInfo32[] = "__image_info"; On 2013/03/12 19:19:31, Mark Mentovai wrote: > Why do you need to check for both flavors? If you’re a 32-bit process, only the > 32-bit one is relevant. If you’re a 64-bit process, only the 64-bit one is > relevant. You can have one constant here inside an #ifdef and then the code > below becomes simpler. Done. https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm#ne... base/native_library_mac.mm:41: const char* const library_path_ascii = library_path.MaybeAsASCII().c_str(); On 2013/03/12 19:19:31, Mark Mentovai wrote: > This is bogus since you know what platform you’re on in this file. Just use > library_path.value().c_str(). Less object code that way, too. I’d rename the > variable to “library_path_c” in that case. Done. https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm#ne... base/native_library_mac.mm:42: VLOG(2) << "Looking up image data for " << library_path_ascii; On 2013/03/12 19:19:31, Mark Mentovai wrote: > Is this line even needed now? Removed. https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm#ne... base/native_library_mac.mm:57: if (image_info.imageLoadAddress->cputype == CPU_TYPE_X86_64) { On 2013/03/12 19:19:31, Mark Mentovai wrote: > Evaluating this conditional at runtime is stupid (see above). Done. https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm#ne... base/native_library_mac.mm:61: SEG_DATA, kObjCImageInfo64); On 2013/03/12 19:19:31, Mark Mentovai wrote: > kObjCImageInfo64/32 didn’t need to be defined external to this function after > all. Done. https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm#ne... base/native_library_mac.mm:111: base::FilePath([[executable_url_ns path] UTF8String]); On 2013/03/12 19:19:31, Mark Mentovai wrote: > Not UTF8String, but fileSystemRepresentation. Done. https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm#ne... base/native_library_mac.mm:120: << "an Objective-C segment."; On 2013/03/12 19:19:31, Mark Mentovai wrote: > Simplify the object code by coalescing these last two static const char[]s into > one. (It’ll all fit on one line.) Done. https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm#ne... base/native_library_mac.mm:124: delete library; On 2013/03/12 19:19:31, Mark Mentovai wrote: > Why don’t you share the “delete” with the delete below, and get rid of the early > return, by flipping the logic? > > if (!ImageContainsObjectiveC(…)) { > // CFBundleCloseBundleResourceMap/CFRelease or dlclose > } > delete library; Done.
LGTM
I agree with the notion of just having cleaner 32-bit and 64-bit versions rather than lots of dynamic fooforah. https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm File base/native_library_mac.mm (right): https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm#ne... base/native_library_mac.mm:72: return false; AFAICT, in this case we couldn't find the information, so we don't know if it is safe or not. Seems like you need to say "Is definitely safe", "Is definitely unsafe" or "Hellifiknow". Or change to "ImageDoesNotContainObjectiveC()". Also, does this ban all code containing Objective-C, or just code containing class or category definitions? Admittedly, covering all Objective-C code handles the issue, while just hitting classes and categories sounds like enumerating badness (what about protocols? Or other things I'm missing). https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm#ne... base/native_library_mac.mm:117: if (ImageContainsObjectiveC(library->image_path)) { Is there any downside to looking this up at Load and adding a "NOUNLOAD" to the type enum?
https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm File base/native_library_mac.mm (right): https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm#ne... base/native_library_mac.mm:72: return false; On 2013/03/12 20:10:38, shess wrote: > AFAICT, in this case we couldn't find the information, so we don't know if it is > safe or not. Seems like you need to say "Is definitely safe", "Is definitely > unsafe" or "Hellifiknow". Or change to "ImageDoesNotContainObjectiveC()". I see your point, but what should happen in the HellIfIKnowCase? Is it any different than this? > Also, does this ban all code containing Objective-C, or just code containing > class or category definitions? Admittedly, covering all Objective-C code > handles the issue, while just hitting classes and categories sounds like > enumerating badness (what about protocols? Or other things I'm missing). The docs say any ObjC cannot be unloaded. This is also the check that CFBundle uses to see if an image contains ObjC, so I think this is fine. Better safe than sorry. https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm#ne... base/native_library_mac.mm:117: if (ImageContainsObjectiveC(library->image_path)) { On 2013/03/12 20:10:38, shess wrote: > Is there any downside to looking this up at Load and adding a "NOUNLOAD" to the > type enum? We think alike ;). But that's not possible because we don't actually load the executable manually - we wait for GetFunctionPointerFromNativeLibrary and let CFBundle do it lazily. Therefore we have to do it at unload, otherwise it's not loaded and we can't get its info.
https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm File base/native_library_mac.mm (right): https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm#ne... base/native_library_mac.mm:72: return false; On 2013/03/12 20:19:43, rsesek wrote: > On 2013/03/12 20:10:38, shess wrote: > > AFAICT, in this case we couldn't find the information, so we don't know if it > is > > safe or not. Seems like you need to say "Is definitely safe", "Is definitely > > unsafe" or "Hellifiknow". Or change to "ImageDoesNotContainObjectiveC()". > > I see your point, but what should happen in the HellIfIKnowCase? Is it any > different than this? Am I reading it wrong? In the HellIfIKnow case, this code cannot conclusively determine if it is safe to unload the library, but since it returns false the code will optimistically attempt it. I think an argument could be made that in that case it should just be leaked. Well, either leaked, or it should be a CHECK() so that we can figure out WTF is happening to these users. > > Also, does this ban all code containing Objective-C, or just code containing > > class or category definitions? Admittedly, covering all Objective-C code > > handles the issue, while just hitting classes and categories sounds like > > enumerating badness (what about protocols? Or other things I'm missing). > > The docs say any ObjC cannot be unloaded. This is also the check that CFBundle > uses to see if an image contains ObjC, so I think this is fine. Better safe than > sorry. OK, that seems reasonable. I think it would be fine to allow Objective-C calls only, though obviously that is a fine line if someone leaps in to do crazy manual method overrides. https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm#ne... base/native_library_mac.mm:117: if (ImageContainsObjectiveC(library->image_path)) { On 2013/03/12 20:19:43, rsesek wrote: > On 2013/03/12 20:10:38, shess wrote: > > Is there any downside to looking this up at Load and adding a "NOUNLOAD" to > the > > type enum? > > We think alike ;). But that's not possible because we don't actually load the > executable manually - we wait for GetFunctionPointerFromNativeLibrary and let > CFBundle do it lazily. Therefore we have to do it at unload, otherwise it's not > loaded and we can't get its info. Sigh. Is there any case which differs from: Line 1: Lazy-load the library. Line 2: Force the library to be loaded. ??? I hate lazy, it just means all of your errors are in the future, after all of your error-checking is already done. It should be non-lazy unless you explicitly ask for it. https://codereview.chromium.org/12793004/diff/4001/base/native_library_mac.mm File base/native_library_mac.mm (right): https://codereview.chromium.org/12793004/diff/4001/base/native_library_mac.mm... base/native_library_mac.mm:27: task_dyld_info_data_t task_dyld_info; I'm having a mental-model issue with this. Usually I'd expect to see a vm_deallocate() for large structures the kernel returns. Is all_image_info_addr some direct kernel structure? How the heck is it safe to iterate, given that task_t could be some other process?
https://codereview.chromium.org/12793004/diff/4001/base/native_library_mac.mm File base/native_library_mac.mm (right): https://codereview.chromium.org/12793004/diff/4001/base/native_library_mac.mm... base/native_library_mac.mm:27: task_dyld_info_data_t task_dyld_info; shess wrote: > I'm having a mental-model issue with this. Usually I'd expect to see a > vm_deallocate() for large structures the kernel returns. Is all_image_info_addr > some direct kernel structure? How the heck is it safe to iterate, given that > task_t could be some other process? task_dyld_info contains pointers into task’s address space. If you’re in a different task, you need to call mach_vm_read to get at its memory (and then obviously mach_vm_deallocate it when you’re done with it). If you’re the task, you can take advantage of the fact that a pointer in your address space is, um, a valid pointer to your own address space.
https://codereview.chromium.org/12793004/diff/4001/base/native_library_mac.mm File base/native_library_mac.mm (right): https://codereview.chromium.org/12793004/diff/4001/base/native_library_mac.mm... base/native_library_mac.mm:27: task_dyld_info_data_t task_dyld_info; On 2013/03/12 20:51:03, Mark Mentovai wrote: > shess wrote: > > I'm having a mental-model issue with this. Usually I'd expect to see a > > vm_deallocate() for large structures the kernel returns. Is > all_image_info_addr > > some direct kernel structure? How the heck is it safe to iterate, given that > > task_t could be some other process? > > task_dyld_info contains pointers into task’s address space. If you’re in a > different task, you need to call mach_vm_read to get at its memory (and then > obviously mach_vm_deallocate it when you’re done with it). If you’re the task, > you can take advantage of the fact that a pointer in your address space is, um, > a valid pointer to your own address space. So, is this thread safe? Perhaps implicitly, because load/unload can only happen on the main thread?
On 2013/03/12 20:43:53, shess wrote: > https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm > File base/native_library_mac.mm (right): > > https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm#ne... > base/native_library_mac.mm:72: return false; > On 2013/03/12 20:19:43, rsesek wrote: > > On 2013/03/12 20:10:38, shess wrote: > > > AFAICT, in this case we couldn't find the information, so we don't know if > it > > is > > > safe or not. Seems like you need to say "Is definitely safe", "Is > definitely > > > unsafe" or "Hellifiknow". Or change to "ImageDoesNotContainObjectiveC()". > > > > I see your point, but what should happen in the HellIfIKnowCase? Is it any > > different than this? > > Am I reading it wrong? In the HellIfIKnow case, this code cannot conclusively > determine if it is safe to unload the library, but since it returns false the > code will optimistically attempt it. I think an argument could be made that in > that case it should just be leaked. > > Well, either leaked, or it should be a CHECK() so that we can figure out WTF is > happening to these users. It does sound smart to assume ObjC in the unknown case. So done that. I did notice that CoreAudio in the browser process triggers this vlog, so that's something to look into. I don't think we should CHECK that, though. https://codereview.chromium.org/12793004/diff/1/base/native_library_mac.mm#ne... > base/native_library_mac.mm:117: if > (ImageContainsObjectiveC(library->image_path)) { > On 2013/03/12 20:19:43, rsesek wrote: > > On 2013/03/12 20:10:38, shess wrote: > > > Is there any downside to looking this up at Load and adding a "NOUNLOAD" to > > the > > > type enum? > > > > We think alike ;). But that's not possible because we don't actually load the > > executable manually - we wait for GetFunctionPointerFromNativeLibrary and let > > CFBundle do it lazily. Therefore we have to do it at unload, otherwise it's > not > > loaded and we can't get its info. > > Sigh. Is there any case which differs from: > Line 1: Lazy-load the library. > Line 2: Force the library to be loaded. > ??? > > I hate lazy, it just means all of your errors are in the future, after all of > your error-checking is already done. It should be non-lazy unless you > explicitly ask for it. This is the Apple-recommended way to load a CFBundle. They explicitly recommend against calling CFBundleLoadExecutable.
No, it’s not thread-safe at all.
+avi for content/ to revert Scott's change
lgtm okey dokey
On 2013/03/13 16:30:24, Avi wrote: > lgtm > > okey dokey Thanks. Scott: am I clear to land?
LGTM, since Mark seemed nonchalant about the thread-safety issue...
Well, the other thing you could do is find the mach_header directly from a void* (from dlopen) or a CFBundleRef. You don’t need to hold a dyld lock for either of those to work, as long as you’re sure someone’s holding a ref, which is safe because you are. But you’ll be attacking the problem by digging into more internals rather than using a sorta defined API.
On 2013/03/13 16:45:40, Mark Mentovai wrote: > Well, the other thing you could do is find the mach_header directly from a void* > (from dlopen) or a CFBundleRef. You don’t need to hold a dyld lock for either of > those to work, as long as you’re sure someone’s holding a ref, which is safe > because you are. But you’ll be attacking the problem by digging into more > internals rather than using a sorta defined API. Yeah, I was actually going to suggest that, but AFAICT it would mean separate code paths. Though you could dlopen() the image path in case of bundle, I guess. But then I realized that I couldn't figure out WTF symbols one would be looking for in that case.
Here’s what it’s gonna take. 1. Instead of asking the kernel for dyld info, you can use call _dyld_image_count, and loop calling _dyld_get_image_name, and use _dyld_get_image_header to find the mach_header. <mach-o/dyld.h>. This doesn’t make it thread safe, but it saves a system call each time you need to do this whole thing. 2. Based on that, it looks like the recommended (and thread-safe) interface to get dyld to do what you want is to call dladdr, asking it for an address that lies somewhere in the module you’re interested in. But you might not have that. 3. You can use the dyld notifier functions to specify callbacks that will be called when an image is loaded or unloaded: _dyld_register_func_for_add_image and _dyld_register_func_for_remove_image. The callbacks will receive mach_header* arguments. You can tie that back to the image’s filename using dladdr as in (2) above. When you add a notifier, it’s supposed to be called for all of the images that are already loaded, so there is a way to bootstrap your map properly even though it’s impossible for you to call _dyld_register_func_for_add_image before any modules have been loaded. This is thread-safe for notifications that arrive as images are loaded (the callback is called under dyld’s lock), but looks like it might not be thread-safe for the initial batch of calls to your “add” callback. 4. Using (3), you can build a map of image filenames to mach_header pointers. Or, relevant to the “why be lazy?” discussion from a few review e-mails ago, you can use the mach_header to look for Objective-C at image load time and do something intelligent with that, rather than deferring it to unload time. (But doing this for all images that have already loaded might suck.) 5. To answer Scott’s question about not knowing what to make of a dlopen void* or a CFBundleRef, you can get the void* from a CFBundleRef by poking into its internals: you usually want its _handleCookie field, but I guess in some cases it’d be _moduleCookie or _imageCookie. (CF-744.12/CFBundle.c). Once you have a void*, you clear the low two bits (which are used to stash options like RTLD_FIRST), treat it as an ImageLoaderMachO* (dyld-210.2.3/src/ImageLoaderMachO.h), and look at its fMachOData field. That’s the mach_header. But poking at structure internals like this, there’s no promise that any of it will be stable from CF version to CF version, or dyld version to dyld version. That sucks. What do you wanna do about this?
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/rsesek@chromium.org/12793004/16001
I don't know why the CQ waited so long to pick this up… On 2013/03/13 18:45:22, Mark Mentovai wrote: > Here’s what it’s gonna take. > > 1. Instead of asking the kernel for dyld info, you can use call > _dyld_image_count, and loop calling _dyld_get_image_name, and use > _dyld_get_image_header to find the mach_header. <mach-o/dyld.h>. This doesn’t > make it thread safe, but it saves a system call each time you need to do this > whole thing. This seems OKish. > 2. Based on that, it looks like the recommended (and thread-safe) interface to > get dyld to do what you want is to call dladdr, asking it for an address that > lies somewhere in the module you’re interested in. But you might not have that. We don't know the address of symbols in modules with this API, so that doesn't seem workable. > 3. You can use the dyld notifier functions to specify callbacks that will be > called when an image is loaded or unloaded: _dyld_register_func_for_add_image > and _dyld_register_func_for_remove_image. The callbacks will receive > mach_header* arguments. You can tie that back to the image’s filename using > dladdr as in (2) above. When you add a notifier, it’s supposed to be called for > all of the images that are already loaded, so there is a way to bootstrap your > map properly even though it’s impossible for you to call > _dyld_register_func_for_add_image before any modules have been loaded. This is > thread-safe for notifications that arrive as images are loaded (the callback is > called under dyld’s lock), but looks like it might not be thread-safe for the > initial batch of calls to your “add” callback. I don't know how I feel about this. > 4. Using (3), you can build a map of image filenames to mach_header pointers. > Or, relevant to the “why be lazy?” discussion from a few review e-mails ago, you > can use the mach_header to look for Objective-C at image load time and do > something intelligent with that, rather than deferring it to unload time. (But > doing this for all images that have already loaded might suck.) Is this worth it? How big of an issue is this thread safety going to be in practice? > 5. To answer Scott’s question about not knowing what to make of a dlopen void* > or a CFBundleRef, you can get the void* from a CFBundleRef by poking into its > internals: you usually want its _handleCookie field, but I guess in some cases > it’d be _moduleCookie or _imageCookie. (CF-744.12/CFBundle.c). Once you have a > void*, you clear the low two bits (which are used to stash options like > RTLD_FIRST), treat it as an ImageLoaderMachO* > (dyld-210.2.3/src/ImageLoaderMachO.h), and look at its fMachOData field. That’s > the mach_header. But poking at structure internals like this, there’s no promise > that any of it will be stable from CF version to CF version, or dyld version to > dyld version. That sucks. Yeah, I thought about poking this initially. But this is heinous. > What do you wanna do about this? I dunno. You tell me.
On 2013/03/13 18:45:22, Mark Mentovai wrote: > What do you wanna do about this? I'm already curled up under my desk crying. Who even calls this code? Can we assert that they only do it on the UI thread or something? Hmm, I guess we can't control random system stuff like qtkit. In the end I'm willing to just go for it, what could go wrong? Does this imply that using some other task_t has a race condition between when you get the result and when you map the buffer over to your address space? It seems possible/likely that this only represents a problem if other threads are unloading modules while this is inspecting the header. Loading probably adds to the end. Unless it moves the data structure around, in which case doom. But this gets back to "Who even calls this code?" If it only happens in plug-in startup or something, there's little to no scope for background threads to screw things up. WRT 5), dlopen() ref-counts, so just dlopen() the file again and dlclose() when done: "A second call to dlopen() with the same path will return the same handle, but the internal reference count for the handle will be incremented. Therefore all dlopen() calls should be balanced with a dlclose() call." --- Of course, there's no reason why we need to inspect runtime data structures. I can see this information using otool. I'd rather not write a Mach-O reader for this, but doing so presents no thread-safety issues at all, and does't involve insane dyld or CFBundle internals knowledge. I'm not suggesting that we do this, really. Like I said, it seems pretty unlikely to go wrong if we just leave it as-is.
shess@chromium.org wrote: > I'm already curled up under my desk crying. > That’s what I was going for. Who even calls this code? Can we assert that they only do it on the UI > thread > or something? Hmm, I guess we can't control random system stuff like > qtkit. Right. In the end I'm willing to just go for it, what could go wrong? > We’d decide that some module has Objective-C when it really doesn’t, and we’d avoid unloading it, and maybe it had finalizers that it really wanted to run. Not too compelling. OK, we’d decide that some module doesn’t have Objective-C when it really does, and we’d unload it, and then crash. Or we’d run off the end of a shrunken vector and start dereferencing deallocated memory and crash. Or the vector’s backing memory would get reallocated and its old memory would be reassigned and we’d start reading that and crash. In other words, not too bad, the usual thread safety concerns. Does this imply that using some other task_t has a race condition between > when > you get the result and when you map the buffer over to your address space? You’re supposed to either explicitly suspend the victim task before you ask the kernel for TASK_DYLD_INFO, or be working from a context where it’s already implicitly suspended (like in an EXC_CRASH handler). It seems possible/likely that this only represents a problem if other > threads > are unloading modules while this is inspecting the header. Loading > probably > adds to the end. Unless it moves the data structure around, in which case > doom. > But this gets back to "Who even calls this code?" If it only happens in > plug-in startup or something, there's little to no scope for background > threads > to screw things up. > System frameworks load stuff dynamically at runtime. Things happen on threads. “We don’t know but it could happen.” I’m sure it does happen. Sometimes. Maybe not enough to hurt us, but it happens. I LGTMed this already because it sucks but it’s hard to do properly. I’ve proposed a solution that handles this in a thread-safe way but it’s more work and the callbacks are a pain and it might just be plain hateful. I’ve also proposed something that just gets rid of the system call and deals with the entire problem in user space with exactly the same thread safety concerns. We should at least do that one. I also have another option, 💩, in which we wait for someone to call dlsym. Once they’ve done that successfully, we have an address to something within the module, and we can call dladdr to get its mach_header and use that to figure out Objective-C-ness. If nobody ever has a successful call to dlsym, we can just treat it as having had Objective-C, or treat it as not having had any Objective-C, or whatever. This is totally thread-safe, but risks calling it wrong when there’s no successful call to dlsym. For the purposes of this 💩 option, dlsym really means GetFunctionPointerFromNativeLibrary, which is either dlsym or CFBundleGetFunctionPointerForName. The final option, ♋, is to just allays refuse to dlclose things we’ve dlopened via this interface. (But what about those finalizers that really want to run? There’s probably some module that gives a 💩.)
Well, objc-load.mm says: * objc_loadModule. * * NOTE: Loading isn't really thread safe. awesome. I'm still LGTM, I don't think we're going to resolve it just now.
On 2013/03/13 20:05:51, shess wrote: > Well, objc-load.mm says: > * objc_loadModule. > * > * NOTE: Loading isn't really thread safe. > > awesome. > > I'm still LGTM, I don't think we're going to resolve it just now. Sometimes your brain just won't let go, and you end up with: https://codereview.chromium.org/12827003 it was really awesome until I realized I'd have to copy in the dyld_priv.h bit. Sigh.
I've switched to the _dyld functions to avoid the system call (your #1). Thinking about, though, I'm not concerned about the thread safety, and here's why (but tell me if I'm wrong): we're specifically concerned with libraries that have been loaded using this base::NativeLibrary API, which means that the library in question won't be unloaded outside this API. And since we're *only* looking for that one library, it doesn't matter if the list of other loaded modules changes. I'm also not sure what the thread safety considerations imply for how you should iterate: do you cache the count? reverse iterate?
I’m answering the thread-safety question that was asked. I’m not advocating any particular solution. Although you’re only concerned with finding modules that have been loaded through our interface and we can impose arbitrary limitations on the interface, you’re talking to dyld to look through its list of loaded modules. There’s no stopping someone on a different thread—code we may not even control—from asking dyld to load or unload a module. It doesn’t matter that we’re looking for a module that we loaded, we have to look through dyld’s process-global list of all modules to find it. Let’s say the module list initially has four modules in it: A, B, C, and D. You’re looking for C. You loop through and find A and B, neither of which are it. Then some other thread unloads B. The list now has three modules: A, C, and D. You look in the next slot and find D. Even if you consulted the count again and didn’t run off the end of the list, you missed finding your module, C. In practice, with these APIs, even consulting the count again doesn’t protect you from running off the end of the list because a module could be unloaded after you recheck the count. It doesn’t really help to walk the list backwards. Say you’re looking for B. You look at D, and it’s not a match. Then another thread unloads A and C. When you get scheduled again, you look at the third slot, which isn’t valid at all anymore. In reality, dyld is using a vector, and _dyld_get_image_header doesn’t do any sort of locking to pin the vector down, so it’s possible for _dyld_get_image_header to do the wrong thing if another thread winds up in a dyld operation that grows the vector. The “wrong thing” could be a crash in dyld (if, say, the vector’s old memory wound up unmapped) or returning totally bogus data to you that would likely cause you to crash (if, say, the vector’s old memory was allocated to something else that filled it with its own data).
Patch set 5 LGTM as a non-thread-safe solution. For a thread-safe solution, of all of the options we’ve discussed, I think that waiting for someone to call dlsym and using the result to get the image header via dladdr would be cleanest.
On 2013/03/14 14:41:33, Mark Mentovai wrote: > Patch set 5 LGTM as a non-thread-safe solution. Understood about the array stability of dyld's module list. And just confirming (based on what I read in xnu/osfmk/kern/task.c) that the task_info() version in patch set 4 has the exact same problem, since the kernel only locks the task for the duration of the system call. The dyld version is just slightly faster. > For a thread-safe solution, of all of the options we’ve discussed, I think that > waiting for someone to call dlsym and using the result to get the image header > via dladdr would be cleanest. Agreed it would be the cleanest, but I think it could still cause problems unless we assumed that all libraries had ObjC until we verified that they didn't. Otherwise, you could LoadNativeLibrary, instantiate ObjC classes without using GetFunctionPointerFromNativeLibrary, and then close the library. If we didn't assume the library had ObjC, we'd try to unload it and have this problem again.
rsesek@chromium.org wrote: > Understood about the array stability of dyld's module list. And just > confirming > (based on what I read in xnu/osfmk/kern/task.c) that the task_info() > version in > patch set 4 has the exact same problem, since the kernel only locks the > task for > the duration of the system call. The dyld version is just slightly faster. Patch sets 4 and 5 are exactly as thread-safe as one another, and behave identically. Avoiding system call overhead is faster. The task_lock you found in the kernel is an in-kernel lock only. The only thing it’s protecting is the integrity of the in-kernel task structures. It doesn’t have any impact on user space at all. When one thread in a process calls task_info, other threads in the process aren’t suspended while the kernel takes the task_lock in the task_info thread. unless we assumed that all libraries had ObjC until we verified that they > didn't. Otherwise, you could LoadNativeLibrary, instantiate ObjC classes > without > using GetFunctionPointerFromNativeLi**brary, and then close the library. > If we > didn't assume the library had ObjC, we'd try to unload it and have this > problem > again. > Of these two options, which is more likely, and which do we have better control over? Which can we audit and fix? Which can’t we do anything about? 1. dlsym/dladdr. For this to break, some code has to use the base::NativeLibrary interface to load a module, not look up any symbols in it, use Objective-C classes from it, and close the module. Since it’s using base::NativeLibrary, it has to be Chrome code. Why would it load a module but not look up any symbols? If it was a bundle, you said CFBundle loads the module lazily, so the Objective-C wouldn’t even be present until someone tried looking up a symbol. Even so, why would it use Objective-C and then try to close the module? If it’s using the library for Objective-C but not looking up symbols, why would it be using the base::NativeLibrary interface at all instead of using NSBundle or something else directly? There are some pretty weird assumptions that have to hold in order for this to break, and we can audit and fix them. 2. Walk the module list non-thread-safely. For this to break, some code has to use dyld on any thread at any time. This can be any code, it doesn’t have to be Chrome code. It can be system code or code from some other module we’ve loaded. These are perfectly normal and reasonable things to assume are happening, and we can’t do anything to audit or fix them. Taking the example of the problem module in this bug: who’s loading it? dlopen or CFBundle? Are we looking up any symbols? Where’s the Objective-C coming from?
On 2013/03/14 16:09:22, Mark Mentovai wrote: > unless we assumed that all libraries had ObjC until we verified that they > > didn't. Otherwise, you could LoadNativeLibrary, instantiate ObjC classes > > without > > using GetFunctionPointerFromNativeLi**brary, and then close the library. > > If we > > didn't assume the library had ObjC, we'd try to unload it and have this > > problem > > again. > > > > Of these two options, which is more likely, and which do we have better > control over? Which can we audit and fix? Which can’t we do anything about? > > 1. dlsym/dladdr. For this to break, some code has to use the > base::NativeLibrary interface to load a module, not look up any symbols in > it, use Objective-C classes from it, and close the module. Since it’s using > base::NativeLibrary, it has to be Chrome code. Why would it load a module > but not look up any symbols? If it was a bundle, you said CFBundle loads > the module lazily, so the Objective-C wouldn’t even be present until > someone tried looking up a symbol. Even so, why would it use Objective-C > and then try to close the module? If it’s using the library for Objective-C > but not looking up symbols, why would it be using the base::NativeLibrary > interface at all instead of using NSBundle or something else directly? > There are some pretty weird assumptions that have to hold in order for this > to break, and we can audit and fix them. The only time I could see this happening is for importing a ObjC FutureCat framework. But you're right that there are other system APIs to load those bundles, and we'd be able to audit them.. I just wanted to point out a possible sidestepping of this mechanism. > 2. Walk the module list non-thread-safely. For this to break, some code has > to use dyld on any thread at any time. This can be any code, it doesn’t > have to be Chrome code. It can be system code or code from some other > module we’ve loaded. These are perfectly normal and reasonable things to > assume are happening, and we can’t do anything to audit or fix them. > > Taking the example of the problem module in this bug: who’s loading it? > dlopen or CFBundle? Are we looking up any symbols? Where’s the Objective-C > coming from? I think (1) is a lot less likely than (2) and we'd be able to audit the cases in (1) more easily.
I don't like checking in known thread-unsafe things. Let's just do the dladdr thing now. PTAL.
LGTM https://codereview.chromium.org/12793004/diff/46001/base/native_library_mac.mm File base/native_library_mac.mm (right): https://codereview.chromium.org/12793004/diff/46001/base/native_library_mac.m... base/native_library_mac.mm:85: "Not unloading NativeLibrary because it contains an ObjC segment."; This message isn’t strictly correct if the status is still OBJC_UNKNOWN. https://codereview.chromium.org/12793004/diff/46001/base/native_library_mac.m... base/native_library_mac.mm:111: if (library->objc_status == OBJC_UNKNOWN) { && function_pointer
https://codereview.chromium.org/12793004/diff/46001/base/native_library_mac.mm File base/native_library_mac.mm (right): https://codereview.chromium.org/12793004/diff/46001/base/native_library_mac.m... base/native_library_mac.mm:85: "Not unloading NativeLibrary because it contains an ObjC segment."; On 2013/03/14 18:22:36, Mark Mentovai wrote: > This message isn’t strictly correct if the status is still OBJC_UNKNOWN. Done. https://codereview.chromium.org/12793004/diff/46001/base/native_library_mac.m... base/native_library_mac.mm:111: if (library->objc_status == OBJC_UNKNOWN) { On 2013/03/14 18:22:36, Mark Mentovai wrote: > && function_pointer Done.
I like this better, and it doesn't feel any more crazy. The other approach was nominally more supported, but it wasn't like it had good documentation or examples or anything to support using it without having read various internal stuff anyhow. https://codereview.chromium.org/12793004/diff/46001/base/native_library_mac.mm File base/native_library_mac.mm (right): https://codereview.chromium.org/12793004/diff/46001/base/native_library_mac.m... base/native_library_mac.mm:115: reinterpret_cast<const struct mach_header*>(info.dli_fbase); I think that you should pass in a void* (or whatever) and have the 32-bit and 64-bit versions each have a custom crazy undocumented cast, rather than having the cast in two places. Heck, pass function_pointer in and let the test code handle all the crazy.
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/rsesek@chromium.org/12793004/50001
https://chromiumcodereview.appspot.com/12793004/diff/50001/base/native_librar... File base/native_library_mac.mm (right): https://chromiumcodereview.appspot.com/12793004/diff/50001/base/native_librar... base/native_library_mac.mm:85: << "segment. library->objc_status = " << library->objc_status; Remove the first <<. Let char[] constants do their job. Simplify the object code.
This review is the gift that keeps on giving. Speak now or forever hold your peace. https://chromiumcodereview.appspot.com/12793004/diff/46001/base/native_librar... File base/native_library_mac.mm (right): https://chromiumcodereview.appspot.com/12793004/diff/46001/base/native_librar... base/native_library_mac.mm:115: reinterpret_cast<const struct mach_header*>(info.dli_fbase); On 2013/03/14 18:30:39, shess wrote: > I think that you should pass in a void* (or whatever) and have the 32-bit and > 64-bit versions each have a custom crazy undocumented cast, rather than having > the cast in two places. > > Heck, pass function_pointer in and let the test code handle all the crazy. Done. https://chromiumcodereview.appspot.com/12793004/diff/50001/base/native_librar... File base/native_library_mac.mm (right): https://chromiumcodereview.appspot.com/12793004/diff/50001/base/native_librar... base/native_library_mac.mm:85: << "segment. library->objc_status = " << library->objc_status; On 2013/03/14 18:32:43, Mark Mentovai wrote: > Remove the first <<. Let char[] constants do their job. Simplify the object > code. Done.
rsesek@chromium.org wrote: > This review is the gift that keeps on giving. Speak now or forever hold > your > peace. > PHRASING! lgtm
lgtm Tests? :-). LGTM. https://chromiumcodereview.appspot.com/12793004/diff/64001/base/native_librar... File base/native_library_mac.mm (right): https://chromiumcodereview.appspot.com/12793004/diff/64001/base/native_librar... base/native_library_mac.mm:21: const void* function_pointer) { Is void* even a valid assignment for a function pointer? According to section 703.2 of the 1999 ... blah blah blah. https://chromiumcodereview.appspot.com/12793004/diff/64001/base/native_librar... base/native_library_mac.mm:38: reinterpret_cast<const struct mach_header*>(info.dli_fbase), Perfect! Symmetric grodiness.
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/rsesek@chromium.org/12793004/64001
https://chromiumcodereview.appspot.com/12793004/diff/64001/base/native_librar... File base/native_library_mac.mm (right): https://chromiumcodereview.appspot.com/12793004/diff/64001/base/native_librar... base/native_library_mac.mm:21: const void* function_pointer) { shess wrote: > Is void* even a valid assignment for a function pointer? According to section > 703.2 of the 1999 ... blah blah blah. It’s not valid, but it’s probably only relevant on a strict Harvard architecture. On a modified Harvard architecture machine (as is the case here) or a von Neumann architecture, it’s generally fine. And in this case, we’re treating it as a data pointer anyway. Is that what you were looking for? ☻💩
On 2013/03/14 21:17:55, Mark Mentovai wrote: > https://chromiumcodereview.appspot.com/12793004/diff/64001/base/native_librar... > File base/native_library_mac.mm (right): > > https://chromiumcodereview.appspot.com/12793004/diff/64001/base/native_librar... > base/native_library_mac.mm:21: const void* function_pointer) { > shess wrote: > > Is void* even a valid assignment for a function pointer? According to section > > 703.2 of the 1999 ... blah blah blah. > > It’s not valid, but it’s probably only relevant on a strict Harvard > architecture. On a modified Harvard architecture machine (as is the case here) > or a von Neumann architecture, it’s generally fine. And in this case, we’re > treating it as a data pointer anyway. > > Is that what you were looking for? ☻💩 Well, and dlsym() and CFBundleGetFunctionPointerForName() are defined to return void* anyhow. So even if Chromium were perfectly pedantic on this with some sort of funcptr_t type, it wouldn't matter, the battle is already lost. Wouldn't it be great if template expansion gets to the point where it actually makes sense to have an architecture where Chromium's code is mapped int a space larger than the heap?
Message was sent while issue was closed.
Change committed as 188356 |