|
|
Created:
3 years, 10 months ago by Avi (use Gerrit) Modified:
3 years, 8 months ago CC:
chromium-reviews, mac-reviews_chromium.org, vmpstr+watch_chromium.org Target Ref:
refs/heads/master Project:
chromium Visibility:
Public. |
DescriptionNativeStackSampler implementation for Mac.
Stack sampling on Mac implemented for StackSamplingProfiler to provide
Chrome stack trace data from Mac users to UMA.
BUG=531673
Review-Url: https://codereview.chromium.org/2702463003
Cr-Commit-Position: refs/heads/master@{#461145}
Committed: https://chromium.googlesource.com/chromium/src/+/965a47b7737c6f15238447d8d8fb7105183acc8a
Patch Set 1 #Patch Set 2 : rev #Patch Set 3 : 0u #
Total comments: 19
Patch Set 4 : Mike #Patch Set 5 : fix #
Total comments: 25
Patch Set 6 : mark #Patch Set 7 : config #Patch Set 8 : fix for test #Patch Set 9 : rev 2 #Patch Set 10 : ios fix? #Patch Set 11 : guess not re ios #
Total comments: 10
Patch Set 12 : mike #
Total comments: 2
Patch Set 13 : uintptr_t #Patch Set 14 : ios build fix; don't run tests on ios #
Total comments: 54
Patch Set 15 : mark/robert #Patch Set 16 : fix #Patch Set 17 : fix config #
Total comments: 6
Patch Set 18 : rev #
Total comments: 7
Patch Set 19 : rev #Patch Set 20 : . #
Total comments: 8
Patch Set 21 : rev #
Messages
Total messages: 181 (116 generated)
The CQ bit was checked by avi@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was checked by avi@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...)
The CQ bit was checked by avi@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...)
The CQ bit was checked by avi@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: cast_shell_linux on master.tryserver.chromium.linux (JOB_TIMED_OUT, no build URL) chromeos_amd64-generic_chromium_compile_only_ng on master.tryserver.chromium.linux (JOB_TIMED_OUT, no build URL) linux_chromium_chromeos_rel_ng on master.tryserver.chromium.linux (JOB_TIMED_OUT, no build URL) linux_chromium_tsan_rel_ng on master.tryserver.chromium.linux (JOB_TIMED_OUT, no build URL)
The CQ bit was checked by avi@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
avi@chromium.org changed reviewers: + mark@chromium.org, rsesek@chromium.org, wittman@chromium.org
Mike: For integration into the stack sampler world. There are a few cleanups to the Windows side that I might do, inspired by my CL here. Mark & Robert: Mac stuff, for your amusement and review. https://codereview.chromium.org/2702463003/diff/40001/base/BUILD.gn File base/BUILD.gn (right): https://codereview.chromium.org/2702463003/diff/40001/base/BUILD.gn#newcode2317 base/BUILD.gn:2317: data_deps += [ ":base_profiler_test_support_library" ] How does this work without this change? A normal "deps" means that you're hard-linking a requirement to this support library, which on the Mac at least means that you can't unload it. https://codereview.chromium.org/2702463003/diff/40001/base/profiler/stack_sam... File base/profiler/stack_sampling_profiler_unittest.cc (right): https://codereview.chromium.org/2702463003/diff/40001/base/profiler/stack_sam... base/profiler/stack_sampling_profiler_unittest.cc:554: // This is the same case as |wait_until_unloaded|. FYI this isn't the same as above. Above is: EXPECT_EQ(2, sample.frames.end() - end_frame) whereas this takes one off the end and still expects it to be 2.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...)
https://codereview.chromium.org/2702463003/diff/40001/base/BUILD.gn File base/BUILD.gn (right): https://codereview.chromium.org/2702463003/diff/40001/base/BUILD.gn#newcode2317 base/BUILD.gn:2317: data_deps += [ ":base_profiler_test_support_library" ] On 2017/02/16 06:11:15, Avi wrote: > How does this work without this change? A normal "deps" means that you're > hard-linking a requirement to this support library, which on the Mac at least > means that you can't unload it. Apparently there's no corresponding restriction on Windows. I just verified that the module gets unloaded in the test. https://codereview.chromium.org/2702463003/diff/40001/base/profiler/native_st... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/40001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:58: // Fills |state| with |target_thread|'s context. Should we have deadlock warnings analogous to the ones in the Windows code on this function, CopyStackAndRewritePointers, and SuspendThreadAndRecordStack? https://codereview.chromium.org/2702463003/diff/40001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:156: step_result = unw_step(&unwind_cursor); General questions: How is unwinding of leaf functions handled on Mac; do leaf functions have unwind information? If they don't, does unw_step properly handle them? If the libunwind implementation assumes that the current thread is unwinding itself then I could see it never having to support the leaf function case. https://codereview.chromium.org/2702463003/diff/40001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:479: auto profile_module_index = &profile_module_index_; Can we add a sample->frames.reserve() call here to minimize the number of memory allocations? (Similar to https://cs.chromium.org/chromium/src/base/profiler/native_stack_sampler_win.c...) https://codereview.chromium.org/2702463003/diff/40001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:480: WalkStack(thread_context, [sample, current_modules, Nice, using lambdas makes for a much cleaner solution. https://codereview.chromium.org/2702463003/diff/40001/base/profiler/stack_sam... File base/profiler/stack_sampling_profiler_unittest.cc (right): https://codereview.chromium.org/2702463003/diff/40001/base/profiler/stack_sam... base/profiler/stack_sampling_profiler_unittest.cc:554: // This is the same case as |wait_until_unloaded|. On 2017/02/16 06:11:15, Avi wrote: > FYI this isn't the same as above. > > Above is: > > EXPECT_EQ(2, sample.frames.end() - end_frame) > > whereas this takes one off the end and still expects it to be 2. Good catch. This should be just "if (sample.frames.end() - end_frame == 2)". Can you make the change? Something must have changed in the test execution or the system libraries since this was implemented, because I remember being forced to address it due to flaky failures when it was originally implemented. I'm guessing the inconsistency was due to an update to the above condition that didn't get propagated here and was never noticed because the path isn't being triggered.
The CQ bit was checked by avi@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-device on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device/builds...) ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...) ios-simulator-xcode-clang on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator-xco...)
The CQ bit was checked by avi@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-device on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device/builds...) ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...) ios-simulator-xcode-clang on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator-xco...)
The CQ bit was checked by avi@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-device on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device/builds...) ios-device-xcode-clang on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device-xcode-...) ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...) ios-simulator-xcode-clang on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator-xco...)
The CQ bit was checked by avi@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-device-xcode-clang on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device-xcode-...) ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...) ios-simulator-xcode-clang on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator-xco...)
The CQ bit was checked by avi@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-device on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device/builds...) ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...)
The CQ bit was checked by avi@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-device on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device/builds...) ios-device-xcode-clang on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device-xcode-...) ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...) ios-simulator-xcode-clang on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator-xco...) mac_chromium_compile_dbg_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_comp...)
The CQ bit was checked by avi@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Well, the trybots are pretty broken, but here you go for a new version. https://codereview.chromium.org/2702463003/diff/40001/base/BUILD.gn File base/BUILD.gn (right): https://codereview.chromium.org/2702463003/diff/40001/base/BUILD.gn#newcode2317 base/BUILD.gn:2317: data_deps += [ ":base_profiler_test_support_library" ] On 2017/02/16 21:51:34, Mike Wittman wrote: > On 2017/02/16 06:11:15, Avi wrote: > > How does this work without this change? A normal "deps" means that you're > > hard-linking a requirement to this support library, which on the Mac at least > > means that you can't unload it. > > Apparently there's no corresponding restriction on Windows. I just verified that > the module gets unloaded in the test. :\ This is technically more correct, as the test intends to have an independent library being loaded and unloaded, rather than unloading one that was automatically loaded. I'm planning on leaving this in, even if it works today as is. https://codereview.chromium.org/2702463003/diff/40001/base/profiler/native_st... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/40001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:58: // Fills |state| with |target_thread|'s context. On 2017/02/16 21:51:34, Mike Wittman wrote: > Should we have deadlock warnings analogous to the ones in the Windows code on > this function, CopyStackAndRewritePointers, and SuspendThreadAndRecordStack? Done. https://codereview.chromium.org/2702463003/diff/40001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:156: step_result = unw_step(&unwind_cursor); On 2017/02/16 21:51:34, Mike Wittman wrote: > General questions: How is unwinding of leaf functions handled on Mac; do leaf > functions have unwind information? If they don't, does unw_step properly handle > them? > > If the libunwind implementation assumes that the current thread is unwinding > itself then I could see it never having to support the leaf function case. I believe the compiler puts in full debug info even for leaf functions, but I think Mark would be able to comment here better than I can. https://codereview.chromium.org/2702463003/diff/40001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:479: auto profile_module_index = &profile_module_index_; On 2017/02/16 21:51:34, Mike Wittman wrote: > Can we add a sample->frames.reserve() call here to minimize the number of memory > allocations? (Similar to > https://cs.chromium.org/chromium/src/base/profiler/native_stack_sampler_win.c...) Ah! Done. https://codereview.chromium.org/2702463003/diff/40001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:480: WalkStack(thread_context, [sample, current_modules, On 2017/02/16 21:51:34, Mike Wittman wrote: > Nice, using lambdas makes for a much cleaner solution. Acknowledged. https://codereview.chromium.org/2702463003/diff/40001/base/profiler/stack_sam... File base/profiler/stack_sampling_profiler_unittest.cc (right): https://codereview.chromium.org/2702463003/diff/40001/base/profiler/stack_sam... base/profiler/stack_sampling_profiler_unittest.cc:554: // This is the same case as |wait_until_unloaded|. On 2017/02/16 21:51:34, Mike Wittman wrote: > On 2017/02/16 06:11:15, Avi wrote: > > FYI this isn't the same as above. > > > > Above is: > > > > EXPECT_EQ(2, sample.frames.end() - end_frame) > > > > whereas this takes one off the end and still expects it to be 2. > > Good catch. This should be just "if (sample.frames.end() - end_frame == 2)". Can > you make the change? > > Something must have changed in the test execution or the system libraries since > this was implemented, because I remember being forced to address it due to flaky > failures when it was originally implemented. I'm guessing the inconsistency was > due to an update to the above condition that didn't get propagated here and was > never noticed because the path isn't being triggered. Spinning off into a separate review.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...)
The CQ bit was checked by avi@chromium.org to run a CQ dry run
This is only partial. https://codereview.chromium.org/2702463003/diff/40001/base/profiler/native_st... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/40001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:156: step_result = unw_step(&unwind_cursor); Avi wrote: > On 2017/02/16 21:51:34, Mike Wittman wrote: > > General questions: How is unwinding of leaf functions handled on Mac; do leaf > > functions have unwind information? If they don't, does unw_step properly > handle > > them? > > > > If the libunwind implementation assumes that the current thread is unwinding > > itself then I could see it never having to support the leaf function case. > > I believe the compiler puts in full debug info even for leaf functions, but I > think Mark would be able to comment here better than I can. It can always be suppressed, stripped, or never provided in the first place as in the case of custom asm without the provision of CFI directives like the syscall wrappers. But yes, normally, even leaf functions get unwind info. https://codereview.chromium.org/2702463003/diff/80001/base/BUILD.gn File base/BUILD.gn (right): https://codereview.chromium.org/2702463003/diff/80001/base/BUILD.gn#newcode1828 base/BUILD.gn:1828: shared_library("base_profiler_test_support_library") { I believe you want this to be a loadable_module and not a shared_library. Dependents don’t link against loadable modules (as opposed to shared libraries), so that you can be in charge of loading them yourself. https://codereview.chromium.org/2702463003/diff/80001/base/profiler/native_st... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/80001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:33: // Darwin. ? It’s never going to not be the double-underscore versions in the 64-bit environment. Don’t define your own, just go with x86_thread_state64_t. https://codereview.chromium.org/2702463003/diff/80001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:200: // file followed by the instruction pointer. Coincidentally, the first 17 “register file plus instruction pointer” is weird because the instruction pointer is part of the register file. x86_thread_state64_t is actually a more complete (but still incomplete) snapshot of the register file than unw_context_t. Also, the ordering of fields is arbitrary in both unw_context_t and x86_thread_state64_t (the CPU doesn’t expose any native structure for this stuff unlike floating-point state, it’s all arbitrary in software), so this part of the comment isn’t helpful. You should just say that the first 17 fields of each structure are identical. https://codereview.chromium.org/2702463003/diff/80001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:225: uint32_t SwapIfBig32(uint32_t x, bool swap) { You don’t need any of this swapping stuff. https://codereview.chromium.org/2702463003/diff/80001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:234: off_t GetMach64HeaderOffset(const void* module_addr) { You don’t need any of this fat stuff either. https://codereview.chromium.org/2702463003/diff/80001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:277: const load_command* current_cmd = reinterpret_cast<const load_command*>( This loop needs to be cognizant of not exceeding header->sizeofcmds. It’s not enough to just loop up to header->ncmds. I just reviewed a patch for Erik this week that’s yet another Mach-O load command walker, you should take a look and perhaps share code but definitely share ideas. We’ve got more of these than we really should have in Chrome as it is. His was https://codereview.chromium.org/2674973004. https://codereview.chromium.org/2702463003/diff/80001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:281: const uuid_command* uuid_cmd = Also need to check that current_cmd->cmdsize is at least big enough for a uuid_cmd. https://codereview.chromium.org/2702463003/diff/80001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:285: memcpy(id, &uuid_cmd->uuid, sizeof(uuid_t)); (if you were swapping, and you’re not, this would need to be swapped too.) https://codereview.chromium.org/2702463003/diff/80001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:296: bool GetUUID(const void* module_addr, unsigned char* id) { (this is the only caller of the swappy and fatty functions) https://codereview.chromium.org/2702463003/diff/80001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:311: std::string GetUniqueId(const void* module_addr) { (this is the only caller of the caller of the swappy and fatty functions) https://codereview.chromium.org/2702463003/diff/80001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:332: reinterpret_cast<uintptr_t>(inf.dli_fbase), GetUniqueId(inf.dli_fbase), (this is the only caller of the caller of the caller of the swappy and fatty functions). Here’s the deal. All of the modules you’re looking at have been loaded by dyld (or, in the case of the main executable, the kernel, but the kernel did the same thing and told dyld about it). A process has only one bitness, and dyld will never load a wrong-bitted module. dyld will certainly never load a wrong-endian module. Even if the file that hosted a module was fat and contained multiple architectures, dyld (or the kernel) will only load the single slice relevant to the process. It maps based on the instructions in the segment load commands, which aren’t permitted to exceed the boundaries of the containing slice. The fat header is never mapped. No part of any slice belonging to another architecture is ever mapped. If it’s loaded and dyld knows about it, it’s guaranteed to be MH_MAGIC or MH_MAGIC_64 (same-endian), and it’s guaranteed to always be one or the other for a specific bitness, and since Chrome only supports 64-bit x86 now, it’s actually guaranteed to always be MH_MAGIC_64. So if you want, you can check that dli_fbase casted to a mach_header_64* has a magic field that says MH_MAGIC_64, but you don’t need to account for swaps or other bitnesses or fat headers at all. https://codereview.chromium.org/2702463003/diff/80001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:520: #if defined(__i386__) You can just make this a compile-time #error. And you can be stricter: !defined(__x86_64__).
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
(my review, I mean.)
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...)
The CQ bit was checked by avi@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2702463003/diff/40001/base/BUILD.gn File base/BUILD.gn (right): https://codereview.chromium.org/2702463003/diff/40001/base/BUILD.gn#newcode2317 base/BUILD.gn:2317: data_deps += [ ":base_profiler_test_support_library" ] On 2017/02/17 03:41:09, Avi wrote: > On 2017/02/16 21:51:34, Mike Wittman wrote: > > On 2017/02/16 06:11:15, Avi wrote: > > > How does this work without this change? A normal "deps" means that you're > > > hard-linking a requirement to this support library, which on the Mac at > least > > > means that you can't unload it. > > > > Apparently there's no corresponding restriction on Windows. I just verified > that > > the module gets unloaded in the test. > > :\ > > This is technically more correct, as the test intends to have an independent > library being loaded and unloaded, rather than unloading one that was > automatically loaded. I'm planning on leaving this in, even if it works today as > is. I'm fine with this; my comment was just intended to provide context for what's happening on Windows. https://codereview.chromium.org/2702463003/diff/40001/base/profiler/native_st... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/40001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:156: step_result = unw_step(&unwind_cursor); On 2017/02/17 05:21:05, Mark Mentovai wrote: > Avi wrote: > > On 2017/02/16 21:51:34, Mike Wittman wrote: > > > General questions: How is unwinding of leaf functions handled on Mac; do > leaf > > > functions have unwind information? If they don't, does unw_step properly > > handle > > > them? > > > > > > If the libunwind implementation assumes that the current thread is unwinding > > > itself then I could see it never having to support the leaf function case. > > > > I believe the compiler puts in full debug info even for leaf functions, but I > > think Mark would be able to comment here better than I can. > > It can always be suppressed, stripped, or never provided in the first place as > in the case of custom asm without the provision of CFI directives like the > syscall wrappers. But yes, normally, even leaf functions get unwind info. Can we verify the behavior by running the profiler locally within a Chrome release build and observing if we consistently get complete stacks? The Windows profiler sees about 5% of stacks terminating in leaf functions, so if there are any consistent issues they should be detectable. Startup sampling within Chrome can be enabled within StackSamplingConfiguration.
https://codereview.chromium.org/2702463003/diff/40001/base/profiler/native_st... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/40001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:156: step_result = unw_step(&unwind_cursor); On 2017/02/17 17:09:15, Mike Wittman wrote: > On 2017/02/17 05:21:05, Mark Mentovai wrote: > > Avi wrote: > > > On 2017/02/16 21:51:34, Mike Wittman wrote: > > > > General questions: How is unwinding of leaf functions handled on Mac; do > > leaf > > > > functions have unwind information? If they don't, does unw_step properly > > > handle > > > > them? > > > > > > > > If the libunwind implementation assumes that the current thread is > unwinding > > > > itself then I could see it never having to support the leaf function case. > > > > > > I believe the compiler puts in full debug info even for leaf functions, but > I > > > think Mark would be able to comment here better than I can. > > > > It can always be suppressed, stripped, or never provided in the first place as > > in the case of custom asm without the provision of CFI directives like the > > syscall wrappers. But yes, normally, even leaf functions get unwind info. > > Can we verify the behavior by running the profiler locally within a Chrome > release build and observing if we consistently get complete stacks? The Windows > profiler sees about 5% of stacks terminating in leaf functions, so if there are > any consistent issues they should be detectable. > > Startup sampling within Chrome can be enabled within StackSamplingConfiguration. a) Didn't catch that the stack_sampling_configuration.cc had another ifdef windows that I need to fix b) Is there a simple way to flip the switch on and look at the results on my local build? I haven't looked deeply at SSC. https://codereview.chromium.org/2702463003/diff/80001/base/BUILD.gn File base/BUILD.gn (right): https://codereview.chromium.org/2702463003/diff/80001/base/BUILD.gn#newcode1828 base/BUILD.gn:1828: shared_library("base_profiler_test_support_library") { On 2017/02/17 05:21:05, Mark Mentovai wrote: > I believe you want this to be a loadable_module and not a shared_library. > Dependents don’t link against loadable modules (as opposed to shared libraries), > so that you can be in charge of loading them yourself. Done. https://codereview.chromium.org/2702463003/diff/80001/base/profiler/native_st... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/80001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:33: // Darwin. On 2017/02/17 05:21:05, Mark Mentovai wrote: > ? > > It’s never going to not be the double-underscore versions in the 64-bit > environment. Don’t define your own, just go with x86_thread_state64_t. Done. https://codereview.chromium.org/2702463003/diff/80001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:200: // file followed by the instruction pointer. Coincidentally, the first 17 On 2017/02/17 05:21:05, Mark Mentovai wrote: > “register file plus instruction pointer” is weird because the instruction > pointer is part of the register file. x86_thread_state64_t is actually a more > complete (but still incomplete) snapshot of the register file than > unw_context_t. Also, the ordering of fields is arbitrary in both unw_context_t > and x86_thread_state64_t (the CPU doesn’t expose any native structure for this > stuff unlike floating-point state, it’s all arbitrary in software), so this part > of the comment isn’t helpful. You should just say that the first 17 fields of > each structure are identical. Done. https://codereview.chromium.org/2702463003/diff/80001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:225: uint32_t SwapIfBig32(uint32_t x, bool swap) { On 2017/02/17 05:21:05, Mark Mentovai wrote: > You don’t need any of this swapping stuff. Acknowledged. https://codereview.chromium.org/2702463003/diff/80001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:234: off_t GetMach64HeaderOffset(const void* module_addr) { On 2017/02/17 05:21:05, Mark Mentovai wrote: > You don’t need any of this fat stuff either. Acknowledged. https://codereview.chromium.org/2702463003/diff/80001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:277: const load_command* current_cmd = reinterpret_cast<const load_command*>( On 2017/02/17 05:21:05, Mark Mentovai wrote: > This loop needs to be cognizant of not exceeding header->sizeofcmds. It’s not > enough to just loop up to header->ncmds. > > I just reviewed a patch for Erik this week that’s yet another Mach-O load > command walker, you should take a look and perhaps share code but definitely > share ideas. We’ve got more of these than we really should have in Chrome as it > is. His was https://codereview.chromium.org/2674973004. Acknowledged. https://codereview.chromium.org/2702463003/diff/80001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:281: const uuid_command* uuid_cmd = On 2017/02/17 05:21:05, Mark Mentovai wrote: > Also need to check that current_cmd->cmdsize is at least big enough for a > uuid_cmd. Done. https://codereview.chromium.org/2702463003/diff/80001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:285: memcpy(id, &uuid_cmd->uuid, sizeof(uuid_t)); On 2017/02/17 05:21:05, Mark Mentovai wrote: > (if you were swapping, and you’re not, this would need to be swapped too.) Acknowledged. https://codereview.chromium.org/2702463003/diff/80001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:296: bool GetUUID(const void* module_addr, unsigned char* id) { On 2017/02/17 05:21:05, Mark Mentovai wrote: > (this is the only caller of the swappy and fatty functions) Acknowledged. https://codereview.chromium.org/2702463003/diff/80001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:311: std::string GetUniqueId(const void* module_addr) { On 2017/02/17 05:21:05, Mark Mentovai wrote: > (this is the only caller of the caller of the swappy and fatty functions) Acknowledged. https://codereview.chromium.org/2702463003/diff/80001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:332: reinterpret_cast<uintptr_t>(inf.dli_fbase), GetUniqueId(inf.dli_fbase), On 2017/02/17 05:21:05, Mark Mentovai wrote: > (this is the only caller of the caller of the caller of the swappy and fatty > functions). > > Here’s the deal. All of the modules you’re looking at have been loaded by dyld > (or, in the case of the main executable, the kernel, but the kernel did the same > thing and told dyld about it). A process has only one bitness, and dyld will > never load a wrong-bitted module. dyld will certainly never load a wrong-endian > module. Even if the file that hosted a module was fat and contained multiple > architectures, dyld (or the kernel) will only load the single slice relevant to > the process. It maps based on the instructions in the segment load commands, > which aren’t permitted to exceed the boundaries of the containing slice. The fat > header is never mapped. No part of any slice belonging to another architecture > is ever mapped. > > If it’s loaded and dyld knows about it, it’s guaranteed to be MH_MAGIC or > MH_MAGIC_64 (same-endian), and it’s guaranteed to always be one or the other for > a specific bitness, and since Chrome only supports 64-bit x86 now, it’s actually > guaranteed to always be MH_MAGIC_64. So if you want, you can check that > dli_fbase casted to a mach_header_64* has a magic field that says MH_MAGIC_64, > but you don’t need to account for swaps or other bitnesses or fat headers at > all. Done. https://codereview.chromium.org/2702463003/diff/80001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:520: #if defined(__i386__) On 2017/02/17 05:21:05, Mark Mentovai wrote: > You can just make this a compile-time #error. And you can be stricter: > !defined(__x86_64__). Re compile error: the _posix version of this file returns a nullptr, and the interface seems to specify that, so I'm going in that direction.
The CQ bit was checked by avi@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2702463003/diff/40001/base/profiler/native_st... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/40001/base/profiler/native_st... base/profiler/native_stack_sampler_mac.cc:156: step_result = unw_step(&unwind_cursor); On 2017/02/17 17:18:12, Avi wrote: > a) Didn't catch that the stack_sampling_configuration.cc had another ifdef > windows that I need to fix That's the only other thing necessary to enable for startup profiling. I'd enable at 100% for UNKNOWN channel in this CL to get coverage on developer and trybot builds. I recommend doing an incremental roll out for canary and dev in case there are any unexpected issues. (On Windows we ran into a bunch of problems due to third party injected code and other things like copying from guard pages. It sounds like the risks are lower on Mac, but better safe than sorry.) > b) Is there a simple way to flip the switch on and look at the results on my > local build? I haven't looked deeply at SSC. Changing the ifdef in IsProfilerSupported should enable execution at 10Hz for 30 seconds. Results are sent to CallStackProfileMetricsProvider::ReceiveCompletedProfiles.
The CQ bit was checked by avi@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2702463003/diff/80001/base/BUILD.gn File base/BUILD.gn (right): https://codereview.chromium.org/2702463003/diff/80001/base/BUILD.gn#newcode1828 base/BUILD.gn:1828: shared_library("base_profiler_test_support_library") { On 2017/02/17 17:18:12, Avi wrote: > On 2017/02/17 05:21:05, Mark Mentovai wrote: > > I believe you want this to be a loadable_module and not a shared_library. > > Dependents don’t link against loadable modules (as opposed to shared > libraries), > > so that you can be in charge of loading them yourself. > > Done. This results in an .so file, which isn't dynamically openable by LoadLibrary. Reverting just this part.
Avi wrote: > https://codereview.chromium.org/2702463003/diff/80001/base/BUILD.gn > File base/BUILD.gn (right): > > https://codereview.chromium.org/2702463003/diff/80001/base/BUILD.gn#newcode1828 > base/BUILD.gn:1828: shared_library("base_profiler_test_support_library") { > On 2017/02/17 17:18:12, Avi wrote: > > On 2017/02/17 05:21:05, Mark Mentovai wrote: > > > I believe you want this to be a loadable_module and not a shared_library. > > > Dependents don’t link against loadable modules (as opposed to shared > > libraries), > > > so that you can be in charge of loading them yourself. > > > > Done. > > This results in an .so file, which isn't dynamically openable by LoadLibrary. > Reverting just this part. Really? It ought to be.
On 2017/02/17 18:10:20, Mark Mentovai wrote: > Avi wrote: > > https://codereview.chromium.org/2702463003/diff/80001/base/BUILD.gn > > File base/BUILD.gn (right): > > > > > https://codereview.chromium.org/2702463003/diff/80001/base/BUILD.gn#newcode1828 > > base/BUILD.gn:1828: shared_library("base_profiler_test_support_library") { > > On 2017/02/17 17:18:12, Avi wrote: > > > On 2017/02/17 05:21:05, Mark Mentovai wrote: > > > > I believe you want this to be a loadable_module and not a shared_library. > > > > Dependents don’t link against loadable modules (as opposed to shared > > > libraries), > > > > so that you can be in charge of loading them yourself. > > > > > > Done. > > > > This results in an .so file, which isn't dynamically openable by LoadLibrary. > > Reverting just this part. > > Really? It ought to be. Patch set 6 says otherwise. Would we have to adjust the name passed into the LoadLibrary?
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...)
On 2017/02/17 18:19:56, Avi wrote: > On 2017/02/17 18:10:20, Mark Mentovai wrote: > > Avi wrote: > > > https://codereview.chromium.org/2702463003/diff/80001/base/BUILD.gn > > > File base/BUILD.gn (right): > > > > > > > > > https://codereview.chromium.org/2702463003/diff/80001/base/BUILD.gn#newcode1828 > > > base/BUILD.gn:1828: shared_library("base_profiler_test_support_library") { > > > On 2017/02/17 17:18:12, Avi wrote: > > > > On 2017/02/17 05:21:05, Mark Mentovai wrote: > > > > > I believe you want this to be a loadable_module and not a > shared_library. > > > > > Dependents don’t link against loadable modules (as opposed to shared > > > > libraries), > > > > > so that you can be in charge of loading them yourself. > > > > > > > > Done. > > > > > > This results in an .so file, which isn't dynamically openable by > LoadLibrary. > > > Reverting just this part. > > > > Really? It ought to be. > > Patch set 6 says otherwise. Would we have to adjust the name passed into the > LoadLibrary? What happens is that GetNativeLibraryName blindly adds ".dylib" onto the name, which doesn't match the ".so" ending of the actual file on disk.
Here is a sample run with logging in ReceiveCompletedProfilesImpl: ReceiveCompletedProfilesImpl > received 1 profiles >> profile with 300 samples >> stack frame with 1 frame(s) seen 4 times >> stack frame with 9 frame(s) seen 1 times >> stack frame with 25 frame(s) seen 1 times >> stack frame with 26 frame(s) seen 255 times >> stack frame with 28 frame(s) seen 8 times >> stack frame with 29 frame(s) seen 1 times [...] ReceiveCompletedProfilesImpl > received 1 profiles >> profile with 300 samples >> stack frame with 1 frame(s) seen 2 times >> stack frame with 13 frame(s) seen 1 times >> stack frame with 14 frame(s) seen 291 times >> stack frame with 15 frame(s) seen 1 times [...] Nine frames in a stack walk looks reasonable, but just one is wrong. Let me investigate those real-world stacks.
On 2017/02/17 21:33:41, Avi wrote: > Here is a sample run with logging in ReceiveCompletedProfilesImpl: > > ReceiveCompletedProfilesImpl > > received 1 profiles > >> profile with 300 samples > >> stack frame with 1 frame(s) seen 4 times > >> stack frame with 9 frame(s) seen 1 times > >> stack frame with 25 frame(s) seen 1 times > >> stack frame with 26 frame(s) seen 255 times > >> stack frame with 28 frame(s) seen 8 times > >> stack frame with 29 frame(s) seen 1 times > [...] > ReceiveCompletedProfilesImpl > > received 1 profiles > >> profile with 300 samples > >> stack frame with 1 frame(s) seen 2 times > >> stack frame with 13 frame(s) seen 1 times > >> stack frame with 14 frame(s) seen 291 times > >> stack frame with 15 frame(s) seen 1 times > [...] > > Nine frames in a stack walk looks reasonable, but just one is wrong. Let me > investigate those real-world stacks. Cool. Based on the stack sizes and frequencies I'm assuming the first set is from the browser process and the second set is from the gpu process. One thing to keep in mind is that most of the stacks recorded will be waiting in the message loop -- this probably accounts for almost all of the 26 and 14 frame stacks respectively in the two reports. These waiting stacks are mostly ignored when the data is presented for analysis, so the other stacks have outsized importance. :)
On 2017/02/17 21:57:42, Mike Wittman wrote: > On 2017/02/17 21:33:41, Avi wrote: > > Here is a sample run with logging in ReceiveCompletedProfilesImpl: > > > > ReceiveCompletedProfilesImpl > > > received 1 profiles > > >> profile with 300 samples > > >> stack frame with 1 frame(s) seen 4 times > > >> stack frame with 9 frame(s) seen 1 times > > >> stack frame with 25 frame(s) seen 1 times > > >> stack frame with 26 frame(s) seen 255 times > > >> stack frame with 28 frame(s) seen 8 times > > >> stack frame with 29 frame(s) seen 1 times > > [...] > > ReceiveCompletedProfilesImpl > > > received 1 profiles > > >> profile with 300 samples > > >> stack frame with 1 frame(s) seen 2 times > > >> stack frame with 13 frame(s) seen 1 times > > >> stack frame with 14 frame(s) seen 291 times > > >> stack frame with 15 frame(s) seen 1 times > > [...] > > > > Nine frames in a stack walk looks reasonable, but just one is wrong. Let me > > investigate those real-world stacks. > > Cool. Based on the stack sizes and frequencies I'm assuming the first set is > from the browser process and the second set is from the gpu process. > > One thing to keep in mind is that most of the stacks recorded will be waiting in > the message loop -- this probably accounts for almost all of the 26 and 14 frame > stacks respectively in the two reports. These waiting stacks are mostly ignored > when the data is presented for analysis, so the other stacks have outsized > importance. :) Thanks! So two things are happening here. First, there are more bridge libraries. Here are a few one-frame snapshots: ] > inf.dli_fname /usr/lib/system/libsystem_platform.dylib ] > inf.dli_sname _platform_memmove$VARIANT$Nehalem ] > inf.dli_fname /usr/lib/libobjc.A.dylib ] > inf.dli_sname +[NSObject alloc] ] > inf.dli_fname /System/Library/Frameworks/QuartzCore.framework/Versions/A/QuartzCore ] > inf.dli_sname _ZN1X11WeakDetails3Ptr7releaseEP11objc_object ] > last_function _ZN1X11WeakDetails3Ptr7releaseEP11objc_object ] > inf.dli_fname /usr/lib/system/libsystem_c.dylib ] > inf.dli_sname __vsnprintf_chk ] > last_function __vsnprintf_chk The approach that we have of trying again after libsystem_kernel.dylib is obviously not covering all the cases. I'm investigating the possibility for forcing a walk to continue for all one-frame traces. The other issue is that there are lots of one-frame traces that make no sense. ] > inf.dli_fname ] > inf.dli_sname ?z0[? ] > inf.dli_fname ] > inf.dli_sname ??R? I'm still digging there.
On 2017/02/17 22:04:12, Avi wrote: > On 2017/02/17 21:57:42, Mike Wittman wrote: > > On 2017/02/17 21:33:41, Avi wrote: > > > Here is a sample run with logging in ReceiveCompletedProfilesImpl: > > > > > > ReceiveCompletedProfilesImpl > > > > received 1 profiles > > > >> profile with 300 samples > > > >> stack frame with 1 frame(s) seen 4 times > > > >> stack frame with 9 frame(s) seen 1 times > > > >> stack frame with 25 frame(s) seen 1 times > > > >> stack frame with 26 frame(s) seen 255 times > > > >> stack frame with 28 frame(s) seen 8 times > > > >> stack frame with 29 frame(s) seen 1 times > > > [...] > > > ReceiveCompletedProfilesImpl > > > > received 1 profiles > > > >> profile with 300 samples > > > >> stack frame with 1 frame(s) seen 2 times > > > >> stack frame with 13 frame(s) seen 1 times > > > >> stack frame with 14 frame(s) seen 291 times > > > >> stack frame with 15 frame(s) seen 1 times > > > [...] > > > > > > Nine frames in a stack walk looks reasonable, but just one is wrong. Let me > > > investigate those real-world stacks. > > > > Cool. Based on the stack sizes and frequencies I'm assuming the first set is > > from the browser process and the second set is from the gpu process. > > > > One thing to keep in mind is that most of the stacks recorded will be waiting > in > > the message loop -- this probably accounts for almost all of the 26 and 14 > frame > > stacks respectively in the two reports. These waiting stacks are mostly > ignored > > when the data is presented for analysis, so the other stacks have outsized > > importance. :) > > Thanks! > > So two things are happening here. > > First, there are more bridge libraries. Here are a few one-frame snapshots: > > ] > inf.dli_fname /usr/lib/system/libsystem_platform.dylib > ] > inf.dli_sname _platform_memmove$VARIANT$Nehalem > > ] > inf.dli_fname /usr/lib/libobjc.A.dylib > ] > inf.dli_sname +[NSObject alloc] > > ] > inf.dli_fname > /System/Library/Frameworks/QuartzCore.framework/Versions/A/QuartzCore > ] > inf.dli_sname _ZN1X11WeakDetails3Ptr7releaseEP11objc_object > ] > last_function _ZN1X11WeakDetails3Ptr7releaseEP11objc_object > > ] > inf.dli_fname /usr/lib/system/libsystem_c.dylib > ] > inf.dli_sname __vsnprintf_chk > ] > last_function __vsnprintf_chk > > The approach that we have of trying again after libsystem_kernel.dylib is > obviously not covering all the cases. I'm investigating the possibility for > forcing a walk to continue for all one-frame traces. > > The other issue is that there are lots of one-frame traces that make no sense. > > ] > inf.dli_fname > ] > inf.dli_sname ?z0[? > > ] > inf.dli_fname > ] > inf.dli_sname ??R? > > I'm still digging there. One other thing to look out for is stacks that are truncated before reaching the terminal frame. You should be able to detect these by checking that the ip and module for the terminal frame is consistent across the collected stacks. The 9 frame stack in the browser process looks suspicious in that regard. In the Windows data only about 0.1% of stacks are of size 13 or smaller.
I'm not even to the 9 frame stacks. We're stuck with two issues. The first issue, is where we're jammed in a system library and could resume. We could probably figure out some decent heuristic on when to continue. But the second issue is where it seems like we have a bogus IP to start walking from. thread_get_state() is giving some IPs that aren't valid. 140733193388032 = 0x7FFF00000000 (is there actually a library here?) 0 = 0x0 2 = 0x2 13 = 0xD 140347242978768 = 0x7FA523901DD0 4777075360 = 0x11CBC62A0 (which is program-ish) 11873829058986341 = 0x2A2F2F432F4165 = "*//C/Ae" Mark, how reliable is suspending a thread and calling thread_get_state()? I'm starting to distrust this.
On 2017/02/17 22:46:57, Avi wrote: > I'm not even to the 9 frame stacks. > > We're stuck with two issues. > > The first issue, is where we're jammed in a system library and could resume. We > could probably figure out some decent heuristic on when to continue. > > But the second issue is where it seems like we have a bogus IP to start walking > from. > > thread_get_state() is giving some IPs that aren't valid. > > 140733193388032 = 0x7FFF00000000 (is there actually a library here?) > 0 = 0x0 > 2 = 0x2 > 13 = 0xD > 140347242978768 = 0x7FA523901DD0 > 4777075360 = 0x11CBC62A0 (which is program-ish) > 11873829058986341 = 0x2A2F2F432F4165 = "*//C/Ae" > > Mark, how reliable is suspending a thread and calling thread_get_state()? I'm > starting to distrust this. thread_get_state() is very reliable. Even if you don’t suspend the thread, it’ll do it for you (unless you call it on your own thread—but there are better ways to get your own thread’s context anyway). Of course, you want to inspect stack memory too, so you need to rely on your own thread_suspend(). thread_get_state() accesses the very same user register set that was saved when the thread stopped executing, and the same state that will be restored when it resumes. We use thread_get_state() in Crashpad and haven’t seen any problems at all. 2 and 13 are plausible recovered %rip values for the *end* of a stackwalk, when the rip recovered as start’s would-be caller is actually argc. So something about this says “stack bottoms” to me. But you’d need to run vmmap or an equivalent to tell whether those other addresses are validly mapped, and look at what you see there to determine whether they’re plausible program text. At what point in this patch are you looking at rip and getting questionable values?
You are correct; this is due to an error during hacking to figure out what's going on with other stuff. For now, let's assume thread_get_state() is fine. Here's the first problem that I was trying to work around. Sometimes, a thread is stuck in a library that has no unwind info. Mark, you saw this in my toy app, which was calling sleep() and ending up in __semwait_signal in /usr/lib/system/libsystem_kernel.dylib. The problem is that that library isn't the only library that I've seen this happen in. Here are a few samples: _platform_memmove$VARIANT$Nehalem in /usr/lib/system/libsystem_platform.dylib +[NSObject alloc] in /usr/lib/libobjc.A.dylib _ZN1X11WeakDetails3Ptr7releaseEP11objc_object in /System/Library/Frameworks/QuartzCore.framework/Versions/A/QuartzCore __vsnprintf_chk in /usr/lib/system/libsystem_c.dylib __block_literal_global67 in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation So we need a new criterion for determining if a single-frame unwind is due to being in a library with no unwind info. unw_proc_info_t has lots of info; maybe we can come up with something cool? I'm going to dig a bit with that approach.
On 2017/02/19 19:05:00, Avi wrote: > I'm going to dig a bit with that approach. Suppose we say, "if we get one frame of a backtrace, unw_proc_info_t says that the IP belongs to an image with a valid mach-o header, but there was no unwind info, it must be a bridge library, so force one unwind and keep going". That seems to catch most (though I'm not sure all) of the kernel shim functions. However, that is causing other issues. Here is a two frame stack crawl: ] WalkStack for IP 0x7FFF9D67F640 ] > calling WalkStackFromContext the first time ] frame _platform_memmove$VARIANT$Nehalem ] > ip 0x7FFF9D67F640 ] > proc_info.lsda 0 ] > proc_info.handler 0 ] > proc_info.format 0 ] > proc_info.unwind_info_size 0 ] > proc_info.unwind_info 0 ] > proc_info.extra 0x7FFF9D67E000 ] > got SYSCALL; calling WalkStackFromContext again ] frame ??o}? ] > ip 0x11CBC6070 ] > proc_info.lsda 0 ] > proc_info.handler 0 ] > proc_info.format 0 ] > proc_info.unwind_info_size 0 ] > proc_info.unwind_info 0 ] > proc_info.extra 0 It looks like that, for some bridge libraries, just walking up the stack manually one frame isn't going to work. :( Still playing with this.
On 2017/02/19 19:46:02, Avi wrote: > On 2017/02/19 19:05:00, Avi wrote: > > I'm going to dig a bit with that approach. > > Suppose we say, "if we get one frame of a backtrace, unw_proc_info_t says that > the IP belongs to an image with a valid mach-o header, but there was no unwind > info, it must be a bridge library, so force one unwind and keep going". That > seems to catch most (though I'm not sure all) of the kernel shim functions. > > However, that is causing other issues. Here is a two frame stack crawl: > > ] WalkStack for IP 0x7FFF9D67F640 > ] > calling WalkStackFromContext the first time > ] frame _platform_memmove$VARIANT$Nehalem > ] > ip 0x7FFF9D67F640 > ] > proc_info.lsda 0 > ] > proc_info.handler 0 > ] > proc_info.format 0 > ] > proc_info.unwind_info_size 0 > ] > proc_info.unwind_info 0 > ] > proc_info.extra 0x7FFF9D67E000 > ] > got SYSCALL; calling WalkStackFromContext again > ] frame ??o}? > ] > ip 0x11CBC6070 > ] > proc_info.lsda 0 > ] > proc_info.handler 0 > ] > proc_info.format 0 > ] > proc_info.unwind_info_size 0 > ] > proc_info.unwind_info 0 > ] > proc_info.extra 0 > > It looks like that, for some bridge libraries, just walking up the stack > manually one frame isn't going to work. :( > > Still playing with this. And there are still wild IPs, though they at least seem plausible: ] WalkStack for IP 0x7FFF61EF1642 ] > calling WalkStackFromContext the first time ] WalkStackFromContext ] > frame ????? ] >> ip 0x7FFF61EF1642 ] >> proc_info.lsda 0 ] >> proc_info.handler 0 ] >> proc_info.format 0 ] >> proc_info.unwind_info_size 0 ] >> proc_info.unwind_info 0 ] >> proc_info.extra 0 ] WalkStack for IP 0x7FFF68BCC627 ] > calling WalkStackFromContext the first time ] WalkStackFromContext ] > frame ??_?? ] >> ip 0x7FFF68BCC627 ] >> proc_info.lsda 0 ] >> proc_info.handler 0 ] >> proc_info.format 0 ] >> proc_info.unwind_info_size 0 ] >> proc_info.unwind_info 0 ] >> proc_info.extra 0
As a reply to two posts ago, I modified the stack unwinder to keep at it until it found something that libunwind could cope with. That yields things like this: ] WalkStack for IP 140735568314674 ] > calling WalkStackFromContext the first time ] WalkStackFromContext ] > frame strlen ] >> ip 140735568314674 ] >> proc_info.format 0 ] >> proc_info.extra 140735568310272 ] > got SYSCALL ] >> unwinding a frame ] >> unwinding a frame ] > calling WalkStackFromContext again ] WalkStackFromContext ] > frame _ZN4base12RecordActionERKNS_17UserMetricsActionE ] >> ip 4613410217 ] >> proc_info.format 17042193 ] >> proc_info.extra 4613021696 ] > frame _ZN4base12RecordActionERKNS_17UserMetricsActionE ] >> ip 4613410217 ] >> proc_info.format 17042193 ] >> proc_info.extra 4613021696 Now, base::RecordAction doesn't seem to directly call strlen, so this is some kind of weird inlining thing, but at least we're getting a stack. Though the previous message about the weird IPs that don't seem to point to an image seem to still be an issue.
First, lemme capture my previous e-mail to you here to give context to the review thread. Your code is fine, it’s libunwind that’s sucking. Watch this. | mark@mela zsh% diff -u unwindtoy.cpp.orig unwindtoy.cpp | --- unwindtoy.cpp.orig 2017-02-14 09:39:28.000000000 -0500 | +++ unwindtoy.cpp 2017-02-14 10:01:14.000000000 -0500 | @@ -125,8 +125,8 @@ | } | | void stacktop() { | - while (1) | - sleep(10); | + while (1) { | + } | } | | void c() { | mark@mela zsh% clang++ -std=c++11 -g unwindtoy.cpp -o unwindtoy | mark@mela zsh% ./unwindtoy | top of stack: 0x7fff5daaffff | | function _Z4dumpv + 37, ip = 1021568d5, sp = 70000af03340, fp = 70000af03e10 | function __main_block_invoke + 36, ip = 102156e04, sp = 70000af03e20, fp = 70000af03e40 | function _dispatch_call_block_and_release + 12, ip = 7fffea024ef7, sp = 70000af03e50, fp = 70000af03e60 | function _dispatch_client_callout + 8, ip = 7fffea01c0b8, sp = 70000af03e70, fp = 70000af03e80 | function _dispatch_root_queue_drain + 917, ip = 7fffea01e029, sp = 70000af03e90, fp = 70000af03ee0 | function _dispatch_worker_thread3 + 99, ip = 7fffea01dc47, sp = 70000af03ef0, fp = 70000af03ef0 | function _pthread_wqthread + 1299, ip = 7fffea269712, sp = 70000af03f00, fp = 70000af03f50 | stack dump complete | ----- | Dumping thread 775 | function _Z8stacktopv + 9, ip = 102156d19, sp = 7fff5daa9968, fp = 7fff5daa9960 | function _Z1cv + 9, ip = 102156d29, sp = 7fff5daa9970, fp = 7fff5daa9970 | function _Z1bv + 9, ip = 102156d39, sp = 7fff5daa9980, fp = 7fff5daa9980 | function _Z1av + 9, ip = 102156d49, sp = 7fff5daa9990, fp = 7fff5daa9990 | function main + 119, ip = 102156dd7, sp = 7fff5daa99a0, fp = 7fff5daa99d0 | stack dump complete | ^C It’s falling into the trap at libunwind-35.3/src/UnwindCursor.hpp:778, which is commented with | // if unwind table has entry, but entry says there is no unwind info, note that You’re looking at __semwait_signal + 10, which is libsystem_kernel.dylib + 0x19fda. Now… | mark@mela zsh% objdump -macho -unwind-info /usr/lib/system/libsystem_kernel.dylib | [blah blah blah] | [347]: function offset=0x00019094, encoding[0]=0x01010001 | [348]: function offset=0x000190b3, encoding[2]=0x01030161 | [349]: function offset=0x00019104, encoding[7]=0x00000000 and that’s it. The big fat zero applies to everything beyond offset 0x19104, including __semwait_signal. Why no unwind info? __semwait_signal is a raw syscall stub. Look at the source! | mark@mela zsh% cd …/10.12.3/xnu-3789.41.3 | mark@mela zsh% mkdir /tmp/syscalls_user | mark@mela zsh% ARCHS=x86_64 libsyscall/xcodescripts/create-syscalls.pl bsd/kern/syscalls.master libsyscall/custom libsyscall/Platforms MacOSX /tmp/syscalls_user | […] Now you’ve got a handy /tmp/syscalls_user/___semwait_signal.s that you can read, with reference to /tmp/syscalls_user/SYS.h. And you’ll see that it’s just a tiny assembly stub. There’s no real stack frame to speak of. And, most notably, there’s no unwind info. It’s possible to write unwind info into assembler source either by writing DWARF the hard way by hand or by using .cfi directives, but these stubs do neither. (I would have sent you to Hopper in the first place, but that wouldn’t have shown that there was no unwind info in the source.) libunwind, or at least Apple’s version of it, really wants to see unwind info to do anything at all. This kind of sucks, because in the real world, threads call into the kernel all the time. Well-behaved threads can even spend the vast majority of their (wall-clock) time blocked on the kernel in a syscall, which is exactly what __semwait_signal is doing when you call sleep(). (Pop on over to 10.12.2 Libc-1158.30.7/gen/nanosleep.c for the user space-side details.) Of course, your thread can’t be in the kernel if it’s running your code, which is why this is never a problem for the Apple-shipped libunwind that only wants to let you unwind your own thread: if you’re calling libunwind, you’re not in the kernel, and you never see the missing unwind info. I fixed your test case with a very cheesey approach: I detect a single-frame successful unwind with only a context frame in libsystem_kernel.dylib and manually recover its caller, priming another unwind attempt with the register set you’d expect in the caller. And now: | mark@mela zsh% clang++ -Wall -std=c++11 -Wall -g unwindtoy.cpp -o unwindtoy | mark@mela zsh% ./unwindtoy | top of stack: 0x7fff5afcffff | | function _Z4dumpv + 23, ip = 104c30a47, sp = 70000074da00, fp = 70000074de10 | function __main_block_invoke + 36, ip = 104c30db4, sp = 70000074de20, fp = 70000074de40 | function _dispatch_call_block_and_release + 12, ip = 7fffea024ef7, sp = 70000074de50, fp = 70000074de60 | function _dispatch_client_callout + 8, ip = 7fffea01c0b8, sp = 70000074de70, fp = 70000074de80 | function _dispatch_root_queue_drain + 917, ip = 7fffea01e029, sp = 70000074de90, fp = 70000074dee0 | function _dispatch_worker_thread3 + 99, ip = 7fffea01dc47, sp = 70000074def0, fp = 70000074def0 | function _pthread_wqthread + 1299, ip = 7fffea269712, sp = 70000074df00, fp = 70000074df50 | stack dump complete | ----- | Dumping thread 775 | function __semwait_signal + 10, ip = 7fffea180fda, sp = 7fff5afcf8c8, fp = 7fff5afcf900 | function nanosleep + 199, ip = 7fffea107b72, sp = 7fff5afcf8d0, fp = 7fff5afcf900 | function sleep + 42, ip = 7fffea1079d3, sp = 7fff5afcf910, fp = 7fff5afcf940 | function _Z8stacktopv + 23, ip = 104c30cc7, sp = 7fff5afcf950, fp = 7fff5afcf960 | function _Z1cv + 9, ip = 104c30cd9, sp = 7fff5afcf970, fp = 7fff5afcf970 | function _Z1bv + 9, ip = 104c30ce9, sp = 7fff5afcf980, fp = 7fff5afcf980 | function _Z1av + 9, ip = 104c30cf9, sp = 7fff5afcf990, fp = 7fff5afcf990 | function main + 119, ip = 104c30d87, sp = 7fff5afcf9a0, fp = 7fff5afcf9d0 | stack dump complete | ^C And there you have it.
The trick that I gave you previously was “rip = *rsp; rsp += 8;”. That’s valid to get yourself out of the system call stubs that don’t allocate their own stack frames. _platform_memmove$VARIANT$everything, like most of the other routines in libplatform hand-written in custom assembly source, are different. They do allocate stack frames and they do use rbp as a traditional frame pointer. The correct way to walk from them is “rsp = rbp + 16; rip = *(rbp + 8); rbp = *(rbp);”. This is tricky because the x86_64 ABI doesn’t mandate the use of frame pointers. I’d say that it’s a reasonably safe assumption, however, that in the absence of unwind data, this sequence will get you out of most hand-written assembly, because most hand-written code does allocate a traditional stack frame and does use rbp as a frame pointer. The system call stubs in libsystem_kernel are actually the exception. Note that 10.12 finally does restore access to the source code for libplatform (which becomes libsystem_platform) at https://opensource.apple.com/source/libplatform/libplatform-126.1.2/, but it’s missing most of the assembly routines which are actually shipping. We can only see the generic routine written in C which doesn’t ship and isn’t ever used. Bummer. So you’ll just have to trust based on the evidence that _platform_memmove$VARIANT$Nehalem that you’re seeing was written in x86 assembly language, and that if you look at that in a disassembler, you’re basically looking to something that’s quite close to what the source would be. On top of that, there’s this: | mark@mela zsh% nm -n /usr/lib/system/libsystem_platform.dylib | grep -F '_platform_memmove$VARIANT$' | 0000000000002020 t __platform_memmove$VARIANT$Ivybridge | 0000000000002060 t __platform_memmove$VARIANT$Nehalem | 0000000000005ea0 t __platform_memmove$VARIANT$Haswell | 00000000000061a0 t __platform_memmove$VARIANT$Base | mark@mela zsh% objdump -macho -unwind-info /usr/lib/system/libsystem_platform.dylib | […] | [14]: function offset=0x00002020, encoding[2]=0x00000000 | [15]: function offset=0x000022c4, encoding[74]=0x04000320 | […] | [141]: function offset=0x00005ea0, encoding[2]=0x00000000 | [142]: function offset=0x00006f73, encoding[6]=0x01020021 | […] confirming that there’s no unwind data that Apple’s libunwind can use to work its way out of this situation. Anyway, if you take the two “manual” unwind approaches I’ve given you so far and you reverse them, you’ll definitely wind up with incorrect frames, terminal situations, or skipped frames. I’d still say that these unwind info-less custom assembly routines are rare outside of call graph leaves. I’m also very disappointed that Apple didn’t take the time to annotate any of their custom assembly routines with assembler directives that would produce unwind information. It’s easy to do, like in https://chromium.googlesource.com/crashpad/crashpad/+/1678e1a3ac203a6d72424c7..., which is a custom assembly function (one for 32-bit x86 and one for x86_64, actually) that does allocate a stack frame and use a traditional ebp/rbp frame pointer, but also uses a couple of .cfi directives to produce unwind info. And that’s how I achieve: | mark@mela zsh% nm out/Debug/crashpad_client_test | grep CaptureContext | […] | 000000010009d300 t __ZN8crashpad14CaptureContextEP16x86_thread_state | […] | mark@mela zsh% objdump -macho -unwind-info out/Debug/crashpad_client_test | […] | [10]: function offset=0x00009510, encoding[0]=0x01000000 | [11]: function offset=0x00009fc0, encoding[2]=0x01010001 | […] which even Apple’s libunwind could deal with. Just a plug for how easy it is to do the right thing, and why it’s so frustrating when people don’t bother.
There are all kinds of leaf functions, as I ran into last time: +[NSObject alloc] /usr/lib/libobjc.A.dylib _ZN1X11WeakDetails3Ptr7releaseEP11objc_object /System/Library/Frameworks/QuartzCore.framework/Versions/A/QuartzCore __vsnprintf_chk /usr/lib/system/libsystem_c.dylib __block_literal_global67 /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation Is there a good way to look at rbp and decide if it's solid enough to unwind properly? Sigh. Any thoughts about the weird IPs for the one-frame unwinds? Also, you don't have to even check your email until Tuesday if you don't want to :)
Avi wrote: > There are all kinds of leaf functions, as I ran into last time: > > +[NSObject alloc] > /usr/lib/libobjc.A.dylib > _ZN1X11WeakDetails3Ptr7releaseEP11objc_object > /System/Library/Frameworks/QuartzCore.framework/Versions/A/QuartzCore > __vsnprintf_chk > /usr/lib/system/libsystem_c.dylib > __block_literal_global67 > /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation +[NSObject alloc] and __vsnprintf_chk() have unwind info on my system. I don’t have those symbols in QuartzCore or CoreFoundation, so I think we might be looking at different OS versions. I don’t believe that any of these would be custom assembly source (and know it for a fact for two of them that I can see). Also, +[NSObject alloc] is a single instruction long, so it seems somewhat implausible that a thread would have landed somewhere within it. Symbol names alone aren’t really enough information to do the whole job, though: we’d ideally be looking at symbol names and offsets. If you said you were at “+[NSObject alloc] + anything_other_than_zero”, we’d know that your symbolizer was pulling your leg, and we’d be able to work out what code you were really about to run. +[NSObject alloc] is always and __vsnprintf_chk() is normally frameless and, in the frameless case, both make a further jmp. They’re not strictly leaves in the sense that I meant it, but Apple’s libunwind might not be able to process the unwind info produced for them because you’d never expect to see them as an intermediate frame in a backtrace. The jmp instead of call is tail-call optimization at work. By now you know how to get these: 000000000000a752 t +[NSObject alloc] [169]: function offset=0x0000a752, encoding[51]=0x04002c40 And now a new trick for you: https://opensource.apple.com/source/libunwind/libunwind-35.3/include/mach-o/c.... Take 0x04002c40 & UNWIND_X86_64_MODE_MASK and you get UNWIND_X86_64_MODE_DWARF, so take 0x04002c40 & 0x00FFFFFF and you get 0x2c40. Now you can run dwarfdump --eh-frame /usr/lib/libobjc.A.dylib and you find at offset 0x2c40: 0x00002c40: FDE length: 0x0000001c CIE_pointer: 0x00002368 start_addr: 0x000000000000a752 +[NSObject alloc] range_size: 0x0000000000000005 (end_addr = 0x000000000000a757) […] Instructions: 0x000000000000a752: CFA=rsp+8 rip=[rsp] So this is unwindable, (there’s more unwind info for this function than there is actual code because it’s not represented “compactly”,) and the compiler agrees with our analysis that you’d walk it with rip = *rsp; rsp += 8. Apple’s libunwind ought to be able to deal with this, but I haven’t read through its DWARF unwind code to check for bugs. (I also think that a compact encoding of 0x02000000 should have handled this case in fewer than 28 bytes.) On the other hand, I see encoding[0]=0x01000000 for __vsnprintf_chk(), where 0x01000000 is UNWIND_X86_64_MODE_RBP_FRAME, which won’t give a good stackwalk unless you disassembled to look for a push/mov prologue which you’ll find at offset +0x10, where you wouldn’t even expect to be executing in the normal case. But Apple’s libunwind doesn’t do this, so I wouldn’t expect it to be able to recover __vsnprintf_chk()’s caller if it hadn’t made it to either the expected vsnprintf() call or the unexpected __chk_fail_overflow(). This is a bad situation. I can see that the compiler generates correct CFI directives, but the linker turns them into junk, quite likely because it assumes that if any part of a function uses a traditional rbp frame pointer, the whole function must. Bug. I don’t think that the compact unwind format can legally encode this situation because it wants to work on functions as top-level entities that aren’t split up, so this one should have been relegated to DWARF in the __TEXT,__eh_frame section. As it stands now, this one’s hopeless with the information we’re given. > Is there a good way to look at rbp and decide if it's solid enough to unwind > properly? Sigh. Not really. If rbp points outside of the stack region, then it obviously isn’t being used as a frame pointer. If *(rbp + 8) doesn’t look like it’s mapped and executable, then it’s pointless to pretend that’s the address that you’ll return to. Other than that, there’s not much more you can check. You can’t do a sanity check on the recovered rbp, for example, because the caller might not have been using it as a frame pointer. A debugger would try to figure out whether there’s a stack frame and how it’s laid out by disassembling the code. I assume we’re not going to be doing that on our own. > Any thoughts about the weird IPs for the one-frame unwinds? You mean 0x7fff61ef1642 and 0x7fff68bcc627? Both of those plausibly look to be in your dyld shared cache, but I’m not looking at your cache file so I can’t tell you for sure. As in the symbol case, module + offset would be helpful. > Also, you don't have to even check your email until Tuesday if you don't want to > :) This coming week’s going to be a weird schedule, I might as well pack in what I can while I can.
Providing a little context here: it's OK from the analysis perspective if we can't recover every stack, as long as we recover "most" stacks. About 0.5% of stacks on Windows are incompletely unwound for various reasons; if we can achieve something in the same ballpark for Mac and avoid crashing on the unwindable cases, I'd consider that a success.
On 2017/02/24 16:01:13, Mike Wittman wrote: > Providing a little context here: it's OK from the analysis perspective if we > can't recover every stack, as long as we recover "most" stacks. About 0.5% of > stacks on Windows are incompletely unwound for various reasons; if we can > achieve something in the same ballpark for Mac and avoid crashing on the > unwindable cases, I'd consider that a success. If you'd be ok with not being able to unwind the weird IPs that don't match a loaded module, I'd be OK letting those slide for the first version of this.
On 2017/02/20 05:23:05, Mark Mentovai wrote: > By now you know how to get these: > > 000000000000a752 t +[NSObject alloc] > [169]: function offset=0x0000a752, encoding[51]=0x04002c40 Wait, what? unw_get_proc_name would get me the offset and function name, but I'm not sure what the "encoding" you're talking about here is. (I don't see anything in unw_proc_info_t like it.) > And now a new trick for you: > [...] > So this is unwindable I'm not sure how to do this programmatically. > > Is there a good way to look at rbp and decide if it's solid enough to unwind > > properly? Sigh. > > Not really. Is there a good general way to look at a one-frame unwind that's jammed in a system library and know how to unwind it just enough so that libunwind can finish the job? (This is the huge question that I feel blocked on.) > > Any thoughts about the weird IPs for the one-frame unwinds? > > You mean 0x7fff61ef1642 and 0x7fff68bcc627? Both of those plausibly look to be > in your dyld shared cache, but I’m not looking at your cache file so I can’t > tell you for sure. As in the symbol case, module + offset would be helpful. The point here is that those IPs, though plausible, *don't have* a module. That's my point. thread_get_state() returns that IP, and if you initialize a cursor with it and call unw_proc_info_t, you get a null "extra" field, which means it's not showing up as having a mach header. In any case, I'm willing to let those slide for the first version of this unwinder.
On 2017/02/24 18:43:03, Avi wrote: > On 2017/02/24 16:01:13, Mike Wittman wrote: > > Providing a little context here: it's OK from the analysis perspective if we > > can't recover every stack, as long as we recover "most" stacks. About 0.5% of > > stacks on Windows are incompletely unwound for various reasons; if we can > > achieve something in the same ballpark for Mac and avoid crashing on the > > unwindable cases, I'd consider that a success. > > If you'd be ok with not being able to unwind the weird IPs that don't match a > loaded module, I'd be OK letting those slide for the first version of this. Sounds reasonable for v1. If we can distinguish this case in the resulting data, then a small amount of analysis will tell us how big a problem this is in real world usage.
The CQ bit was checked by avi@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...)
The CQ bit was checked by avi@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-device-xcode-clang on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device-xcode-...)
The CQ bit was checked by avi@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...)
Mike, Mark, Robert, please review. BTW, does anyone know what's up the with ios simulator trybot here? It's hanging on base_unittests but I can't figure out why.
On 2017/03/01 20:11:37, Avi wrote: > Mike, Mark, Robert, please review. How do the resulting stacks look in local testing with this change? The key metric is the percentage of truncated stacks (i.e. those that don't end at the expected top-level function), since those are unusable for analysis. > BTW, does anyone know what's up the with ios simulator trybot here? It's hanging > on base_unittests but I can't figure out why. I vaguely recall hitting something like this in the past and it being related to the test support library. https://codereview.chromium.org/2702463003/diff/200001/base/profiler/native_s... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/200001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:175: do { How about using aliases for the register state, to make this code more readable? uint64_t& rip = unwind_context.data[16]; uint64_t& rsp = unwind_context.data[7]; https://codereview.chromium.org/2702463003/diff/200001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:183: ip_in_valid_image = proc_info.extra != 0; Move all these calls into an IsIPInValidImage(const unw_context_t& unwind_context) function? https://codereview.chromium.org/2702463003/diff/200001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:184: } while (!ip_in_valid_image && unwind_context.data[16] < stack_top); Shouldn't this be unwind_context.data[7] < stack_top? https://codereview.chromium.org/2702463003/diff/200001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:395: uint64_t new_stack_top; initialize to 0 https://codereview.chromium.org/2702463003/diff/200001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:418: } // ScopedSuspendThread nit: new_stack_top could be computed directly here: new_stack_top = reinterpret_cast<uint64_t>(stack_copy_buffer_.get()) + (stack_top - stack_bottom);
The CQ bit was checked by avi@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2702463003/diff/200001/base/profiler/native_s... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/200001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:175: do { On 2017/03/02 16:35:04, Mike Wittman wrote: > How about using aliases for the register state, to make this code more readable? > > uint64_t& rip = unwind_context.data[16]; > uint64_t& rsp = unwind_context.data[7]; Done. https://codereview.chromium.org/2702463003/diff/200001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:183: ip_in_valid_image = proc_info.extra != 0; On 2017/03/02 16:35:04, Mike Wittman wrote: > Move all these calls into an IsIPInValidImage(const unw_context_t& > unwind_context) function? Done. https://codereview.chromium.org/2702463003/diff/200001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:184: } while (!ip_in_valid_image && unwind_context.data[16] < stack_top); On 2017/03/02 16:35:04, Mike Wittman wrote: > Shouldn't this be unwind_context.data[7] < stack_top? Yes. Done. https://codereview.chromium.org/2702463003/diff/200001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:395: uint64_t new_stack_top; On 2017/03/02 16:35:03, Mike Wittman wrote: > initialize to 0 Done. https://codereview.chromium.org/2702463003/diff/200001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:418: } // ScopedSuspendThread On 2017/03/02 16:35:03, Mike Wittman wrote: > nit: new_stack_top could be computed directly here: > new_stack_top = reinterpret_cast<uint64_t>(stack_copy_buffer_.get()) + > (stack_top - stack_bottom); Done.
On 2017/03/02 16:35:04, Mike Wittman wrote: > How do the resulting stacks look in local testing with this change? The key > metric is the percentage of truncated stacks (i.e. those that don't end at the > expected top-level function), since those are unusable for analysis. I did a quick test run, and got two profiled threads. I'm not sure what the cutoff point is; you said 13 was rare but I see > net::URLRequest::SetDefaultCookiePolicyToBlock() + 0 > ChromeBrowserMainPartsPosix::ChromeBrowserMainPartsPosix(content::MainFunctionParams const&) + 14 > ChromeBrowserMainPartsMac::ChromeBrowserMainPartsMac(content::MainFunctionParams const&) + 15 > ChromeContentBrowserClient::CreateBrowserMainParts(content::MainFunctionParams const&) + 37 > content::BrowserMainLoop::Init() + 63 > content::BrowserMainRunnerImpl::Initialize(content::MainFunctionParams const&) + 602 > content::BrowserMain(content::MainFunctionParams const&) + 100 > content::ContentMainRunnerImpl::Run() + 368 > content::ContentMain(content::ContentMainParams const&) + 54 > ChromeMain + 111 > main + 522 which looks reasonable and is 11. Suppose we say 10. Then of the 600 samples there are 7 with fewer than that number of frames: > received 1 profiles >> profile with 300 samples >> stack frame with 2 frame(s) seen 4 times > received 1 profiles >> profile with 300 samples >> stack frame with 2 frame(s) seen 2 times >> stack frame with 8 frame(s) seen 1 times Those are (unmangled): > mach_msg_trap + 10 > mach_msg + 55 > base::MachPortBroker::ChildSendTaskPortToParent(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 333 > content::MachBroker::ChildSendTaskPortToParent() + 59 > content::ContentMainRunnerImpl::Initialize(content::ContentMainParams const&) + 728 > content::ContentMain(content::ContentMainParams const&) + 30 > ChromeMain + 111 > main + 522 which seems legit, > [ empty space ] + 3 > getsectiondata + 132 and > + 3 > tiny_malloc_from_free_list + 1443 which are bogus, and from the other thread: > + 3 > __sfvwrite + 464 > + 3 > _Z21dyldGlobalLockReleasev + 0 > + 3 > dlopen + 59 > + 3 > szone_malloc_should_clear + 1118 So roughly one percent are failing here. I think this is the code where if libunwind fails after just one frame, we try scanning the stack but only once, and libunwind jams again after just one more frame. On one hand we can keep scanning the stack and keep trying to force libunwind to unwind, but then we run the risk of having completely bogus data. (If you want a copy of this particular run's dumps, let me know.)
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...)
OK, I'm generally satisfied with this as an initial implementation if Mark and Robert are good with it. I think it's worth getting it out to real users to validate stability and performance, and get a baseline for real world stack recovery. For future changes, I think it does makes sense to try pretty aggressively to recover stacks, as long as at least a subset are valid and we're not compromising stability. I suspect we can filter out some bogus stacks in the processing, and if other ones have relatively distinct signatures they will effectively wash out with the aggregation. Given the legit shorter stacks you've seen it seems likely that the distribution of Mac stack lengths is somewhat different than Windows.
https://codereview.chromium.org/2702463003/diff/220001/base/profiler/native_s... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/220001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:125: callback(static_cast<uintptr_t>(ip)); This and a few other places are using uintptr_t for pointer-size integer values, but other places are using uint64_t. Can we use a consistent type throughout, possibly uintptr_t to match the Windows implementation?
The CQ bit was checked by avi@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
I'm in agreement. I don't like the 1% figure, but I'd like to run with this and see how useful it is, and work on fixing it later if we need to rather than wait until it's perfect. Mark, Robert, WDYT? Thoughts on the ios-simulator builder? I'm stumped. https://codereview.chromium.org/2702463003/diff/220001/base/profiler/native_s... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/220001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:125: callback(static_cast<uintptr_t>(ip)); On 2017/03/03 20:22:22, Mike Wittman wrote: > This and a few other places are using uintptr_t for pointer-size integer values, > but other places are using uint64_t. Can we use a consistent type throughout, > possibly uintptr_t to match the Windows implementation? Done.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...)
The CQ bit was checked by avi@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
iOS issue fixed. Mark, Robert, thoughts?
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
On 2017/03/06 20:34:52, Avi wrote: > iOS issue fixed. > > Mark, Robert, thoughts? Oh, and Mike, you're good?
On 2017/03/07 00:15:39, Avi wrote: > On 2017/03/06 20:34:52, Avi wrote: > > iOS issue fixed. > > > > Mark, Robert, thoughts? > > Oh, and Mike, you're good? Yep, lgtm.
Overall LG, just some minor comments. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:54: uintptr_t original_stack_bottom_int = (optional, here and throughout but only noted here): I like to use |auto| as the LHS type when casting, since the type is already on the RHS. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:176: // If so, cheat by scanning the stack and trying again. Only do this once, This isn't quite scanning and more of an unwind via assumed frame pointer and standard calling convention, right? https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:183: rsp += 8; // rsp++ 8 -> sizeof(uintptr_t) to make it a little more portable? https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:294: CHECK_EQ(KERN_SUCCESS, resume_result) << "thread_resume failed"; MACH_CHECK here would log the Mach error too. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:413: if (stack_size > kStackCopyBufferSize) We may want monitoring here in case the room-to-grow you left stops being enough. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:453: // No. NOTIMPLEMENTED() maybe?
https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:413: if (stack_size > kStackCopyBufferSize) On 2017/03/15 21:57:25, Robert Sesek wrote: > We may want monitoring here in case the room-to-grow you left stops being > enough. We should be able to detect this server-side if we can assume that the other two early returns are essentially never hit.
https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:54: uintptr_t original_stack_bottom_int = I see these casts here, there’s another set of casts in the caller (CopyStackAndRewritePointers()), and there’s another set of casts in the thing that calls that (NativeStackSamplerMac::SuspendThreadAndRecordStack()). That’s indicative of some kind of problem, y’know? https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:106: stack_copy_bottom, thread_state->__rsp); Unwind info may reference and recover rbx, r12, r13, r14, and r15, too. The two you already have plus these five makes seven, so it makes sense to loop over rewrite_registers[] = { &thread_state->__rbp, &thread_state->__rsp, &thread_state->rbx, … }. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:112: template <typename StackFrameCallback> Is this template thing just papering over the fact that it’s hard to figure out a type for the lambda? Same with WalkStack<>() on line 146. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:122: ++(*frame_count); It may be prudent to set a frame limit here to prevent deeply recursive stacks from causing this thing to spin for too long. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:176: // If so, cheat by scanning the stack and trying again. Only do this once, Robert Sesek wrote: > This isn't quite scanning and more of an unwind via assumed frame pointer and > standard calling convention, right? This is scanning. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:176: // If so, cheat by scanning the stack and trying again. Only do this once, “Only do this once” makes it sound like you might unwind a bunch of frames, stop, try this, and continue. That’s not true. You’ll only do this if libunwind can’t make it past the context frame. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:185: } while (!ip_in_valid_image && rsp < stack_top); You should set some upper bound for scanning, not just stack_top. #include <algorithm> and std::min(orig_rsp + n, stack_top). Pick a suitable n. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:188: WalkStackFromContext(&unwind_context, &frame_count, callback); If this looks fishy and you tried scanning, you might want to throw out the scan result and use the original single frame. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:197: bool GetUUID(const mach_header_64* mach_header, unsigned char* id) { Seems like id is really a uuid_t https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:232: std::string GetUniqueId(const void* module_addr) { I can’t tell the difference between GetUUID and GetUinqueId just from their names. One returns a bool and sets an out param, the other returns a string. The names should be more descriptive. Really, one just calls the other and they’re closely interrelated, so they don’t even need to be separate. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:285: ScopedSuspendThread::ScopedSuspendThread(mach_port_t thread_port) No reason to define these out-of-line, they’re short and right next to the class anyway. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:286: : thread_port_(thread_port), Conserve: : thread_port_(thread_suspend(thread_port) == KERN_SUCCESS ? thread_port : MACH_PORT_NULL) and remove was_successful_. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:313: enum { constexpr size_t https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:380: if (!stack_copy_buffer_) The OOM killer should take care of this, we shouldn’t need to null-check allocations. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:411: uintptr_t stack_size = stack_top - stack_bottom; Normally I’d say to DCHECK_LE(stack_bottom, stack_top) before you do this, but since you don’t want to DCHECK here, I’ll instead say to fail and return if that condition isn’t satisfied. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:453: // No. On 2017/03/15 21:57:25, Robert Sesek wrote: > NOTIMPLEMENTED() maybe? #error is best https://codereview.chromium.org/2702463003/diff/260001/chrome/common/stack_sa... File chrome/common/stack_sampling_configuration.cc (right): https://codereview.chromium.org/2702463003/diff/260001/chrome/common/stack_sa... chrome/common/stack_sampling_configuration.cc:8: #include "base/compiler_specific.h" I don’t know if you need this for anything, but "build/build_config.h" is needed for OS_MACOSX, and while "base/compiler_specific.h" does #include that, you ought to IWYU. https://codereview.chromium.org/2702463003/diff/260001/chrome/common/stack_sa... chrome/common/stack_sampling_configuration.cc:23: #if defined(_WIN64) This is multi-platform code, use OS_WIN now. https://codereview.chromium.org/2702463003/diff/260001/chrome/common/stack_sa... chrome/common/stack_sampling_configuration.cc:30: // This is experimental, so only run on trunk. “unknown” doesn’t really mean trunk, it just means non-Chrome-branded. And a more direct way to say that is #if defined(GOOGLE_CHROME_BUILD) return false; #else return true; #endif
https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:122: ++(*frame_count); On 2017/03/16 03:08:46, Mark Mentovai wrote: > It may be prudent to set a frame limit here to prevent deeply recursive stacks > from causing this thing to spin for too long. This is probably a good idea as a backstop. On Windows the max stack depths for the last month are all in the hundreds, with one unlucky case apparently caught in the middle of a generating a stack overflow exception within injected third party module code (at 16038 frames!). So a 1K frame limit is probably reasonable. To ensure consistent analysis I'd prefer to apply the limit to Windows at the same time, so this would be best done in a follow-on CL for both platforms.
https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:183: rsp += 8; // rsp++ One more thought for right here… Things are only valid if rsp % sizeof(rsp) == 0. If you find that it’s not, then there’s no reasonable way to recover, so you might as well just bail out. The ABI has a further decree regarding stack alignment. Refer to the System V ABI AMD64 Supplement, which macOS adheres to. §3.2.2 > The end of the input argument area shall be aligned on a 16 […] byte boundary. > In other words, the value (%rsp + 8) is always a multiple of 16 […] when > control is transferred to the function entry point. You can use this to improve your stack scanning to reduce false-positive recoveries in this section.
The CQ bit was checked by avi@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-device on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device/builds...) ios-device-xcode-clang on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device-xcode-...) ios-simulator-xcode-clang on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator-xco...)
The CQ bit was checked by avi@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_chromeos_ozone_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by avi@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:54: uintptr_t original_stack_bottom_int = On 2017/03/16 03:08:46, Mark Mentovai wrote: > I see these casts here, there’s another set of casts in the caller > (CopyStackAndRewritePointers()), and there’s another set of casts in the thing > that calls that (NativeStackSamplerMac::SuspendThreadAndRecordStack()). That’s > indicative of some kind of problem, y’know? Boy am I aware of the zillion types here :( Sometimes I need to access these as pointers to 64-bit ints, so I need uint64_t*. The interface to the rest of the stack sampler code is uintptr_t, and sometimes I need the ints, uint64_t. I'm not sure what the best choice here is. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:106: stack_copy_bottom, thread_state->__rsp); On 2017/03/16 03:08:45, Mark Mentovai wrote: > Unwind info may reference and recover rbx, r12, r13, r14, and r15, too. > > The two you already have plus these five makes seven, so it makes sense to loop > over rewrite_registers[] = { &thread_state->__rbp, &thread_state->__rsp, > &thread_state->rbx, … }. Done. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:112: template <typename StackFrameCallback> On 2017/03/16 03:08:46, Mark Mentovai wrote: > Is this template thing just papering over the fact that it’s hard to figure out > a type for the lambda? Same with WalkStack<>() on line 146. Absolutely. Using a template like this is the most sane way to pass a lambda. The other way would be to use std::function, but that's not allowed in Chromium. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:122: ++(*frame_count); On 2017/03/16 16:42:24, Mike Wittman wrote: > On 2017/03/16 03:08:46, Mark Mentovai wrote: > > It may be prudent to set a frame limit here to prevent deeply recursive stacks > > from causing this thing to spin for too long. > > This is probably a good idea as a backstop. On Windows the max stack depths for > the last month are all in the hundreds, with one unlucky case apparently caught > in the middle of a generating a stack overflow exception within injected third > party module code (at 16038 frames!). > > So a 1K frame limit is probably reasonable. To ensure consistent analysis I'd > prefer to apply the limit to Windows at the same time, so this would be best > done in a follow-on CL for both platforms. Mark, would you be OK with this being in a followup CL? https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:176: // If so, cheat by scanning the stack and trying again. Only do this once, On 2017/03/16 03:08:46, Mark Mentovai wrote: > “Only do this once” makes it sound like you might unwind a bunch of frames, > stop, try this, and continue. That’s not true. You’ll only do this if libunwind > can’t make it past the context frame. Rewording. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:183: rsp += 8; // rsp++ On 2017/03/16 16:53:46, Mark Mentovai wrote: > One more thought for right here… > > Things are only valid if rsp % sizeof(rsp) == 0. If you find that it’s not, then > there’s no reasonable way to recover, so you might as well just bail out. > > The ABI has a further decree regarding stack alignment. Refer to the System V > ABI AMD64 Supplement, which macOS adheres to. §3.2.2 > > > The end of the input argument area shall be aligned on a 16 […] byte boundary. > > In other words, the value (%rsp + 8) is always a multiple of 16 […] when > > control is transferred to the function entry point. > > You can use this to improve your stack scanning to reduce false-positive > recoveries in this section. Done. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:185: } while (!ip_in_valid_image && rsp < stack_top); On 2017/03/16 03:08:45, Mark Mentovai wrote: > You should set some upper bound for scanning, not just stack_top. #include > <algorithm> and std::min(orig_rsp + n, stack_top). Pick a suitable n. Done. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:188: WalkStackFromContext(&unwind_context, &frame_count, callback); On 2017/03/16 03:08:46, Mark Mentovai wrote: > If this looks fishy and you tried scanning, you might want to throw out the scan > result and use the original single frame. I don't understand what you mean here. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:197: bool GetUUID(const mach_header_64* mach_header, unsigned char* id) { On 2017/03/16 03:08:46, Mark Mentovai wrote: > Seems like id is really a uuid_t Done. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:232: std::string GetUniqueId(const void* module_addr) { On 2017/03/16 03:08:45, Mark Mentovai wrote: > I can’t tell the difference between GetUUID and GetUinqueId just from their > names. One returns a bool and sets an out param, the other returns a string. The > names should be more descriptive. > > Really, one just calls the other and they’re closely interrelated, so they don’t > even need to be separate. Done. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:285: ScopedSuspendThread::ScopedSuspendThread(mach_port_t thread_port) On 2017/03/16 03:08:46, Mark Mentovai wrote: > No reason to define these out-of-line, they’re short and right next to the class > anyway. Done. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:286: : thread_port_(thread_port), On 2017/03/16 03:08:46, Mark Mentovai wrote: > Conserve: > > : thread_port_(thread_suspend(thread_port) == KERN_SUCCESS ? thread_port : > MACH_PORT_NULL) > > and remove was_successful_. Done. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:294: CHECK_EQ(KERN_SUCCESS, resume_result) << "thread_resume failed"; On 2017/03/15 21:57:25, Robert Sesek wrote: > MACH_CHECK here would log the Mach error too. Done. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:313: enum { On 2017/03/16 03:08:46, Mark Mentovai wrote: > constexpr size_t This is in parallel to the windows version. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:380: if (!stack_copy_buffer_) On 2017/03/16 03:08:45, Mark Mentovai wrote: > The OOM killer should take care of this, we shouldn’t need to null-check > allocations. parallel to windows version. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:411: uintptr_t stack_size = stack_top - stack_bottom; On 2017/03/16 03:08:46, Mark Mentovai wrote: > Normally I’d say to DCHECK_LE(stack_bottom, stack_top) before you do this, but > since you don’t want to DCHECK here, I’ll instead say to fail and return if that > condition isn’t satisfied. Done. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:413: if (stack_size > kStackCopyBufferSize) On 2017/03/15 22:09:56, Mike Wittman wrote: > On 2017/03/15 21:57:25, Robert Sesek wrote: > > We may want monitoring here in case the room-to-grow you left stops being > > enough. > > We should be able to detect this server-side if we can assume that the other two > early returns are essentially never hit. Acknowledged. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:453: // No. On 2017/03/16 03:08:45, Mark Mentovai wrote: > On 2017/03/15 21:57:25, Robert Sesek wrote: > > NOTIMPLEMENTED() maybe? > > #error is best Windows silently returns a nullptr. Mike, WDYT? https://codereview.chromium.org/2702463003/diff/260001/chrome/common/stack_sa... File chrome/common/stack_sampling_configuration.cc (right): https://codereview.chromium.org/2702463003/diff/260001/chrome/common/stack_sa... chrome/common/stack_sampling_configuration.cc:8: #include "base/compiler_specific.h" On 2017/03/16 03:08:46, Mark Mentovai wrote: > I don’t know if you need this for anything, but "build/build_config.h" is needed > for OS_MACOSX, and while "base/compiler_specific.h" does #include that, you > ought to IWYU. Yes, that's the one that I meant. https://codereview.chromium.org/2702463003/diff/260001/chrome/common/stack_sa... chrome/common/stack_sampling_configuration.cc:23: #if defined(_WIN64) On 2017/03/16 03:08:46, Mark Mentovai wrote: > This is multi-platform code, use OS_WIN now. Done. https://codereview.chromium.org/2702463003/diff/260001/chrome/common/stack_sa... chrome/common/stack_sampling_configuration.cc:30: // This is experimental, so only run on trunk. On 2017/03/16 03:08:46, Mark Mentovai wrote: > “unknown” doesn’t really mean trunk, it just means non-Chrome-branded. And a > more direct way to say that is > > #if defined(GOOGLE_CHROME_BUILD) > return false; > #else > return true; > #endif Done.
https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:54: uintptr_t original_stack_bottom_int = On 2017/03/18 03:09:27, Avi wrote: > On 2017/03/16 03:08:46, Mark Mentovai wrote: > > I see these casts here, there’s another set of casts in the caller > > (CopyStackAndRewritePointers()), and there’s another set of casts in the thing > > that calls that (NativeStackSamplerMac::SuspendThreadAndRecordStack()). That’s > > indicative of some kind of problem, y’know? > > Boy am I aware of the zillion types here :( > > Sometimes I need to access these as pointers to 64-bit ints, so I need > uint64_t*. The interface to the rest of the stack sampler code is uintptr_t, and > sometimes I need the ints, uint64_t. I'm not sure what the best choice here is. The representational rules I tried to follow in the Windows code were: - dereferenceable memory addresses as void*/const void* - non-dereferenceable memory addresses (e.g. original stack extents after resuming the thread) as uintptr_t - conversion to/from pointer-to-sized-type for deferencing/pointer arithmetic - conversion to/from OS/StackSamplingProfiler types as necessary I think this makes the code somewhat clearer since there are only two memory address types in the interfaces, and there's a clear (and relevant) distinction between them. https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:453: // No. On 2017/03/18 03:09:27, Avi wrote: > On 2017/03/16 03:08:45, Mark Mentovai wrote: > > On 2017/03/15 21:57:25, Robert Sesek wrote: > > > NOTIMPLEMENTED() maybe? > > > > #error is best > > Windows silently returns a nullptr. Mike, WDYT? The calling code expects NativeStackSampler::Create to return null on platforms/architectures that don't have a sampler implemented. The Windows code is built for both 32 and 64 bit and needs to return null since sampling is not supported on 32 bit. We don't support building for 32-bit OS X at all, right? In that case I think we should just remove the ifdef. https://codereview.chromium.org/2702463003/diff/320001/chrome/common/stack_sa... File chrome/common/stack_sampling_configuration.cc (right): https://codereview.chromium.org/2702463003/diff/320001/chrome/common/stack_sa... chrome/common/stack_sampling_configuration.cc:31: #if defined(GOOGLE_CHROME_BUILD) Can you apply the same change to the OS_WIN section, for consistency?
Mark, Robert: I addressed your comments a week ago. Ping. Please review. https://codereview.chromium.org/2702463003/diff/320001/chrome/common/stack_sa... File chrome/common/stack_sampling_configuration.cc (right): https://codereview.chromium.org/2702463003/diff/320001/chrome/common/stack_sa... chrome/common/stack_sampling_configuration.cc:31: #if defined(GOOGLE_CHROME_BUILD) On 2017/03/20 19:21:30, Mike Wittman wrote: > Can you apply the same change to the OS_WIN section, for consistency? Done.
The CQ bit was checked by avi@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
LGTM https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:122: ++(*frame_count); On 2017/03/18 03:09:27, Avi wrote: > On 2017/03/16 16:42:24, Mike Wittman wrote: > > On 2017/03/16 03:08:46, Mark Mentovai wrote: > > > It may be prudent to set a frame limit here to prevent deeply recursive > stacks > > > from causing this thing to spin for too long. > > > > This is probably a good idea as a backstop. On Windows the max stack depths > for > > the last month are all in the hundreds, with one unlucky case apparently > caught > > in the middle of a generating a stack overflow exception within injected third > > party module code (at 16038 frames!). > > > > So a 1K frame limit is probably reasonable. To ensure consistent analysis I'd > > prefer to apply the limit to Windows at the same time, so this would be best > > done in a follow-on CL for both platforms. > > Mark, would you be OK with this being in a followup CL? Certainly. https://codereview.chromium.org/2702463003/diff/340001/base/profiler/native_s... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/340001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:308: static constexpr size_t kStackCopyBufferSize = 12 * 1024 * 1024; You could also getrlimit(RLIMIT_STACK) to compute a reasonable size that’s probably in use.
https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:54: uintptr_t original_stack_bottom_int = On 2017/03/20 19:21:30, Mike Wittman wrote: > On 2017/03/18 03:09:27, Avi wrote: > > On 2017/03/16 03:08:46, Mark Mentovai wrote: > > > I see these casts here, there’s another set of casts in the caller > > > (CopyStackAndRewritePointers()), and there’s another set of casts in the > thing > > > that calls that (NativeStackSamplerMac::SuspendThreadAndRecordStack()). > That’s > > > indicative of some kind of problem, y’know? > > > > Boy am I aware of the zillion types here :( > > > > Sometimes I need to access these as pointers to 64-bit ints, so I need > > uint64_t*. The interface to the rest of the stack sampler code is uintptr_t, > and > > sometimes I need the ints, uint64_t. I'm not sure what the best choice here > is. > > The representational rules I tried to follow in the Windows code were: > - dereferenceable memory addresses as void*/const void* > - non-dereferenceable memory addresses (e.g. original stack extents after > resuming the thread) as uintptr_t > - conversion to/from pointer-to-sized-type for deferencing/pointer arithmetic > - conversion to/from OS/StackSamplingProfiler types as necessary > > I think this makes the code somewhat clearer since there are only two memory > address types in the interfaces, and there's a clear (and relevant) distinction > between them. It's up to you if you think it's worth following the above. But if not, can you at least use const types for all pointers to the original stack? https://codereview.chromium.org/2702463003/diff/320001/chrome/common/stack_sa... File chrome/common/stack_sampling_configuration.cc (right): https://codereview.chromium.org/2702463003/diff/320001/chrome/common/stack_sa... chrome/common/stack_sampling_configuration.cc:23: #if defined(OS_WIN) #if defined(OS_WIN) && defined(ARCH_CPU_X86_64) https://codereview.chromium.org/2702463003/diff/320001/chrome/common/stack_sa... chrome/common/stack_sampling_configuration.cc:25: const version_info::Channel channel = chrome::GetChannel(); Can we do this so we have the same approach wrt the unknown channel: #if defined(GOOGLE_CHROME_BUILD) const version_info::Channel channel = chrome::GetChannel(); return channel == version_info::Channel::CANARY || channel == version_info::Channel::DEV; #else return true; #endif
The CQ bit was checked by avi@chromium.org to run a CQ dry run
avi@chromium.org changed reviewers: + thakis@chromium.org
Nico, the config file? https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/260001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:54: uintptr_t original_stack_bottom_int = On 2017/03/27 19:48:46, Mike Wittman wrote: > On 2017/03/20 19:21:30, Mike Wittman wrote: > > On 2017/03/18 03:09:27, Avi wrote: > > > On 2017/03/16 03:08:46, Mark Mentovai wrote: > > > > I see these casts here, there’s another set of casts in the caller > > > > (CopyStackAndRewritePointers()), and there’s another set of casts in the > > thing > > > > that calls that (NativeStackSamplerMac::SuspendThreadAndRecordStack()). > > That’s > > > > indicative of some kind of problem, y’know? > > > > > > Boy am I aware of the zillion types here :( > > > > > > Sometimes I need to access these as pointers to 64-bit ints, so I need > > > uint64_t*. The interface to the rest of the stack sampler code is uintptr_t, > > and > > > sometimes I need the ints, uint64_t. I'm not sure what the best choice here > > is. > > > > The representational rules I tried to follow in the Windows code were: > > - dereferenceable memory addresses as void*/const void* > > - non-dereferenceable memory addresses (e.g. original stack extents after > > resuming the thread) as uintptr_t > > - conversion to/from pointer-to-sized-type for deferencing/pointer arithmetic > > - conversion to/from OS/StackSamplingProfiler types as necessary > > > > I think this makes the code somewhat clearer since there are only two memory > > address types in the interfaces, and there's a clear (and relevant) > distinction > > between them. > > It's up to you if you think it's worth following the above. But if not, can you > at least use const types for all pointers to the original stack? I tried modifying the pointers in NativeStackSamplerMac::SuspendThreadAndRecordStack as a test. The problem with using void* as "dereferenceable" pointer types is that you can't do arithmetic with them, and I ended up converting from whatever pointer to void* and then immediately being forced to drop back to the original types I had. :( Lemme do a const conversion. https://codereview.chromium.org/2702463003/diff/320001/chrome/common/stack_sa... File chrome/common/stack_sampling_configuration.cc (right): https://codereview.chromium.org/2702463003/diff/320001/chrome/common/stack_sa... chrome/common/stack_sampling_configuration.cc:23: #if defined(OS_WIN) On 2017/03/27 19:48:46, Mike Wittman wrote: > #if defined(OS_WIN) && defined(ARCH_CPU_X86_64) Done. https://codereview.chromium.org/2702463003/diff/320001/chrome/common/stack_sa... chrome/common/stack_sampling_configuration.cc:25: const version_info::Channel channel = chrome::GetChannel(); On 2017/03/27 19:48:46, Mike Wittman wrote: > Can we do this so we have the same approach wrt the unknown channel: > > #if defined(GOOGLE_CHROME_BUILD) > const version_info::Channel channel = chrome::GetChannel(); > return channel == version_info::Channel::CANARY || > channel == version_info::Channel::DEV; > #else > return true; > #endif Done. https://codereview.chromium.org/2702463003/diff/340001/base/profiler/native_s... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/340001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:308: static constexpr size_t kStackCopyBufferSize = 12 * 1024 * 1024; On 2017/03/27 18:24:55, Mark Mentovai wrote: > You could also getrlimit(RLIMIT_STACK) to compute a reasonable size that’s > probably in use. Right, but that's for the main thread and this is designed to work for all threads. We could do a full "if main thread then getrlimit else pthread_get_stacksize_np" deal if you want.
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2702463003/diff/340001/base/profiler/native_s... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/340001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:308: static constexpr size_t kStackCopyBufferSize = 12 * 1024 * 1024; On 2017/03/29 17:52:09, Avi wrote: > On 2017/03/27 18:24:55, Mark Mentovai wrote: > > You could also getrlimit(RLIMIT_STACK) to compute a reasonable size that’s > > probably in use. > > Right, but that's for the main thread and this is designed to work for all > threads. We could do a full "if main thread then getrlimit else > pthread_get_stacksize_np" deal if you want. I'd like to keep this as a single buffer size that is large enough for pretty much any thread. We will be supporting parallel profile collections across multiple threads (https://codereview.chromium.org/2554123002) and intend to reuse a single buffer across all of them so we're not allocating multiple large memory buffers (https://codereview.chromium.org/2601633002).
https://codereview.chromium.org/2702463003/diff/340001/base/profiler/native_s... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/340001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:308: static constexpr size_t kStackCopyBufferSize = 12 * 1024 * 1024; On 2017/03/29 18:08:09, Mike Wittman wrote: > On 2017/03/29 17:52:09, Avi wrote: > > On 2017/03/27 18:24:55, Mark Mentovai wrote: > > > You could also getrlimit(RLIMIT_STACK) to compute a reasonable size that’s > > > probably in use. > > > > Right, but that's for the main thread and this is designed to work for all > > threads. We could do a full "if main thread then getrlimit else > > pthread_get_stacksize_np" deal if you want. > > I'd like to keep this as a single buffer size that is large enough for pretty > much any thread. We will be supporting parallel profile collections across > multiple threads (https://codereview.chromium.org/2554123002) and intend to > reuse a single buffer across all of them so we're not allocating multiple large > memory buffers (https://codereview.chromium.org/2601633002). Acknowledged.
config file lgtm
https://codereview.chromium.org/2702463003/diff/340001/base/profiler/native_s... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/340001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:308: static constexpr size_t kStackCopyBufferSize = 12 * 1024 * 1024; On 2017/03/29 17:52:09, Avi wrote: > On 2017/03/27 18:24:55, Mark Mentovai wrote: > > You could also getrlimit(RLIMIT_STACK) to compute a reasonable size that’s > > probably in use. > > Right, but that's for the main thread and this is designed to work for all > threads. We could do a full "if main thread then getrlimit else > pthread_get_stacksize_np" deal if you want. I don't really think a non-main thread would use more than this. In fact, I recall that we have code to use this as the default for threads we create.
https://codereview.chromium.org/2702463003/diff/340001/base/profiler/native_s... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/340001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:308: static constexpr size_t kStackCopyBufferSize = 12 * 1024 * 1024; On 2017/03/29 19:23:49, Mark Mentovai wrote: > On 2017/03/29 17:52:09, Avi wrote: > > On 2017/03/27 18:24:55, Mark Mentovai wrote: > > > You could also getrlimit(RLIMIT_STACK) to compute a reasonable size that’s > > > probably in use. > > > > Right, but that's for the main thread and this is designed to work for all > > threads. We could do a full "if main thread then getrlimit else > > pthread_get_stacksize_np" deal if you want. > > I don't really think a non-main thread would use more than this. In fact, I > recall that we have code to use this as the default for threads we create. Yes, I see the use of getrlimit in platform_thread_mac's GetDefaultThreadStackSize() so that we use RLIMIT_STACK for new threads. Mike, would you object if I made this switch?
https://codereview.chromium.org/2702463003/diff/340001/base/profiler/native_s... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/340001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:308: static constexpr size_t kStackCopyBufferSize = 12 * 1024 * 1024; On 2017/03/29 19:59:25, Avi wrote: > On 2017/03/29 19:23:49, Mark Mentovai wrote: > > On 2017/03/29 17:52:09, Avi wrote: > > > On 2017/03/27 18:24:55, Mark Mentovai wrote: > > > > You could also getrlimit(RLIMIT_STACK) to compute a reasonable size that’s > > > > probably in use. > > > > > > Right, but that's for the main thread and this is designed to work for all > > > threads. We could do a full "if main thread then getrlimit else > > > pthread_get_stacksize_np" deal if you want. > > > > I don't really think a non-main thread would use more than this. In fact, I > > recall that we have code to use this as the default for threads we create. > > Yes, I see the use of getrlimit in platform_thread_mac's > GetDefaultThreadStackSize() so that we use RLIMIT_STACK for new threads. > > Mike, would you object if I made this switch? It's still a single buffer size across all threads, so that works for me.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...)
The CQ bit was checked by avi@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Mark, Mike, PTAL
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-simulator on master.tryserver.chromium.mac (JOB_TIMED_OUT, build hasn't started yet, builder probably lacks capacity)
LGTM otherwise https://codereview.chromium.org/2702463003/diff/380001/base/profiler/native_s... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/380001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:49: // of 8 MB with extra wiggle room. Provide a comment saying where the default comes from so that it can be validated in the future (<i386/vmparam.h>’s DFLSSIZ.) https://codereview.chromium.org/2702463003/diff/380001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:377: StackCopyBufferSize(); No need to prime this separately, since it was already done in the initializer list. But what you should do is save the buffer size in a different member variable to protect against it changing (growing) while you’re at work.
LGTM Can you retest this and validate that the recovered stack percentage is still in the same ballpark as before? https://codereview.chromium.org/2702463003/diff/380001/base/profiler/native_s... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/380001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:204: const uintptr_t kMaxScanDepth = 50; nit: call out that this is in bytes https://codereview.chromium.org/2702463003/diff/380001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:377: StackCopyBufferSize(); On 2017/03/30 02:33:26, Mark Mentovai wrote: > No need to prime this separately, since it was already done in the initializer > list. But what you should do is save the buffer size in a different member > variable to protect against it changing (growing) while you’re at work. I also endorse storing the value in a member variable, so that we don't have to call the function within the thread suspend block and be concerned about it taking locks.
The CQ bit was checked by avi@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Nothing changes re completeness. I just ran a sampling. The GPU process is in better shape, with two bad samples in the 300. The main process tends worse; this one time it was three bad samples in the 300. I'd like to see how it falls out in the real world. https://codereview.chromium.org/2702463003/diff/380001/base/profiler/native_s... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/380001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:49: // of 8 MB with extra wiggle room. On 2017/03/30 02:33:26, Mark Mentovai wrote: > Provide a comment saying where the default comes from so that it can be > validated in the future (<i386/vmparam.h>’s DFLSSIZ.) Should I just #include that file and use DFLSSIZ? https://codereview.chromium.org/2702463003/diff/380001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:204: const uintptr_t kMaxScanDepth = 50; On 2017/03/30 16:56:46, Mike Wittman wrote: > nit: call out that this is in bytes Done, though that's a really weird depth. Fifty? Perhaps 16 words? That should be enough. https://codereview.chromium.org/2702463003/diff/380001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:377: StackCopyBufferSize(); On 2017/03/30 16:56:46, Mike Wittman wrote: > On 2017/03/30 02:33:26, Mark Mentovai wrote: > > No need to prime this separately, since it was already done in the initializer > > list. But what you should do is save the buffer size in a different member > > variable to protect against it changing (growing) while you’re at work. > > I also endorse storing the value in a member variable, so that we don't have to > call the function within the thread suspend block and be concerned about it > taking locks. Done.
LGTM https://codereview.chromium.org/2702463003/diff/380001/base/profiler/native_s... File base/profiler/native_stack_sampler_mac.cc (right): https://codereview.chromium.org/2702463003/diff/380001/base/profiler/native_s... base/profiler/native_stack_sampler_mac.cc:49: // of 8 MB with extra wiggle room. On 2017/03/30 20:13:38, Avi wrote: > On 2017/03/30 02:33:26, Mark Mentovai wrote: > > Provide a comment saying where the default comes from so that it can be > > validated in the future (<i386/vmparam.h>’s DFLSSIZ.) > > Should I just #include that file and use DFLSSIZ? If you want. You don’t really have to because the user view of DFLSSIZ doesn’t tell you anything about what its value was when the kernel was built, which is what matters. If DFLSSIZ ever grows, it’ll change in a future SDK and kernel version, but while we’re building with an older SDK, we’ll be picking up the older, smaller DFLSSIZ.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...)
The CQ bit was checked by avi@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from thakis@chromium.org, wittman@chromium.org Link to the patchset: https://codereview.chromium.org/2702463003/#ps400001 (title: "rev")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 400001, "attempt_start_ts": 1490975846756840, "parent_rev": "227382eefa95c38e6dfd924204554c9a06f9666a", "commit_rev": "965a47b7737c6f15238447d8d8fb7105183acc8a"}
Message was sent while issue was closed.
Description was changed from ========== NativeStackSampler implementation for Mac. Stack sampling on Mac implemented for StackSamplingProfiler to provide Chrome stack trace data from Mac users to UMA. BUG=531673 ========== to ========== NativeStackSampler implementation for Mac. Stack sampling on Mac implemented for StackSamplingProfiler to provide Chrome stack trace data from Mac users to UMA. BUG=531673 Review-Url: https://codereview.chromium.org/2702463003 Cr-Commit-Position: refs/heads/master@{#461145} Committed: https://chromium.googlesource.com/chromium/src/+/965a47b7737c6f15238447d8d8fb... ==========
Message was sent while issue was closed.
Committed patchset #21 (id:400001) as https://chromium.googlesource.com/chromium/src/+/965a47b7737c6f15238447d8d8fb...
Message was sent while issue was closed.
A revert of this CL (patchset #21 id:400001) has been created in https://codereview.chromium.org/2797513003/ by avi@chromium.org. The reason for reverting is: WebGL crashes. http://crbug.com/707474. |