|
|
Chromium Code Reviews|
Created:
7 years, 6 months ago by digit1 Modified:
7 years, 6 months ago CC:
chromium-reviews, erikwright+watch_chromium.org, Lei Zhang Base URL:
https://chromium.googlesource.com/chromium/src.git@master Visibility:
Public. |
DescriptionImprove the implementation of atomic operations on Linux/ARM (including Android/ARM).
The previous patch at:
https://chromiumcodereview.appspot.com/10831358
actually regressed the performance of atomic operations on Linux/ARM systems, because
the GCC intrinsics (e.g. __sync_fetch_and_add) are very poorly implemented at the
moment (and also always provide a full barrier, even when the caller doesn't need it).
This replaces the implementation with a better version which:
- Uses inline assembly and LDRES/STREX instructions on ARMv6+, or the old
kernel helper cmpxchg implementation on ARMv5.
- Still uses the kernel helper memory barrier to optimize for single core
devices on ARMv6 and ARMv7, or ARMv5 binaries running on devices with
a higher architecture number.
- Provide truly barrier free compare-and-swap, swap and atomic-inc
operations.
On tested Android/ARM devices, this speeds up atomic increments by x2 to x3.
This indirectly speeds up other operations relying on it (e.g. scoped_refptr<>
or base::Bind()). For details, see:
https://docs.google.com/a/chromium.org/spreadsheet/ccc?key=0Arp73PHrzcIQdGNUd1NGYWlfY0dKWS1EZ2V6RThhZXc&usp=sharing
BUG=234215
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=205205
Patch Set 1 #
Total comments: 15
Patch Set 2 : fix NaCL builds and other issues. #Patch Set 3 : Add comment about Linux 2.6.24 #
Total comments: 11
Patch Set 4 : Really fix the NACL builds. #Patch Set 5 : Fix Acquire_CompareAndSwap() for ARMv5 #
Total comments: 23
Patch Set 6 : Address Mark's remarks #
Total comments: 1
Patch Set 7 : Fix typo #Patch Set 8 : Trivial rebase #Messages
Total messages: 26 (0 generated)
Here's a new implementation of atomic operations on ARM that should distinctly improve performance of these operations, and those that depend on them. I'd particularly appreciate your input on the implementation of Barrier_AtomicIncrement and the use of a single memory barrier operation. I'm convinced that it is correct, but I'm not the best expert on memory reordering issues, so would appreciate if someone could explain, if it is wrong, the exact reasons. Thanks.
https://chromiumcodereview.appspot.com/16335007/diff/1/base/atomicops_interna... File base/atomicops_internals_arm_gcc.h (right): https://chromiumcodereview.appspot.com/16335007/diff/1/base/atomicops_interna... base/atomicops_internals_arm_gcc.h:164: " blx %5\n" yeah, this is very subtle the following does not mention it: http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html I would probably ask over comp.programming.threads or something https://chromiumcodereview.appspot.com/16335007/diff/1/base/atomicops_interna... base/atomicops_internals_arm_gcc.h:168: : "cc"); add "memory" https://chromiumcodereview.appspot.com/16335007/diff/1/base/atomicops_interna... base/atomicops_internals_arm_gcc.h:199: typedef Atomic32 (*LinuxKernelCmpxchgFunc)(Atomic32 old_value, I would use int as the return type, it's 0/1 rather than atomic value. https://chromiumcodereview.appspot.com/16335007/diff/1/base/atomicops_interna... base/atomicops_internals_arm_gcc.h:199: typedef Atomic32 (*LinuxKernelCmpxchgFunc)(Atomic32 old_value, add a comment that this function lies with return value prior to 2.6.24: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b49... https://chromiumcodereview.appspot.com/16335007/diff/1/base/atomicops_interna... base/atomicops_internals_arm_gcc.h:214: prev_value = *ptr; it does not have dmb after the load (provided that we bail out of the loop), so it can not be used as acquire version
https://chromiumcodereview.appspot.com/16335007/diff/1/base/atomicops_interna... File base/atomicops_internals_arm_gcc.h (right): https://chromiumcodereview.appspot.com/16335007/diff/1/base/atomicops_interna... base/atomicops_internals_arm_gcc.h:49: ((void (*)(void))LINUX_ARM_KERNEL_MEMORY_BARRIER)(); Shouldn't this have a compiler barrier to prevent code motion around it? e.g.: __asm__ __volatile__("" ::: "memory"); https://chromiumcodereview.appspot.com/16335007/diff/1/base/atomicops_interna... base/atomicops_internals_arm_gcc.h:71: #ifdef __thumb2__ I don't think this is needed, on ARM it blocks should just serve as an annotation. https://chromiumcodereview.appspot.com/16335007/diff/1/base/atomicops_interna... base/atomicops_internals_arm_gcc.h:264: inline void NoBarrier_Store(volatile Atomic32* ptr, Atomic32 value) { This is only atomic when aligned, right? A comment to that effect (if not an assert)? Same for below.
I defer to those who know more about ARM.
On Mon, Jun 3, 2013 at 10:29 PM, <jfb@chromium.org> wrote: > > https://chromiumcodereview.appspot.com/16335007/diff/1/base/atomicops_interna... > File base/atomicops_internals_arm_gcc.h (right): > > https://chromiumcodereview.appspot.com/16335007/diff/1/base/atomicops_interna... > base/atomicops_internals_arm_gcc.h:49: ((void > (*)(void))LINUX_ARM_KERNEL_MEMORY_BARRIER)(); > Shouldn't this have a compiler barrier to prevent code motion around it? > e.g.: > __asm__ __volatile__("" ::: "memory"); Opaque function call should act as a compiler barrier.
https://chromiumcodereview.appspot.com/16335007/diff/1/base/atomicops_interna... File base/atomicops_internals_arm_gcc.h (right): https://chromiumcodereview.appspot.com/16335007/diff/1/base/atomicops_interna... base/atomicops_internals_arm_gcc.h:49: ((void (*)(void))LINUX_ARM_KERNEL_MEMORY_BARRIER)(); I believe that all function calls are implicit compiler barriers, hence this isn't needed. I've added a comment though to make it clearer. https://chromiumcodereview.appspot.com/16335007/diff/1/base/atomicops_interna... base/atomicops_internals_arm_gcc.h:71: #ifdef __thumb2__ As of today, it's definitely needed with version of gas that comes with the Android NDK being used to build Chromium on Android (NDK r8e and GCC 4.6), otherwise I get: {standard input}: Assembler messages: {standard input}:10708: Error: thumb conditional instruction should be in IT block -- `strexeq r0,r3,[r4]' {standard input}:10867: Error: thumb conditional instruction should be in IT block -- `strexeq r0,r3,[r4]' {standard input}:10990: Error: thumb conditional instruction should be in IT block -- `strexeq r1,r3,[r4]' I believe more recent versions of binutils/gas provide automatic insertion of IT instructions, but we're not using them yet. https://chromiumcodereview.appspot.com/16335007/diff/1/base/atomicops_interna... base/atomicops_internals_arm_gcc.h:164: " blx %5\n" On 2013/06/03 18:01:22, Dmitry Vyukov wrote: > yeah, this is very subtle > the following does not mention it: > http://www.cl.cam.ac.uk/%7Epes20/cpp/cpp0xmappings.html > I would probably ask over comp.programming.threads or something Good point, I've restored the two-memory-barriers implementation, and filed https://code.google.com/p/chromium/issues/detail?id=246514 to investigate this issue with lower priority. This will give ample time to determine what's the best option. I'll rerun my benchmarks and update the spreadsheet too. At first glance, we're still noticeably better than the two previous implementations anyway. https://chromiumcodereview.appspot.com/16335007/diff/1/base/atomicops_interna... base/atomicops_internals_arm_gcc.h:168: : "cc"); On 2013/06/03 18:01:22, Dmitry Vyukov wrote: > add "memory" Done, thanks. Also, I removed the superfluous "cc" annotations for the segments that don't need it (this code comes from a previous versions I wrote for a different project that used explicit labels / tests / branches, turns out it's generally better to let the compiler do this when using inline functions). https://chromiumcodereview.appspot.com/16335007/diff/1/base/atomicops_interna... base/atomicops_internals_arm_gcc.h:199: typedef Atomic32 (*LinuxKernelCmpxchgFunc)(Atomic32 old_value, On 2013/06/03 18:01:22, Dmitry Vyukov wrote: > I would use int as the return type, it's 0/1 rather than atomic value. Done. https://chromiumcodereview.appspot.com/16335007/diff/1/base/atomicops_interna... base/atomicops_internals_arm_gcc.h:214: prev_value = *ptr; On 2013/06/03 18:01:22, Dmitry Vyukov wrote: > it does not have dmb after the load (provided that we bail out of the loop), so > it can not be used as acquire version Fixed. Note that this part of the code comes from was in the Chromium source tree before the switch to gcc intrinsics. It's a bit scary this wasn't detected previously. It also seems the const_cast<> uses are obsolete, so I've removed them. https://chromiumcodereview.appspot.com/16335007/diff/1/base/atomicops_interna... base/atomicops_internals_arm_gcc.h:264: inline void NoBarrier_Store(volatile Atomic32* ptr, Atomic32 value) { On 2013/06/03 18:29:27, JF wrote: > This is only atomic when aligned, right? A comment to that effect (if not an > assert)? > > Same for below. Yes, I've added a comment to the source code. I don't think an assert is necessary given that the issue exists for other platforms too and isn't tested. (Besides, unaligned access results in a crash on certain Android devices, based on how the kernel is configured).
https://chromiumcodereview.appspot.com/16335007/diff/13001/base/atomicops_int... File base/atomicops_internals_arm_gcc.h (right): https://chromiumcodereview.appspot.com/16335007/diff/13001/base/atomicops_int... base/atomicops_internals_arm_gcc.h:142: : "memory"); I think that NoBarrier functions does not require "memory" clobber, because they properly enumerate all inputs/outputs. However, there is probably no code in Chromium for which is makes any performance difference (stats collection?), I would use "memory", "cc" everywhere just to be safe. https://chromiumcodereview.appspot.com/16335007/diff/13001/base/atomicops_int... base/atomicops_internals_arm_gcc.h:159: // Available since Linux 2.6.24. Note that the first Android releases It was available before 2.6.24, but it was returning spurious failures (fails even if old_value==*ptr) https://chromiumcodereview.appspot.com/16335007/diff/13001/base/atomicops_int... base/atomicops_internals_arm_gcc.h:176: return prev_value; This does not have acquire semantics either. We load *ptr, see that it's different from old_value and return. Caller can act based on the value returned from NoBarrier_CompareAndSwap(), e.g. if it's equal to 1, load some global var. This needs dmb in between.
https://chromiumcodereview.appspot.com/16335007/diff/13001/base/atomicops_int... File base/atomicops_internals_arm_gcc.h (right): https://chromiumcodereview.appspot.com/16335007/diff/13001/base/atomicops_int... base/atomicops_internals_arm_gcc.h:159: // Available since Linux 2.6.24. Note that the first Android releases Thanks. I'm changing this to "Available and reliable since..." https://chromiumcodereview.appspot.com/16335007/diff/13001/base/atomicops_int... base/atomicops_internals_arm_gcc.h:176: return prev_value; I see. Is it correct that this remark should only apply to Acquire_CompareAndSwap()? I.e. that NoBarrier_CompareAndSwap() and Release_CompareAndSwap() don't need this extra dmb before the return?
https://chromiumcodereview.appspot.com/16335007/diff/13001/base/atomicops_int... File base/atomicops_internals_arm_gcc.h (right): https://chromiumcodereview.appspot.com/16335007/diff/13001/base/atomicops_int... base/atomicops_internals_arm_gcc.h:176: return prev_value; Also, please note that it looks like that the NoBarrier_CompareAndSwap() implementation in base/atomicops_internals_gcc.h (currently used on Android/ARM and NaCL) have the same "issue". A bit off-topic, but base/atomicops_internal_mips_gcc.h is even more interesting. The NoBarrier_CompareAndSwap() inline assembly also doesn't use any memory barrier, and Barrier_CompareAndSwap() places _compiler_ barriers before/after a call to NoBarrier_CompareAndSwap().
https://chromiumcodereview.appspot.com/16335007/diff/13001/base/atomicops_int... File base/atomicops_internals_arm_gcc.h (right): https://chromiumcodereview.appspot.com/16335007/diff/13001/base/atomicops_int... base/atomicops_internals_arm_gcc.h:176: return prev_value; On 2013/06/04 11:51:56, digit1 wrote: > I see. Is it correct that this remark should only apply to > Acquire_CompareAndSwap()? I.e. that NoBarrier_CompareAndSwap() and > Release_CompareAndSwap() don't need this extra dmb before the return? Yes, that's correct. But Acquire version does not need dbm, if pLinuxKernelCmpxchg() succeeds, because it includes the barrier. For this reason I prefer to do it the other way around, i.e. if the cheapest atomic RMW is Barrier, then implement the Barrier version and then forward Acquire/Release/NoBarrier to Barrier with a comment that there is no more efficient implementation. But if you forward Barrier into NoBarrier, that's kind of questionable. I know that it's implemented this way in atomicops for historical reasons. https://chromiumcodereview.appspot.com/16335007/diff/13001/base/atomicops_int... base/atomicops_internals_arm_gcc.h:176: return prev_value; On 2013/06/04 12:00:37, digit1 wrote: > Also, please note that it looks like that the NoBarrier_CompareAndSwap() > implementation in base/atomicops_internals_gcc.h (currently used on Android/ARM > and NaCL) have the same "issue". > > A bit off-topic, but base/atomicops_internal_mips_gcc.h is even more > interesting. The NoBarrier_CompareAndSwap() inline assembly also doesn't use any > memory barrier, and Barrier_CompareAndSwap() places _compiler_ barriers > before/after a call to NoBarrier_CompareAndSwap(). I would not be surprised if there are bugs.
https://chromiumcodereview.appspot.com/16335007/diff/13001/base/atomicops_int... File base/atomicops_internals_arm_gcc.h (right): https://chromiumcodereview.appspot.com/16335007/diff/13001/base/atomicops_int... base/atomicops_internals_arm_gcc.h:142: : "memory"); Thanks for the details, I've put "cc" and "memory" to all fragments. https://chromiumcodereview.appspot.com/16335007/diff/13001/base/atomicops_int... base/atomicops_internals_arm_gcc.h:159: // Available since Linux 2.6.24. Note that the first Android releases On 2013/06/04 11:51:56, digit1 wrote: > Thanks. I'm changing this to "Available and reliable since..." Done. https://chromiumcodereview.appspot.com/16335007/diff/13001/base/atomicops_int... base/atomicops_internals_arm_gcc.h:176: return prev_value; I've uploaded a patch where only Acquire_CompareAndSwap() performs the extra memory barrier before returning. Please let me know if this is still not correct.
I am not an expert in ARM, but LGTM FWIW
lgtm
Thank you, I've added a few owners and bbudge@ as reviewers.
And I've filed two issues to address the MIPS and gcc intrinsics issues discovered here. [1] https://code.google.com/p/chromium/issues/detail?id=246949 [2] https://code.google.com/p/chromium/issues/detail?id=246947
NaCl part lgtm. Thanks for fixing this.
https://chromiumcodereview.appspot.com/16335007/diff/11003/base/atomicops_int... File base/atomicops_internals_arm_gcc.h (right): https://chromiumcodereview.appspot.com/16335007/diff/11003/base/atomicops_int... base/atomicops_internals_arm_gcc.h:15: // Memory barriers on ARM are funky, but the kernel is here to help: Thanks for this comment! https://chromiumcodereview.appspot.com/16335007/diff/11003/base/atomicops_int... base/atomicops_internals_arm_gcc.h:18: // all on this architecture, or when targetting its machine code. targeting https://chromiumcodereview.appspot.com/16335007/diff/11003/base/atomicops_int... base/atomicops_internals_arm_gcc.h:30: // single or multi-core. However, the kernel provide a useful helper provides https://chromiumcodereview.appspot.com/16335007/diff/11003/base/atomicops_int... base/atomicops_internals_arm_gcc.h:37: // See the comment in Barrier_AtomicIncrement() to see why it is There are two versions of Barrier_AtomicIncrement() and neither explains what this says will be explained. https://chromiumcodereview.appspot.com/16335007/diff/11003/base/atomicops_int... base/atomicops_internals_arm_gcc.h:46: #define LINUX_ARM_KERNEL_MEMORY_BARRIER 0xffff0fa0 You can avoid the macro entirely (and thus the need for the #undef at the bottom). Just write the number inline on line 51, the same as you did on line 167. That should be fine, since you don’t use this value anywhere else in the file. https://chromiumcodereview.appspot.com/16335007/diff/11003/base/atomicops_int... base/atomicops_internals_arm_gcc.h:48: inline void MemoryBarrier() { Make this static or put it in an unnamed namespace. https://chromiumcodereview.appspot.com/16335007/diff/11003/base/atomicops_int... base/atomicops_internals_arm_gcc.h:51: ((void (*)(void))LINUX_ARM_KERNEL_MEMORY_BARRIER)(); This is C++, the second void is unnecessary, it can just be (). https://chromiumcodereview.appspot.com/16335007/diff/11003/base/atomicops_int... base/atomicops_internals_arm_gcc.h:147: #else Is there a C preprocessor macro you can use to indicate that you’re targeting ARMv5? If so, use it here #elif defined(__ARM_ARCH_5__) Then have an #else #error whatever #endif at the bottom of the sequence. That way, when ARMv15 or whatever comes out and someone tries to compile this code targeting that architecture, they won’t fall into the ARMv5 implementation that they probably don’t want to use. https://chromiumcodereview.appspot.com/16335007/diff/11003/base/atomicops_int... base/atomicops_internals_arm_gcc.h:160: // releases used 2.6.29, and ChromeOS is currently at 3.4, iirc, so this “iirc” has no place in a comment. You have the luxury of being able to look this up or ask someone. “currently” also becomes really stale in comments. Future readers (or future digits) shouldn’t have to resort to version history to figure out when “currently” was written. If you write “currently,” also write a date or a Chrome milestone or something else to identify when the statement was true. https://chromiumcodereview.appspot.com/16335007/diff/11003/base/atomicops_int... base/atomicops_internals_arm_gcc.h:162: // Linux/ARM distributions). You’re ending a parenthetical that you never started. https://chromiumcodereview.appspot.com/16335007/diff/11003/base/atomicops_int... base/atomicops_internals_arm_gcc.h:166: LinuxKernelCmpxchgFunc pLinuxKernelCmpxchg __attribute__((weak)) = Isn’t this saying that base::subtle::pLinuxKernelCmpxchg is weak? You don’t care about that. You’re not expecting or wanting user code to override you. https://chromiumcodereview.appspot.com/16335007/diff/11003/base/atomicops_int... base/atomicops_internals_arm_gcc.h:166: LinuxKernelCmpxchgFunc pLinuxKernelCmpxchg __attribute__((weak)) = The p prefix is uncommon in Chrome code. variables_are_normally_named_like_this. https://chromiumcodereview.appspot.com/16335007/diff/11003/base/atomicops_int... base/atomicops_internals_arm_gcc.h:167: (LinuxKernelCmpxchgFunc)0xffff0fc0; The typedef and this pointer should both be in an unnamed namespace. https://chromiumcodereview.appspot.com/16335007/diff/11003/base/atomicops_int... base/atomicops_internals_arm_gcc.h:167: (LinuxKernelCmpxchgFunc)0xffff0fc0; Make this const? You don’t want someone accidentally changing the value.
https://chromiumcodereview.appspot.com/16335007/diff/11003/base/atomicops_int... File base/atomicops_internals_arm_gcc.h (right): https://chromiumcodereview.appspot.com/16335007/diff/11003/base/atomicops_int... base/atomicops_internals_arm_gcc.h:18: // all on this architecture, or when targetting its machine code. On 2013/06/05 13:34:31, Mark Mentovai wrote: > targeting Done. https://chromiumcodereview.appspot.com/16335007/diff/11003/base/atomicops_int... base/atomicops_internals_arm_gcc.h:30: // single or multi-core. However, the kernel provide a useful helper On 2013/06/05 13:34:31, Mark Mentovai wrote: > provides Done. https://chromiumcodereview.appspot.com/16335007/diff/11003/base/atomicops_int... base/atomicops_internals_arm_gcc.h:37: // See the comment in Barrier_AtomicIncrement() to see why it is This comment is obsolete (it was only useful for a previous patch that called the function through inlined assembly). I've removed it. https://chromiumcodereview.appspot.com/16335007/diff/11003/base/atomicops_int... base/atomicops_internals_arm_gcc.h:46: #define LINUX_ARM_KERNEL_MEMORY_BARRIER 0xffff0fa0 Yes, again, this came from the previous patch were the macro was needed in the inline assembly. Removed. https://chromiumcodereview.appspot.com/16335007/diff/11003/base/atomicops_int... base/atomicops_internals_arm_gcc.h:48: inline void MemoryBarrier() { Unfortunately, the function is declared in base/atomicops.h and thus must be in this namespace. https://chromiumcodereview.appspot.com/16335007/diff/11003/base/atomicops_int... base/atomicops_internals_arm_gcc.h:51: ((void (*)(void))LINUX_ARM_KERNEL_MEMORY_BARRIER)(); On 2013/06/05 13:34:31, Mark Mentovai wrote: > This is C++, the second void is unnecessary, it can just be (). Done. https://chromiumcodereview.appspot.com/16335007/diff/11003/base/atomicops_int... base/atomicops_internals_arm_gcc.h:147: #else I've done that. Note that the checks must be more complex because the toolchain actually doesn't define __ARM_ARCH_5__ for Android/ARMv5, but __ARM_ARCH_5TE__, I've updated it. https://chromiumcodereview.appspot.com/16335007/diff/11003/base/atomicops_int... base/atomicops_internals_arm_gcc.h:166: LinuxKernelCmpxchgFunc pLinuxKernelCmpxchg __attribute__((weak)) = Yes, this is part of the original code that was recently removed, so I didn't touch it too much. I've changed all this to a more Chrome-y format. https://chromiumcodereview.appspot.com/16335007/diff/11003/base/atomicops_int... base/atomicops_internals_arm_gcc.h:167: (LinuxKernelCmpxchgFunc)0xffff0fc0; Same here.
LGTM I’m surprised by "cc", "memory" appearing everywhere they do, but it looks like you’ve already discussed that with dvyukov. https://chromiumcodereview.appspot.com/16335007/diff/31001/base/atomicops_int... File base/atomicops_internals_arm_gcc.h (right): https://chromiumcodereview.appspot.com/16335007/diff/31001/base/atomicops_int... base/atomicops_internals_arm_gcc.h:39: // when targetting ARMv5TE. targeting :)
On 2013/06/05 16:46:40, Mark Mentovai wrote: > https://chromiumcodereview.appspot.com/16335007/diff/31001/base/atomicops_int... > base/atomicops_internals_arm_gcc.h:39: // when targetting ARMv5TE. > targeting :) Argh! Fixed :)
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/digit@chromium.org/16335007/35001
Retried try job too often on mac_rel for step(s) content_browsertests http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=mac_rel&nu...
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/digit@chromium.org/16335007/35001
Sorry for I got bad news for ya. Compile failed with a clobber build on win_rel. http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=win_rel&nu... Your code is likely broken or HEAD is junk. Please ensure your code is not broken then alert the build sheriffs. Look at the try server FAQ for more details.
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/digit@chromium.org/16335007/52001
Message was sent while issue was closed.
Change committed as 205205 |
