|
|
Created:
7 years, 10 months ago by jln (very slow on Chromium) Modified:
7 years, 10 months ago CC:
chromium-reviews, erikwright+watch_chromium.org, dmikurube+memory_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionLinux: grow a unique random mapping in ASLR
We loosen ASLR by only growing one random mapping. The previous version
had security benefits but had a negative performance impact.
This change aims to be performance neutral in respect to the pre-ASLR era.
At a later date, we may try to strike a good balance between performance and
security.
BUG=170133, 173371
NOTRY=true
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=180486
Patch Set 1 : #
Total comments: 8
Patch Set 2 : Address comments from Chris. #
Total comments: 12
Patch Set 3 : Address nits from Jim. #
Total comments: 3
Messages
Total messages: 14 (0 generated)
This CL restores page cycler performance to the pre-ASLR era by growing a unique mmap random area. Please add jar@ as a reviewer if you're happy with it. The hard part is the test. The previous test was really nice because we could iterate until we saw a difference. Now that it's not the case anymore, making a test that will not flake even after 1M runs is more difficult.
https://chromiumcodereview.appspot.com/12090112/diff/11001/third_party/tcmall... File third_party/tcmalloc/chromium/src/system-alloc.cc (right): https://chromiumcodereview.appspot.com/12090112/diff/11001/third_party/tcmall... third_party/tcmalloc/chromium/src/system-alloc.cc:203: static void* address_hint = NULL; What's the threading story here? Is it possible for two threads to be in here at once? If not, maybe a comment declaring the lock that protects this? https://chromiumcodereview.appspot.com/12090112/diff/11001/third_party/tcmall... third_party/tcmalloc/chromium/src/system-alloc.cc:206: } Style: don't need braces for single-line if with single-line body. https://chromiumcodereview.appspot.com/12090112/diff/11001/third_party/tcmall... third_party/tcmalloc/chromium/src/system-alloc.cc:207: void* result = mmap(address_hint, length, PROT_READ|PROT_WRITE, Maybe some form of comment that the intent here is to map right at the end of the previous mapping, leading to the extension of the existing VMA instead of a new VMA? https://chromiumcodereview.appspot.com/12090112/diff/11001/third_party/tcmall... third_party/tcmalloc/chromium/src/system-alloc.cc:209: if (result != static_cast<void*>(MAP_FAILED) && is_aslr_enabled) { I think you can simplify this block to simply if (result != address_hint) address_hint = NULL; It'll work in the MAP_FAILED case and reset the hint for _any_ VMA collision.
Thanks, PTAL! https://chromiumcodereview.appspot.com/12090112/diff/11001/third_party/tcmall... File third_party/tcmalloc/chromium/src/system-alloc.cc (right): https://chromiumcodereview.appspot.com/12090112/diff/11001/third_party/tcmall... third_party/tcmalloc/chromium/src/system-alloc.cc:203: static void* address_hint = NULL; On 2013/02/01 09:22:30, Chris Evans wrote: > What's the threading story here? Is it possible for two threads to be in here at > once? If not, maybe a comment declaring the lock that protects this? Done. https://chromiumcodereview.appspot.com/12090112/diff/11001/third_party/tcmall... third_party/tcmalloc/chromium/src/system-alloc.cc:206: } On 2013/02/01 09:22:30, Chris Evans wrote: > Style: don't need braces for single-line if with single-line body. The style guide is open on this. I usually always put them, I've been bitten by it before. But since I didn't do it below, you're right that I should be consistent. https://chromiumcodereview.appspot.com/12090112/diff/11001/third_party/tcmall... third_party/tcmalloc/chromium/src/system-alloc.cc:207: void* result = mmap(address_hint, length, PROT_READ|PROT_WRITE, On 2013/02/01 09:22:30, Chris Evans wrote: > Maybe some form of comment that the intent here is to map right at the end of > the previous mapping, leading to the extension of the existing VMA instead of a > new VMA? Isn't the comment below good enough ? Should I move the comment here? https://chromiumcodereview.appspot.com/12090112/diff/11001/third_party/tcmall... third_party/tcmalloc/chromium/src/system-alloc.cc:209: if (result != static_cast<void*>(MAP_FAILED) && is_aslr_enabled) { On 2013/02/01 09:22:30, Chris Evans wrote: > I think you can simplify this block to simply > > if (result != address_hint) > address_hint = NULL; > > It'll work in the MAP_FAILED case and reset the hint for _any_ VMA collision. But is that what we want ? If the goal is to grow existing areas as much as possible, this is not optimal. I was already tempted to do it, so you convinced me. It's closer to the balance we'll eventually get when we start skewing this more towards security later.
https://chromiumcodereview.appspot.com/12090112/diff/5011/base/security_unitt... File base/security_unittest.cc (right): https://chromiumcodereview.appspot.com/12090112/diff/5011/base/security_unitt... base/security_unittest.cc:131: TEST(SecurityTest, ALLOC_TEST(RandomMemoryAllocations)) { I defer to @jar to check the test changes, since he developed a much better understanding of how it works during the first review. https://chromiumcodereview.appspot.com/12090112/diff/5011/third_party/tcmallo... File third_party/tcmalloc/chromium/src/system-alloc.cc (right): https://chromiumcodereview.appspot.com/12090112/diff/5011/third_party/tcmallo... third_party/tcmalloc/chromium/src/system-alloc.cc:209: reinterpret_cast<uint64_t>(address_hint) & ~kRandomAddressMask)) { I'm not sure the extra complexity here is worth is. In the unlikely event we start allocating just before 0x40........ and then cross that line, do we really care? I like simplicity :)
https://chromiumcodereview.appspot.com/12090112/diff/5011/base/security_unitt... File base/security_unittest.cc (right): https://chromiumcodereview.appspot.com/12090112/diff/5011/base/security_unitt... base/security_unittest.cc:131: TEST(SecurityTest, ALLOC_TEST(RandomMemoryAllocations)) { On 2013/02/01 19:03:23, Chris Evans wrote: > I defer to @jar to check the test changes, since he developed a much better > understanding of how it works during the first review. It's completely different though. The previous test was good because we could make many calls. In this case though, it's much much harder to get a decent test, because we would need to go through execve() to re-set the address. https://chromiumcodereview.appspot.com/12090112/diff/5011/third_party/tcmallo... File third_party/tcmalloc/chromium/src/system-alloc.cc (right): https://chromiumcodereview.appspot.com/12090112/diff/5011/third_party/tcmallo... third_party/tcmalloc/chromium/src/system-alloc.cc:209: reinterpret_cast<uint64_t>(address_hint) & ~kRandomAddressMask)) { On 2013/02/01 19:03:23, Chris Evans wrote: > I'm not sure the extra complexity here is worth is. In the unlikely event we > start allocating just before 0x40........ and then cross that line, do we really > care? > > I like simplicity :) I think it's more correct. Our line happens to give us some margin, but that's an implementation detail. We certainly don't want to end up hitting the stack. Moreover, not having this makes it even harder to assess how reliable the test will be.
Jim, could you please take a look ?
Mostly comments on the test (which I'll waive if you agree to land better tests RSN). One question below. https://chromiumcodereview.appspot.com/12090112/diff/5011/base/security_unitt... File base/security_unittest.cc (right): https://chromiumcodereview.appspot.com/12090112/diff/5011/base/security_unitt... base/security_unittest.cc:150: // the sophisticated allocators. Won't all allocations, even the early (small) classes be done via spans that came from your hinted mmap? As a result, you don't even need a big allocation. You should probably just ask for a real small allocation, to potentially get a sample from the first class allocated (in the first span??), and then do a largish allocation to get something more like the higher span. There are probably a lot of allocs and frees by now (during runs of unit tests), so I suspect that you can't get the top of the stack by any easy algorithm. Perhaps you want to add a testability hook, and call to get a peek at the current hint_address? https://chromiumcodereview.appspot.com/12090112/diff/5011/base/security_unitt... base/security_unittest.cc:160: // 2^15 to flake. nit: that's "only" 1 in 32K. If we run 1000 tests a day (a couple of linux and CrOS testers?), we'll flake once a month. Change radius to make it more like once in 10 years or more IMO. https://chromiumcodereview.appspot.com/12090112/diff/5011/third_party/tcmallo... File third_party/tcmalloc/chromium/src/system-alloc.cc (right): https://chromiumcodereview.appspot.com/12090112/diff/5011/third_party/tcmallo... third_party/tcmalloc/chromium/src/system-alloc.cc:185: // The mmap top-down allocator will normally allocate below TASK_SIZE - gap, nit: This comment (lines 185-192) should be placed above line 140. https://chromiumcodereview.appspot.com/12090112/diff/5011/third_party/tcmallo... third_party/tcmalloc/chromium/src/system-alloc.cc:209: reinterpret_cast<uint64_t>(address_hint) & ~kRandomAddressMask)) { Do we have concerns about blocking stack growth? Isn't a stack allocated (anywhere), but not committed, and grown within that bound? If our hint is problematic, won't it just be ignored? The only thing we have to worry about (I think) is asking for an allocation of the bottom page... which can't happen because the hint will be NULL. Are there other real concerns on avoiding some addresses? +1 simplicity (delete line 209). On 2013/02/01 19:15:41, Julien Tinnes wrote: > On 2013/02/01 19:03:23, Chris Evans wrote: > > I'm not sure the extra complexity here is worth is. In the unlikely event we > > start allocating just before 0x40........ and then cross that line, do we > really > > care? > > > > I like simplicity :) > > I think it's more correct. Our line happens to give us some margin, but that's > an implementation detail. We certainly don't want to end up hitting the stack. > > Moreover, not having this makes it even harder to assess how reliable the test > will be.
Thanks PTAL! Re test: yes, let's introduce testing hooks in a next patch. It's the only sane way forward. A statistical test was very good before, because we could have large sample. With sample size 1, it turns out it's very hard :) https://chromiumcodereview.appspot.com/12090112/diff/5011/base/security_unitt... File base/security_unittest.cc (right): https://chromiumcodereview.appspot.com/12090112/diff/5011/base/security_unitt... base/security_unittest.cc:150: // the sophisticated allocators. On 2013/02/01 22:31:27, jar wrote: > Won't all allocations, even the early (small) classes be done via spans that > came from your hinted mmap? As a result, you don't even need a big allocation. > > You should probably just ask for a real small allocation, to potentially get a > sample from the first class allocated (in the first span??), and then do a > largish allocation to get something more like the higher span. There are > probably a lot of allocs and frees by now (during runs of unit tests), so I > suspect that you can't get the top of the stack by any easy algorithm. I am not sure why yet, but that's not the case. Even 512K doesn't work. Note that I didn't touch the low level allocator yet (I'll probably make it use the same AllocWithMmap() function in a later revision). This particular version of this test has been reasonably well tested (both in positive and negative mode), in Debug and Release, and I think it's "ok-ish", but I do realize how clunky it looks. (Also I needed to test both with brk and mmap heap). > Perhaps you want to add a testability hook, and call to get a peek at the > current hint_address? Yes! I'm happy that you're suggesting it. I think a testing hook is the only sane way forward at this point. https://chromiumcodereview.appspot.com/12090112/diff/5011/base/security_unitt... base/security_unittest.cc:160: // 2^15 to flake. On 2013/02/01 22:31:27, jar wrote: > nit: that's "only" 1 in 32K. If we run 1000 tests a day (a couple of linux and > CrOS testers?), we'll flake once a month. > > Change radius to make it more like once in 10 years or more IMO. I would have loved too. But it doesn't work. I've spent a lot of time on this test trying to get it right. Since other tests can run beforehands (as you've noted before), we can end up surprisingly far from the "bottom of the mmap heap". https://chromiumcodereview.appspot.com/12090112/diff/5011/third_party/tcmallo... File third_party/tcmalloc/chromium/src/system-alloc.cc (right): https://chromiumcodereview.appspot.com/12090112/diff/5011/third_party/tcmallo... third_party/tcmalloc/chromium/src/system-alloc.cc:185: // The mmap top-down allocator will normally allocate below TASK_SIZE - gap, On 2013/02/01 22:31:27, jar wrote: > nit: This comment (lines 185-192) should be placed above line 140. Done. https://chromiumcodereview.appspot.com/12090112/diff/5011/third_party/tcmallo... third_party/tcmalloc/chromium/src/system-alloc.cc:209: reinterpret_cast<uint64_t>(address_hint) & ~kRandomAddressMask)) { On 2013/02/01 22:31:27, jar wrote: > Do we have concerns about blocking stack growth? Isn't a stack allocated > (anywhere), but not committed, and grown within that bound? > > If our hint is problematic, won't it just be ignored? The only thing we have to > worry about (I think) is asking for an allocation of the bottom page... which > can't happen because the hint will be NULL. It's a little more complex than "ignored". It's "ignored" when it makes no sense (i.e. non x86_64 canonical address). Otherwise, the kernel tries to find something "close enough". "Collision with the stack" is not detected as problematic by the kernel afaik. > Are there other real concerns on avoiding some addresses? > > +1 simplicity (delete line 209). All concerns should be handled by this mask (and the comment explains the rationale). I would really love to keep this. In this particular version, it's not crucial, if this was changed by someone else and became more complex, it may become crucial and would be easy to overlook. This is my last attempt at keeping it. If you really don't like it, say so and I'll remove :)
https://chromiumcodereview.appspot.com/12090112/diff/15003/third_party/tcmall... File third_party/tcmalloc/chromium/src/system-alloc.cc (right): https://chromiumcodereview.appspot.com/12090112/diff/15003/third_party/tcmall... third_party/tcmalloc/chromium/src/system-alloc.cc:209: reinterpret_cast<uint64_t>(address_hint) & ~kRandomAddressMask)) { Another way to look at this (which may help convince you) is: "do we already have a good hint ?" yes, do nothing : no, get a new hint. Should I create an intermediate bool to make that clear?
jim, any chance you could take a look today ? I really would like to address this perf regression.
I'd really like better testing, and as per agreement, the test will soon use some better testability hooks (e.g., ability to interrogate the hint? reset the hint?). LGTM to land current code, removing perf regression (hopefully). Please respond to the question below. https://chromiumcodereview.appspot.com/12090112/diff/15003/third_party/tcmall... File third_party/tcmalloc/chromium/src/system-alloc.cc (right): https://chromiumcodereview.appspot.com/12090112/diff/15003/third_party/tcmall... third_party/tcmalloc/chromium/src/system-alloc.cc:146: // 0x7ffbf8000000. Question: In a 64 address space, why do they only use the lower 2^48 bits worth of virtual space (or at least, why are allocations all below the 2^47 mark per that constant?)
Thanks! I'll make sure I get this test in a decent shape ASAP. https://chromiumcodereview.appspot.com/12090112/diff/15003/third_party/tcmall... File third_party/tcmalloc/chromium/src/system-alloc.cc (right): https://chromiumcodereview.appspot.com/12090112/diff/15003/third_party/tcmall... third_party/tcmalloc/chromium/src/system-alloc.cc:146: // 0x7ffbf8000000. On 2013/02/04 18:34:08, jar wrote: > Question: In a 64 address space, why do they only use the lower 2^48 bits worth > of virtual space (or at least, why are allocations all below the 2^47 mark per > that constant?) It's an architectural limitation. It saves transistors in the processor. 64 bits addresses on x86_64 have to be "canonical", i.e. roughly a 64 bits sign-extended version of a 48 bits value. The high order bit set is reserved for the kernel (that's a Linux decision, but a very common one on this architecture).
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/jln@chromium.org/12090112/15003
Message was sent while issue was closed.
Change committed as 180486 |