|
|
Created:
7 years, 5 months ago by scherkus (not reviewing) Modified:
7 years, 3 months ago CC:
chromium-reviews, erikwright+watch_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionUpdate ReadProcMaps() to reflect lack of atomicity when reading /proc/self/maps.
Reading from procfs returns at most a page-sized amount of data. If the process has a larger-than-page-sized /proc/self/maps, we cannot guarantee that the virtual memory table hasn't changed while reading the entire contents from procfs.
In addition, ReadProcMaps() now stops reading as soon as it finds a gate VMA entry to workaround a scenario where the kernel would return duplicate entries (it turns out ThreadSanitizer v2 was very good at triggering said scenario).
BUG=258451
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=221570
Patch Set 1 #
Total comments: 1
Patch Set 2 : Use std::string::resize() #
Total comments: 5
Patch Set 3 : check for gate vma #
Total comments: 2
Patch Set 4 : #
Total comments: 4
Patch Set 5 : fixes #
Total comments: 2
Patch Set 6 : for SPEED #
Total comments: 7
Patch Set 7 : nits #
Messages
Total messages: 22 (0 generated)
glider: here's my take at it mark: base/ OWNERS + peanut gallery I believe the real culprit was doing allocations while reading. Both fixes feel a bit too magical for my liking, but perhaps that's the nature of the game when dealing with procfs :( PS#1 uses read() as suggested in the bug, however I noticed we have to loop over read() until it reaches zero. For example on Android it would only return ~4k bytes at a time as opposed to the entire contents of /proc/self/maps. (FYI I'm a noob when it comes to the low level family of functions (read() et al.), so it's possible I'm Doing It Wrong(tm)) PS#2 uses std::string::resize() to accomplish a similar goal by preallocating enough space in std::string in an attempt to avoid reallocations. Downside is we might return bad data if std::string ended up reallocating. We could inspect the resulting capacity of the string ... but that feels a bit gross. Both methods passed under tsan v2 for 5000+ iterations: ./out/Debug/base_unittests --gtest_repeat=-1 --gtest_break_on_failure --gtest_filter=ProcMapsTest.ReadProcMaps
https://codereview.chromium.org/18661009/diff/2001/base/debug/proc_maps_linux.cc File base/debug/proc_maps_linux.cc (right): https://codereview.chromium.org/18661009/diff/2001/base/debug/proc_maps_linux... base/debug/proc_maps_linux.cc:28: enum { kLargeProcMapsSize = 256 * 1024 }; Why an enum? Why not a const? And why is it out here? You can put it in ReadProcMaps, since that’s the only place it’s used. https://codereview.chromium.org/18661009/diff/2001/base/debug/proc_maps_linux... base/debug/proc_maps_linux.cc:37: proc_maps->reserve(kLargeProcMapsSize); How is this supposed to help…? https://codereview.chromium.org/18661009/diff/2001/base/debug/proc_maps_linux... base/debug/proc_maps_linux.cc:40: return file_util::ReadFileToString(proc_maps_path, proc_maps); …file_util::ReadFileToString still reads chunks of at most 64kB at a time. It doesn’t care about the size of the “contents” buffer. All the “reserve” does is make sure that the std::string::append calls don’t result in internal reallocations until at least 256kB has been read. It’s not enough to avoid reallocations, you really need to gobble up the entire contents of the file in a single read operation.
https://codereview.chromium.org/18661009/diff/2001/base/debug/proc_maps_linux.cc File base/debug/proc_maps_linux.cc (right): https://codereview.chromium.org/18661009/diff/2001/base/debug/proc_maps_linux... base/debug/proc_maps_linux.cc:40: return file_util::ReadFileToString(proc_maps_path, proc_maps); Maybe base::MemoryMappedFile could be used instead? This is still non-atomic, because the size of /proc/self/maps may change between the fstat() and mmap() calls, so we may need to pass some default size to MapFileToMemoryInternal.
https://codereview.chromium.org/18661009/diff/1/base/debug/proc_maps_linux.cc File base/debug/proc_maps_linux.cc (right): https://codereview.chromium.org/18661009/diff/1/base/debug/proc_maps_linux.cc... base/debug/proc_maps_linux.cc:50: read(fd, buffer.get() + offset, kMaxProcMapsSize - offset)); There must be only one read() call, otherwise we're busted if someone fragments the memory while we're reading the file chunk by chunk.
Here’s the story… Both patch set 1 and patch set 2 aren’t thread-safe. But they seem to fix the problem, right? Why? Memory allocations and deallocations may wind up changing the set of memory regions mapped in a process. They don’t always, but they do sometimes. I bet that when running under the sanitizers, it’s much more likely that the mapped regions will change during any arbitrary memory allocation or deallocation. As scherkus found, minimizing the reallocations done while reading /proc/self/maps mitigates the problem. That’s a start, but it’s not [multi-]thread-safe, it’s really only single-thread-safe. /proc/self/maps now probably won’t change while a single thread is trying to read it, assuming no other threads. Changing the allocation stuff around like this doesn’t have anything to do with avoiding the maybe-yield that could happen during an allocation. Anyway, that’s not the only opportunity that exists for a thread to yield, and it’s possible for multiple threads to execute concurrently, so none of that hand-wavey stuff is too comforting. Since we need to fix this anyway, let’s just fix it the right way and make it thread-safe. The only real solution to thread safety here is to read the entire file in one fell swoop with a single read call. How do you know how big a buffer you need? As I pointed out previously, you can’t use fstat (and anyway, the answer might change between your fstat and your read). The only real solution is to retry your read within a loop—if you didn’t grab the whole file in one pass, discard the fragment that you did read, reallocate your buffer, lseek the maps file to rewind it back to 0, and try to read the whole thing again. But how do you know that you’ve read the entire file and that you don’t need to go back for another read? The easy answer is that if you read anything smaller than the size of the buffer, you’ve read the whole thing up to EOF. But is that signal-safe? POSIX allows a read that has been interrupted after it has produced some data to return a short read even though the buffer hasn’t been filled completely and the file’s not at EOF. If you care about this, you need to detect whether a short read was short because it read the entire map by doing an additional 1-byte read from the file. If it returns 0, you’re at EOF, and the map you just read is good. If it returns data, you read a partial map and were interrupted by a signal, so rewind the file, go back, and try again. But in reality, can signals interrupt a /proc read? Remember, it’s not a real file and it’s not a real filesystem. On a real file, a signal might occur during a long read while the kernel is blocked waiting for input from a disk. On /proc/self/maps, where’s the opportunity for a signal to interrupt the read? I don’t think there is any, and honestly, the whole thing seems kind of ridiculous to me. But I’m not positive, and I’m not ruling out the possibility that even though this might not be possible with current kernels, it might become possible in the future. https://codereview.chromium.org/18661009/diff/2001/base/debug/proc_maps_linux.cc File base/debug/proc_maps_linux.cc (right): https://codereview.chromium.org/18661009/diff/2001/base/debug/proc_maps_linux... base/debug/proc_maps_linux.cc:40: return file_util::ReadFileToString(proc_maps_path, proc_maps); Alexander Potapenko wrote: > Maybe base::MemoryMappedFile could be used instead? > This is still non-atomic, because the size of /proc/self/maps may change between > the fstat() and mmap() calls, so we may need to pass some default size to > MapFileToMemoryInternal. /proc is not a regular filesystem. There are two problems with this approach. 1. You can’t use stat (or fstat) to figure out how big the file is, because there is no file. The kernel won’t assemble any data until you try to read the file, so it doesn’t even know the answer. st_size will always be 0 for this file, and for most or all other files in /proc. 2. You can’t mmap the file, because it doesn’t implement mmap. (fs/proc/task_mmu.c proc_tid_maps_operations). You’ll get ENODEV. I don’t know what semantics you would expect of mmap, anyway, since this isn’t a real file.
On 2013/07/11 14:20:40, Mark Mentovai wrote: > Here’s the story… > > Both patch set 1 and patch set 2 aren’t thread-safe. But they seem to fix the > problem, right? Why? > > Memory allocations and deallocations may wind up changing the set of memory > regions mapped in a process. They don’t always, but they do sometimes. I bet > that when running under the sanitizers, it’s much more likely that the mapped > regions will change during any arbitrary memory allocation or deallocation. As > scherkus found, minimizing the reallocations done while reading /proc/self/maps > mitigates the problem. > > That’s a start, but it’s not [multi-]thread-safe, it’s really only > single-thread-safe. /proc/self/maps now probably won’t change while a single > thread is trying to read it, assuming no other threads. Changing the allocation > stuff around like this doesn’t have anything to do with avoiding the maybe-yield > that could happen during an allocation. Anyway, that’s not the only opportunity > that exists for a thread to yield, and it’s possible for multiple threads to > execute concurrently, so none of that hand-wavey stuff is too comforting. > > Since we need to fix this anyway, let’s just fix it the right way and make it > thread-safe. > > The only real solution to thread safety here is to read the entire file in one > fell swoop with a single read call. How do you know how big a buffer you need? > As I pointed out previously, you can’t use fstat (and anyway, the answer might > change between your fstat and your read). The only real solution is to retry > your read within a loop—if you didn’t grab the whole file in one pass, discard > the fragment that you did read, reallocate your buffer, lseek the maps file to > rewind it back to 0, and try to read the whole thing again. But how do you know > that you’ve read the entire file and that you don’t need to go back for another > read? > > The easy answer is that if you read anything smaller than the size of the > buffer, you’ve read the whole thing up to EOF. But is that signal-safe? POSIX > allows a read that has been interrupted after it has produced some data to > return a short read even though the buffer hasn’t been filled completely and the > file’s not at EOF. If you care about this, you need to detect whether a short > read was short because it read the entire map by doing an additional 1-byte read > from the file. If it returns 0, you’re at EOF, and the map you just read is > good. If it returns data, you read a partial map and were interrupted by a > signal, so rewind the file, go back, and try again. But in reality, can signals > interrupt a /proc read? Remember, it’s not a real file and it’s not a real > filesystem. On a real file, a signal might occur during a long read while the > kernel is blocked waiting for input from a disk. On /proc/self/maps, where’s the > opportunity for a signal to interrupt the read? I don’t think there is any, and > honestly, the whole thing seems kind of ridiculous to me. But I’m not positive, > and I’m not ruling out the possibility that even though this might not be > possible with current kernels, it might become possible in the future. > > https://codereview.chromium.org/18661009/diff/2001/base/debug/proc_maps_linux.cc > File base/debug/proc_maps_linux.cc (right): > > https://codereview.chromium.org/18661009/diff/2001/base/debug/proc_maps_linux... > base/debug/proc_maps_linux.cc:40: return > file_util::ReadFileToString(proc_maps_path, proc_maps); > Alexander Potapenko wrote: > > Maybe base::MemoryMappedFile could be used instead? > > This is still non-atomic, because the size of /proc/self/maps may change > between > > the fstat() and mmap() calls, so we may need to pass some default size to > > MapFileToMemoryInternal. > > /proc is not a regular filesystem. There are two problems with this approach. > > 1. You can’t use stat (or fstat) to figure out how big the file is, because > there is no file. The kernel won’t assemble any data until you try to read the > file, so it doesn’t even know the answer. st_size will always be 0 for this > file, and for most or all other files in /proc. > > 2. You can’t mmap the file, because it doesn’t implement mmap. > (fs/proc/task_mmu.c proc_tid_maps_operations). You’ll get ENODEV. I don’t know > what semantics you would expect of mmap, anyway, since this isn’t a real file. tl;dr: read() will return at most getpagesize() bytes because that's how seq_file works. This explains the ~4K bytes I'm seeing on Android + desktop Linux. seq_read() works something like this: 1) kmalloc(PAGE_SIZE) a buffer 2) Call seq_file->op->start() 3) While buffer has capacity... a) Call seq_file->op->show() b) Call seq_file->op->next() to advance iterator 4) Call seq_file->op->stop() 5) Copy buffer into usermode buffer 6) Return # of bytes copied into usermode buffer fs/seq_file.c: seq_read() http://lxr.linux.no/linux+v2.6.37/fs/seq_file.c#L132 seq_file->op in this case is the seq_operations struct filled out by fs/proc/task_mmu.c Each time show() is called, fs/proc/task_mmu.c records the starting address of the last memory mapped region it printed out as a hint for the next time start() is called. What this means is that each time we're forced to call read(), fs/proc/task_mmu.c will re-lookup the last address in the rb_tree of mapped regions. If the rb_tree changed at all between calls to read(), then it's possible that we'll print out duplicate/extra information. fs/proc/task_mmu.c: m_start() http://lxr.linux.no/linux+v2.6.37/fs/proc/task_mmu.c#L109 fs/proc/task_mmu.c: show_map() http://lxr.linux.no/linux+v2.6.37/fs/proc/task_mmu.c#L278 So ... knowing what we know about seq_file is there anything we can realistically do in the interim aside from read()'ing until EOF and hoping it doesn't change? Should this be reported upstream?
We can try to stitch the information from several read() calls. E.g. if there's a common address range in the buffers returned by two consequent read() calls, we can take the contents of the first buffer up to that address range and then append the contents of the second buffer from that point. When the buffer contains a "[vsyscall]" record (on x86_64) or "[stack]"/"ffffe000" (on x86), it's fine to stop. This way we may get an inconsistent mapping, but that'll only contain the mappings that existed at the time one of the read() calls occurred. BTW, there's a limit of 32767-5 mappings in the kernel, so we could potentially preallocate a buffer that's enough to hold /proc/self/maps (although it's quite a big buffer to allocate for such a task).
This is starting to get gross. The best I’m able to come up with that’s thread-safe given that ridiculous seq_file limitation is to freeze the process state: fork a child and maintain a pipe between the two processes, have the child send the parent SIGSTOP, then read its parent’s /proc/<ppid>/maps in full, send the parent SIGCONT, and write the contents to its side of the pipe. See? Gross. Otherwise, I guess glider’s suggestion will at least eliminate duplicates.
On 2013/07/12 14:57:10, Mark Mentovai wrote: > This is starting to get gross. > > The best I’m able to come up with that’s thread-safe given that ridiculous > seq_file limitation is to freeze the process state: fork a child and maintain a > pipe between the two processes, have the child send the parent SIGSTOP, then > read its parent’s /proc/<ppid>/maps in full, send the parent SIGCONT, and write > the contents to its side of the pipe. > > See? Gross. > > Otherwise, I guess glider’s suggestion will at least eliminate duplicates. tl;dr: there's a bug on systems that use a gate VMA (e.g., [vsyscall] on x64, [vectors] on ARM). I'm inclined not to do anything overly fancy, but rather stitch together what we get, stop short when we find a gate VMA, and document ReadProcMaps() that it ain't perfect. WDYT? LONG VERSION The original bug was that we found multiple [stack] entries. So ... how is that possible? I played around with gliders@'s suggestion of looking for [vsyscall] as a clue that we're done. I believe there's a bug in fs/proc/task_mmu.c where it incorrectly sets its "last address" hint to 0 as opposed to -1 after it reached the special [vsyscall] case. What this means is that if N VMA entries are added before we call read() again, we will *not* hit EOF but rather fs/proc/task_mmu.c m_start() falls into this sequential scan loop and end up re-printing the last N entries, including [vsyscall] again: http://lxr.linux.no/linux+v2.6.37/fs/proc/task_mmu.c#L151 In fact, it's possible to make read() spit out the gate VMA every time and never hit EOF as long as you keep creating one new virtual memory region between each call to read() (!) AFAICT, this isn't a problem on systems that don't use a gate VMA (e.g., x86 32-bit) as the last address hint will work as intended and any new regions created with an address smaller than the last address will not show up.
https://codereview.chromium.org/18661009/diff/18001/base/debug/proc_maps_linu... File base/debug/proc_maps_linux.cc (right): https://codereview.chromium.org/18661009/diff/18001/base/debug/proc_maps_linu... base/debug/proc_maps_linux.cc:59: ssize_t bytes_read = HANDLE_EINTR(read(fd, buffer.get(), kBufferSize)); I wonder what should we do in the case the last line is split into two halves. In general the subsequent read() call isn't guaranteed to contain the other half.
Also, if that's a problem to find a gate VMA, maybe we could check that the address ranges grow monotonically?
https://codereview.chromium.org/18661009/diff/18001/base/debug/proc_maps_linu... File base/debug/proc_maps_linux.cc (right): https://codereview.chromium.org/18661009/diff/18001/base/debug/proc_maps_linu... base/debug/proc_maps_linux.cc:59: ssize_t bytes_read = HANDLE_EINTR(read(fd, buffer.get(), kBufferSize)); On 2013/07/15 08:17:19, Alexander Potapenko wrote: > I wonder what should we do in the case the last line is split into two halves. > In general the subsequent read() call isn't guaranteed to contain the other > half. seq_file protects against that by only allowing whole entries to be copied to usermode at a time: http://lxr.linux.no/linux+v2.6.37/fs/seq_file.c#L224 (check out lines 234-235 -- basically if the result of calling show() fills up the entire seq_file internal buffer, it resets m->count and breaks out of the loop)
Finally got back to updating this... As discussed offline and in the bug... here's the cleaned up CL that contains all the disclaimers, gory details, and includes the gate vma workaround to avoid the one scenario where we get duplicate entries at the end.
https://codereview.chromium.org/18661009/diff/26001/base/debug/proc_maps_linu... File base/debug/proc_maps_linux.cc (right): https://codereview.chromium.org/18661009/diff/26001/base/debug/proc_maps_linu... base/debug/proc_maps_linux.cc:32: return proc_maps->find("[vectors]", pos) != std::string::npos; Wanna look for the trailing \n too? And one leading space, since that’ll always be there also? https://codereview.chromium.org/18661009/diff/26001/base/debug/proc_maps_linu... base/debug/proc_maps_linux.cc:68: size_t pos = proc_maps->size(); You haven’t documented that proc_maps should start out as empty. And I don’t think you should have to, either. So I’d just stick a proc_maps->clear() above the loop.
https://codereview.chromium.org/18661009/diff/26001/base/debug/proc_maps_linu... File base/debug/proc_maps_linux.cc (right): https://codereview.chromium.org/18661009/diff/26001/base/debug/proc_maps_linu... base/debug/proc_maps_linux.cc:32: return proc_maps->find("[vectors]", pos) != std::string::npos; On 2013/09/05 16:31:17, Mark Mentovai wrote: > Wanna look for the trailing \n too? And one leading space, since that’ll always > be there also? Done. https://codereview.chromium.org/18661009/diff/26001/base/debug/proc_maps_linu... base/debug/proc_maps_linux.cc:68: size_t pos = proc_maps->size(); On 2013/09/05 16:31:17, Mark Mentovai wrote: > You haven’t documented that proc_maps should start out as empty. And I don’t > think you should have to, either. So I’d just stick a proc_maps->clear() above > the loop. Done and test added.
LGTM. The comment is optional. If you make the suggested change, I’ll do a quick spot check again. Otherwise, feel free to check in. https://codereview.chromium.org/18661009/diff/35001/base/debug/proc_maps_linu... File base/debug/proc_maps_linux.cc (right): https://codereview.chromium.org/18661009/diff/35001/base/debug/proc_maps_linu... base/debug/proc_maps_linux.cc:58: ssize_t bytes_read = HANDLE_EINTR(read(fd, buffer.get(), kBufferSize)); I suppose it doesn’t really matter for this non-performance-critical code, but you could save the copy (in vector<>::append) and get rid of the distinct buffer variable by using proc_maps here. You’d just have to extend it by kBufferSize prior to the read and then resize it to the actual size after the read.
https://codereview.chromium.org/18661009/diff/35001/base/debug/proc_maps_linu... File base/debug/proc_maps_linux.cc (right): https://codereview.chromium.org/18661009/diff/35001/base/debug/proc_maps_linu... base/debug/proc_maps_linux.cc:58: ssize_t bytes_read = HANDLE_EINTR(read(fd, buffer.get(), kBufferSize)); On 2013/09/05 17:52:43, Mark Mentovai wrote: > I suppose it doesn’t really matter for this non-performance-critical code, but > you could save the copy (in vector<>::append) and get rid of the distinct buffer > variable by using proc_maps here. You’d just have to extend it by kBufferSize > prior to the read and then resize it to the actual size after the read. I like it. Done. https://codereview.chromium.org/18661009/diff/48001/base/debug/proc_maps_linu... File base/debug/proc_maps_linux.cc (right): https://codereview.chromium.org/18661009/diff/48001/base/debug/proc_maps_linu... base/debug/proc_maps_linux.cc:66: proc_maps->resize(pos); I don't know if this is worth it.
LGTM https://codereview.chromium.org/18661009/diff/48001/base/debug/proc_maps_linu... File base/debug/proc_maps_linux.cc (right): https://codereview.chromium.org/18661009/diff/48001/base/debug/proc_maps_linu... base/debug/proc_maps_linux.cc:61: void* buffer = &proc_maps->front() + pos; Simplify to &proc_maps[pos]? https://codereview.chromium.org/18661009/diff/48001/base/debug/proc_maps_linu... base/debug/proc_maps_linux.cc:66: proc_maps->resize(pos); scherkus wrote: > I don't know if this is worth it. I think it is. Why return garbage? Either this resize(pos) or a clear() work for me. https://codereview.chromium.org/18661009/diff/48001/base/debug/proc_maps_linu... File base/debug/proc_maps_linux_unittest.cc (right): https://codereview.chromium.org/18661009/diff/48001/base/debug/proc_maps_linu... base/debug/proc_maps_linux_unittest.cc:240: It’d be good if there was a test that ensured that ReadProcMaps was able to properly read maps larger than a page so that you know your loop gets exercised in testing and you can validate that things from the first [and middle] and last reads show up in the output. I can’t come up with a great way to write that test case, though. I guess you could mmap a few hundred files and make sure their paths all show up in ReadProcMaps’ output. I’m not actually asking you to write this test if you don’t want to, it’s probably not that important and I’d like to see this code get checked in after what you’ve been through for it.
https://codereview.chromium.org/18661009/diff/48001/base/debug/proc_maps_linu... File base/debug/proc_maps_linux.cc (right): https://codereview.chromium.org/18661009/diff/48001/base/debug/proc_maps_linu... base/debug/proc_maps_linux.cc:61: void* buffer = &proc_maps->front() + pos; On 2013/09/05 19:25:50, Mark Mentovai wrote: > Simplify to &proc_maps[pos]? Done, but tweaked a bit as proc_maps is a pointer https://codereview.chromium.org/18661009/diff/48001/base/debug/proc_maps_linu... base/debug/proc_maps_linux.cc:66: proc_maps->resize(pos); On 2013/09/05 19:25:50, Mark Mentovai wrote: > scherkus wrote: > > I don't know if this is worth it. > > I think it is. Why return garbage? Either this resize(pos) or a clear() work for > me. I like clear(). https://codereview.chromium.org/18661009/diff/48001/base/debug/proc_maps_linu... File base/debug/proc_maps_linux_unittest.cc (right): https://codereview.chromium.org/18661009/diff/48001/base/debug/proc_maps_linu... base/debug/proc_maps_linux_unittest.cc:240: On 2013/09/05 19:25:50, Mark Mentovai wrote: > It’d be good if there was a test that ensured that ReadProcMaps was able to > properly read maps larger than a page so that you know your loop gets exercised > in testing and you can validate that things from the first [and middle] and last > reads show up in the output. I can’t come up with a great way to write that test > case, though. I guess you could mmap a few hundred files and make sure their > paths all show up in ReadProcMaps’ output. > > I’m not actually asking you to write this test if you don’t want to, it’s > probably not that important and I’d like to see this code get checked in after > what you’ve been through for it. Yeah ... I'm inclined to pass as the ReadProcMaps test above should be sufficient for now. Debug base_unittests has a ~35KB /proc/self/maps so the loop is being covered. Also while I was iterating on this CL, the ReadProcMaps test was catching instances where I screwed up the string sizes 'n offsets.
LGTM! All done!
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/scherkus@chromium.org/18661009/55001
Message was sent while issue was closed.
Change committed as 221570 |