|
|
Created:
6 years, 7 months ago by earthdok Modified:
6 years, 6 months ago Reviewers:
Ken Russell (switch to Gerrit), Alexander Potapenko, jln (very slow on Chromium), mdempsky CC:
chromium-reviews, darin-cc_chromium.org, jam, jln+watch_chromium.org, piman+watch_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionAdd sandbox support for AsanCoverage.
Support dumping coverage data generated by AsanCoverage from sandboxed processes. For the GPU process, we simply pre-open a file before engaging the sandbox. For the renderers, we use a helper process which we fork from the zygote. The helper process collects coverage data from renderers over a socket and writes it to a file. This allows an arbitrary number of renderers to share one output file.
With this change, GPU and renderers will write coverage data to .sancov.packed files which may contain data from multiple modules/processes. Previously, we created one .sancov file per process per module (and still do so for other processes).
Note that the new behavior takes effect regardless of whether the sandbox is actually enabled.
BUG=336212
R=jln@chromium.org
TBR=kbr@chromium.org
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=274248
Patch Set 1 #
Total comments: 44
Patch Set 2 : address most of jln@'s comments #
Total comments: 16
Patch Set 3 : address remaining comments #
Total comments: 6
Patch Set 4 : helper exits on 0 bytes received, zygote waits on it #
Total comments: 19
Patch Set 5 : address comments #Patch Set 6 : change TODO owner #Patch Set 7 : fix build error #
Total comments: 2
Messages
Total messages: 39 (0 generated)
Julien, please take a look. Thanks. https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... File content/zygote/zygote_main_linux.cc (right): https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... content/zygote/zygote_main_linux.cc:439: //TODO: filename must include real pid Does the Zygote know its real pid?
Sergey: the most important feedback is about the LinuxSandbox API (let me know what you think) and about the fact that the helper should be sandboxed in the setuid sandbox as well. Matthew: we don't know the "outside" pid of the Zygote, do we? https://codereview.chromium.org/280303002/diff/1/content/common/sandbox_linux... File content/common/sandbox_linux/sandbox_linux.cc (right): https://codereview.chromium.org/280303002/diff/1/content/common/sandbox_linux... content/common/sandbox_linux/sandbox_linux.cc:137: void LinuxSandbox::SetSanitizerArgs(void *sanitizer_args) { style: "void* blah". https://codereview.chromium.org/280303002/diff/1/content/common/sandbox_linux... content/common/sandbox_linux/sandbox_linux.cc:137: void LinuxSandbox::SetSanitizerArgs(void *sanitizer_args) { Let's respect the order of declarations in the header file and put this After LimitAddressSpace(). https://codereview.chromium.org/280303002/diff/1/content/common/sandbox_linux... File content/common/sandbox_linux/sandbox_linux.h (right): https://codereview.chromium.org/280303002/diff/1/content/common/sandbox_linux... content/common/sandbox_linux/sandbox_linux.h:87: void SetSanitizerArgs(void *sanitizer_args); style: "void* blah" in Chromium. https://codereview.chromium.org/280303002/diff/1/content/common/sandbox_linux... content/common/sandbox_linux/sandbox_linux.h:87: void SetSanitizerArgs(void *sanitizer_args); Note: it's ok to just call it set_sanitizer_args() and define it inline, since it's a trivial mutator. https://codereview.chromium.org/280303002/diff/1/content/common/sandbox_linux... content/common/sandbox_linux/sandbox_linux.h:126: defined(LEAK_SANITIZER)) && defined(OS_LINUX) You're in _linux, so you can remove the last defined() https://codereview.chromium.org/280303002/diff/1/content/common/sandbox_linux... content/common/sandbox_linux/sandbox_linux.h:127: void *sanitizer_args_; style: "void* blah" in chromium. https://codereview.chromium.org/280303002/diff/1/content/common/sandbox_linux... content/common/sandbox_linux/sandbox_linux.h:127: void *sanitizer_args_; How about doing the following instead: 1. scoped_ptr<struct sanitizer_args> sanitizer_args_; 2. Allocate it and zero it in the constructor. 3. SetSanitizerArgs takes three arguments instead and sets the structure members (therefore, only sandbox_linux.cc needs to know about them). 4. __sanitizer_sandbox_on_notify uses sanitizer_args.release() to pass ownership to *SAN. 3. (alternative): have a simple sanitizer_args() const { return sanitizer_args_; } method and continue to do this inline. https://codereview.chromium.org/280303002/diff/1/content/gpu/gpu_main.cc File content/gpu/gpu_main.cc (right): https://codereview.chromium.org/280303002/diff/1/content/gpu/gpu_main.cc#newc... content/gpu/gpu_main.cc:438: memset(sanitizer_args, 0, sizeof(*sanitizer_args)); *sanitizer_args = {0}; instead? https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... File content/zygote/zygote_main_linux.cc (right): https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... content/zygote/zygote_main_linux.cc:434: int received_size = recv(socket_fd, buffer, kSanitizerMaxMessageLength, 0); HANDLE_EINTR() https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... content/zygote/zygote_main_linux.cc:439: //TODO: filename must include real pid On 2014/05/12 15:48:20, earthdok wrote: > Does the Zygote know its real pid? I don't think so. Matthew? Do you need it? https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... content/zygote/zygote_main_linux.cc:440: char filename[64] = "zygote.sancov.packed"; const char kZygoteSanCovFileName[] = "zygote.sancov.packed"; https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... content/zygote/zygote_main_linux.cc:441: file_fd = creat(filename, 0660); HANDLE_EINTR. More importantly, I don't like creat(2). It doesn't set O_EXCL. Do you want to just open with O_CREAT | O_EXCL?. Or maybe we actually do not want O_EXCL? How to handle existing files? Is O_TRUNC (which creat(2) does) what we want? https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... content/zygote/zygote_main_linux.cc:445: int written_size = write(file_fd, buffer, received_size); HANDLE_EINTR() https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... content/zygote/zygote_main_linux.cc:445: int written_size = write(file_fd, buffer, received_size); ssize_t https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... content/zygote/zygote_main_linux.cc:448: fsync(file_fd); PCHECK() https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... content/zygote/zygote_main_linux.cc:455: PCHECK(0 == socketpair(AF_UNIX, SOCK_SEQPACKET, 0, fds)) You're only using the socketpair in one direction. Please use shutdown(2) to shutdown the directions you don't need. https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... content/zygote/zygote_main_linux.cc:461: close(fds[1]); PCHECK(0 == IGNORE_EINTR(close())); https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... content/zygote/zygote_main_linux.cc:466: close(fds[0]); PCHECK(0 ==... https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... content/zygote/zygote_main_linux.cc:501: int sancov_helper_fd = ForkSanitizerCoverageHelper(); As described in the bug, the Coverage Helper should be (setuid) sandboxed. Let's move this whole block after EnterLayerOneSandbox(), except for a small open(2) (or creat) to open the file descriptor you need, which should be right before EnterLayerOneSandbox(). Put it in a base::ScopedFD(). ForkSanitizerCoverageHelper() should take this ScopedFD as an argument (.Pass()), and this will make sure that it gets closed in the parent.
https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... File content/zygote/zygote_main_linux.cc (right): https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... content/zygote/zygote_main_linux.cc:457: int pid = fork(); pid_t
https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... File content/zygote/zygote_main_linux.cc (right): https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... content/zygote/zygote_main_linux.cc:439: //TODO: filename must include real pid On 2014/05/13 01:17:35, jln wrote: > On 2014/05/12 15:48:20, earthdok wrote: > > Does the Zygote know its real pid? > > I don't think so. Matthew? Correct, currently the zygote never learns its own real PID, only non-NaCl child processes do.
Julien: I've addressed most of the comments but there are still unresolved issues with the helper process (see below). Also there's one more problem with this code: __sanitizer_sandbox_notify() is called regardless of whether the sandbox is actually going to be enabled. I.e. SanCov enters "sandboxed" mode even with --no-sandbox. This is not a huge deal breaker, but it would make more sense not to do it. Is it possible/would it make sense to not call __sanitizer_sandbox_notify() in this case? https://codereview.chromium.org/280303002/diff/1/content/common/sandbox_linux... File content/common/sandbox_linux/sandbox_linux.cc (right): https://codereview.chromium.org/280303002/diff/1/content/common/sandbox_linux... content/common/sandbox_linux/sandbox_linux.cc:137: void LinuxSandbox::SetSanitizerArgs(void *sanitizer_args) { On 2014/05/13 01:17:35, jln wrote: > style: "void* blah". Done. https://codereview.chromium.org/280303002/diff/1/content/common/sandbox_linux... content/common/sandbox_linux/sandbox_linux.cc:137: void LinuxSandbox::SetSanitizerArgs(void *sanitizer_args) { On 2014/05/13 01:17:35, jln wrote: > Let's respect the order of declarations in the header file and put this After > LimitAddressSpace(). Done. https://codereview.chromium.org/280303002/diff/1/content/common/sandbox_linux... File content/common/sandbox_linux/sandbox_linux.h (right): https://codereview.chromium.org/280303002/diff/1/content/common/sandbox_linux... content/common/sandbox_linux/sandbox_linux.h:87: void SetSanitizerArgs(void *sanitizer_args); On 2014/05/13 01:17:35, jln wrote: > style: "void* blah" in Chromium. Done. https://codereview.chromium.org/280303002/diff/1/content/common/sandbox_linux... content/common/sandbox_linux/sandbox_linux.h:126: defined(LEAK_SANITIZER)) && defined(OS_LINUX) On 2014/05/13 01:17:35, jln wrote: > You're in _linux, so you can remove the last defined() Done. https://codereview.chromium.org/280303002/diff/1/content/common/sandbox_linux... content/common/sandbox_linux/sandbox_linux.h:127: void *sanitizer_args_; On 2014/05/13 01:17:35, jln wrote: > style: "void* blah" in chromium. Done. https://codereview.chromium.org/280303002/diff/1/content/common/sandbox_linux... content/common/sandbox_linux/sandbox_linux.h:127: void *sanitizer_args_; On 2014/05/13 01:17:35, jln wrote: > How about doing the following instead: > > 1. scoped_ptr<struct sanitizer_args> sanitizer_args_; > 2. Allocate it and zero it in the constructor. > 3. SetSanitizerArgs takes three arguments instead and sets the structure members > (therefore, only sandbox_linux.cc needs to know about them). > 4. __sanitizer_sandbox_on_notify uses sanitizer_args.release() to pass ownership > to *SAN. Note that *SAN doesn't actually take ownership of this struct. The drawback of this approach is that the interface (SetSanitizerArgs) would have to be changed every time new structure members are added, even though some callers might not care about setting them. > > 3. (alternative): have a simple sanitizer_args() const { return sanitizer_args_; > } method and continue to do this inline. I've implemented this solution, although I'm not sure I understood you right. https://codereview.chromium.org/280303002/diff/1/content/gpu/gpu_main.cc File content/gpu/gpu_main.cc (right): https://codereview.chromium.org/280303002/diff/1/content/gpu/gpu_main.cc#newc... content/gpu/gpu_main.cc:438: memset(sanitizer_args, 0, sizeof(*sanitizer_args)); On 2014/05/13 01:17:35, jln wrote: > *sanitizer_args = {0}; instead? Done. https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... File content/zygote/zygote_main_linux.cc (right): https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... content/zygote/zygote_main_linux.cc:434: int received_size = recv(socket_fd, buffer, kSanitizerMaxMessageLength, 0); On 2014/05/13 01:17:35, jln wrote: > HANDLE_EINTR() Done. https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... content/zygote/zygote_main_linux.cc:439: //TODO: filename must include real pid On 2014/05/13 01:17:35, jln wrote: > On 2014/05/12 15:48:20, earthdok wrote: > > Does the Zygote know its real pid? > > I don't think so. Matthew? > > Do you need it? The intention is to make the filename unique. Can we have multiple zygotes in the course of a single browser session? Also please see the comment to creat() below. https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... content/zygote/zygote_main_linux.cc:440: char filename[64] = "zygote.sancov.packed"; On 2014/05/13 01:17:35, jln wrote: > const char kZygoteSanCovFileName[] = "zygote.sancov.packed"; Done. https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... content/zygote/zygote_main_linux.cc:441: file_fd = creat(filename, 0660); On 2014/05/13 01:17:35, jln wrote: > HANDLE_EINTR. > > More importantly, I don't like creat(2). It doesn't set O_EXCL. Do you want to > just open with O_CREAT | O_EXCL?. Or maybe we actually do not want O_EXCL? How > to handle existing files? Is O_TRUNC (which creat(2) does) what we want? I wanted to do this the same way it's done in sanitizer code, which I assumed truncated pre-existing files. Upon closer inspection, it turns out that it actually appends to them (O_CREAT | O_WRONLY). More often than not, it doesn't matter because the filename includes real PID, which makes it unique-ish. In the situation where the filename is always the same, appending is the lesser evil. However it's still possible to have contention over the file. Here's a better solution: coverage file names should be of the form <name>.<unique_id>.sancov[.packed] where <name> is either "zygote" or <module>.<pid>. The code that opens the file would live in the sanitizer runtime and the zygote would call an interface function such as int __sanitizer_open_file_for_coverage(const char* name); The added benefit here is that the zygote no longer needs to know where to put the coverage files. This allows us to support something like ASAN_OPTIONS=coverage_dir=... For the time being I'm changing it to open(O_WRONLY | O_CREAT) to match the sanitizer behavior. https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... content/zygote/zygote_main_linux.cc:445: int written_size = write(file_fd, buffer, received_size); On 2014/05/13 01:17:35, jln wrote: > HANDLE_EINTR() Done. https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... content/zygote/zygote_main_linux.cc:448: fsync(file_fd); On 2014/05/13 01:17:35, jln wrote: > PCHECK() Done. Also added HANDLE_EINTR. https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... content/zygote/zygote_main_linux.cc:455: PCHECK(0 == socketpair(AF_UNIX, SOCK_SEQPACKET, 0, fds)) On 2014/05/13 01:17:35, jln wrote: > You're only using the socketpair in one direction. > > Please use shutdown(2) to shutdown the directions you don't need. Done. https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... content/zygote/zygote_main_linux.cc:457: int pid = fork(); On 2014/05/13 01:19:35, jln wrote: > pid_t Done. https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... content/zygote/zygote_main_linux.cc:461: close(fds[1]); On 2014/05/13 01:17:35, jln wrote: > PCHECK(0 == IGNORE_EINTR(close())); Done. https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... content/zygote/zygote_main_linux.cc:466: close(fds[0]); On 2014/05/13 01:17:35, jln wrote: > PCHECK(0 ==... Done. https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... content/zygote/zygote_main_linux.cc:501: int sancov_helper_fd = ForkSanitizerCoverageHelper(); On 2014/05/13 01:17:35, jln wrote: > As described in the bug, the Coverage Helper should be (setuid) sandboxed. > > Let's move this whole block after EnterLayerOneSandbox(), except for a small > open(2) (or creat) to open the file descriptor you need, which should be right > before EnterLayerOneSandbox(). Put it in a base::ScopedFD(). > > ForkSanitizerCoverageHelper() should take this ScopedFD as an argument > (.Pass()), and this will make sure that it gets closed in the parent. There's just one problem: that would cause any ASan build of Chrome to unconditionally create a "zygote.sancov.packed" file in the current directory, even if coverage dumping is not enabled. This is not just ugly, it'll probably break for a lot of people who run chrome from non-writeable directories. Sanitizer code knows whether coverage dumping is enabled or not, but Chrome doesn't. A possible solution is to use the __sanitizer_open_file_for_coverage() function described above, and have it not open the file if coverage dumping is disabled. This is a bit ugly, and would require that function to have two different failure modes (because it could also genuinely fail to open the file), but that's the best solution I can come up with.
https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... File content/zygote/zygote_main_linux.cc (right): https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... content/zygote/zygote_main_linux.cc:439: //TODO: filename must include real pid On 2014/05/14 17:00:26, earthdok wrote: > The intention is to make the filename unique. Can we have multiple zygotes in > the course of a single browser session? Right now, no. I have some plans in the future to change that so NaCl will run its own zygotes, but if/when that happens, we can arrange to plumb some unique value/string down to here so we can guarantee the filenames are still unique.
On 2014/05/14 17:11:28, mdempsky wrote: > https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... > File content/zygote/zygote_main_linux.cc (right): > > https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... > content/zygote/zygote_main_linux.cc:439: //TODO: filename must include real pid > On 2014/05/14 17:00:26, earthdok wrote: > > The intention is to make the filename unique. Can we have multiple zygotes in > > the course of a single browser session? > > Right now, no. Just to clarify, a zygote cannot die and be restarted, right? Or are we talking about a weaker guarantee of "never more than one zygote simultaneously, but possibly multiple zygotes in succession"? > I have some plans in the future to change that so NaCl will run > its own zygotes, but if/when that happens, we can arrange to plumb some unique > value/string down to here so we can guarantee the filenames are still unique. We probably don't care about NaCl at all in this context.
On 2014/05/14 17:19:11, earthdok wrote: > Just to clarify, a zygote cannot die and be restarted, right? Or are we talking > about a weaker guarantee of "never more than one zygote simultaneously, but > possibly multiple zygotes in succession"? Correct, the zygote is only ever spawned at browser startup, and if it crashes the browser will be unable to spawn new renderers, etc.
On Wed, May 14, 2014 at 10:19 AM, <earthdok@chromium.org> wrote: > >> Just to clarify, a zygote cannot die and be restarted, right? Or are we > talking > about a weaker guarantee of "never more than one zygote simultaneously, but > possibly multiple zygotes in succession"? Yeah, the Zygote cannot die and be restarted. There is only one Zygote that you can care about per browser instance. However, if you want this to be unique, I would simply generate a random ID. This is much safer (and easier) than using PIDs. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... File content/zygote/zygote_main_linux.cc (right): https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... content/zygote/zygote_main_linux.cc:501: int sancov_helper_fd = ForkSanitizerCoverageHelper(); On 2014/05/14 17:00:26, earthdok wrote: > On 2014/05/13 01:17:35, jln wrote: > > As described in the bug, the Coverage Helper should be (setuid) sandboxed. > > > > Let's move this whole block after EnterLayerOneSandbox(), except for a small > > open(2) (or creat) to open the file descriptor you need, which should be right > > before EnterLayerOneSandbox(). Put it in a base::ScopedFD(). > > > > ForkSanitizerCoverageHelper() should take this ScopedFD as an argument > > (.Pass()), and this will make sure that it gets closed in the parent. > > There's just one problem: that would cause any ASan build of Chrome to > unconditionally create a "zygote.sancov.packed" file in the current directory, > even if coverage dumping is not enabled. This is not just ugly, it'll probably > break for a lot of people who run chrome from non-writeable directories. > > Sanitizer code knows whether coverage dumping is enabled or not, but Chrome > doesn't. A possible solution is to use the __sanitizer_open_file_for_coverage() > function described above, and have it not open the file if coverage dumping is > disabled. This is a bit ugly, and would require that function to have two > different failure modes (because it could also genuinely fail to open the file), > but that's the best solution I can come up with. It could be a nice property to execute a similar codepath whether or not coverage dumping is enabled. I would simply try to open the file ahead of time. Not CHECK() on failures, the ScopedFD could hold "-1". The check can instead be in place for when we need to write to it. Also the current initialization is too early. We should open the coverage right before EnterLayerOneSandbox (or in ZygotePreSandboxInit()) and spawn the helper process after EnterLayerOneSandbox() (also ideally we should make sure the init process in EnterLayerOneSandbox does close the coverage file - we can pass it a callback, but I can take care of this later - ). I think it's ok to move PreinitializeSandbox() right before EnterLayerOneSandbox(). If you really don't want to do the same thing when the sandbox is off and when the sandbox is on, must_enable_setuid_sandbox tells you reliably what will happen. I think we would deeply regret if the helper process was not (setuid) sandboxed. There is some attack surface because of signal handlers, but also sandboxing is great tool to make sure things stay clean and properly engineered. https://codereview.chromium.org/280303002/diff/20001/content/common/sandbox_l... File content/common/sandbox_linux/sandbox_linux.cc (right): https://codereview.chromium.org/280303002/diff/20001/content/common/sandbox_l... content/common/sandbox_linux/sandbox_linux.cc:121: #endif Maybe just put it below in the body of the constructor rather than in the initializer list then? https://codereview.chromium.org/280303002/diff/20001/content/common/sandbox_l... content/common/sandbox_linux/sandbox_linux.cc:148: __sanitizer_sandbox_on_notify(sanitizer_args()); Before the process exits, when the LinuxSandbox singleton gets destroyed, what happens if we delete the sanitizer_args? I would think that this needs to survive for the lifetime of the process, no? In that case I would .release() here to pass ownership (and *SAN itself can "leak" it, i.e. pass ownership to the kernel to bind it to the process' life time). https://codereview.chromium.org/280303002/diff/20001/content/common/sandbox_l... File content/common/sandbox_linux/sandbox_linux.h (right): https://codereview.chromium.org/280303002/diff/20001/content/common/sandbox_l... content/common/sandbox_linux/sandbox_linux.h:92: __sanitizer_sandbox_arguments* sanitizer_args() const { Since this is a scoped_ptr and we're not doing a trivial accessor, could you make this GetSanitizerArgs() instead? But see the further discussion about ownership of the pointer first. https://codereview.chromium.org/280303002/diff/20001/content/zygote/zygote_ma... File content/zygote/zygote_main_linux.cc (right): https://codereview.chromium.org/280303002/diff/20001/content/zygote/zygote_ma... content/zygote/zygote_main_linux.cc:451: PCHECK(0 == HANDLE_EINTR(fsync(file_fd))) I don't think fsync can EINTR. https://codereview.chromium.org/280303002/diff/20001/content/zygote/zygote_ma... content/zygote/zygote_main_linux.cc:469: << "Sanitizer coverage helper: child failed to close socket";; General remark for all the PCHECKs. Feel free to not put any messages here. These doesn't really happen in practice and line numbers are enough if it does. Your choice.
https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... File content/zygote/zygote_main_linux.cc (right): https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... content/zygote/zygote_main_linux.cc:501: int sancov_helper_fd = ForkSanitizerCoverageHelper(); On 2014/05/14 17:00:26, earthdok wrote: > On 2014/05/13 01:17:35, jln wrote: > > As described in the bug, the Coverage Helper should be (setuid) sandboxed. > > > > Let's move this whole block after EnterLayerOneSandbox(), except for a small > > open(2) (or creat) to open the file descriptor you need, which should be right > > before EnterLayerOneSandbox(). Put it in a base::ScopedFD(). > > > > ForkSanitizerCoverageHelper() should take this ScopedFD as an argument > > (.Pass()), and this will make sure that it gets closed in the parent. > > There's just one problem: that would cause any ASan build of Chrome to > unconditionally create a "zygote.sancov.packed" file in the current directory, > even if coverage dumping is not enabled. This is not just ugly, it'll probably > break for a lot of people who run chrome from non-writeable directories. > > Sanitizer code knows whether coverage dumping is enabled or not, but Chrome > doesn't. A possible solution is to use the __sanitizer_open_file_for_coverage() > function described above, and have it not open the file if coverage dumping is > disabled. This is a bit ugly, and would require that function to have two > different failure modes (because it could also genuinely fail to open the file), > but that's the best solution I can come up with. It could be a nice property to execute a similar codepath whether or not coverage dumping is enabled. I would simply try to open the file ahead of time. Not CHECK() on failures, the ScopedFD could hold "-1". The check can instead be in place for when we need to write to it. Also the current initialization is too early. We should open the coverage right before EnterLayerOneSandbox (or in ZygotePreSandboxInit()) and spawn the helper process after EnterLayerOneSandbox() (also ideally we should make sure the init process in EnterLayerOneSandbox does close the coverage file - we can pass it a callback, but I can take care of this later - ). I think it's ok to move PreinitializeSandbox() right before EnterLayerOneSandbox(). If you really don't want to do the same thing when the sandbox is off and when the sandbox is on, must_enable_setuid_sandbox tells you reliably what will happen. I think we would deeply regret if the helper process was not (setuid) sandboxed. There is some attack surface because of signal handlers, but also sandboxing is great tool to make sure things stay clean and properly engineered. https://codereview.chromium.org/280303002/diff/20001/content/common/sandbox_l... File content/common/sandbox_linux/sandbox_linux.cc (right): https://codereview.chromium.org/280303002/diff/20001/content/common/sandbox_l... content/common/sandbox_linux/sandbox_linux.cc:121: #endif Maybe just put it below in the body of the constructor rather than in the initializer list then? https://codereview.chromium.org/280303002/diff/20001/content/common/sandbox_l... content/common/sandbox_linux/sandbox_linux.cc:148: __sanitizer_sandbox_on_notify(sanitizer_args()); Before the process exits, when the LinuxSandbox singleton gets destroyed, what happens if we delete the sanitizer_args? I would think that this needs to survive for the lifetime of the process, no? In that case I would .release() here to pass ownership (and *SAN itself can "leak" it, i.e. pass ownership to the kernel to bind it to the process' life time). https://codereview.chromium.org/280303002/diff/20001/content/common/sandbox_l... File content/common/sandbox_linux/sandbox_linux.h (right): https://codereview.chromium.org/280303002/diff/20001/content/common/sandbox_l... content/common/sandbox_linux/sandbox_linux.h:92: __sanitizer_sandbox_arguments* sanitizer_args() const { Since this is a scoped_ptr and we're not doing a trivial accessor, could you make this GetSanitizerArgs() instead? But see the further discussion about ownership of the pointer first. https://codereview.chromium.org/280303002/diff/20001/content/zygote/zygote_ma... File content/zygote/zygote_main_linux.cc (right): https://codereview.chromium.org/280303002/diff/20001/content/zygote/zygote_ma... content/zygote/zygote_main_linux.cc:451: PCHECK(0 == HANDLE_EINTR(fsync(file_fd))) I don't think fsync can EINTR. https://codereview.chromium.org/280303002/diff/20001/content/zygote/zygote_ma... content/zygote/zygote_main_linux.cc:469: << "Sanitizer coverage helper: child failed to close socket";; General remark for all the PCHECKs. Feel free to not put any messages here. These doesn't really happen in practice and line numbers are enough if it does. Your choice.
Some replies below, will follow up with changes tomorrow. https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... File content/zygote/zygote_main_linux.cc (right): https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... content/zygote/zygote_main_linux.cc:501: int sancov_helper_fd = ForkSanitizerCoverageHelper(); On 2014/05/14 17:49:40, jln wrote: > On 2014/05/14 17:00:26, earthdok wrote: > > On 2014/05/13 01:17:35, jln wrote: > > > As described in the bug, the Coverage Helper should be (setuid) sandboxed. > > > > > > Let's move this whole block after EnterLayerOneSandbox(), except for a small > > > open(2) (or creat) to open the file descriptor you need, which should be > right > > > before EnterLayerOneSandbox(). Put it in a base::ScopedFD(). > > > > > > ForkSanitizerCoverageHelper() should take this ScopedFD as an argument > > > (.Pass()), and this will make sure that it gets closed in the parent. > > > > There's just one problem: that would cause any ASan build of Chrome to > > unconditionally create a "zygote.sancov.packed" file in the current directory, > > even if coverage dumping is not enabled. This is not just ugly, it'll probably > > break for a lot of people who run chrome from non-writeable directories. > > > > Sanitizer code knows whether coverage dumping is enabled or not, but Chrome > > doesn't. A possible solution is to use the > __sanitizer_open_file_for_coverage() > > function described above, and have it not open the file if coverage dumping is > > disabled. This is a bit ugly, and would require that function to have two > > different failure modes (because it could also genuinely fail to open the > file), > > but that's the best solution I can come up with. > > It could be a nice property to execute a similar codepath whether or not > coverage dumping is enabled. > > I would simply try to open the file ahead of time. Not CHECK() on failures, the > ScopedFD could hold "-1". The check can instead be in place for when we need to > write to it. That's not a bad idea. The only downside is that we no longer have the errno from open() at the point of the check. But we could print an error message from __sanitizer_open_file_for_coverage(). > Also the current initialization is too early. We should open the coverage right > before EnterLayerOneSandbox (or in ZygotePreSandboxInit()) and spawn the helper > process after EnterLayerOneSandbox() (also ideally we should make sure the init > process in EnterLayerOneSandbox does close the coverage file - we can pass it a > callback, but I can take care of this later - ). > > I think it's ok to move PreinitializeSandbox() right before > EnterLayerOneSandbox(). > > If you really don't want to do the same thing when the sandbox is off and when > the sandbox is on, must_enable_setuid_sandbox tells you reliably what will > happen. > > I think we would deeply regret if the helper process was not (setuid) sandboxed. > There is some attack surface because of signal handlers, but also sandboxing is > great tool to make sure things stay clean and properly engineered. https://codereview.chromium.org/280303002/diff/20001/content/common/sandbox_l... File content/common/sandbox_linux/sandbox_linux.cc (right): https://codereview.chromium.org/280303002/diff/20001/content/common/sandbox_l... content/common/sandbox_linux/sandbox_linux.cc:148: __sanitizer_sandbox_on_notify(sanitizer_args()); On 2014/05/14 17:49:41, jln wrote: > Before the process exits, when the LinuxSandbox singleton gets destroyed, what > happens if we delete the sanitizer_args? > > I would think that this needs to survive for the lifetime of the process, no? In > that case I would .release() here to pass ownership (and *SAN itself can "leak" > it, i.e. pass ownership to the kernel to bind it to the process' life time). In the current implementation, the structure is only read once, in __sanitizer_sandbox_on_notify(). We don't care about it after we leave that function. https://codereview.chromium.org/280303002/diff/20001/content/zygote/zygote_ma... File content/zygote/zygote_main_linux.cc (right): https://codereview.chromium.org/280303002/diff/20001/content/zygote/zygote_ma... content/zygote/zygote_main_linux.cc:451: PCHECK(0 == HANDLE_EINTR(fsync(file_fd))) On 2014/05/14 17:49:41, jln wrote: > I don't think fsync can EINTR. The POSIX standard does mention it: http://pubs.opengroup.org/onlinepubs/009695399/functions/fsync.html The Linux man page doesn't, so I suspect that the Linux implementation differs in this regard, but I'm not 100% sure.
https://codereview.chromium.org/280303002/diff/20001/content/zygote/zygote_ma... File content/zygote/zygote_main_linux.cc (right): https://codereview.chromium.org/280303002/diff/20001/content/zygote/zygote_ma... content/zygote/zygote_main_linux.cc:449: PCHECK(written_size == received_size) You may want to loop here. It can definitely happen in practice that writing to the FS can return with a partial write.
https://codereview.chromium.org/280303002/diff/20001/content/common/sandbox_l... File content/common/sandbox_linux/sandbox_linux.cc (right): https://codereview.chromium.org/280303002/diff/20001/content/common/sandbox_l... content/common/sandbox_linux/sandbox_linux.cc:148: __sanitizer_sandbox_on_notify(sanitizer_args()); On 2014/05/14 19:21:47, earthdok wrote: > On 2014/05/14 17:49:41, jln wrote: > > Before the process exits, when the LinuxSandbox singleton gets destroyed, what > > happens if we delete the sanitizer_args? > > > > I would think that this needs to survive for the lifetime of the process, no? > In > > that case I would .release() here to pass ownership (and *SAN itself can > "leak" > > it, i.e. pass ownership to the kernel to bind it to the process' life time). > > In the current implementation, the structure is only read once, in > __sanitizer_sandbox_on_notify(). We don't care about it after we leave that > function. Ok. Let's have a GetSanitizerArgs() that .Pass() the scoped ptr then. And let's .reset() it after it's passed to __sanitizer_sandbox_on_notify(). This will serve as documentation of how ownership of the pointer works and also guard against any future use of that pointer by mistake.
https://codereview.chromium.org/280303002/diff/20001/content/common/sandbox_l... File content/common/sandbox_linux/sandbox_linux.cc (right): https://codereview.chromium.org/280303002/diff/20001/content/common/sandbox_l... content/common/sandbox_linux/sandbox_linux.cc:148: __sanitizer_sandbox_on_notify(sanitizer_args()); On 2014/05/14 23:26:30, jln wrote: > Ok. Let's have a GetSanitizerArgs() that .Pass() the scoped ptr then. And let's > .reset() it after it's passed to __sanitizer_sandbox_on_notify(). err, no need to .reset() manually, it will happen automatically once this function returns.
https://codereview.chromium.org/280303002/diff/20001/content/zygote/zygote_ma... File content/zygote/zygote_main_linux.cc (right): https://codereview.chromium.org/280303002/diff/20001/content/zygote/zygote_ma... content/zygote/zygote_main_linux.cc:433: char buffer[kSanitizerMaxMessageLength]; This is very large, let's get it off the stack.
https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... File content/zygote/zygote_main_linux.cc (right): https://codereview.chromium.org/280303002/diff/1/content/zygote/zygote_main_l... content/zygote/zygote_main_linux.cc:501: int sancov_helper_fd = ForkSanitizerCoverageHelper(); On 2014/05/14 17:49:40, jln wrote: > On 2014/05/14 17:00:26, earthdok wrote: > > On 2014/05/13 01:17:35, jln wrote: > > > As described in the bug, the Coverage Helper should be (setuid) sandboxed. > > > > > > Let's move this whole block after EnterLayerOneSandbox(), except for a small > > > open(2) (or creat) to open the file descriptor you need, which should be > right > > > before EnterLayerOneSandbox(). Put it in a base::ScopedFD(). > > > > > > ForkSanitizerCoverageHelper() should take this ScopedFD as an argument > > > (.Pass()), and this will make sure that it gets closed in the parent. > > > > There's just one problem: that would cause any ASan build of Chrome to > > unconditionally create a "zygote.sancov.packed" file in the current directory, > > even if coverage dumping is not enabled. This is not just ugly, it'll probably > > break for a lot of people who run chrome from non-writeable directories. > > > > Sanitizer code knows whether coverage dumping is enabled or not, but Chrome > > doesn't. A possible solution is to use the > __sanitizer_open_file_for_coverage() > > function described above, and have it not open the file if coverage dumping is > > disabled. This is a bit ugly, and would require that function to have two > > different failure modes (because it could also genuinely fail to open the > file), > > but that's the best solution I can come up with. > > It could be a nice property to execute a similar codepath whether or not > coverage dumping is enabled. > > I would simply try to open the file ahead of time. Not CHECK() on failures, the > ScopedFD could hold "-1". The check can instead be in place for when we need to > write to it. > > Also the current initialization is too early. We should open the coverage right > before EnterLayerOneSandbox (or in ZygotePreSandboxInit()) and spawn the helper > process after EnterLayerOneSandbox() (also ideally we should make sure the init > process in EnterLayerOneSandbox does close the coverage file - we can pass it a > callback, but I can take care of this later - ). > > I think it's ok to move PreinitializeSandbox() right before > EnterLayerOneSandbox(). > > If you really don't want to do the same thing when the sandbox is off and when > the sandbox is on, must_enable_setuid_sandbox tells you reliably what will > happen. > > I think we would deeply regret if the helper process was not (setuid) sandboxed. > There is some attack surface because of signal handlers, but also sandboxing is > great tool to make sure things stay clean and properly engineered. Done. Note that sanitizer args still have to be filled out before the call to PreinitializeSandbox(), which means the socket pair has to be created early as well. I'm tempted to use ScopedFD everywhere for consistency (including the socket FDs), but it seems like that would make the code a bit more awkward without any apparent security benefit. https://codereview.chromium.org/280303002/diff/20001/content/common/sandbox_l... File content/common/sandbox_linux/sandbox_linux.cc (right): https://codereview.chromium.org/280303002/diff/20001/content/common/sandbox_l... content/common/sandbox_linux/sandbox_linux.cc:121: #endif On 2014/05/14 17:49:41, jln wrote: > Maybe just put it below in the body of the constructor rather than in the > initializer list then? Done. https://codereview.chromium.org/280303002/diff/20001/content/common/sandbox_l... content/common/sandbox_linux/sandbox_linux.cc:148: __sanitizer_sandbox_on_notify(sanitizer_args()); On 2014/05/14 23:26:30, jln wrote: > On 2014/05/14 19:21:47, earthdok wrote: > > On 2014/05/14 17:49:41, jln wrote: > > > Before the process exits, when the LinuxSandbox singleton gets destroyed, > what > > > happens if we delete the sanitizer_args? > > > > > > I would think that this needs to survive for the lifetime of the process, > no? > > In > > > that case I would .release() here to pass ownership (and *SAN itself can > > "leak" > > > it, i.e. pass ownership to the kernel to bind it to the process' life time). > > > > In the current implementation, the structure is only read once, in > > __sanitizer_sandbox_on_notify(). We don't care about it after we leave that > > function. > > Ok. Let's have a GetSanitizerArgs() that .Pass() the scoped ptr then. And let's > .reset() it after it's passed to __sanitizer_sandbox_on_notify(). > > This will serve as documentation of how ownership of the pointer works and also > guard against any future use of that pointer by mistake. Making this accessor .Pass() the pointer is incorrect if we want to keep using it to set structure members in ZygoteMain() and gpu::StartSandboxLinux(). I've added a reset() call. https://codereview.chromium.org/280303002/diff/20001/content/zygote/zygote_ma... File content/zygote/zygote_main_linux.cc (right): https://codereview.chromium.org/280303002/diff/20001/content/zygote/zygote_ma... content/zygote/zygote_main_linux.cc:433: char buffer[kSanitizerMaxMessageLength]; On 2014/05/14 23:42:24, jln wrote: > This is very large, let's get it off the stack. Done. https://codereview.chromium.org/280303002/diff/20001/content/zygote/zygote_ma... content/zygote/zygote_main_linux.cc:449: PCHECK(written_size == received_size) On 2014/05/14 21:21:04, jln wrote: > You may want to loop here. It can definitely happen in practice that writing to > the FS can return with a partial write. Done. https://codereview.chromium.org/280303002/diff/20001/content/zygote/zygote_ma... content/zygote/zygote_main_linux.cc:469: << "Sanitizer coverage helper: child failed to close socket";; On 2014/05/14 17:49:41, jln wrote: > General remark for all the PCHECKs. Feel free to not put any messages here. > These doesn't really happen in practice and line numbers are enough if it does. > > Your choice. Done.
This looks almost ready, outside of recv() returning EOF. We need to think carefully about process life-time: - In the current state of the Zygote, it's ok to "forget" about the new helper process. We can let init reap it. But it shouldn't spin() forever when the socketpair is closed everywhere. - Be careful on the other end (in *SAN) to not block on write() / send() forever if the socketpair is not being consumed by a helper. You could either do a non-blocking send() there, or to a poll() with a reasonable timeout before writing. https://chromiumcodereview.appspot.com/280303002/diff/40001/content/common/sa... File content/common/sandbox_linux/sandbox_linux.cc (right): https://chromiumcodereview.appspot.com/280303002/diff/40001/content/common/sa... content/common/sandbox_linux/sandbox_linux.cc:145: __sanitizer_sandbox_on_notify(sanitizer_args()); You can simply do sanitizer_args_.release() here. https://chromiumcodereview.appspot.com/280303002/diff/40001/content/zygote/zy... File content/zygote/zygote_main_linux.cc (right): https://chromiumcodereview.appspot.com/280303002/diff/40001/content/zygote/zy... content/zygote/zygote_main_linux.cc:415: recv(socket_fd, buffer.get(), kSanitizerMaxMessageLength, 0)); Once the all renderers have exited and the Zygote follows, recv() should return 0 since |socket_fd| will EOF. Currently this process will spin forever. We should simply _exit(0) on received_size == 0 I think. https://chromiumcodereview.appspot.com/280303002/diff/40001/content/zygote/zy... content/zygote/zygote_main_linux.cc:433: static void CreateSanitizerCoverageSocketPair(int fds[2]) { How are you handling this socketpair on the *SAN side? Are you performing a non-blocking send()? You could also fcntl O_NONBLOCK the writing end here for extra safety. I fear some kind of deadlock if, for instance write() on L22 fails and the helper process dies: the socketpair will become full and processes writing to it will block forever. https://chromiumcodereview.appspot.com/280303002/diff/40001/content/zygote/zy... content/zygote/zygote_main_linux.cc:449: // In the parent. Could you add a note along the lines of: - We will never wait for this child. If the child exits, it will be reparented to init(1) once the Zygote exits and reaped. (At some point we may want to account for this child in the Zygote, but for now, keeping all of this hidden is nice). https://chromiumcodereview.appspot.com/280303002/diff/40001/content/zygote/zy... content/zygote/zygote_main_linux.cc:453: #endif // defined(...) https://chromiumcodereview.appspot.com/280303002/diff/40001/content/zygote/zy... content/zygote/zygote_main_linux.cc:491: #endif // defined(ADDRESS_SANITIZER)
On 2014/05/21 01:07:37, jln wrote: > This looks almost ready, outside of recv() returning EOF. > > We need to think carefully about process life-time: > > - In the current state of the Zygote, it's ok to "forget" about the new helper > process. We can let init reap it. But it shouldn't spin() forever when the > socketpair is closed everywhere. It appears that, although non-sandboxed runs behave as you described, in sandboxed runs the helper *does not* spin and in fact terminates before ever receiving a 0 byte msg. I suspect that the SUID sandbox does something to forcefully terminate the process during browser shutdown. If that is the case, I'm concerned that the helper could die before writing the last chunk. That would be most unfortunate. > - Be careful on the other end (in *SAN) to not block on write() / send() forever > if the socketpair is not being consumed by a helper. You could either do a > non-blocking send() there, or to a poll() with a reasonable timeout before > writing. I do it with a simple write(). http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/sanitizer_common/sa... Introducing send() would be an undesirable complication. I'll take a look at poll() though. > https://chromiumcodereview.appspot.com/280303002/diff/40001/content/common/sa... > File content/common/sandbox_linux/sandbox_linux.cc (right): > > https://chromiumcodereview.appspot.com/280303002/diff/40001/content/common/sa... > content/common/sandbox_linux/sandbox_linux.cc:145: > __sanitizer_sandbox_on_notify(sanitizer_args()); > You can simply do sanitizer_args_.release() here. > > https://chromiumcodereview.appspot.com/280303002/diff/40001/content/zygote/zy... > File content/zygote/zygote_main_linux.cc (right): > > https://chromiumcodereview.appspot.com/280303002/diff/40001/content/zygote/zy... > content/zygote/zygote_main_linux.cc:415: recv(socket_fd, buffer.get(), > kSanitizerMaxMessageLength, 0)); > Once the all renderers have exited and the Zygote follows, recv() should return > 0 since |socket_fd| will EOF. > > Currently this process will spin forever. We should simply _exit(0) on > received_size == 0 I think. > > https://chromiumcodereview.appspot.com/280303002/diff/40001/content/zygote/zy... > content/zygote/zygote_main_linux.cc:433: static void > CreateSanitizerCoverageSocketPair(int fds[2]) { > How are you handling this socketpair on the *SAN side? > > Are you performing a non-blocking send()? You could also fcntl O_NONBLOCK the > writing end here for extra safety. > > I fear some kind of deadlock if, for instance write() on L22 fails and the > helper process dies: the socketpair will become full and processes writing to it > will block forever. > > https://chromiumcodereview.appspot.com/280303002/diff/40001/content/zygote/zy... > content/zygote/zygote_main_linux.cc:449: // In the parent. > Could you add a note along the lines of: > > - We will never wait for this child. If the child exits, it will be reparented > to init(1) once the Zygote exits and reaped. > > (At some point we may want to account for this child in the Zygote, but for now, > keeping all of this hidden is nice). > > https://chromiumcodereview.appspot.com/280303002/diff/40001/content/zygote/zy... > content/zygote/zygote_main_linux.cc:453: #endif > // defined(...) > > https://chromiumcodereview.appspot.com/280303002/diff/40001/content/zygote/zy... > content/zygote/zygote_main_linux.cc:491: #endif > // defined(ADDRESS_SANITIZER)
On 2014/05/21 14:38:38, earthdok wrote: > On 2014/05/21 01:07:37, jln wrote: > > This looks almost ready, outside of recv() returning EOF. > > > > We need to think carefully about process life-time: > > > > - In the current state of the Zygote, it's ok to "forget" about the new helper > > process. We can let init reap it. But it shouldn't spin() forever when the > > socketpair is closed everywhere. > It appears that, although non-sandboxed runs behave as you described, in > sandboxed runs the helper *does not* spin and in fact terminates before ever > receiving a 0 byte msg. > > I suspect that the SUID sandbox does something to forcefully terminate the > process during browser shutdown. If that is the case, I'm concerned that the > helper could die before writing the last chunk. That would be most unfortunate. Ahh, right. The reason is that the writing end is actually never closed everywhere: we forgot to close it in the init process spawned by the sandbox. The lifetime of the helper process is tied to that of the PID namespace (just like any renderer will be, so that's not a regression). Once the Zygote exits (after all renderers exit), the init process will exit() and the helper will be killed. You can pass a base::Closure() to the function that starts the init reaper to close both ends of the socketpair there. Another thing to pay attention to in the renderers: I mentioned a blocking write, but SIGPIPE could also happen (and terminate them) if the helper died early (there is a flag in send() to prevent SIGPIPE).
On 2014/05/21 14:50:00, jln wrote: > On 2014/05/21 14:38:38, earthdok wrote: > > On 2014/05/21 01:07:37, jln wrote: > > > This looks almost ready, outside of recv() returning EOF. > > > > > > We need to think carefully about process life-time: > > > > > > - In the current state of the Zygote, it's ok to "forget" about the new > helper > > > process. We can let init reap it. But it shouldn't spin() forever when the > > > socketpair is closed everywhere. > > It appears that, although non-sandboxed runs behave as you described, in > > sandboxed runs the helper *does not* spin and in fact terminates before ever > > receiving a 0 byte msg. > > > > I suspect that the SUID sandbox does something to forcefully terminate the > > process during browser shutdown. If that is the case, I'm concerned that the > > helper could die before writing the last chunk. That would be most > unfortunate. > > Ahh, right. The reason is that the writing end is actually never closed > everywhere: we forgot to close it in the init process spawned by the sandbox. > > The lifetime of the helper process is tied to that of the PID namespace (just > like any renderer will be, so that's not a regression). Once the Zygote exits > (after all renderers exit), the init process will exit() and the helper will be > killed. > > You can pass a base::Closure() to the function that starts the init reaper to > close both ends of the socketpair there. > > Another thing to pay attention to in the renderers: I mentioned a blocking > write, but SIGPIPE could also happen (and terminate them) if the helper died > early (there is a flag in send() to prevent SIGPIPE). To be more clear: if the helper died early and we closed the reading end in the init process properly. Also note that the time-window to have the helper process "spin" on recv is tiny: it will be after the last renderer terminates, but before the main Zygote process (which holds the writing end as well) is reaped by init (which will cause init to exit and the namespace to be destroyed). So even after closing the writing end properly in init(1), you may not be able to reproduce the issue. But it's still true that we should simply _exit() on EOF.
On 2014/05/21 14:50:00, jln wrote: > On 2014/05/21 14:38:38, earthdok wrote: > > On 2014/05/21 01:07:37, jln wrote: > > > This looks almost ready, outside of recv() returning EOF. > > > > > > We need to think carefully about process life-time: > > > > > > - In the current state of the Zygote, it's ok to "forget" about the new > helper > > > process. We can let init reap it. But it shouldn't spin() forever when the > > > socketpair is closed everywhere. > > It appears that, although non-sandboxed runs behave as you described, in > > sandboxed runs the helper *does not* spin and in fact terminates before ever > > receiving a 0 byte msg. > > > > I suspect that the SUID sandbox does something to forcefully terminate the > > process during browser shutdown. If that is the case, I'm concerned that the > > helper could die before writing the last chunk. That would be most > unfortunate. > > Ahh, right. The reason is that the writing end is actually never closed > everywhere: we forgot to close it in the init process spawned by the sandbox. > > The lifetime of the helper process is tied to that of the PID namespace (just > like any renderer will be, so that's not a regression). Once the Zygote exits > (after all renderers exit), the init process will exit() and the helper will be > killed. Right, so the helper could be killed at a point in time where all the renderers have exited but it has not finished processing their messages. That would cause us to lose some coverage data. Looks like we can't avoid having to reap the helper process. > Another thing to pay attention to in the renderers: I mentioned a blocking > write, but SIGPIPE could also happen (and terminate them) if the helper died > early (there is a flag in send() to prevent SIGPIPE). It probably doesn't matter whether coverage dumping errors are fatal or not, since anyway it happens during renderer termination.
On 2014/05/21 17:06:30, earthdok wrote: > Right, so the helper could be killed at a point in time where all the renderers > have exited but it has not finished processing their messages. That would cause > us to lose some coverage data. > Looks like we can't avoid having to reap the helper process. Ok. How about having the Zygote constructor take a std::vector of pid_t to wait for. Then in Zygote::HandleRequestFromBrowser(), we can wait on all of them before exiting. We may want to set-up a "watchdog" thread like I did in child/child_thread.cc (CreateWaitAndExitThread()) to add a reasonable timeout.
On 2014/05/22 20:51:34, jln wrote: > On 2014/05/21 17:06:30, earthdok wrote: > > Right, so the helper could be killed at a point in time where all the > renderers > > have exited but it has not finished processing their messages. That would > cause > > us to lose some coverage data. > > Looks like we can't avoid having to reap the helper process. > > Ok. How about having the Zygote constructor take a std::vector of pid_t to wait > for. > > Then in Zygote::HandleRequestFromBrowser(), we can wait on all of them before > exiting. > > We may want to set-up a "watchdog" thread like I did in child/child_thread.cc > (CreateWaitAndExitThread()) to add a reasonable timeout. PTAL. I've addressed most of the issues, except I don't spawn a watchdog thread. The code that does it needs to be factored out of child/child_thread.cc. Do you have any suggestions as to where to put it? Notes: - Rietveld is unable to recognize my rebase for some reason, so the patchset contains some minor unrelated changes. - the sanitizer-side change has been rolled into Chromium. I committed those changes before you suggested using poll() on the sanitizer side, so currently the code attempts to write to the socket unconditionally. I think we should still land this, without waiting for the next clang roll. Currently the only user of this is CF, which I think would kill any processes which are left hanging, anyway.
https://codereview.chromium.org/280303002/diff/90001/content/zygote/zygote_li... File content/zygote/zygote_linux.cc (left): https://codereview.chromium.org/280303002/diff/90001/content/zygote/zygote_li... content/zygote/zygote_linux.cc:149: // EOF from the browser. We should die. Note to self: call __asan_cov_dump() to dump coverage from the zygote itself at this point.
https://chromiumcodereview.appspot.com/280303002/diff/90001/content/zygote/zy... File content/zygote/zygote_linux.cc (right): https://chromiumcodereview.appspot.com/280303002/diff/90001/content/zygote/zy... content/zygote/zygote_linux.cc:153: for (std::vector<int>::iterator it = extra_fds_.begin(); Would you mind adding a TODO(earthdok) to replace the "note to self"? https://chromiumcodereview.appspot.com/280303002/diff/90001/content/zygote/zy... content/zygote/zygote_linux.cc:154: it < extra_fds_.end(); it++) Style: always use {} for multi-line if/for https://chromiumcodereview.appspot.com/280303002/diff/90001/content/zygote/zy... content/zygote/zygote_linux.cc:154: it < extra_fds_.end(); it++) Nit: ++it is more idiomatic than it++ as it never requires storing a temporary. https://chromiumcodereview.appspot.com/280303002/diff/90001/content/zygote/zy... content/zygote/zygote_linux.cc:158: it < extra_children_.end(); it++) Style: {} https://chromiumcodereview.appspot.com/280303002/diff/90001/content/zygote/zy... content/zygote/zygote_linux.cc:158: it < extra_children_.end(); it++) Nit: ++it https://chromiumcodereview.appspot.com/280303002/diff/90001/content/zygote/zy... content/zygote/zygote_linux.cc:159: PCHECK(*it == HANDLE_EINTR(waitpid(*it, NULL, 0))); I'm a little scared of the implicit requirement that all renderers are exiting at this point. It should work, but it's something that could easily regress. I think we'll really need to add a watchdog thread here for production code. ASAN is the only user of this for now, but I fear something else will use it in the future. How about something such as: #if !defined(ADDRESS_SANITIZER) // TODO(jln): add watchdog thread before using this in non-ASAN builds. CHECK(extra_children.empty()); #endif https://chromiumcodereview.appspot.com/280303002/diff/90001/content/zygote/zy... File content/zygote/zygote_linux.h (right): https://chromiumcodereview.appspot.com/280303002/diff/90001/content/zygote/zy... content/zygote/zygote_linux.h:131: // before the Zygote exits. Let's add more documentation: "The Zygote will perform a blocking wait on these children, so they must be guaranteed to be exiting by the time the Zygote exits.". https://chromiumcodereview.appspot.com/280303002/diff/90001/content/zygote/zy... File content/zygote/zygote_main_linux.cc (right): https://chromiumcodereview.appspot.com/280303002/diff/90001/content/zygote/zy... content/zygote/zygote_main_linux.cc:549: // completes its I/O tasks. Let's add more information about the lifetime of sancov_helper_pid. Something along the lines of: "|sancov_helper_pid| will exit once the last renderer holding the write end of sancov_socket_fds closes it."
https://codereview.chromium.org/280303002/diff/90001/content/zygote/zygote_li... File content/zygote/zygote_linux.cc (right): https://codereview.chromium.org/280303002/diff/90001/content/zygote/zygote_li... content/zygote/zygote_linux.cc:153: for (std::vector<int>::iterator it = extra_fds_.begin(); On 2014/05/27 23:15:00, jln wrote: > Would you mind adding a TODO(earthdok) to replace the "note to self"? I was planning to add the call as part of this CL, I just didn't have the time to do it yesterday. I realize now that __sanitizer_cov_dump() actually closes the file descriptor before returning. Thus if we choose to call it here, we won't need extra_fds_ at all. I'm not sure I'm happy with introducing this sort of implicit dependency (i.e. if __sanitizer_cov_dump forgets to close the FD, we'll be stuck waiting on the helper) but it's probably ok as long as we document it as part of the interface. WDYT? (I'd still need to fix the sanitizer side to close the FD even if coverage is not enabled, so I cannot implement this just yet. Added a TODO) https://codereview.chromium.org/280303002/diff/90001/content/zygote/zygote_li... content/zygote/zygote_linux.cc:154: it < extra_fds_.end(); it++) On 2014/05/27 23:15:00, jln wrote: > Style: always use {} for multi-line if/for Done. https://codereview.chromium.org/280303002/diff/90001/content/zygote/zygote_li... content/zygote/zygote_linux.cc:154: it < extra_fds_.end(); it++) On 2014/05/27 23:15:00, jln wrote: > Nit: ++it is more idiomatic than it++ as it never requires storing a temporary. Done. https://codereview.chromium.org/280303002/diff/90001/content/zygote/zygote_li... content/zygote/zygote_linux.cc:158: it < extra_children_.end(); it++) On 2014/05/27 23:15:00, jln wrote: > Style: {} Done. https://codereview.chromium.org/280303002/diff/90001/content/zygote/zygote_li... content/zygote/zygote_linux.cc:158: it < extra_children_.end(); it++) On 2014/05/27 23:15:00, jln wrote: > Nit: ++it Done. https://codereview.chromium.org/280303002/diff/90001/content/zygote/zygote_li... content/zygote/zygote_linux.cc:159: PCHECK(*it == HANDLE_EINTR(waitpid(*it, NULL, 0))); On 2014/05/27 23:15:00, jln wrote: > I'm a little scared of the implicit requirement that all renderers are exiting > at this point. It should work, but it's something that could easily regress. > > I think we'll really need to add a watchdog thread here for production code. > ASAN is the only user of this for now, but I fear something else will use it in > the future. > > How about something such as: > > #if !defined(ADDRESS_SANITIZER) > // TODO(jln): add watchdog thread before using this in non-ASAN builds. > CHECK(extra_children.empty()); > #endif Like I said, I'm willing to implement the watchdog thread if you can suggest a good home for the watchdog code (which will be shared between this class and SuicideOnChannelErrorFilter). I think it's best to do that in a separate CL, so I've added your suggested CHECK for now. https://codereview.chromium.org/280303002/diff/90001/content/zygote/zygote_li... File content/zygote/zygote_linux.h (right): https://codereview.chromium.org/280303002/diff/90001/content/zygote/zygote_li... content/zygote/zygote_linux.h:131: // before the Zygote exits. On 2014/05/27 23:15:00, jln wrote: > Let's add more documentation: > > "The Zygote will perform a blocking wait on these children, so they must be > guaranteed to be exiting by the time the Zygote exits.". Done. https://codereview.chromium.org/280303002/diff/90001/content/zygote/zygote_ma... File content/zygote/zygote_main_linux.cc (right): https://codereview.chromium.org/280303002/diff/90001/content/zygote/zygote_ma... content/zygote/zygote_main_linux.cc:549: // completes its I/O tasks. On 2014/05/27 23:15:00, jln wrote: > Let's add more information about the lifetime of sancov_helper_pid. > > Something along the lines of: "|sancov_helper_pid| will exit once the last > renderer holding the write end of sancov_socket_fds closes it." Done.
lgtm https://chromiumcodereview.appspot.com/280303002/diff/90001/content/zygote/zy... File content/zygote/zygote_linux.cc (right): https://chromiumcodereview.appspot.com/280303002/diff/90001/content/zygote/zy... content/zygote/zygote_linux.cc:153: for (std::vector<int>::iterator it = extra_fds_.begin(); On 2014/05/28 16:44:51, earthdok wrote: > On 2014/05/27 23:15:00, jln wrote: > > Would you mind adding a TODO(earthdok) to replace the "note to self"? > > I was planning to add the call as part of this CL, I just didn't have the time > to do it yesterday. > > I realize now that __sanitizer_cov_dump() actually closes the file descriptor > before returning. Thus if we choose to call it here, we won't need extra_fds_ at > all. I'm not sure I'm happy with introducing this sort of implicit dependency > (i.e. if __sanitizer_cov_dump forgets to close the FD, we'll be stuck waiting on > the helper) but it's probably ok as long as we document it as part of the > interface. WDYT? Yeah, it would be nice to get rid of extra_fds_. If you can document the behavior very clearly as part of the interface, and make sure that it's unit tested (we could even unit test it in a Chrome unit test), it seems fine. https://chromiumcodereview.appspot.com/280303002/diff/90001/content/zygote/zy... content/zygote/zygote_linux.cc:159: PCHECK(*it == HANDLE_EINTR(waitpid(*it, NULL, 0))); On 2014/05/28 16:44:51, earthdok wrote: > I think it's best to do that in a separate CL, so I've added your suggested > CHECK for now. I wonder if we could simply use base::Watchdog for this. CreateWaitAndExitThread() creates a non-joinable thread, which base::Watchdog wouldn't do, but I don't think this is a big deal. I probably only did that due to an abundance of caution. We could simply leak the Watchdog object. If there is a case for not using base::Watchdog, content/common/wait_and_exit_thread_linux.cc could be a reasonable home in that case, or maybe base/threading/, depending on how generic that reason seems. It would be great if you could take care of it. Feel free to change TODO(jln) to TODO(earthdok) in that case :)
Don't forget to change your commit message please!
The CQ bit was checked by earthdok@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/earthdok@chromium.org/280303002/130001
The CQ bit was checked by earthdok@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/earthdok@chromium.org/280303002/150001
+kbr@ for content/gpu/ review
The CQ bit was unchecked by earthdok@chromium.org
Message was sent while issue was closed.
Committed patchset #7 manually as r274248 (presubmit successful).
Message was sent while issue was closed.
content/gpu lgtm
Message was sent while issue was closed.
https://codereview.chromium.org/280303002/diff/150001/content/zygote/zygote_m... File content/zygote/zygote_main_linux.cc (right): https://codereview.chromium.org/280303002/diff/150001/content/zygote/zygote_m... content/zygote/zygote_main_linux.cc:439: PCHECK(0 == socketpair(AF_UNIX, SOCK_SEQPACKET, 0, fds)); BTW have you considered using base::SyncSocket?
Message was sent while issue was closed.
https://codereview.chromium.org/280303002/diff/150001/content/zygote/zygote_m... File content/zygote/zygote_main_linux.cc (right): https://codereview.chromium.org/280303002/diff/150001/content/zygote/zygote_m... content/zygote/zygote_main_linux.cc:439: PCHECK(0 == socketpair(AF_UNIX, SOCK_SEQPACKET, 0, fds)); Or, better, base::UnixDomainSocket. |