|
|
Created:
8 years, 8 months ago by Chris Evans Modified:
8 years, 8 months ago Reviewers:
Ken Russell (switch to Gerrit), Will Drewry, Kees Cook, Markus (顧孟勤), cevans, jam, apatrick_chromium, Jorge Lucangeli Obes, jln (very slow on Chromium), apatrick, Chris Evans CC:
chromium-reviews, joi+watch-content_chromium.org, darin-cc_chromium.org, apatrick_chromium Base URL:
svn://svn.chromium.org/chrome/trunk/src/ Visibility:
Public. |
DescriptionAdd an initial Linux GPU sandbox using the seccomp filter framework.
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=132266
Patch Set 1 #Patch Set 2 : #Patch Set 3 : #Patch Set 4 : #Patch Set 5 : #Patch Set 6 : #
Total comments: 19
Patch Set 7 : #Patch Set 8 : #
Total comments: 11
Patch Set 9 : #
Total comments: 1
Patch Set 10 : #
Messages
Total messages: 43 (0 generated)
Julien, if you can concentrate on the large new file and get it to lgtm, I'll use jam@ later for an OWNERS lgtm and kbr@ / apatrick@ for the small gpu/ changes.
Ken / Al, can you have a look for the (pretty small!) change to gpu_main.cc ?
Awesome! https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... File content/common/sandbox_init_linux.cc (right): https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... content/common/sandbox_init_linux.cc:54: unsigned int syscall = context->uc_mcontext.gregs[REG_RAX]; I guess this is easier than wrangling asm/siginfo? What other info will end up in the dump? Do you want to prime the crash dump with other data or are you hoping unwinding will cover all that? https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... content/common/sandbox_init_linux.cc:133: static void EmitKill(std::vector<struct sock_filter>* program) { nit: Any reason not to call it EmitLoggedKill? Then if you add a SECCOMP_RET_KILL call, it'll just be EmitKill :) https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... content/common/sandbox_init_linux.cc:143: static void ApplyGPUPolicy(std::vector<struct sock_filter>* program) { Is the plan to tighten this up further later? https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... content/common/sandbox_init_linux.cc:199: } An authoritative check would be: ret = prctl(PR_SET_SECCOMP, 2, NULL); if (ret != 0 && errno == EFAULT) return true; return false; but no_new_privs is a good guard too since it is required. The only ugly bit is that now your CanUse function has side effects :) https://chromiumcodereview.appspot.com/10051022/diff/3009/content/gpu/gpu_mai... File content/gpu/gpu_main.cc (right): https://chromiumcodereview.appspot.com/10051022/diff/3009/content/gpu/gpu_mai... content/gpu/gpu_main.cc:111: if (gpu_info.vendor_id == 0x1002) { // ATI Any chance we can get intel in here?
I am working on a more generic sandbox that will allow us to protect the renderers. But the code isn't quite ready for review yet (most notably, we I don't have support for 32bit, yet). My main problem is that I need to support sendmsg() and SysV IPC. Both of them have interesting security problems. And they are unfortunately not easy to sandbox. Having said all of that, the basic code looks similar to what you are doing. I have two minor comments that might help you: https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... File content/common/sandbox_init_linux.cc (right): https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... content/common/sandbox_init_linux.cc:143: static void ApplyGPUPolicy(std::vector<struct sock_filter>* program) { My ultimate plan for efficient syscall filtering is actually to build a tree that does a binary search. On the other hand, I am not sure if optimizing the way that we test for system calls is actually worth the trouble. Has anybody bothered benchmarking this code? Does the order of the tests show up in benchmarks? https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... content/common/sandbox_init_linux.cc:199: } If you don't mind the minimal cost of doing an extra fork(), a more thorough test looks like this: int Sandbox::supportsSeccompSandbox(int proc_self) { pid_t pid = fork(); if (pid < 0) { die("Failed to check for sandbox support"); } if (!pid) { static const struct sock_filter filter[] = { // If the architecture doesn't match SECCOMP_ARCH, disallow the // system call. BPF_STMT(BPF_LD+BPF_W+BPF_ABS, offsetof(struct arch_seccomp_data, arch)), BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, SECCOMP_ARCH, 0, 3), // Check the system call number. The only allowed call is exit_group() BPF_STMT(BPF_LD+BPF_W+BPF_ABS, offsetof(struct arch_seccomp_data, nr)), BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit_group, 0, 1), BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW), BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL), }; // Try to install filter. If we succeed, return success. const struct sock_fprog prog = { ARRAYSIZE(filter), (struct sock_filter *)filter }; if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) == 0 && prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog) == 0) { syscall(__NR_exit_group, (intptr_t)0); } _exit(1); } int status; TEMP_FAILURE_RETRY(waitpid(pid, &status, 0)); return WIFEXITED(status) && !WEXITSTATUS(status); }
https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... File content/common/sandbox_init_linux.cc (right): https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... content/common/sandbox_init_linux.cc:37: static void CheckSingleThreaded() { This is racy if potentially created threads can create other threads. Could you check /proc/self/stat for the num_threads field instead? https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... content/common/sandbox_init_linux.cc:144: // "Hot" syscalls go first. Looks like a very good start with a positive security impact. We can look into tightening it more later maybe.
LGTM for gpu_main.cc with one suggestion. https://chromiumcodereview.appspot.com/10051022/diff/3009/content/gpu/gpu_mai... File content/gpu/gpu_main.cc (right): https://chromiumcodereview.appspot.com/10051022/diff/3009/content/gpu/gpu_mai... content/gpu/gpu_main.cc:113: (void) base::RandInt(0, 1337); We warm up rand on windows as well. Maybe just do it on all platforms? https://chromiumcodereview.appspot.com/10051022/diff/3009/content/gpu/gpu_mai... content/gpu/gpu_main.cc:128: rand_s(&dummy_rand); ...here
https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... File content/common/sandbox_init_linux.cc (right): https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... content/common/sandbox_init_linux.cc:54: unsigned int syscall = context->uc_mcontext.gregs[REG_RAX]; On 2012/04/12 01:08:56, Will Drewry wrote: > I guess this is easier than wrangling asm/siginfo? Yeah, and it's also the way the Will / Kees example code grabs it. > What other info will end up in the dump? Do you want to prime the crash dump > with other data or are you hoping unwinding will cover all that? Initial plan is that stack trace + syscall_nr should be sufficient to understand and add omitted syscalls. If we get into difficulty, we can actually encode tons more stuff in the deref ptr before we use another channel :) https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... content/common/sandbox_init_linux.cc:133: static void EmitKill(std::vector<struct sock_filter>* program) { On 2012/04/12 01:08:56, Will Drewry wrote: > nit: Any reason not to call it EmitLoggedKill? Then if you add a > SECCOMP_RET_KILL call, it'll just be EmitKill :) Ah nice catch, I did previously used SECCOMP_RET_KILL but then changed to trap. Renamed to just EmitTrap(). https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... content/common/sandbox_init_linux.cc:143: static void ApplyGPUPolicy(std::vector<struct sock_filter>* program) { On 2012/04/12 01:08:56, Will Drewry wrote: > Is the plan to tighten this up further later? Definitely. Once we've flushed out all the corner-case syscalls, we'll start locking down arguments. https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... content/common/sandbox_init_linux.cc:199: } On 2012/04/12 01:08:56, Will Drewry wrote: > An authoritative check would be: > ret = prctl(PR_SET_SECCOMP, 2, NULL); > if (ret != 0 && errno == EFAULT) > return true; > return false; > > but no_new_privs is a good guard too since it is required. The only ugly bit is > that now your CanUse function has side effects :) Thanks! Much cleaner. I moved PR_SET_NEW_PRIVS to InstallFilter to complete the cleanup. https://chromiumcodereview.appspot.com/10051022/diff/3009/content/gpu/gpu_mai... File content/gpu/gpu_main.cc (right): https://chromiumcodereview.appspot.com/10051022/diff/3009/content/gpu/gpu_mai... content/gpu/gpu_main.cc:111: if (gpu_info.vendor_id == 0x1002) { // ATI On 2012/04/12 01:08:56, Will Drewry wrote: > Any chance we can get intel in here? Yes, and imminently, but not for this first patch. I'm going to test each card before throwing it in :) Ideally, someone with a CrOS device + patched kernel could test this. Happy to add a command line flag if it would help.
Very exciting! :) https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... File content/common/sandbox_init_linux.cc (right): https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... content/common/sandbox_init_linux.cc:54: unsigned int syscall = context->uc_mcontext.gregs[REG_RAX]; On 2012/04/12 18:42:49, Chris Evans wrote: > On 2012/04/12 01:08:56, Will Drewry wrote: > > I guess this is easier than wrangling asm/siginfo? > > Yeah, and it's also the way the Will / Kees example code grabs it. Will found a way to include the kernel's siginfo.h sanely, but that'll make things depend on that specific siginfo.h file for builds. I think instead of using the arch-specific REG_RAX, it'd be better to include a copy of the kernel's siginfo structure instead, and just recast the glibc siginfo_t to the new structure. https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... content/common/sandbox_init_linux.cc:92: filter.k = 0; I would use the actual SECCOMP_* defines here and later, instead of literals, with a section at the top with the values defined. That way, once #include <linux/seccomp.h> is possible, the defines can go away easily without needing to check all the code. https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... content/common/sandbox_init_linux.cc:199: } I wonder if the filter should test for failure instead of passing exit 0 cleanly. i.e. verify that the filter actually blocks things rather than just installs. So, reduce it to a single statement (RET_KILL), and check WIFSIGNALED(status) && WTERMSIG(status) == 31
On 2012/04/12 17:51:24, Markus (顧孟勤) wrote: > I am working on a more generic sandbox that will allow us to protect the > renderers. Awesome! The hardest work represented by this patch series is 1) Determining and then locking down a good policy (aka. syscall list). 2) "Warming up" various little subsystems etc. before entering sandbox lockdown, so that we can keep the policy tight. We can re-use all this when we switch over to your generic API. > But the code isn't quite ready for review yet (most notably, we I > don't have support for 32bit, yet). > > My main problem is that I need to support sendmsg() and SysV IPC. Both of them > have interesting security problems. And they are unfortunately not easy to > sandbox. Yeah, jln@, jorje@ and I were chatting about this the other day. Depending on how the renderer <-> browser share stuff, it may need to be rearranged. And we're hoping X won't through a wrench in :-/ > > Having said all of that, the basic code looks similar to what you are doing. I > have two minor comments that might help you: > > https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... > File content/common/sandbox_init_linux.cc (right): > > https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... > content/common/sandbox_init_linux.cc:143: static void > ApplyGPUPolicy(std::vector<struct sock_filter>* program) { > My ultimate plan for efficient syscall filtering is actually to build a tree > that does a binary search. That would be nice! wad@ and I have also chatted about a simple new BPF instruction that permits either a jump table or a data table solution. I think that's preferable as it would save many instructions over even a binary tree approach. > > On the other hand, I am not sure if optimizing the way that we test for system > calls is actually worth the trouble. Has anybody bothered benchmarking this > code? Does the order of the tests show up in benchmarks? > > https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... > content/common/sandbox_init_linux.cc:199: } > If you don't mind the minimal cost of doing an extra fork(), a more thorough > test looks like this: Thanks. I've gone with Will's test for now as I have confidence in it, plus it's simple. > > > int Sandbox::supportsSeccompSandbox(int proc_self) { > pid_t pid = fork(); > if (pid < 0) { > die("Failed to check for sandbox support"); > } > if (!pid) { > static const struct sock_filter filter[] = { > // If the architecture doesn't match SECCOMP_ARCH, disallow the > // system call. > BPF_STMT(BPF_LD+BPF_W+BPF_ABS, offsetof(struct arch_seccomp_data, arch)), > BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, SECCOMP_ARCH, 0, 3), > > // Check the system call number. The only allowed call is exit_group() > BPF_STMT(BPF_LD+BPF_W+BPF_ABS, offsetof(struct arch_seccomp_data, nr)), > BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit_group, 0, 1), > BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW), > BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL), > }; > > // Try to install filter. If we succeed, return success. > const struct sock_fprog prog = { > ARRAYSIZE(filter), > (struct sock_filter *)filter > }; > if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) == 0 && > prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog) == 0) { > syscall(__NR_exit_group, (intptr_t)0); > } > _exit(1); > } > int status; > TEMP_FAILURE_RETRY(waitpid(pid, &status, 0)); > return WIFEXITED(status) && !WEXITSTATUS(status); > }
On Thu, Apr 12, 2012 at 12:01, <cevans@chromium.org> wrote: > Awesome! It'll be a drop-in replacement for the existing Mode-1 seccomp sandbox. I still can't make up my mind whether I'll keep the old sandbox around, or whether I'll delete it and instead install the new one. > The hardest work represented by this patch series is > 1) Determining and then locking down a good policy (aka. syscall list). I obviously don't know about the GPU task, but for the renderer this work has already been done: http://code.google.com/p/seccompsandbox/source/browse/trunk/system_call_table.cc That list has been surprisingly stable. > 2) "Warming up" various little subsystems etc. before entering sandbox > lockdown, so that we can keep the policy tight. Fortunately, for the renderers this work has already been done a long time ago. I think, Windows and Mac had the same problem. > We can re-use all this when we switch over to your generic API. Definitely. One of the things that I need to make easier is a way for different users of the sandbox to select different policies. I have a couple of options already, but this probably needs to be refined a little more, as we get more users. > Yeah, jln@, jorje@ and I were chatting about this the other day. Depending > on > how the renderer <-> browser share stuff, it may need to be rearranged. And > we're hoping X won't through a wrench in :-/ This is tricky. I can change the renderer so that it greatly reduces the number of sendmsg() calls. That's pretty easy to do by falling back on recvmsg() instead. The preference for sendmsg() was only there, because that works a lot more efficiently with the old seccomp sandbox (ping me, if you want the gory details). But it is a single flag that can be tuned one way or the other. Unfortunately, besides our main IPC mechanism, we have a few minor ad-hoc IPC mechanisms that are hard to find/fix. I think, we have an IPC mechanism somewhere, that is a callback into fontconfig. I don't recall the details. AGL probably knows more. And we also had some issues at some point, where NaCl was using sendmsg() for each video frame that they were producing. No idea, whether this is still the case. The upshot is, the renderer is a complex piece of software, and it'll be hard to completely eradicate a system call; and to make sure it doesn't get new users. The good news is that I have solution for sendmsg() that I am reasonably happy with. It's not as good as having kernel support, and it does require a helper process. But other than that, it shouldn't hurt much. SysV IPC is a whole different issue. It is very complex to handle this API safely. It is just as horribly insecure and broken as it looks. Unfortunately, it is needed for MIT Shared Memory, if we want to get fast access to the X server. I should have a working solution in the next day or two. But I am not happy with it in general; very complicated code. If I had one wish for a new kernel API, then it would be the ability to send memory mappings as ancillary data in a call to sendmsg(). That would allow us to make SysV shared memory calls from a helper process, without adding much complexity. > That would be nice! > wad@ and I have also chatted about a simple new BPF instruction that permits > either a jump table or a data table solution. I think that's preferable as > it > would save many instructions over even a binary tree approach. It probably would produce about the same code in the JIT. So, I don't feel it is absolutely required to have this instruction. But it certainly would make user space much cleaner -- at the cost of some complexity in the kernel. I leave it up to Will to decide if he wants to fight this battle. Again, I'd love to see benchmark numbers in order to judge whether any of this effort is actually worth it. Markus
On 2012/04/12 18:56:39, Kees Cook wrote: > Very exciting! :) > > https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... > File content/common/sandbox_init_linux.cc (right): > > https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... > content/common/sandbox_init_linux.cc:54: unsigned int syscall = > context->uc_mcontext.gregs[REG_RAX]; > On 2012/04/12 18:42:49, Chris Evans wrote: > > On 2012/04/12 01:08:56, Will Drewry wrote: > > > I guess this is easier than wrangling asm/siginfo? > > > > Yeah, and it's also the way the Will / Kees example code grabs it. > > Will found a way to include the kernel's siginfo.h sanely, but that'll make > things depend on that specific siginfo.h file for builds. I think instead of > using the arch-specific REG_RAX, it'd be better to include a copy of the > kernel's siginfo structure instead, and just recast the glibc siginfo_t to the > new structure. Chatted with Will and for now we propose to leave it as-is, in case access to other registers becomes desirable (e.g. to work out _which_ fcntl() someone is doing). > > https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... > content/common/sandbox_init_linux.cc:92: filter.k = 0; > I would use the actual SECCOMP_* defines here and later, instead of literals, Yeah, nice idea, done. I also started using SECCOMP_MODE_FILTER instead of the magic value "2". I included linux/audit.h for AUDIT_ARCH_X86_64 because that header file has been around for a while. I also added a define for eventfd2, so older machines can still build. > with a section at the top with the values defined. That way, once #include > <linux/seccomp.h> is possible, the defines can go away easily without needing to > check all the code. > > https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... > content/common/sandbox_init_linux.cc:199: } > I wonder if the filter should test for failure instead of passing exit 0 > cleanly. i.e. verify that the filter actually blocks things rather than just > installs. So, reduce it to a single statement (RET_KILL), and check > WIFSIGNALED(status) && WTERMSIG(status) == 31
On 2012/04/12 19:42:44, Markus (顧孟勤) wrote: > On Thu, Apr 12, 2012 at 12:01, <mailto:cevans@chromium.org> wrote: > > Awesome! > > It'll be a drop-in replacement for the existing Mode-1 seccomp > sandbox. I still can't make up my mind whether I'll keep the old > sandbox around, or whether I'll delete it and instead install the new > one. > > > The hardest work represented by this patch series is > > 1) Determining and then locking down a good policy (aka. syscall list). > > I obviously don't know about the GPU task, but for the renderer this > work has already been done: > http://code.google.com/p/seccompsandbox/source/browse/trunk/system_call_table.cc > That list has been surprisingly stable. > > > 2) "Warming up" various little subsystems etc. before entering sandbox > > lockdown, so that we can keep the policy tight. > > Fortunately, for the renderers this work has already been done a long > time ago. I think, Windows and Mac had the same problem. > > > We can re-use all this when we switch over to your generic API. > > Definitely. One of the things that I need to make easier is a way for > different users of the sandbox to select different policies. I have a > couple of options already, but this probably needs to be refined a > little more, as we get more users. > > > Yeah, jln@, jorje@ and I were chatting about this the other day. Depending > > on > > how the renderer <-> browser share stuff, it may need to be rearranged. And > > we're hoping X won't through a wrench in :-/ > > This is tricky. I can change the renderer so that it greatly reduces > the number of sendmsg() calls. That's pretty easy to do by falling > back on recvmsg() instead. The preference for sendmsg() was only > there, because that works a lot more efficiently with the old seccomp > sandbox (ping me, if you want the gory details). > > But it is a single flag that can be tuned one way or the other. > > Unfortunately, besides our main IPC mechanism, we have a few minor > ad-hoc IPC mechanisms that are hard to find/fix. I think, we have an > IPC mechanism somewhere, that is a callback into fontconfig. I don't > recall the details. AGL probably knows more. Yeah, that's the "Sandbox IPC", which is used to get the renderers to talk to fontconfig. It uses sendmsg/recvmsg so it shouldn't be a problem. http://code.google.com/searchframe#OAMlx_jo-ck/src/content/browser/zygote_mai... > And we also had some issues at some point, where NaCl was using > sendmsg() for each video frame that they were producing. No idea, > whether this is still the case. > > The upshot is, the renderer is a complex piece of software, and it'll > be hard to completely eradicate a system call; and to make sure it > doesn't get new users. > > The good news is that I have solution for sendmsg() that I am > reasonably happy with. It's not as good as having kernel support, and > it does require a helper process. But other than that, it shouldn't > hurt much. > > SysV IPC is a whole different issue. It is very complex to handle this > API safely. It is just as horribly insecure and broken as it looks. > Unfortunately, it is needed for MIT Shared Memory, if we want to get > fast access to the X server. I should have a working solution in the > next day or two. But I am not happy with it in general; very > complicated code. > > If I had one wish for a new kernel API, then it would be the ability > to send memory mappings as ancillary data in a call to sendmsg(). That > would allow us to make SysV shared memory calls from a helper process, > without adding much complexity. > > > That would be nice! > > wad@ and I have also chatted about a simple new BPF instruction that permits > > either a jump table or a data table solution. I think that's preferable as > > it > > would save many instructions over even a binary tree approach. > > It probably would produce about the same code in the JIT. So, I don't > feel it is absolutely required to have this instruction. But it > certainly would make user space much cleaner -- at the cost of some > complexity in the kernel. I leave it up to Will to decide if he wants > to fight this battle. > > Again, I'd love to see benchmark numbers in order to judge whether any > of this effort is actually worth it. > > > Markus
On 2012/04/12 18:36:46, apatrick_chromium wrote: > LGTM for gpu_main.cc with one suggestion. > > https://chromiumcodereview.appspot.com/10051022/diff/3009/content/gpu/gpu_mai... > File content/gpu/gpu_main.cc (right): > > https://chromiumcodereview.appspot.com/10051022/diff/3009/content/gpu/gpu_mai... > content/gpu/gpu_main.cc:113: (void) base::RandInt(0, 1337); > We warm up rand on windows as well. Maybe just do it on all platforms? > > https://chromiumcodereview.appspot.com/10051022/diff/3009/content/gpu/gpu_mai... > content/gpu/gpu_main.cc:128: rand_s(&dummy_rand); > ...here Good call, the reason we warm up rand_s on Windows is because it's used by RandInt() in base/ (and RandInt() is used in turn to create IPC channel identifiers, across all platforms) I've unified the call to RandInt as suggested.
On 2012/04/12 17:54:14, jln wrote: > https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... > File content/common/sandbox_init_linux.cc (right): > > https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... > content/common/sandbox_init_linux.cc:37: static void CheckSingleThreaded() { > This is racy if potentially created threads can create other threads. > > Could you check /proc/self/stat for the num_threads field instead? Using "self" is a great call and also simplifies the code a lot - yippee :) So, done. As discussed over chat, documenting the race via a comment since all we care about is "threads" vs. "no threads" and in addition, this is mainly to trap changes over the development cycles. > > https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... > content/common/sandbox_init_linux.cc:144: // "Hot" syscalls go first. > Looks like a very good start with a positive security impact. > > We can look into tightening it more later maybe.
On Thu, Apr 12, 2012 at 12:52, <jorgelo@chromium.org> wrote: > Yeah, that's the "Sandbox IPC", which is used to get the renderers to talk > to fontconfig. It uses sendmsg/recvmsg so it shouldn't be a problem. sendmsg() is a bit of a problem. On x86_32, it is difficult, because it uses socketcall(), which introduces an extra indirection. On x86_64, we don't have this problem, but we still do have the problem of "struct msghdr" doing way too many different things at once. We want to make sure that we cannot send on an unconnected socket, as that could have all sorts of interesting security implications (e.g. it could probably be used to map your LAN). We also, if at all possible, want to filter the type of ancillary messages. Sending file descriptors is probably safe. But this API just looks like the type of thing that will get richer over time; so, we don't want to allow sending of arbitrary ancillary data which could have unknown security properties. All of this requires checking data that lives in memory rather than in registers. It is thus not accessible from BPF filters. If, on the other hand, we could change the protocol and use recvmsg() inside of the sandbox instead of using sendmsg(), things would be much easier. recvmsg() doesn't have these security problems and can be allowed (almost) unconditionally. Markus
On 2012/04/12 20:54:28, Markus (顧孟勤) wrote: > On Thu, Apr 12, 2012 at 12:52, <mailto:jorgelo@chromium.org> wrote: > > Yeah, that's the "Sandbox IPC", which is used to get the renderers to talk > > to fontconfig. It uses sendmsg/recvmsg so it shouldn't be a problem. > > sendmsg() is a bit of a problem. On x86_32, it is difficult, because > it uses socketcall(), which introduces an extra indirection. On > x86_64, we don't have this problem, but we still do have the problem > of "struct msghdr" doing way too many different things at once. > > We want to make sure that we cannot send on an unconnected socket, as > that could have all sorts of interesting security implications (e.g. > it could probably be used to map your LAN). We also, if at all > possible, want to filter the type of ancillary messages. Sending file > descriptors is probably safe. But this API just looks like the type of > thing that will get richer over time; so, we don't want to allow > sending of arbitrary ancillary data which could have unknown security > properties. > > All of this requires checking data that lives in memory rather than in > registers. It is thus not accessible from BPF filters. > > If, on the other hand, we could change the protocol and use recvmsg() > inside of the sandbox instead of using sendmsg(), things would be much > easier. recvmsg() doesn't have these security problems and can be > allowed (almost) unconditionally. I keep forgetting that we still need to support 32 bits for regular Linux. These are all very good points. I've been going over the code for the sandbox IPC and it does need to pass fd's around. Moreover, since all the renderers share the same socket to communicate with the Sandbox IPC host, each request includes the fd where the reply should be written to. Anyways, I did not mean to completely hijack the thread. I agree that this is a great way to start using seccomp filter + BPF.
On Thu, Apr 12, 2012 at 2:52 PM, <jorgelo@chromium.org> wrote: > On 2012/04/12 20:54:28, Markus (顧孟勤) wrote: > > On Thu, Apr 12, 2012 at 12:52, <mailto:jorgelo@chromium.org> wrote: >> > Yeah, that's the "Sandbox IPC", which is used to get the renderers to >> talk >> > to fontconfig. It uses sendmsg/recvmsg so it shouldn't be a problem. >> > > sendmsg() is a bit of a problem. On x86_32, it is difficult, because >> it uses socketcall(), which introduces an extra indirection. On >> x86_64, we don't have this problem, but we still do have the problem >> of "struct msghdr" doing way too many different things at once. >> > > We want to make sure that we cannot send on an unconnected socket, as >> that could have all sorts of interesting security implications (e.g. >> it could probably be used to map your LAN). We also, if at all >> possible, want to filter the type of ancillary messages. Sending file >> descriptors is probably safe. But this API just looks like the type of >> thing that will get richer over time; so, we don't want to allow >> sending of arbitrary ancillary data which could have unknown security >> properties. >> > > All of this requires checking data that lives in memory rather than in >> registers. It is thus not accessible from BPF filters. >> > > If, on the other hand, we could change the protocol and use recvmsg() >> inside of the sandbox instead of using sendmsg(), things would be much >> easier. recvmsg() doesn't have these security problems and can be >> allowed (almost) unconditionally. >> > > I keep forgetting that we still need to support 32 bits for regular Linux. > These > are all very good points. > > I've been going over the code for the sandbox IPC and it does need to pass > fd's > around. Moreover, since all the renderers share the same socket to > communicate > with the Sandbox IPC host, each request includes the fd where the reply > should > be written to. > > Anyways, I did not mean to completely hijack the thread. I agree that this > is a > great way to start using seccomp filter + BPF. > One of you Linux-y guys want to lgtm so jam@ isn't scared when I approach him for OWNERS lgtm? > https://chromiumcodereview.**appspot.com/10051022/<https://chromiumcodereview... >
lgtm
lgtm
lgtm
Hold on for a minute or two. I am about to give you a little more feedback. On Thu, Apr 12, 2012 at 15:00, Chris Evans <cevans@google.com> wrote: > On Thu, Apr 12, 2012 at 2:52 PM, <jorgelo@chromium.org> wrote: >> >> On 2012/04/12 20:54:28, Markus (顧孟勤) wrote: >> >>> On Thu, Apr 12, 2012 at 12:52, <mailto:jorgelo@chromium.org> wrote: >>> > Yeah, that's the "Sandbox IPC", which is used to get the renderers to >>> > talk >>> > to fontconfig. It uses sendmsg/recvmsg so it shouldn't be a problem. >> >> >>> sendmsg() is a bit of a problem. On x86_32, it is difficult, because >>> it uses socketcall(), which introduces an extra indirection. On >>> x86_64, we don't have this problem, but we still do have the problem >>> of "struct msghdr" doing way too many different things at once. >> >> >>> We want to make sure that we cannot send on an unconnected socket, as >>> that could have all sorts of interesting security implications (e.g. >>> it could probably be used to map your LAN). We also, if at all >>> possible, want to filter the type of ancillary messages. Sending file >>> descriptors is probably safe. But this API just looks like the type of >>> thing that will get richer over time; so, we don't want to allow >>> sending of arbitrary ancillary data which could have unknown security >>> properties. >> >> >>> All of this requires checking data that lives in memory rather than in >>> registers. It is thus not accessible from BPF filters. >> >> >>> If, on the other hand, we could change the protocol and use recvmsg() >>> inside of the sandbox instead of using sendmsg(), things would be much >>> easier. recvmsg() doesn't have these security problems and can be >>> allowed (almost) unconditionally. >> >> >> I keep forgetting that we still need to support 32 bits for regular Linux. >> These >> are all very good points. >> >> I've been going over the code for the sandbox IPC and it does need to pass >> fd's >> around. Moreover, since all the renderers share the same socket to >> communicate >> with the Sandbox IPC host, each request includes the fd where the reply >> should >> be written to. >> >> Anyways, I did not mean to completely hijack the thread. I agree that this >> is a >> great way to start using seccomp filter + BPF. > > > One of you Linux-y guys want to lgtm so jam@ isn't scared when I approach > him for OWNERS lgtm? > >> >> https://chromiumcodereview.appspot.com/10051022/ > >
lgtm Overall, this looks good. There are still a couple of noticeable gaps. But we can close those later. I am wondering whether the location for this code is correct though. Shouldn't it go into src/sandbox/linux/...? If so, you should probably pick some unique directory name, so that we don't run into problems when I am ready to check in the more generic code. That one will likely go into src/sandbox/linux/seccomp2 or possibly src/sandbox/linux/seccomp-bpf https://chromiumcodereview.appspot.com/10051022/diff/12003/content/common/san... File content/common/sandbox_init_linux.cc (right): https://chromiumcodereview.appspot.com/10051022/diff/12003/content/common/san... content/common/sandbox_init_linux.cc:93: filter.k = 4; Instead of hardcoding the offset, you might want to compute it with offsetof() https://chromiumcodereview.appspot.com/10051022/diff/12003/content/common/san... content/common/sandbox_init_linux.cc:112: filter.k = 0; Same here https://chromiumcodereview.appspot.com/10051022/diff/12003/content/common/san... content/common/sandbox_init_linux.cc:127: filter.k = SECCOMP_RET_ALLOW; This generates twice as many instructions as necessary. Not sure if you really worry about this. But you could have one comparison per system call and then one single SECCOMP_RET_ALLOW for the whole group of system calls. You just have to make sure you don't exceed 256 calls. https://chromiumcodereview.appspot.com/10051022/diff/12003/content/common/san... content/common/sandbox_init_linux.cc:132: std::vector<struct sock_filter>* program) { EmitAllowSyscall() is really the same as EmitFailSyscall() with the exception of the exit code. You might want to combine these two functions. https://chromiumcodereview.appspot.com/10051022/diff/12003/content/common/san... content/common/sandbox_init_linux.cc:159: EmitAllowSyscall(__NR_ioctl, program); That sound dangerous. There are way too many things that ioctl() can do. Can you look at the arguments and limit things to something a little saner. https://chromiumcodereview.appspot.com/10051022/diff/12003/content/common/san... content/common/sandbox_init_linux.cc:169: EmitAllowSyscall(__NR_madvise, program); The memory functions (madvise, mmap, munmap, mprotect, mremap, ...) are potentially tricky, if your code relies on the integrity of some of its data. I don't think this is a problem you are currently having. But just something to keep in mind, if you ever plan on extending the sandbox. https://chromiumcodereview.appspot.com/10051022/diff/12003/content/common/san... content/common/sandbox_init_linux.cc:170: EmitAllowSyscall(__NR_sendmsg, program); sendmsg() is potentially dangerous. At the very least, if could be used to map your LAN. But other types of attack are also conceivable. https://chromiumcodereview.appspot.com/10051022/diff/12003/content/common/san... content/common/sandbox_init_linux.cc:176: EmitAllowSyscall(__NR_clone, program); clone() can have surprising semantics. I'd suggest you only allow the specific subset of flags that you actually need to support. https://chromiumcodereview.appspot.com/10051022/diff/12003/content/common/san... content/common/sandbox_init_linux.cc:186: EmitAllowSyscall(__NR_prctl, program); I would strongly encourage you to limit prctl() to just those arguments that you care about. It's a grab-bag of different features that probably shouldn't all be allowed. https://chromiumcodereview.appspot.com/10051022/diff/12003/content/common/san... content/common/sandbox_init_linux.cc:189: EmitAllowSyscall(__NR_restart_syscall, program); Is this actually visible to seccomp/BPF? I thought it was an internal function, but I might very well be mistaken.
https://chromiumcodereview.appspot.com/10051022/diff/12003/content/common/san... File content/common/sandbox_init_linux.cc (right): https://chromiumcodereview.appspot.com/10051022/diff/12003/content/common/san... content/common/sandbox_init_linux.cc:189: EmitAllowSyscall(__NR_restart_syscall, program); On 2012/04/12 22:22:33, Markus (顧孟勤) wrote: > Is this actually visible to seccomp/BPF? I thought it was an internal function, > but I might very well be mistaken. Yes, the kernel puts them on the user stack when it decides to deliver a signal and restart the syscall after delivery for convenience. Like sigreturn, this is something we'll almost always have to include.
I wrote a simple little test program. And no matter what I throw at it, I don't see restart_syscall() from user space nor from the seccomp filters. I don't think it is supposed to be visible. This of course means, you should probably remove it from the list of system calls that you allow. I am a little scared that allowing restart_syscall() could give you some way to bypass the seccomp filters. Markus
On Thu, Apr 12, 2012 at 4:25 PM, Markus Gutschke <markus@chromium.org>wrote: > I wrote a simple little test program. And no matter what I throw at > it, I don't see restart_syscall() from user space nor from the seccomp > filters. I don't think it is supposed to be visible. I saw it when adding a policy to vsftpd. It surprised me too. I think it came from strace'ing a process sleeping in a syscall (perhaps sendfile ?) Cheers Chris > This of course > means, you should probably remove it from the list of system calls > that you allow. > > I am a little scared that allowing restart_syscall() could give you > some way to bypass the seccomp filters. > > > Markus >
strace() output is not necessarily to be trusted in this situation. For instance, strace can occasionally see ERESTART, which I believe is otherwise invisible to userspace. If you have a reproducible way to generate a restart_syscall(), I would love to know, so that I can experiment with it. Markus On Thu, Apr 12, 2012 at 16:28, Chris Evans <cevans@google.com> wrote: > On Thu, Apr 12, 2012 at 4:25 PM, Markus Gutschke <markus@chromium.org> > wrote: >> >> I wrote a simple little test program. And no matter what I throw at >> it, I don't see restart_syscall() from user space nor from the seccomp >> filters. I don't think it is supposed to be visible. > > > I saw it when adding a policy to vsftpd. It surprised me too. I think it > came from strace'ing a process sleeping in a syscall (perhaps sendfile ?) > > > Cheers > Chris > >> >> This of course >> means, you should probably remove it from the list of system calls >> that you allow. >> >> I am a little scared that allowing restart_syscall() could give you >> some way to bypass the seccomp filters. >> >> >> Markus > >
On Thu, Apr 12, 2012 at 4:34 PM, Markus Gutschke <markus@chromium.org>wrote: > strace() output is not necessarily to be trusted in this situation. > For instance, strace can occasionally see ERESTART, which I believe is > otherwise invisible to userspace. > My program died and the kernel logged a violation. > > If you have a reproducible way to generate a restart_syscall(), I > would love to know, so that I can experiment with it. > Try a interrupting a sleeping sendfile perhaps? > > > Markus > > On Thu, Apr 12, 2012 at 16:28, Chris Evans <cevans@google.com> wrote: > > On Thu, Apr 12, 2012 at 4:25 PM, Markus Gutschke <markus@chromium.org> > > wrote: > >> > >> I wrote a simple little test program. And no matter what I throw at > >> it, I don't see restart_syscall() from user space nor from the seccomp > >> filters. I don't think it is supposed to be visible. > > > > > > I saw it when adding a policy to vsftpd. It surprised me too. I think it > > came from strace'ing a process sleeping in a syscall (perhaps sendfile ?) > > > > > > Cheers > > Chris > > > >> > >> This of course > >> means, you should probably remove it from the list of system calls > >> that you allow. > >> > >> I am a little scared that allowing restart_syscall() could give you > >> some way to bypass the seccomp filters. > >> > >> > >> Markus > > > > >
On 2012/04/12 22:22:33, Markus (顧孟勤) wrote: > lgtm > > Overall, this looks good. There are still a couple of noticeable gaps. But we > can close those later. > > I am wondering whether the location for this code is correct though. Shouldn't > it go into src/sandbox/linux/...? If so, you should probably pick some unique > directory name, so that we don't run into problems when I am ready to check in > the more generic code. That one will likely go into src/sandbox/linux/seccomp2 > or possibly src/sandbox/linux/seccomp-bpf I think longer term, the policy should live with the code it applies to (e.g. content/ ) and the mechanism should live in sandbox. We'll get there when you flip over to using your generic API in the future. > > https://chromiumcodereview.appspot.com/10051022/diff/12003/content/common/san... > File content/common/sandbox_init_linux.cc (right): > > https://chromiumcodereview.appspot.com/10051022/diff/12003/content/common/san... > content/common/sandbox_init_linux.cc:93: filter.k = 4; > Instead of hardcoding the offset, you might want to compute it with offsetof() Do you mind if we defer for now? I'm trying to keep copies of structures out. I'll certainly > > https://chromiumcodereview.appspot.com/10051022/diff/12003/content/common/san... > content/common/sandbox_init_linux.cc:112: filter.k = 0; > Same here > > https://chromiumcodereview.appspot.com/10051022/diff/12003/content/common/san... > content/common/sandbox_init_linux.cc:127: filter.k = SECCOMP_RET_ALLOW; > This generates twice as many instructions as necessary. > > Not sure if you really worry about this. But you could have one comparison per > system call and then one single SECCOMP_RET_ALLOW for the whole group of system > calls. I prefer to leave it as-is now, for the sake of simplicity. Otherwise, I might need to know the length of the program when generating jumps, etc. > > You just have to make sure you don't exceed 256 calls. > > https://chromiumcodereview.appspot.com/10051022/diff/12003/content/common/san... > content/common/sandbox_init_linux.cc:132: std::vector<struct sock_filter>* > program) { > EmitAllowSyscall() is really the same as EmitFailSyscall() with the exception of > the exit code. You might want to combine these two functions. Yeah, I'll factor out this commonality (and some other commonality I see) and make it cleaner. Good idea. I'll do it before I land it. > > https://chromiumcodereview.appspot.com/10051022/diff/12003/content/common/san... > content/common/sandbox_init_linux.cc:159: EmitAllowSyscall(__NR_ioctl, program); > That sound dangerous. There are way too many things that ioctl() can do. Can you > look at the arguments and limit things to something a little saner. Julien and Will were raising questions about argument checking earlier. It's intended that we revisit this once we've got a stable policy that works. We will then lock it down to the extent possible. (Will also raised MAP_EXEC which unfortunately we need for the case LLVM kicks it when a card lacks a certain shader support :) I'm not too worried about the specific ioctl() case. There are very few generic file ioctls, and the policy doesn't permit opening of other special files. And fd 0/1/2 are never terminal fds so all the horrible terminal ioctl()s are in play. That said, we can try and lock it down in the future. > > https://chromiumcodereview.appspot.com/10051022/diff/12003/content/common/san... > content/common/sandbox_init_linux.cc:169: EmitAllowSyscall(__NR_madvise, > program); > The memory functions (madvise, mmap, munmap, mprotect, mremap, ...) are > potentially tricky, if your code relies on the integrity of some of its data. I > don't think this is a problem you are currently having. But just something to > keep in mind, if you ever plan on extending the sandbox. > > https://chromiumcodereview.appspot.com/10051022/diff/12003/content/common/san... > content/common/sandbox_init_linux.cc:170: EmitAllowSyscall(__NR_sendmsg, > program); > sendmsg() is potentially dangerous. At the very least, if could be used to map > your LAN. But other types of attack are also conceivable. I don't think the process has any open network fd's (other than local and connected socks from socketpairs). Neither does the policy permit socket(), thank goodness. So I doubt an attack can be mounted. > > https://chromiumcodereview.appspot.com/10051022/diff/12003/content/common/san... > content/common/sandbox_init_linux.cc:176: EmitAllowSyscall(__NR_clone, program); > clone() can have surprising semantics. I'd suggest you only allow the specific > subset of flags that you actually need to support. > > https://chromiumcodereview.appspot.com/10051022/diff/12003/content/common/san... > content/common/sandbox_init_linux.cc:186: EmitAllowSyscall(__NR_prctl, program); > I would strongly encourage you to limit prctl() to just those arguments that you > care about. It's a grab-bag of different features that probably shouldn't all be > allowed. Yeah, definitely on the TODO list. > > https://chromiumcodereview.appspot.com/10051022/diff/12003/content/common/san... > content/common/sandbox_init_linux.cc:189: EmitAllowSyscall(__NR_restart_syscall, > program); > Is this actually visible to seccomp/BPF? I thought it was an internal function, > but I might very well be mistaken.
On Thu, Apr 12, 2012 at 16:52, <cevans@chromium.org> wrote: > Do you mind if we defer for now? I'm trying to keep copies of structures > out. If you think this code is eventually going to go away anyway, I don't mind too much. I think, having hard-coded constants makes the code a little less readable. But that's your call. In my case, I opted to define "struct arch_seccomp_data", which looks exactly like the official "struct seccomp_data". But it doesn't cause compilation failures, independent of whether the system headers define the struct. > I prefer to leave it as-is now, for the sake of simplicity. Otherwise, I > might > need to know the length of the program when generating jumps, etc. It's not particularly complicated to automatically compute these jumps. But again, if you ultimately expect this code to go away in favor of the more generic sandbox, then removing these instructions sounds like pre-mature optimization. It probably is more important to have some code that is easy to understand and that allows you to start experimenting with. Maybe, just add a comment that this is known to be sub-optimal, but it doesn't matter at this time. > Julien and Will were raising questions about argument checking earlier. It's > intended that we revisit this once we've got a stable policy that works. We > will then lock it down to the extent possible. ioctl() has been the source of numerous kernel bugs, and I am worried about crazy super-permissive ioctl()'s in graphics drivers. But there is probably not a lot we can do there. And you are correct that there are not that many generic ioctl()s. > I don't think the process has any open network fd's (other than local and > connected socks from socketpairs). Neither does the policy permit socket(), > thank goodness. So I doubt an attack can be mounted. Socketpair is probably OK, as it creates a UNIX domain socket, which should be safe. But any INET socket would be dangerous.
On Thu, Apr 12, 2012 at 16:36, Chris Evans <cevans@google.com> wrote: > On Thu, Apr 12, 2012 at 4:34 PM, Markus Gutschke <markus@chromium.org> > wrote: >> >> strace() output is not necessarily to be trusted in this situation. >> For instance, strace can occasionally see ERESTART, which I believe is >> otherwise invisible to userspace. > > > My program died and the kernel logged a violation. Are you sure you didn't see the result of a call to sigreturn() or rt_sigreturn(). I don't think you are currently white-listing both of them. And on x86-32 you could definitely see sigreturn() instead of rt_sigreturn() (this depends on the sa_flags that you gave in sigaction()). Markus
On Thu, Apr 12, 2012 at 5:08 PM, Markus Gutschke <markus@chromium.org>wrote: > On Thu, Apr 12, 2012 at 16:36, Chris Evans <cevans@google.com> wrote: > > On Thu, Apr 12, 2012 at 4:34 PM, Markus Gutschke <markus@chromium.org> > > wrote: > >> > >> strace() output is not necessarily to be trusted in this situation. > >> For instance, strace can occasionally see ERESTART, which I believe is > >> otherwise invisible to userspace. > > > > > > My program died and the kernel logged a violation. > > Are you sure you didn't see the result of a call to sigreturn() or > rt_sigreturn(). I don't think you are currently white-listing both of > them. vsftpd-3.0.0, seccompsandbox.c /* Misc simple low-risk calls. */ allow_nr(__NR_rt_sigreturn); /* Used to handle SIGPIPE. */ allow_nr(__NR_restart_syscall); allow_nr(__NR_close); Do you want me to retry it? > And on x86-32 you could definitely see sigreturn() instead of > rt_sigreturn() (this depends on the sa_flags that you gave in > sigaction()). > > > Markus >
That would help. I am trying to reproduce, but I don't have any luck yet. Markus On Thu, Apr 12, 2012 at 17:11, Chris Evans <cevans@google.com> wrote: > On Thu, Apr 12, 2012 at 5:08 PM, Markus Gutschke <markus@chromium.org> > wrote: >> >> On Thu, Apr 12, 2012 at 16:36, Chris Evans <cevans@google.com> wrote: >> > On Thu, Apr 12, 2012 at 4:34 PM, Markus Gutschke <markus@chromium.org> >> > wrote: >> >> >> >> strace() output is not necessarily to be trusted in this situation. >> >> For instance, strace can occasionally see ERESTART, which I believe is >> >> otherwise invisible to userspace. >> > >> > >> > My program died and the kernel logged a violation. >> >> Are you sure you didn't see the result of a call to sigreturn() or >> rt_sigreturn(). I don't think you are currently white-listing both of >> them. > > > vsftpd-3.0.0, seccompsandbox.c > /* Misc simple low-risk calls. */ > allow_nr(__NR_rt_sigreturn); /* Used to handle SIGPIPE. */ > allow_nr(__NR_restart_syscall); > allow_nr(__NR_close); > > Do you want me to retry it? > >> >> And on x86-32 you could definitely see sigreturn() instead of >> rt_sigreturn() (this depends on the sa_flags that you gave in >> sigaction()). >> >> >> Markus > >
On Thu, Apr 12, 2012 at 5:12 PM, Markus Gutschke <markus@chromium.org>wrote: > That would help. I am trying to reproduce, but I don't have any luck yet. > [ 3899.008325] type=1701 audit(1334276389.054:3392): auid=4294967295 uid=115 gid=125 ses=4294967295 pid=5411 comm="vsftpd" reason="seccomp" sig=31 syscall=219 compat=0 ip=0x7f1d9bff7020 code=0x0 #define __NR_restart_syscall 219 - vsftpd-3.0.0 - comment out " allow_nr(__NR_restart_syscall);" - run a simple anonymous setup, but with: trans_chunk_size=4096 anon_max_rate=1000 ... in /etc/vsftpd.conf - start the transfer of a large (some MB) file - strace the process running as "ftp" (this will probably interrupt sendfile) - bang I also tried with "use_sendfile=NO" and still bang so it seems to work for many forms of blocking read. > > Markus > > On Thu, Apr 12, 2012 at 17:11, Chris Evans <cevans@google.com> wrote: > > On Thu, Apr 12, 2012 at 5:08 PM, Markus Gutschke <markus@chromium.org> > > wrote: > >> > >> On Thu, Apr 12, 2012 at 16:36, Chris Evans <cevans@google.com> wrote: > >> > On Thu, Apr 12, 2012 at 4:34 PM, Markus Gutschke <markus@chromium.org > > > >> > wrote: > >> >> > >> >> strace() output is not necessarily to be trusted in this situation. > >> >> For instance, strace can occasionally see ERESTART, which I believe > is > >> >> otherwise invisible to userspace. > >> > > >> > > >> > My program died and the kernel logged a violation. > >> > >> Are you sure you didn't see the result of a call to sigreturn() or > >> rt_sigreturn(). I don't think you are currently white-listing both of > >> them. > > > > > > vsftpd-3.0.0, seccompsandbox.c > > /* Misc simple low-risk calls. */ > > allow_nr(__NR_rt_sigreturn); /* Used to handle SIGPIPE. */ > > allow_nr(__NR_restart_syscall); > > allow_nr(__NR_close); > > > > Do you want me to retry it? > > > >> > >> And on x86-32 you could definitely see sigreturn() instead of > >> rt_sigreturn() (this depends on the sa_flags that you gave in > >> sigaction()). > >> > >> > >> Markus > > > > >
On Thu, Apr 12, 2012 at 5:24 PM, Chris Evans <cevans@google.com> wrote: > On Thu, Apr 12, 2012 at 5:12 PM, Markus Gutschke <markus@chromium.org>wrote: > >> That would help. I am trying to reproduce, but I don't have any luck yet. >> > > [ 3899.008325] type=1701 audit(1334276389.054:3392): auid=4294967295 > uid=115 gid=125 ses=4294967295 pid=5411 comm="vsftpd" reason="seccomp" > sig=31 syscall=219 compat=0 ip=0x7f1d9bff7020 code=0x0 > #define __NR_restart_syscall 219 > > - vsftpd-3.0.0 > - comment out " allow_nr(__NR_restart_syscall);" > - run a simple anonymous setup, but with: > trans_chunk_size=4096 > anon_max_rate=1000 > ... in /etc/vsftpd.conf > - start the transfer of a large (some MB) file > - strace the process running as "ftp" (this will probably interrupt > sendfile) > - bang > > I also tried with "use_sendfile=NO" and still bang so it seems to work for > many forms of blocking read. > Ah, Julien points out (over chat) that it could be nanosleep() that is interrupted. anon_max_rate causes vsftpd to sleep a lot :) > > >> >> Markus >> >> On Thu, Apr 12, 2012 at 17:11, Chris Evans <cevans@google.com> wrote: >> > On Thu, Apr 12, 2012 at 5:08 PM, Markus Gutschke <markus@chromium.org> >> > wrote: >> >> >> >> On Thu, Apr 12, 2012 at 16:36, Chris Evans <cevans@google.com> wrote: >> >> > On Thu, Apr 12, 2012 at 4:34 PM, Markus Gutschke < >> markus@chromium.org> >> >> > wrote: >> >> >> >> >> >> strace() output is not necessarily to be trusted in this situation. >> >> >> For instance, strace can occasionally see ERESTART, which I believe >> is >> >> >> otherwise invisible to userspace. >> >> > >> >> > >> >> > My program died and the kernel logged a violation. >> >> >> >> Are you sure you didn't see the result of a call to sigreturn() or >> >> rt_sigreturn(). I don't think you are currently white-listing both of >> >> them. >> > >> > >> > vsftpd-3.0.0, seccompsandbox.c >> > /* Misc simple low-risk calls. */ >> > allow_nr(__NR_rt_sigreturn); /* Used to handle SIGPIPE. */ >> > allow_nr(__NR_restart_syscall); >> > allow_nr(__NR_close); >> > >> > Do you want me to retry it? >> > >> >> >> >> And on x86-32 you could definitely see sigreturn() instead of >> >> rt_sigreturn() (this depends on the sa_flags that you gave in >> >> sigaction()). >> >> >> >> >> >> Markus >> > >> > >> > >
On Thu, Apr 12, 2012 at 5:25 PM, Chris Evans <cevans@google.com> wrote: > On Thu, Apr 12, 2012 at 5:24 PM, Chris Evans <cevans@google.com> wrote: >> >> On Thu, Apr 12, 2012 at 5:12 PM, Markus Gutschke <markus@chromium.org> >> wrote: >>> >>> That would help. I am trying to reproduce, but I don't have any luck yet. >> >> >> [ 3899.008325] type=1701 audit(1334276389.054:3392): auid=4294967295 >> uid=115 gid=125 ses=4294967295 pid=5411 comm="vsftpd" reason="seccomp" >> sig=31 syscall=219 compat=0 ip=0x7f1d9bff7020 code=0x0 >> #define __NR_restart_syscall 219 >> >> - vsftpd-3.0.0 >> - comment out " allow_nr(__NR_restart_syscall);" >> - run a simple anonymous setup, but with: >> trans_chunk_size=4096 >> anon_max_rate=1000 >> ... in /etc/vsftpd.conf >> - start the transfer of a large (some MB) file >> - strace the process running as "ftp" (this will probably interrupt >> sendfile) >> - bang >> >> I also tried with "use_sendfile=NO" and still bang so it seems to work for >> many forms of blocking read. > > > Ah, Julien points out (over chat) that it could be nanosleep() that is > interrupted. anon_max_rate causes vsftpd to sleep a lot :) I think the key is that no userland signal handler should be called for this mechanism to trigger (otherwise the proper mechanism is EINTR). I can't anything right now since I didn't have time to setup an environment with Will's patch yet. jln
Yes, Julien figured it out. If you are stuck in a call to nanosleep() and receive SIGCONT, this does in fact trigger restart_syscall(). Now, the big question is whether restart_syscall() is safe or whether it allows an attacker to do something they shouldn't be able to do. I leave that question up to the kernel experts. Markus
On 2012/04/13 00:35:33, Markus (顧孟勤) wrote: > Yes, Julien figured it out. If you are stuck in a call to nanosleep() > and receive SIGCONT, this does in fact trigger restart_syscall(). Now, > the big question is whether restart_syscall() is safe or whether it > allows an attacker to do something they shouldn't be able to do. I > leave that question up to the kernel experts. restart_syscall runs a restart_block function: http://lxr.linux.no/linux+v3.3.1/kernel/signal.c#L2479 http://lxr.linux.no/linux+v3.3.1/include/linux/thread_info.h#L18 Usually this is defined as "do_no_restart" which just returns -EINTR: http://lxr.linux.no/linux+v3.3.1/kernel/signal.c#L2485 By default, arches init the value to do_no_restart... during thread_info setup: http://lxr.linux.no/linux+v3.3.1/arch/x86/include/asm/thread_info.h#L56 Timers like nanosleep tend to want to be restartable: http://lxr.linux.no/linux+v3.3.1/kernel/posix-cpu-timers.c#L1490 cheers, will
http://codereview.chromium.org/10051022/diff/14004/content/public/common/sand... File content/public/common/sandbox_init.h (right): http://codereview.chromium.org/10051022/diff/14004/content/public/common/sand... content/public/common/sandbox_init.h:50: CONTENT_EXPORT void InitializeSandbox(); since this isn't called by outside content, it doesn't need to be exposed in the public api. can you just put this in a sandbox_init_linux.h beside the new cc file?
On Thu, Apr 12, 2012 at 10:57 PM, <jam@chromium.org> wrote: > > http://codereview.chromium.**org/10051022/diff/14004/** > content/public/common/sandbox_**init.h<http://codereview.chromium.org/10051022/diff/14004/content/public/common/sandbox_init.h> > File content/public/common/sandbox_**init.h (right): > > http://codereview.chromium.**org/10051022/diff/14004/** > content/public/common/sandbox_**init.h#newcode50<http://codereview.chromium.org/10051022/diff/14004/content/public/common/sandbox_init.h#newcode50> > content/public/common/sandbox_**init.h:50: CONTENT_EXPORT void > InitializeSandbox(); > since this isn't called by outside content, it doesn't need to be > exposed in the public api. can you just put this in a > sandbox_init_linux.h beside the new cc file? > Ok. Any idea why it's CONTENT_EXPORT for all the other platforms? > http://codereview.chromium.**org/10051022/<http://codereview.chromium.org/100... >
On 2012/04/13 06:34:18, cevans wrote: > On Thu, Apr 12, 2012 at 10:57 PM, <mailto:jam@chromium.org> wrote: > > > > > http://codereview.chromium.**org/10051022/diff/14004/** > > > content/public/common/sandbox_**init.h<http://codereview.chromium.org/10051022/diff/14004/content/public/common/sandbox_init.h> > > File content/public/common/sandbox_**init.h (right): > > > > http://codereview.chromium.**org/10051022/diff/14004/** > > > content/public/common/sandbox_**init.h#newcode50<http://codereview.chromium.org/10051022/diff/14004/content/public/common/sandbox_init.h#newcode50> > > content/public/common/sandbox_**init.h:50: CONTENT_EXPORT void > > InitializeSandbox(); > > since this isn't called by outside content, it doesn't need to be > > exposed in the public api. can you just put this in a > > sandbox_init_linux.h beside the new cc file? > > > > Ok. Any idea why it's CONTENT_EXPORT for all the other platforms? > the other ones are called from chrome now http://code.google.com/p/chromium/source/search?q=%22InitializeSandbox%28%22+...
On Thu, Apr 12, 2012 at 11:37 PM, <jam@chromium.org> wrote: > On 2012/04/13 06:34:18, cevans wrote: > >> On Thu, Apr 12, 2012 at 10:57 PM, <mailto:jam@chromium.org> wrote: >> > > > >> > http://codereview.chromium.****org/10051022/diff/14004/** >> > >> > > content/public/common/sandbox_****init.h<http://codereview.** > chromium.org/10051022/diff/**14004/content/public/common/**sandbox_init.h<http://codereview.chromium.org/10051022/diff/14004/content/public/common/sandbox_init.h> > > > >> > File content/public/common/sandbox_****init.h (right): >> > >> > http://codereview.chromium.****org/10051022/diff/14004/** >> > >> > > content/public/common/sandbox_****init.h#newcode50<http://** > codereview.chromium.org/**10051022/diff/14004/content/** > public/common/sandbox_init.h#**newcode50<http://codereview.chromium.org/10051022/diff/14004/content/public/common/sandbox_init.h#newcode50> > > > >> > content/public/common/sandbox_****init.h:50: CONTENT_EXPORT void >> >> > InitializeSandbox(); >> > since this isn't called by outside content, it doesn't need to be >> > exposed in the public api. can you just put this in a >> > sandbox_init_linux.h beside the new cc file? >> > >> > > Ok. Any idea why it's CONTENT_EXPORT for all the other platforms? >> > > the other ones are called from chrome now > http://code.google.com/p/**chromium/source/search?q=%** > 22InitializeSandbox%28%22+**file%3Achrome&origq=%** > 22InitializeSandbox%28%22+**file%3Achrome&btnG=Search+**Trunk<http://code.google.com/p/chromium/source/search?q=%22InitializeSandbox%28%22+file%3Achrome&origq=%22InitializeSandbox%28%22+file%3Achrome&btnG=Search+Trunk> > > http://codereview.chromium.**org/10051022/<http://codereview.chromium.org/100... > Yeah, I just found the reference from chrome/nacl -- for the Windows port. I think we might want to have a similar setup for Linux in the future? No reason to not lock down nacl further on all platforms :)
On 2012/04/13 06:47:12, cevans wrote: > On Thu, Apr 12, 2012 at 11:37 PM, <mailto:jam@chromium.org> wrote: > > > On 2012/04/13 06:34:18, cevans wrote: > > > >> On Thu, Apr 12, 2012 at 10:57 PM, <mailto:jam@chromium.org> wrote: > >> > > > > > > >> > http://codereview.chromium.****org/10051022/diff/14004/** > >> > > >> > > > > content/public/common/sandbox_****init.h<http://codereview.** > > > chromium.org/10051022/diff/**14004/content/public/common/**sandbox_init.h<http://codereview.chromium.org/10051022/diff/14004/content/public/common/sandbox_init.h> > > > > > > >> > File content/public/common/sandbox_****init.h (right): > >> > > >> > http://codereview.chromium.****org/10051022/diff/14004/** > >> > > >> > > > > content/public/common/sandbox_****init.h#newcode50<http://** > > codereview.chromium.org/**10051022/diff/14004/content/** > > > public/common/sandbox_init.h#**newcode50<http://codereview.chromium.org/10051022/diff/14004/content/public/common/sandbox_init.h#newcode50> > > > > > > >> > content/public/common/sandbox_****init.h:50: CONTENT_EXPORT void > >> > >> > InitializeSandbox(); > >> > since this isn't called by outside content, it doesn't need to be > >> > exposed in the public api. can you just put this in a > >> > sandbox_init_linux.h beside the new cc file? > >> > > >> > > > > Ok. Any idea why it's CONTENT_EXPORT for all the other platforms? > >> > > > > the other ones are called from chrome now > > http://code.google.com/p/**chromium/source/search?q=%25** > > 22InitializeSandbox%28%22+**file%3Achrome&origq=%** > > > 22InitializeSandbox%28%22+**file%3Achrome&btnG=Search+**Trunk<http://code.google.com/p/chromium/source/search?q=%22InitializeSandbox%28%22+file%3Achrome&origq=%22InitializeSandbox%28%22+file%3Achrome&btnG=Search+Trunk> > > > > > http://codereview.chromium.**org/10051022/%3Chttp://codereview.chromium.org/1...> > > > > Yeah, I just found the reference from chrome/nacl -- for the Windows port. > I think we might want to have a similar setup for Linux in the future? No > reason to not lock down nacl further on all platforms :) ok, if you're planning on using this for nacl, lgtm
On 2012/04/12 01:08:56, Will Drewry wrote: > Awesome! > > https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... > File content/common/sandbox_init_linux.cc (right): > > https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... > content/common/sandbox_init_linux.cc:54: unsigned int syscall = > context->uc_mcontext.gregs[REG_RAX]; > I guess this is easier than wrangling asm/siginfo? > What other info will end up in the dump? Do you want to prime the crash dump > with other data or are you hoping unwinding will cover all that? > > https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... > content/common/sandbox_init_linux.cc:133: static void > EmitKill(std::vector<struct sock_filter>* program) { > nit: Any reason not to call it EmitLoggedKill? Then if you add a > SECCOMP_RET_KILL call, it'll just be EmitKill :) > > https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... > content/common/sandbox_init_linux.cc:143: static void > ApplyGPUPolicy(std::vector<struct sock_filter>* program) { > Is the plan to tighten this up further later? > > https://chromiumcodereview.appspot.com/10051022/diff/3009/content/common/sand... > content/common/sandbox_init_linux.cc:199: } > An authoritative check would be: > ret = prctl(PR_SET_SECCOMP, 2, NULL); > if (ret != 0 && errno == EFAULT) > return true; > return false; > > but no_new_privs is a good guard too since it is required. The only ugly bit is > that now your CanUse function has side effects :) > > https://chromiumcodereview.appspot.com/10051022/diff/3009/content/gpu/gpu_mai... > File content/gpu/gpu_main.cc (right): > > https://chromiumcodereview.appspot.com/10051022/diff/3009/content/gpu/gpu_mai... > content/gpu/gpu_main.cc:111: if (gpu_info.vendor_id == 0x1002) { // ATI > Any chance we can get intel in here? Just an update for completeness: I'm going to land for all graphics cards. I got my hand on an nvidia card and all was fine. Intel is blacklisted for desktop Linux anyway, so we're all set for ChromeOS once the ChromeOS kernel patch lands :D |