|
|
Created:
8 years, 4 months ago by jln (very slow on Chromium) Modified:
8 years, 4 months ago CC:
chromium-reviews, joi+watch-content_chromium.org, darin-cc_chromium.org, jam Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionSeccomp-bpf: create system call sets.
We define system call sets to create a partition of all existing system
calls on the x86_64 architecture.
This grand classification will allow us to write policies easily and make
sure that we have thought about what to do for every system call.
This particular change does not affect the current semantic at all, this
is achieved by using comments.
BUG=
NOTRY=true
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=150624
Patch Set 1 : #
Total comments: 26
Patch Set 2 : Address Markus' concerns. #
Total comments: 2
Patch Set 3 : Re-order things alphabetically. #Messages
Total messages: 11 (0 generated)
I have been through every x86_64 system call to put it in a bucket. This particular CL doesn't change the semantic at all. However, it makes many inconsistencies of our current policies obvious. The next sets of CL will address those by removing comments and making small changes in the Gpu and Flash policies.
On 2012/08/08 00:19:39, Julien Tinnes wrote: > I have been through every x86_64 system call to put it in a bucket. > > This particular CL doesn't change the semantic at all. However, it makes many > inconsistencies of our current policies obvious. > > The next sets of CL will address those by removing comments and making small > changes in the Gpu and Flash policies. Note: I don't intend to remove all comments. Many should stay here because we want to reduce the attack surface.
https://chromiumcodereview.appspot.com/10834219/diff/9001/content/common/sand... File content/common/sandbox_seccomp_bpf_linux.cc (right): https://chromiumcodereview.appspot.com/10834219/diff/9001/content/common/sand... content/common/sandbox_seccomp_bpf_linux.cc:172: // case __NR_clock_nanosleep: We need to be careful when we deny system calls that are essentially a superset of an older system call. It is not uncommon that never versions of system libraries will at some point start using the newer system calls. We learned this with the old seccomp sandbox and we have generally always allowed a whole family of system calls -- and even then, we occasionally had bugs where we missed something and a while later some user noticed that Chrome started misbehaving for them. In this particular case that means, either a) we crash (which is clearly not very user friendly), or b) we return with an errno value. But the caller might not necessarily expect and check for errno, which would mean our delay just turned into a no-op. In some cases, this could in fact be noticeable and result in a bug being filed. In a lot of cases, this would probably just result in Chrome using 100% CPU where it shouldn't be doing so. I understand that you don't want to make any functional changes with this CL. But maybe you could add a comment explaining why this particular exception has the potential to lead to a hard-to-find bug. https://chromiumcodereview.appspot.com/10834219/diff/9001/content/common/sand... content/common/sandbox_seccomp_bpf_linux.cc:185: bool IsAmbiantFileSystemSyscall(int sysno) { Not quite sure if I understand the meaning of "ambient" in this particular situation -- apart from the fact that it also appears to be misspelled. https://chromiumcodereview.appspot.com/10834219/diff/9001/content/common/sand... content/common/sandbox_seccomp_bpf_linux.cc:186: switch (sysno) { I guess, you are not really soliciting comments on how arbitrary and somewhat wrong this policy looks right now. That's probably reserved for an upcoming changelist. https://chromiumcodereview.appspot.com/10834219/diff/9001/content/common/sand... content/common/sandbox_seccomp_bpf_linux.cc:200: // For many, EPERM is a valid errno, but not for all of them. I see a lot of potential for confusion down the line, if you name a boolean function IsXXXSyscall(), but then you start making policy decisions by commenting out some of the system calls. Maybe, instead of IsXXXSyscall() functions we should instead have IsAllowedXXXSyscall() functions for everything? Or you could make the policy decision in a separate function and then not comment anything out in the IsXXXSyscall() function? That makes things cleaner but is possibly too complex to be worthwhile. https://chromiumcodereview.appspot.com/10834219/diff/9001/content/common/sand... content/common/sandbox_seccomp_bpf_linux.cc:207: case __NR_execve: I probably would have moved execve() into a separate category. It is both a file system call and a process management call as it kills all other threads. If we rely on helper threads (and we might or might not do so in the future), things like execve() will need special attention. https://chromiumcodereview.appspot.com/10834219/diff/9001/content/common/sand... content/common/sandbox_seccomp_bpf_linux.cc:213: // case __NR_getdents64: In general, what is the plan for system calls that only exist on some architectures? I think, but I might be mistaken, getdents64() is one of those. https://chromiumcodereview.appspot.com/10834219/diff/9001/content/common/sand... content/common/sandbox_seccomp_bpf_linux.cc:249: bool IsAllowedFileSystemCapabilitySyscall(int sysno) { The list of system calls in this category seems somewhat random. For instance, why is ftruncate() here and not in the same group as truncate()? And what does fdatasync() do here? It is almost more like read()/write(). https://chromiumcodereview.appspot.com/10834219/diff/9001/content/common/sand... content/common/sandbox_seccomp_bpf_linux.cc:253: case __NR_fstat: This would be an example where we also need fstat64(), if the platform supports it. On x86-32, all modern libraries will use fstat64() in preference of fstat(); but there could still be a few remnants of calls to the old system call. So, we need to support both. On x86-64, fstat() is actually fstat64(), and there is no old 32bit API. There are a whole bunch of other examples like this. https://chromiumcodereview.appspot.com/10834219/diff/9001/content/common/sand... content/common/sandbox_seccomp_bpf_linux.cc:268: bool IsGetProcessIdSyscall(int sysno) { Grouping getpid() et.al. together with getuid() et.al. feels a little random. Especially as you seem to split things for the system calls that set ids or related properties. Fundamentally, a pid_t is very different from a user or group id. https://chromiumcodereview.appspot.com/10834219/diff/9001/content/common/sand... content/common/sandbox_seccomp_bpf_linux.cc:370: // case __NR_set_tid_address: This is not quite intuitive, but this system call doesn't exist on all architectures. https://chromiumcodereview.appspot.com/10834219/diff/9001/content/common/sand... content/common/sandbox_seccomp_bpf_linux.cc:478: case __NR_sendmsg: // Could specify destination. Unless I am terribly mistaken, sendmsg() is a superset of sendto(). So it is pointless to disallow send() or sendto() if sendmsg() is allowed. And sendmmsg() is just a performance optimization. Also, if I interpret http://lwn.net/Articles/508865/ correctly, sendmsg() is now a viable substitute for connect() (at least in some scenarios). https://chromiumcodereview.appspot.com/10834219/diff/9001/content/common/sand... content/common/sandbox_seccomp_bpf_linux.cc:680: // These give a lot of ambiant authority and bypass the setuid sandbox. s/ambiant/ambient/ ? https://chromiumcodereview.appspot.com/10834219/diff/9001/content/common/sand... content/common/sandbox_seccomp_bpf_linux.cc:747: switch (sysno) { These should probably be in the roughly the same category as the various sleep() functions.
PTAL! As a general remark, this CL is: 1. dealing with x86_64 2. not doing any policy change (those are reserved for https://chromiumcodereview.appspot.com/10837156/ and others). https://chromiumcodereview.appspot.com/10834219/diff/9001/content/common/sand... File content/common/sandbox_seccomp_bpf_linux.cc (right): https://chromiumcodereview.appspot.com/10834219/diff/9001/content/common/sand... content/common/sandbox_seccomp_bpf_linux.cc:172: // case __NR_clock_nanosleep: On 2012/08/08 09:59:45, Markus (顧孟勤) wrote: > We need to be careful when we deny system calls that are essentially a superset > of an older system call. It is not uncommon that never versions of system > libraries will at some point start using the newer system calls. Agreed, there is no doubt in my mind that we should allow nanosleep. It's done in the other CL, where I make policy changes: https://chromiumcodereview.appspot.com/10837156/ https://chromiumcodereview.appspot.com/10834219/diff/9001/content/common/sand... content/common/sandbox_seccomp_bpf_linux.cc:185: bool IsAmbiantFileSystemSyscall(int sysno) { On 2012/08/08 09:59:45, Markus (顧孟勤) wrote: > Not quite sure if I understand the meaning of "ambient" in this particular > situation -- apart from the fact that it also appears to be misspelled. Thanks! Anyone has any good suggestion for a name here ? https://chromiumcodereview.appspot.com/10834219/diff/9001/content/common/sand... content/common/sandbox_seccomp_bpf_linux.cc:186: switch (sysno) { On 2012/08/08 09:59:45, Markus (顧孟勤) wrote: > I guess, you are not really soliciting comments on how arbitrary and somewhat > wrong this policy looks right now. That's probably reserved for an upcoming > changelist. Yes! https://chromiumcodereview.appspot.com/10834219/diff/9001/content/common/sand... content/common/sandbox_seccomp_bpf_linux.cc:200: // For many, EPERM is a valid errno, but not for all of them. On 2012/08/08 09:59:45, Markus (顧孟勤) wrote: > I see a lot of potential for confusion down the line, if you name a boolean > function IsXXXSyscall(), but then you start making policy decisions by > commenting out some of the system calls. > > Maybe, instead of IsXXXSyscall() functions we should instead have > IsAllowedXXXSyscall() functions for everything? So far, I've done the following: IsXXX() is intended when the whole group will likely be denied or allowed. For sets where some system calls should be denied and some allowed, I have a IsAllowedXXX(). But in addition to that, some system calls are commented out for attack surface reduction. In this particular CL, many system calls are commented out for attack surface reduction because I didn't want to make any policy change. This changes with the upcoming CLs, including https://chromiumcodereview.appspot.com/10837156. https://chromiumcodereview.appspot.com/10834219/diff/9001/content/common/sand... content/common/sandbox_seccomp_bpf_linux.cc:207: case __NR_execve: On 2012/08/08 09:59:45, Markus (顧孟勤) wrote: > I probably would have moved execve() into a separate category. It is both a file > system call and a process management call as it kills all other threads. If we > rely on helper threads (and we might or might not do so in the future), things > like execve() will need special attention. I don't think execve will matter much, unless we make drastic changes, because I don't see any situation where we would allow it anyway. In practice, execve() gives "roughly" the ability to open() and mmap a file, as well as kill threads. Since mmap / killing of threads should generally be allowed, I thought it fit nicely in this group. https://chromiumcodereview.appspot.com/10834219/diff/9001/content/common/sand... content/common/sandbox_seccomp_bpf_linux.cc:213: // case __NR_getdents64: On 2012/08/08 09:59:45, Markus (顧孟勤) wrote: > In general, what is the plan for system calls that only exist on some > architectures? I think, but I might be mistaken, getdents64() is one of those. getdents64 exists on x86 32 and 64 bits. In general, this CL is only concerned with 64 bits. I plan to unify with 32 bits later, but it's difficult, and might need to wait until we get argument inspection working. https://chromiumcodereview.appspot.com/10834219/diff/9001/content/common/sand... content/common/sandbox_seccomp_bpf_linux.cc:249: bool IsAllowedFileSystemCapabilitySyscall(int sysno) { On 2012/08/08 09:59:45, Markus (顧孟勤) wrote: > The list of system calls in this category seems somewhat random. For instance, > why is ftruncate() here and not in the same group as truncate()? > > And what does fdatasync() do here? It is almost more like read()/write(). This group is for system calls that take a "fd" as a "capability". It's very similar to the previous group but is for the "real" f* function. Note that some names are misleading: for instance newfstatat should really be (new)statat, (see line 223). https://chromiumcodereview.appspot.com/10834219/diff/9001/content/common/sand... content/common/sandbox_seccomp_bpf_linux.cc:253: case __NR_fstat: On 2012/08/08 09:59:45, Markus (顧孟勤) wrote: > This would be an example where we also need fstat64(), if the platform supports > it. On x86-32, all modern libraries will use fstat64() in preference of fstat(); > but there could still be a few remnants of calls to the old system call. So, we > need to support both. > > On x86-64, fstat() is actually fstat64(), and there is no old 32bit API. There is no fstat64() in x86-64. (All system calls x86-64 system calls are here). fstat64() is now fstat() and is implemented as sys_newfstat in the kernel. > There are a whole bunch of other examples like this. I am 100% sure that I didn't forget any x86_64 system call. I even wrote a script to check :) https://chromiumcodereview.appspot.com/10834219/diff/9001/content/common/sand... content/common/sandbox_seccomp_bpf_linux.cc:268: bool IsGetProcessIdSyscall(int sysno) { On 2012/08/08 09:59:45, Markus (顧孟勤) wrote: > Grouping getpid() et.al. together with getuid() et.al. feels a little random. > Especially as you seem to split things for the system calls that set ids or > related properties. > > Fundamentally, a pid_t is very different from a user or group id. Yeah, good point. I didn't want to have two many sets either, but I think you're right on that one. I'll wait until Chris / Jorge comment as well. https://chromiumcodereview.appspot.com/10834219/diff/9001/content/common/sand... content/common/sandbox_seccomp_bpf_linux.cc:370: // case __NR_set_tid_address: On 2012/08/08 09:59:45, Markus (顧孟勤) wrote: > This is not quite intuitive, but this system call doesn't exist on all > architectures. Yeah, this CL is x86_64 only. Architecture unification is TODO. https://chromiumcodereview.appspot.com/10834219/diff/9001/content/common/sand... content/common/sandbox_seccomp_bpf_linux.cc:478: case __NR_sendmsg: // Could specify destination. On 2012/08/08 09:59:45, Markus (顧孟勤) wrote: > Unless I am terribly mistaken, sendmsg() is a superset of sendto(). So it is > pointless to disallow send() or sendto() if sendmsg() is allowed. And sendmmsg() > is just a performance optimization. > > Also, if I interpret http://lwn.net/Articles/508865/ correctly, sendmsg() is now > a viable substitute for connect() (at least in some scenarios). Yes, exactly, I've noticed that. This is precisely why I treat them the same in the other CL that deals with policy changes: https://chromiumcodereview.appspot.com/10837156 https://chromiumcodereview.appspot.com/10834219/diff/9001/content/common/sand... content/common/sandbox_seccomp_bpf_linux.cc:680: // These give a lot of ambiant authority and bypass the setuid sandbox. On 2012/08/08 09:59:45, Markus (顧孟勤) wrote: > s/ambiant/ambient/ ? Thanks! https://chromiumcodereview.appspot.com/10834219/diff/9001/content/common/sand... content/common/sandbox_seccomp_bpf_linux.cc:747: switch (sysno) { On 2012/08/08 09:59:45, Markus (顧孟勤) wrote: > These should probably be in the roughly the same category as the various sleep() > functions. At least I should localize them closer in the C file. However, they do reach significantly different attack surface. But I don't feel too strongly about it.
On 2012/08/08 17:55:07, Julien Tinnes wrote: > PTAL! > > As a general remark, this CL is: > > 1. dealing with x86_64 Also to add to that: https://chromiumcodereview.appspot.com/10837156/ gives a better view of how it's intended to look. Once we have a baseline policy model, we can have a CC file (seccomp_policy_x86_64.cc ?) that is architecture specific and implements the baseline. Then, the other policies can be expressed as variation of the baseline. Currently for instance, flash is roughly Baseline + SysV IPC and GPU is "Baseline + IOCTL and eventfd).
PTAL! I have renamed some of the IsXXXX to IsAllow* for the implementation to match the semantics. I didn't do this when obviously the system calls that are commented out won't be any more very soon (cf. https://chromiumcodereview.appspot.com/10837156/).
lgtm I think this is fine for now. Especially in light of the fact that you have another CL lined up, that will clean things up more (by also making some subtle policy changes)
lgtm with nits https://chromiumcodereview.appspot.com/10834219/diff/11001/content/common/san... File content/common/sandbox_seccomp_bpf_linux.cc (right): https://chromiumcodereview.appspot.com/10834219/diff/11001/content/common/san... content/common/sandbox_seccomp_bpf_linux.cc:291: case __NR_iopl: // Intel privilege. Wrong alphabetical order?
https://chromiumcodereview.appspot.com/10834219/diff/11001/content/common/san... File content/common/sandbox_seccomp_bpf_linux.cc (right): https://chromiumcodereview.appspot.com/10834219/diff/11001/content/common/san... content/common/sandbox_seccomp_bpf_linux.cc:291: case __NR_iopl: // Intel privilege. On 2012/08/08 21:56:31, Jorge Lucangeli Obes wrote: > Wrong alphabetical order? Thanks! You're a terminator for spotting this :)
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/jln@chromium.org/10834219/12002
Change committed as 150624 |