src/trusted/validator_ragel/unreviewed/validator_internals.html - Issue 10883051: Add documentation for the dynamic code modifications.

Unified Diff: src/trusted/validator_ragel/unreviewed/validator_internals.html

Issue 10883051: Add documentation for the dynamic code modifications. (Closed) Base URL: svn://svn.chromium.org/native_client/trunk/src/native_client/

Patch Set: Created 8 years, 4 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View side-by-side diff with in-line comments

Download patch

« no previous file with comments | « src/trusted/validator_ragel/unreviewed/validator_internal.h ('k') | src/trusted/validator_ragel/unreviewed/validator_test.c » ('j') | no next file with comments »
Expand Comments ('e') | Collapse Comments ('c') | Hide Comments ('s')

Index: src/trusted/validator_ragel/unreviewed/validator_internals.html

===================================================================

--- src/trusted/validator_ragel/unreviewed/validator_internals.html (revision 9563)

+++ src/trusted/validator_ragel/unreviewed/validator_internals.html (working copy)

@@ -3,31 +3,63 @@

</head>

<body>

+<div>

+<div style="width:20%; float:left; padding-right:5%;"><a href="http://en.wikipedia.org/wiki/File:Duesenberg.jpg"><img border="0" src="http://upload.wikimedia.org/wikipedia/commons/thumb/3/3a/Duesenberg.jpg/800px-Duesenberg.jpg" width="100%" /></a><br /><center><span style="font-size:50%">Source: <a href="http://en.wikipedia.org/wiki/File:Duesenberg.jpg">http://en.wikipedia.org/wiki/File:Duesenberg.jpg</a></span></center></div><div style="width:33%; float:right; padding-left:5%;"><a href="http://en.wikipedia.org/wiki/File:Felipe_Massa_2011_Malaysia_FP1.jpg"><img border="0" src="http://upload.wikimedia.org/wikipedia/commons/thumb/5/58/Felipe_Massa_2011_Malaysia_FP1.jpg/800px-Felipe_Massa_2011_Malaysia_FP1.jpg" width="100%" /></a><center><span style="font-size:50%">Source: <a href="http://upload.wikimedia.org/wikipedia/commons/thumb/5/58/Felipe_Massa_2011_Malaysia_FP1.jpg/800px-Felipe_Massa_2011_Malaysia_FP1.jpg">http://upload.wikimedia.org/wikipedia/commons/thumb/5/58/Felipe_Massa_2011_Malaysia_FP1.jpg/800px-Felipe_Massa_2011_Malaysia_FP1.jpg</a></span></center></div>

+<h1>New, DFA-based validator with 5-10x speed of the original one, or…<br />

+<div style="text-align:right;">Luxury car to F1 car.</div></h1>

+<div style="position:relative; width:55%; left:10%;">Trust me: every problem in computer science may be solved by an indirection, but those indirections are <b>expensive</b>. Pointer chasing is just about the most expensive thing you can do on modern CPU's.<br /><a href="http://lwn.net/Articles/509416/"><i>—Linus Torvalds</i></a></div>

+<div>

+<ol style="clear:both;">

+<li><a href="#1">DFA, Ragel, macro and inline functions, oh my…</a></li>

+<li><a href="#2">What is ragel and how it works.</a></li>

<ol>

-<li><a href="#1">DFA, Ragel, macroses and inline functions, oh my…</a></li>

-<li><a href="#2">“Special” instructions.</a></li>

-<li><a href="#3">“No so special” instructions.</a></li>

-<li><a href="#4">Features beyond minimal validation.</a></li>

+<li><a href="#2-1">Ragel actions.</a></li>

+</ol>

+<li><a href="#3">“Special” instructions.</a></li>

+<li><a href="#4">“No so special” instructions.</a></li>

+<li><a href="#5">Features beyond minimal validation.</a></li>

<ol>

-<li><a href="#4-1"><code>CPUID</code> support.</a></li>

-<li><a href="#4-2">Dynamic code creation support.</a></li>

-<li><a href="#4-3">Dynamic code modification support.</a></li>

+<li><a href="#5-1"><code>CPUID</code> support.</a></li>

+<li><a href="#5-2">Dynamic code modification support.</a></li>

+<ol>

+<li><a href="#5-2-1">Replacement validation.</a></li>

+<li><a href="#5-2-2">Replacement copying.</a></li>

</ol>

-<li><a href="#5">Validation for x86-64 mode.</a></li>

+</ol>

+<li><a href="#6">Validation for x86-64 mode.</a></li>

<ol>

-<li><a href="#5-1">“Secondary” states.</a></li>

-<li><a href="#5-2">“Normal” instructions.</a></li>

-<li><a href="#5-3">Operands handling.</a></li>

+<li><a href="#6-1">“Secondary” states.</a></li>

+<li><a href="#6-2">“Normal” instructions.</a></li>

+<li><a href="#6-3">Operands handling.</a></li>

+<li><a href="#6-4">Dynamic code modification support.</a></li>

+<ol>

+<li><a href="#6-4-1">Replacement validation.</a></li>

+<li><a href="#6-4-2">Replacement copying.</a></li>

</ol>

-<li><a href="#6">Decoders.</a></li>

</ol>

-<h2><div style="float:right"><a href="#TOC">▲</a></div><a name="1">1. DFA, Ragel, macroses and inline functions, oh my…</a></h2>

-<p>To understand how DFA-based validators work it's best to start from function <code>ValidateChunkIA32</code> in <code>validator_x86_32.rl</code>. Said function is very short and “simple”: it allocates couple of arrays (<code>valid_targets</code> and <code>jump_dests</code>), then cycles over code passed to it (processing it in bundle-sized chunks) and at the end it compares valid jump targets and collected jump destinations… that's it. Oh, and it also includes couple of cryptic lines right in the middle of innermost cycle:<hr />

+<li><a href="#7">Decoders.</a></li>

+</ol>

+<h2><div style="float:right"><a href="#TOC">▲</a></div><a name="1">1. DFA, Ragel, macro and inline functions, oh my…</a></h2>

+<p>Contemporary computer systems are extremely powerful and most complex components and libraries are built like a <a href="http://en.wikipedia.org/wiki/Luxury_vehicle">luxury car</a>: they include a lot of comfort and safety technologies which are designed to improve live of the user of said components. This also facilitates <a href="http://en.wikipedia.org/wiki/Code_reuse">code reuse</a> via <a href="http://en.wikipedia.org/wiki/Modular_programming">modular programming</a> and generally improves <a href="http://en.wikipedia.org/wiki/Maintainability">maintainability</a>.</p>

+<p>Unfortunately these complex structures, improved comfort for the library user and commendable flexibility have a flip side: they lead to a lot of additional work in runtime! You first fill and then parse complex data structures—and this takes time. You often produce a lot of information on the low levels which is just not used on higher levels—and this work is also not free.</p>

+<p>New validator is built differently. It only keep around the indispensable minimum of the information needed to prove (or disprove) that code is safe. Similarly to how <a href="http://en.wikipedia.org/wiki/Formula_One_car">F1 car</a> uses <a href="http://www.youtube.com/watch?v=NsvWnGgT7Ok">custom-designed car seats</a> we use custom-designed data structures to push the data from one point of validator to another one. <span title="Actually we collect slightly more then the bare minimum to make testing possible.">We only collect the bare minimum of the information</span>—and if the requirements are changing we often change all the pieces: from <code>gen_dfa</code> input data format to the highest-level <code>dfa_validate_32.c</code>/<code>dfa_validate_64.c</code> external API adapters.</p>

+<p>This streamlining was one of the most important design goals of a new validator. And indeed the code which reaches the CPU is very simple: it does not contain complex data structures and multilayered functions while all the previous validators had many layers and quite a few complex data structures. How can it be? Were all these structures superfluous and unnecessary? Well… not really. New validator throws away all that complexity and trades it for a few comparisons and jumps. <b>Tens of thousands comparisons and similar number of jumps</b>, to be exact. In a <b>single flat function</b>. Basically we trade runtime complexity for build-time complexity. As you can guess it's practically not possible to write such a function by hand—and even if someone will be able to write tens of thousands of lines of code by hand it'll be impossible to inderstand and review. Peoples are not CPUs! They can keep track of millions lines of code in complex projects if these are organized in modules and are nicely separated, but give then fifty thousand lines of homogeneous code—and they'll be totally lost. But this is basically what we have here in the end product—because CPU loves such code. To solve this dilemma we employ three levels of filters to create the final code.</p>

+<center><img src="files32.svg" height="90%"/><br />Gray elements are hand-written, white elements are generated and dark-gray are aforementioned mixers.</center><br />

+<p>To understand how validator works it's best to start from function <code>ValidateChunkIA32</code> in <code>validator_x86_32.rl</code>. Said function is very short and “simple”: it allocates couple of arrays (<code>valid_targets</code> and <code>jump_dests</code>), then cycles over code passed to it (processing it in bundle-sized chunks) and at the end it compares valid jump targets and collected jump destinations… that's it. Oh, and it also includes couple of cryptic lines right in the middle of innermost cycle:<hr />

    <code>%% write init;</code><br />

    <code>%% write exec;</code><hr />

Apparently collection of valid jump targets and actual target destinations happens here. How?</p>

-<a name="ragel"></a><blockquote style="background:lightgray; font-size:90%;">

+<h2><div style="float:right"><a href="#TOC">▲</a></div><a name="2">2. What is ragel and how it works.</a></h2>

+<blockquote style="background:lightgray; font-size:90%;">

<p>To understand that you need to know a little about DFA and Ragel. I'll not explain what the DFA is (it's explained in CS course you've heard years back… or you can refresh you knowleadge on <a href="http://en.wikipedia.org/wiki/Deterministic_finite_automaton">Wikipedia</a>). But I'll explain a little about Ragel. Extensive documentation with all the gory details is <a href="http://www.complang.org/ragel/">on Ragel's site</a>, but while it explains <b>how</b> to use Ragel it does not explain <b>what</b> it is and <b>why</b> you may want to use it.</p>

<p>Let's start with the first question: <b>what</b> it is. Ragel is compiler of DFA machines… but with a twist. You describe DFA structure using simple <a href="http://en.wikipedia.org/wiki/Regular_expression">RE</a>-style format and Ragel generates the corresponding code in C (D/Go/Java/Ruby/etc: Ragel supports a lot of laguages, but we are interested in C here). When you describe the DFA you just write acceptable bytes and then use the following operations: concatenation (“1 . 2” will accept either “1” followed by “2”), union (“1 | 2” will accept either “1” or “2”), intersection (“('a'..'n') & ('m'..'z')” will accept either “m” or “n”), difference (“('a'..'n') - ('m'..'z')” will accept everything between “a” and “l”, but will not accept either “m” or “n”) and kleene star (“(1 | 2)*” will accept any number of “1” or “2”).</p>

@@ -36,7 +68,9 @@

<p>If, instead of “("aa"+ | "aaa"+)” in the example above you'll use something like “("a"{5}+ | "a"{7}+ | "a"{11}+)” then the resulting DFA will include almost four hundreds nodes and over five hundreds transitions! This limits applicability of DFA technology: e.g. it's possible to describe "valid code sequence" (including bundles, "restricted registers" and everything else) as a DFA, but… said DFA will include millions of nodes and billions of transitions!</p>

-<p><a name="actions">To overcome this problem Ragel offers so-called "actions": pieces of code which are called when certain pieces in DFA are reached. E.g. we can mark begin and end of “aa” (or “aaa”) in the example above—“("b" . (("aa" >begin @end)+ | ("aaa" >begin @end)+ ))*” produces the following DFA:</a></p>

+<h3><div style="float:right"><a href="#TOC">▲</a></div><a name="2-1">2.1. Ragel actions.</a></h3>

+<p>To overcome this problem Ragel offers so-called "actions": pieces of code which are called when certain pieces in DFA are reached. E.g. we can mark begin and end of “aa” (or “aaa”) in the example above—“("b" . (("aa" >begin @end)+ | ("aaa" >begin @end)+ ))*” produces the following DFA:</p>

<p style="margin-bottom:0px;">Let's see what happens if we'll feed it with “baaaaaaaaa” sequence:</p>

@@ -72,15 +106,15 @@

<li>Actions are called in non-random order—take a look on <i>offset 4</i>: <code>end2</code> is called before <code>begin3</code>. That's because <code>begin3</code> has lower priority than <code>end2</code>! Note that in previous example this same effect was observed, but it was quite mysterious there. The closer the action is to the beginning of the source file the higher it's priority is.</li>

</ol>

</blockquote>

-<h2><div style="float:right"><a href="#TOC">▲</a></div><a name="2">2. “Special” instructions.</a></h2>

+<h2><div style="float:right"><a href="#TOC">▲</a></div><a name="3">3. “Special” instructions.</a></h2>

<p>Now we can go back to machine description. Our main DFA is the same in all cases, it's “<code>(one_instruction | special_instruction)*</code>”—i.e. it accepts sequence of “normal” instructions and “special” instructions.</p>

-<p>Also, just like in example above there are two actions: first one is triggered at the beginning of the <code>instruction</code> (“normal” or “special”)—it's used to remember the beginning of the instruction, to clear the list of <code>errors_detected</code>, and to mark the first byte of the instruction as valid target for the direct jump; second one is triggered at the final byte of the <code>instruction</code> (“normal” or “special”)—and is used to report errors. And there are also one additional action which is declared as “<code>$err</code>”. This is <i>error fallback action</i>: it's triggered whenever our machine rejects some byte (which means we've hit either forbidden instruction like <code>lgdt</code> or some undefined byte sequence… in both cases <code> UNRECOGNIZED_INSTRUCTION</code> error is reported and processing is stopped).</p>

+<p>Also, just like in example above there are two actions: first one is triggered at the beginning of the <code>instruction</code> (“normal” or “special”)—it's used to remember the beginning of the instruction, to clear the <code>instruction_info_collected</code>, and to mark the first byte of the instruction as valid target for the direct jump; second one is triggered at the final byte of the <code>instruction</code> (“normal” or “special”)—and is used to report errors. And there are also one additional action which is declared as “<code>$err</code>”. This is <i>error fallback action</i>: it's triggered whenever our machine rejects some byte (which means we've hit either forbidden instruction like <code>lgdt</code> or some undefined byte sequence… in both cases <code> UNRECOGNIZED_INSTRUCTION</code> error is reported and processing is stopped).</p>

<p>There are three “special” instructions in IA32 case: <code>naclcall</code>, <code>nacljmp</code> and <code title="mov %gs:0x0,%reg is part of public ABI, mov %gs:0x4,%reg is used in IRT">mov %gs:0x0/0x4,%reg</code>. The last one is declared as “special” instruction to simplify the validation logic (and DFA, too): instead of accepting all versions of <code>mov %gs:<i>something</i>,%reg</code> instruction followed by additional logic which rejects most possibilities (only plain vanialla “zero” is allowed here as per ABI) we only describe this one version of the instruction and ragel does the rest. <code>naclcall</code> and <code>nacljmp</code> include special action which clears the “valid destination address” bit (remember the story with <code>begin</code> and <code>end</code> actions above? when first byte of a second half of <code>naclcall</code>/<code>nacljmp</code> is processed it's processed as <b>both</b> part of the <code>naclcall</code>/<code>nacljmp</code> <b>and</b> as a start of a regular instruction, too).</p>

<p>This explains how <code>valid_targets</code> array is filled and invalid instructions are rejected.</p>

-<h2><div style="float:right"><a href="#TOC">▲</a></div><a name="3">3. “Not so special” instructions.</a></h2>

+<h2><div style="float:right"><a href="#TOC">▲</a></div><a name="4">4. “Not so special” instructions.</a></h2>

<p>But of course there are <code>jump_dests</code>, too. Special instructions don't touch it, but something obviously fills the array, isn't it. This can only be result of processing of normal instructions, thus we need to go deeper. Where it all comes from? To understand that we need to look on [autogenerated] <code>validator_x86_32_instruction.rl</code> file. The file looks like this:<hr />

    ⋮<br />

  <i>Semi-manual simple helper machines and actions</i><br />

@@ -100,7 +134,7 @@

    <code>int8_t offset = (uint8_t) (p[0]);</code><br />

    <code>size_t jump_dest = offset + (p - data) + 1;</code><br /><br />

    <code>if (!MarkJumpTarget(jump_dest, jump_dests, size)) {</code><br />

-      <code>errors_detected |= DIRECT_JUMP_OUT_OF_RANGE;</code><br />

+      <code>instruction_info_collected |= DIRECT_JUMP_OUT_OF_RANGE;</code><br />

  <code>action rel32_operand {</code><br />

@@ -108,82 +142,139 @@

    <code>size_t jump_dest = offset + (p - data) + 1;</code><br /><br />

    <code>if (!MarkJumpTarget(jump_dest, jump_dests, size)) {</code><br />

-      <code>errors_detected |= DIRECT_JUMP_OUT_OF_RANGE;</code><br />

+      <code>instruction_info_collected |= DIRECT_JUMP_OUT_OF_RANGE;</code><br />

We just check if jump target passes preliminary check (direct jump to the outside of the region is always invalid) and that's not so then we detect error <code>DIRECT_JUMP_OUT_OF_RANGE</code>.</p>

-<h2><div style="float:right"><a href="#TOC">▲</a></div><a name="4">4. Features beyond minimal validation.</a></h2>

+<h2><div style="float:right"><a href="#TOC">▲</a></div><a name="5">5. Features beyond minimal validation.</a></h2>

<p style="margin-bottom:0px;">This covers most of the functionality of the validator (we'll discuss the generation of <code>validator_x86_32_instruction.rl</code> file later), but there are still some details not covered here:</p>

-<li><a href="#4-1"><code>CPUID</code> support.</a></li>

-<li><a href="#4-2">Dynamic code creation support.</a></li>

-<li><a href="#4-3">Dynamic code modification support.</a></li>

+<li><a href="#5-1"><code>CPUID</code> support.</a></li>

+<li><a href="#5-2">Dynamic code modification support.</a></li>

+<ol>

+<li><a href="#5-2-1">Replacement validation.</a></li>

+<li><a href="#5-2-2">Replacement copying.</a></li>

</ol>

+</ol>

-<h3><div style="float:right"><a href="#TOC">▲</a></div><a name="4-1">4.1. <code>CPUID</code> support.</a></h3>

+<h3><div style="float:right"><a href="#TOC">▲</a></div><a name="5-1">5.1. <code>CPUID</code> support.</a></h3>

<p><code>CPUID</code> support is implemented using large set of actions embedded in definition of instructions (see, e.g. <code>@CPUFeature_FXSR</code> in the line for instruction <code>0x0f 0x01 0xd0</code> AKA <code>xgetbv</code>). CPUID-related actions are triggered when we know the identity of the instruction (which happens at different times for different instructions: some instructions are detected when opcode is read, some use <i>opcode extension</i>, etc—AMD/Intel manuals contain all the gory details), but the definition for said actions in <code>validator_x86_32_instruction.rl</code> are very simple<hr />

  <code>action CPUFeature_FXSR {</code><br />

    <code>SET_CPU_FEATURE(CPUFeature_FXSR);</code><br />

This time magic is in <code>validator_internal.h</code>. <code>SET_CPU_FEATURE</code> is defined as<hr />

-<code>#define SET_CPU_FEATURE(F) \</code><br />

-  <code>if (!(F)) { \</code><br />

-    <code>errors_detected |= CPUID_UNSUPPORTED_INSTRUCTION; \</code><br />

+  <code>if (!(F##_Allowed)) { \</code><br />

+    <code>instruction_info_collected |= UNRECOGNIZED_INSTRUCTION; \</code><br />

+  <code>} \</code><br />

+  <code>if (!(F)) { \</code><br />

+    <code>instruction_info_collected |= CPUID_UNSUPPORTED_INSTRUCTION; \</code><br />

-IOW: it's pretty straighforward and simple, but there are a twist: <code>CPUFeature_FXSR</code> is not the name of variable, but the name of macrodefinition. This is needed to handle special cases where <code>CPUFeature</code> does not correspond to a single <code>CPUID</code> bit. E.g. <code>prefetch</code> instruction is available when <b>any one</b> of three bits are set: <code>3DNnow!</code> bit, deficated <code>Prefetch instruction</code> bit or <code>LongMode</code> bit. On the other hand <code>vaesenc</code> is available when <b>both</b> <code>AES</code> and <code>AVX</code> bits are set. And our ABI <a href="http://code.google.com/p/nativeclient/issues/detail?id=2869">permits <code>lzcnt</code> and <code>tzcnt</code> uncoditionally</a> (thus <code>CPUFeature_LZCNT</code> does not check for anything but just returns <code>TRUE</code> in all cases).

-</p>

+IOW: it's pretty straighforward and simple, but there are a twist: <code>CPUFeature_FXSR</code> is not the name of variable, but the name of macrodefinition. This is needed to handle special cases where <code>CPUFeature</code> does not correspond to a single <code>CPUID</code> bit. E.g. <code>prefetch</code> instruction is available when <b>any one</b> of two bits are set: <span title="AMD documtntation also claims it's always available if LongMode bit is set but Intel documentation does not support this assertion."><code>3DNnow!</code> bit or deficated <code>Prefetch instruction</code> bit</span>. On the other hand <code>vaesenc</code> is available when <b>both</b> <code>AES</code> and <code>AVX</code> bits are set. And our ABI <a href="http://code.google.com/p/nativeclient/issues/detail?id=2869">permits <code>lzcnt</code> and <code>tzcnt</code> uncoditionally</a> (thus <code>CPUFeature_LZCNT</code> does not check for anything but just returns <code>TRUE</code> in all cases).</p>

-<h3><div style="float:right"><a href="#TOC">▲</a></div><a name="4-2">4.2. Dynamic code creation support.</a></h3>

+<p>Note: there are two CPUID masks: hardcoded one (it can be replaced if you link in different definition of <code>validator_cpuid_features</code> global variable in your program) and runtime-supplied one (usually obtained from actual <code>CPUID</code> call in production, but hardcoded in tests). New instructions are first added in “production disabled” mode and must pass a security review before they can be used in Chrome.</p>

-<p>TBD</p>

+<h3><div style="float:right"><a href="#TOC">▲</a></div><a name="5-2">5.2. Dynamic code modification support.</a></h3>

-<h3><div style="float:right"><a href="#TOC">▲</a></div><a name="4-3">4.3. Dynamic code modification support.</a></h3>

+<p>Dynamic code modification support is implemented with the help of <code>CALL_USER_CALLBACK_ON_EACH_INSTRUCTION</code> option. Normally user callback is only used when some kind of error is detected, but if this option is used then callback is called after <b>each</b> instruction. When that happend callback have all the information needed to process the instruction: collected errors, information about immediates, etc.</p>

-<p>TBD</p>

+<p>All that information is squeezed in <code>instruction_info_collected</code> variable. <span title="Note that half of the information does not make sense for ia32 mode and is not collected by ValidateChunkIA32. It's included for completeness.">It has the following format</code>:</p>

-<h2><div style="float:right"><a href="#TOC">▲</a></div><a name="5">5. Validation for x86-64 mode.</a></h2>

+<table width="100%"><tr><td align="left">31</td><td align="left">30</td><td align="left">29</td><td align="left">28</td><td align="left">27</td><td align="left">26</td><td align="left">25</td><td align="left">24</td><td align="left">23</td><td align="left">22</td><td align="left">21</td><td align="left">20</td><td align="left">19</td><td align="left">18</td><td align="left">17</td><td align="left">16</td><td align="left">15</td><td align="left">14</td><td align="left">13</td><td align="left">12</td><td align="right">8</td><td align="left">7</td><td align="left">6</td><td align="right">5</td><td align="left">4</td><td align="left">3</td><td align="right">0</td></tr>

+<tr><td align="left"> </td><td align="left"> </td><td align="left"> </td><td align="left"> </td><td align="left"> </td><td align="left"> </td><td colspan="12" align="left" style="border: thin solid black;"><table width="100%"><tr><td align="left">⇤</td><td align="center"><code>VALIDATION_ERRORS_MASK</code></td><td align="right">⇥</td></table></td><td align="left"> </td><td colspan="2" align="left" width="1%" style="border: thin solid black; background:lightgray;"><table width="100%"><tr><td align="left">⇤</td><td width="1%" align="center"><code>RESTRICTED_REGISTER_MASK</code></td><td align="right">⇥</td></table></td><td align="left"> </td><td colspan="2" align="left" width="1%" style="border: thin solid black;"><table width="100%"><tr><td align="left">⇤</td><td width="1%" align="center"><code>RESTRICTED_REGISTER_MASK</code></td><td align="right">⇥</td></table></td><td align="left"> </td><td colspan="2" align="left" width="1%" style="border: thin solid black;"><table width="100%"><tr><td align="left">⇤</td><td width="1%" align="center"><code>IMMEDIATES_SIZE_MASK</code></td><td align="right">⇥</td></table></td></tr>

+<tr><td style="border: thin solid black; background: gray;" width="1%" align="center"> 0 </td><td style="border: thin solid black;" width="1%" align="center">   </td><td style="border: thin solid black;" width="1%" align="center">   </td><td width="1%" style="border: thin solid black;" align="center">   </td><td style="border: thin solid black; background: lightgray;" width="1%" align="center">   </td><td style="border: thin solid black;" width="1%" align="center">   </td><td style="border: thin solid black; background: lightgray;" width="1%" align="center">   </td><td style="border: thin solid black; background: lightgray;" width="1%" align="center">   </td><td style="border: thin solid black; background: lightgray;" width="1%" align="center">   </td><td style="border: thin solid black; background: lightgray;" width="1%" align="center">   </td><td style="border: thin solid black; background: lightgray;" width="1%" align="center">   </td><td style="border: thin solid black; background: lightgray;" width="1%" align="center">   </td><td style="border: thin solid black; background: lightgray;" width="1%" align="center">   </td><td style="border: thin solid black; background: lightgray;" width="1%" align="center">   </td><td style="border: thin solid black; background: lightgray;" width="1%" align="center">   </td><td style="border: thin solid black;" width="1%" align="center">   </td><td style="border: thin solid black;" width="1%" align="center">   </td><td style="border: thin solid black;" width="1%" align="center">   </td><td style="border: thin solid black; background: lightgray;" width="1%" align="center">   </td><td colspan="2" style="border: thin solid black; background: lightgray;" align="center">   </td><td style="border: thin solid black;" width="1%" align="center">   </td><td colspan="2" style="border: thin solid black;" align="center">   </td><td style="border: thin solid black;" width="1%" align="center">   </td><td colspan="2" style="border: thin solid black;" align="center">   </td><td>   </td></tr>

+<tr><td align="left">↑</td><td align="left">↑</td><td align="left">↑</td><td align="left">↑</td><td align="left">↑</td><td align="left">↑</td><td align="left">↑</td><td align="left">↑</td><td align="left">↑</td><td align="left">↑</td><td align="left">↑</td><td align="left">↑</td><td align="left">↑</td><td align="left">↑</td><td align="left">↑</td><td align="left">↑</td><td align="left">↑</td><td align="left">↑</td><td align="left">↑</td><td align="left">↑</td><td align="left"> </td><td align="left">↑</td><td align="left">↑</td><td align="left"> </td><td align="left">↑</td><td align="left">↑</td><td align="left"> </td><td align="left"> </td></tr>

+<tr><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left"> </td><td align="left">┊</td><td align="left">┊</td><td align="left"> </td><td align="left">┊</td><td align="left" colspan="100" >└ Cumulutive size of <i title="Immediates, displacements, relative offsets.">anyfields</i>.</td></tr>

+<tr><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left"> </td><td align="left">┊</td><td align="left">┊</td><td align="left"> </td><td align="left" colspan="100" >└ <span title="enter, extrq, insertq">Instruction has two immediates</a>.</td></tr>

+<tr><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left"> </td><td align="left">┊</td><td align="left" colspan="100" >└ <span title="00 == 0 bytes, 01 == 1 bytes, 10 = 2 bytes, 11 = 4 bytes">Instruction displacement size</span>.</td></tr>

+

+<tr><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left" colspan="100" >└ <span style="background: lightgray;">ia32 mode: reserved;</span> amd64 mode: <span title="NO_REG if instruction does not zero-extending one">register, zero-extended by the instruction.</span></td></tr>

+<tr><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left" colspan="100" >└ <span style="background: lightgray;">ia32 mode: reserved;</span> amd64 mode: <span title="This means that start of this instruction is not a valid jump target.">instruction is valid, but it accesses memory using register which is zero-extended by previous instruction.</span></td></tr>

+<tr><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left" colspan="100" >└ <span title="Note that all unsupported instructions trigger this error. This includes mov by absolute 64bit address, system instructions like lidt or even call and jmp used not as part of superinstruction. If combined with CPUID_UNSUPPORTED_INSTRUCTION it means that instruction is not yet enabled in validator.">DFA error: invalid instruction. Validation then resumes from the next bundle.</span></td></tr>

+<tr><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left" colspan="100" >└ <span style="background: lightgray;">ia32 mode: reserved;</span> amd64 mode: <code>%r15b</code>, <code>%r15w</code>, <code>%r15d</code>, or <code>%r15</code> is modified. <code>%r15</code> is untouchable in amd64 mode.</td></tr>

+<tr><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left" colspan="100" >└ <span style="background: lightgray;">ia32 mode: reserved;</span> amd64 mode: <span title="Note that %ebp is not mentioned. It can be modified by a regular instruction. But NEXT instruction must be special if that happened."><code>%bpl</code>, <code>%bp</code>, or <code>%rbp</code> is incorrectly modified. Only <code>%rbp</code> can be modified and then only by special instruction.</span></td></tr>

+<tr><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left" colspan="100" >└ <span style="background: lightgray;">ia32 mode: reserved;</span> amd64 mode: <span title="Note that %esp is not mentioned. It can be modified by a regular instruction. But NEXT instruction must be special if that happened."><code>%spl</code>, <code>%sp</code>, or <code>%rsp</code> is incorrectly modified. Only <code>%rsp</code> can be modified and then only by special instruction.</span></td></tr>

+<tr><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left" colspan="100">└ Bad <code>call</code> alignment: <code>call</code> must end at the end of the bundle, since <code>nacljmp</code> only can jump to aligned address.</span></td></tr>

+<tr><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left" colspan="100">└ <span style="background: lightgray;">ia32 mode: reserved;</span> amd64 mode: <span title="Note: in ia32 mode all non-special instructions are modifiable.">instruction is modifiable.</span></td></tr>

+<tr><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left" colspan="100">└ Special instruction (uses different validation rules from the regular instruction). Can not be changed in ia32bit mode.</td></tr>

+<tr><td align="left">┊</td><td align="left">┊</td><td align="left" colspan="100">└ Last byte is not immediate. It's either <span title="3DNow! instructions.">opcode</span>, <span title="Some AVX, FMA4, XOP instructions.">register number</span> or <span title="vpermil2pd and vpermil2ps">register number and two-bit immediate</span>.</td></tr>

+<tr><td align="left">┊</td><td align="left" colspan="100">└ Invalid jump target. When this flag is set <code>instruction_start</code> and <code>instruction_end</code> both point to the <b>jump target</b> instruction, not to the <b>jump</b> instruction itself.</td></tr>

+<tr><td align="left" colspan="100">└ Reserved.</td></tr>

+</table>

+<p>Using this information you can determine if the given instruction follows <span title="Only “naclcall” and “nacljmp” in ia32 mode.">special rules</span>, if it includes <span title="Commands like jcc, jmp, loopcc, or call.">relative offsets</span>, <span title="Most commands which access memory support displacements.">displacements</span>, or <span title="Immediates are support by many different commands. They can be combined with displacement if command accesses memory.">immediates</span>. Tests way use the information collected to precisely separate different <i title="Immediates, displacements, relative offsets.">anyfields</i>, but in production only few bits are used to determine if the instruction can be changed or not: in ia32 mode only <span title="naclcall and nacljmp">special instructions</span> can not be changed, while in amd64 situation is the opposite: <span title="Only “call” and “mov” can be changed.">most instructions can not be changed</span>.</p>

+<h4><div style="float:right"><a href="#TOC">▲</a></div><a name="5-2-1">5.2.1. Replacement validation.</a></h4>

+<p>As was said <a href="#5-2">above</a> code replacement is not supported by <code>ValidateChunkIA32</code> function directly. Instead it's done by higher-level function in <code>dfa_validate_32.c</code>.</p>

+<p>It uses <code>CALL_USER_CALLBACK_ON_EACH_INSTRUCTION</code> option to compare lengths of instructions in two fragments in callback and <code>SPECIAL_INSTRUCTION</code> flag passed to callback to make sure special instructions will be unchanged.</p>

+<p>One tricky thing there is handling of relative jumps and calls: if relative jump (or call) triggers <code>DIRECT_JUMP_OUT_OF_RANGE</code> <b>but</b> is bit-to-bit identical to the original instruction it's accepted anyway: this means that this particular <code>jump</code> (or <code>call</code>) jumps (or calls) some valid position outside of a given range. If it must be changed then you need to pass bigger region to the <code>ValidatorCodeReplacement_x86_32</code> function—<span title="This is, of course, not needed if landing point is bundle-aligned.">this way validator will have a chance to check the landing place for validity</span>.</p>

+<h4><div style="float:right"><a href="#TOC">▲</a></div><a name="5-2-2">5.2.2. Replacement copying.</a></h4>

+<p>This is done by very simple function which uses <code>CALL_USER_CALLBACK_ON_EACH_INSTRUCTION</code> option to process instructions one-after-another.</p>

+<h2><div style="float:right"><a href="#TOC">▲</a></div><a name="6">6. Validation for x86-64 mode.</a></h2>

<p>While validator for ia32 mode is very simple and short (it also produces pretty compact code) validator for x86-64 mode is different. It still has all the same properties validator for ia32 mode had (<code>valid_targets</code> and <code>jump_dests</code> arrays, “normal” and “special” instructions, bundles and <code>rel8_operand</code>/<code>rel32_operand</code> actions), but it adds quite a few additional twists to the whole scheme.</p>

-<h3><div style="float:right"><a href="#TOC">▲</a></div><a name="5-1">5.1. “Secondary” states.</a></h3>

+<p>It's created in a process which is similar to the process which creates the ia32 validator.</p>

-<p>First of all: ia32 mode validator had one DFA in it and two arrays which kept track of the instruction boundaries but x86-64 has few more state variables. Most of them (<code>rex_prefix</code>, <code>vex_prefix2</code>, <code>vex_prefix3</code>, <code>operand_states</code>, <code>base</code>, and <code>index</code>) keep track of the instruction parts (and thus they are cleared before each instruction), but one variable called <code>restricted_register</code> is used to tie different instructions together. It keeps track of the <code>restricted_register</code> (if any). Note that not all restricted registers are born equal: most registers can be restricted and then forgotten (if you write to <code>%eax</code> and do nothing with the value before <code>call</code>), but <code>%esp</code> and <code>%ebp</code> are exceptions. If you write to the <code>%esp</code> then the very next instruction must be <code>add %r15,%rbp</code> or <code>lea (%r15,%rbp,1),%rbp</code>. This means that if at the end of a bundle restricted register is <code>%rsp</code> or <code>%rbp</code> then program is inavlid. For the same reason if then at beginning of a normal instruction (this includes first instruction in the “compound”) we see restricted <code>%rsp</code> or restricted <code>%rbp</code> then it's an error, too. On the other hand few rare special instructions which are used to restore the SFI invariant WRT <code>%rsp</code> or <code>%rbp</code> will only be accepted if restricted register is <code>%rsp</code> xor <code>%rbp</code> (depending on special instruction).</p>

+<center><img src="files64.svg" height="90%"/><br />Gray elements are hand-written, white elements are generated and dark-gray are mixers.</center><br />

-<h3><div style="float:right"><a href="#TOC">▲</a></div><a name="5-2">5.2. “Normal” instructions.</a></h3>

+<h3><div style="float:right"><a href="#TOC">▲</a></div><a name="6-1">6.1. “Secondary” states.</a></h3>

-<p>The hard part is, as before, in the DFA. First of all, main machine is similar to what we had in ia32 mode, but subtly different: it's “<code>(normal_instruction | special_instruction)*</code>” now. I.e.: <code>one_instruction</code> is replaced with <code>normal_instruction</code>. And what is <code>normal_instruction</code>? Why, it's “<code>one_instruction - special_instruction</code>”, of course! Well… this is unexpected: why will we want to remove <code>special_instruction</code>s from <code>normal_instruction</code>s only to add them back? The answer is related to actions: recall how <a href="#actions">actions</a> work. When we remove <code>special_instruction</code> from <code>one_instruction</code> we also remove the associated actions. This important in x86-64 case because some special instructions are just a normal instructions which are permitted to violate the usual rules! E.g. “special” instruction <code>and $~0x1f,%rsp</code> (which is used to align the stack pointer) changes the <code>%rsp</code> directly which is usually forbidden, but because of properties of <code>and $xxx,…</code> (for any <code>$xxx</code> < <code>0</code>) we know that invariants will not be violated.</p>

+<p>First of all: ia32 mode validator had one DFA in it and two arrays which kept track of the instruction boundaries but x86-64 has few more state variables. Most of them (<code>rex_prefix</code>, <code>vex_prefix2</code>, <code>vex_prefix3</code>, <code>operand_states</code>, <code>base</code>, and <code>index</code>) keep track of the instruction parts (and thus they are cleared before each instruction), but one variable called <code>restricted_register</code> is used to tie different instructions together. As the name implies it keeps track of the restricted register (if any). Note that not all restricted registers are born equal: most registers can be restricted and then forgotten (if you write to <code>%eax</code> and do nothing with the value before <code>call</code>), but <code>%esp</code> and <code>%ebp</code> are exceptions. If you write to the <code>%esp</code> then the very next instruction must be <code>add %r15,%rsp</code> or <code>lea (%r15,%rsp,1),%rsp</code>—and <code>%rbp</code> has similar requirements. This means that if at the end of a bundle restricted register is <code>%rsp</code> or <code>%rbp</code> then program is invalid. For the same reason if at beginning of a normal instruction (this includes first instruction in the “compound”) we see restricted <code>%rsp</code> or restricted <code>%rbp</code> then it's an error, too. On the other hand few rare special instructions which are used to restore the SFI invariant WRT <code>%rsp</code> or <code>%rbp</code> will only be accepted if restricted register is <code>%rsp</code> xor <code>%rbp</code> (depending on special instruction).</p>

+<h3><div style="float:right"><a href="#TOC">▲</a></div><a name="6-2">6.2. “Normal” instructions.</a></h3>

+<p>The hard part is, as before, in the DFA. First of all, main machine is similar to what we had in ia32 mode, but subtly different: it's “<code>(normal_instruction | special_instruction)*</code>” now. I.e.: <code>one_instruction</code> is replaced with <code>normal_instruction</code>. And what is <code>normal_instruction</code>? Why, it's “<code>one_instruction - special_instruction</code>”, of course! Well… this is unexpected: why will we want to remove <code>special_instruction</code>s from <code>normal_instruction</code>s only to add them back? The answer is related to actions: recall how <a href="#2-1">actions</a> work. When we remove <code>special_instruction</code> from <code>one_instruction</code> we also remove the associated actions. This is important in x86-64 case because some special instructions are just a normal instructions which are permitted to violate the usual rules! E.g. “special” instruction <code>and $~0x1f,%rsp</code> (which is used to align the stack pointer) changes the <code>%rsp</code> directly which is usually forbidden, but because of properties of <code>and $xxx,…</code> (for any <code>$xxx</code> < <code>0</code>) we know that invariants will not be violated.</p>

<p>This approach works well, but only if violations are detected at the instruction end. E.g. the aforementioned <code>and $~0x1f,%rsp</code> instruction is encoded as </code>0x48 0x83 0xe4 0xe0</code> and after we've read </code>0x48 0x83 0xe4</code> we already know it's normal instruction (opcode <code>0x83</code> means it's <code>and</code>) which writes to <code>%rsp</code> (<code>0x48 </code><i>opcode</i><code> 0xe4</code> means it's some instruction which accepts some kind of immediate and writes to <code>%rsp</code>) and we'll signal the error at this point then the fact that later we'll find out it's <code>special_instruction</code> which is accepted anyway will not matter: <code>SPL_MODIFIED</code> error will be triggered which will mean that code is rejected!</p>

<p>This means that we can not do an actual conditions checking till the very end of normal instruction (we can try to process some of them but not all of them but this approach will be quite complex and fragile—not something you want in the most critical security piece). But there are an exception: memory access. <b>This</b> one is checked inline: memory access outside of “40GiB safe area” is strictly forbidden no matter how “special” the instruction is. That's why it's checked immediately after operands discovery. This is how relevant fragment for the <code>and</code> instruction look like:<hr />

-    <code>(0x83 (opcode_2 any* & any . any* & operand_disp @check_access) imm8 @process_0_operands) |</code><br />

-    <code>(0x83 (opcode_2 any* & any . any* & operand_rip @check_access) imm8 @process_0_operands) |</code><br />

-    <code>(REX_B? 0x83 (opcode_2 any* & any . any* & single_register_memory @check_access) imm8 @process_0_operands) |</code><br />

-    <code>(REX_X? 0x83 (opcode_2 any* & any . any* & operand_sib_pure_index @check_access) imm8 @process_0_operands) |</code><br />

-    <code>(REX_XB? 0x83 (opcode_2 any* & any . any* & operand_sib_base_index @check_access) imm8 @process_0_operands) |</code><br />

-    <code>(lock 0x83 (opcode_2 any* & any . any* & operand_disp @check_access) imm8 @process_0_operands) |</code><br />

-    <code>(lock 0x83 (opcode_2 any* & any . any* & operand_rip @check_access) imm8 @process_0_operands) |</code><br />

-    <code>(lock REX_B? 0x83 (opcode_2 any* & any . any* & single_register_memory @check_access) imm8 @process_0_operands) |</code><br />

-    <code>(lock REX_X? 0x83 (opcode_2 any* & any . any* & operand_sib_pure_index @check_access) imm8 @process_0_operands) |</code><br />

-    <code>(lock REX_XB? 0x83 (opcode_2 any* & any . any* & operand_sib_base_index @check_access) imm8 @process_0_operands) |</code><br />

-    <code>(REX_B? 0x83 (opcode_2 @operand0_32bit any* & modrm_registers @operand0_from_modrm_rm) imm8 @process_1_operands) |</code><hr />

-As you can see <code>check_access</code> is triggered after parsing ModRM/SIB bytes, but before parsing <code>imm<i>NN</i></code> field while <code>process_<i>N</i>_operands</code> action is triggered at the very end of the “normal” instruction. Even if instruction does not use <code>imm<i>NN</i></code> field <code>check_access</code> action is <b>still</b> triggerded before <code>process_<i>N</i>_operands</code> action. This is important because <code>check_access</code> action actually depends on <b>previous</b> state of “secondary” DFA while <code>process_<i>N</i>_operands</code> action does the transtions of “secondary” DFA. Note that it's only triggered for “normal” instructions—“special” instructions either do the work themselves (e.g. <code>add %r15,%rsp</code>—which is only valid if previous state of “secondary” DFA was <code>REG_RSP</code> and moves DFA to <code>kNoRestrictedReg</code> in case of succcess) or call the usual <code>process_<i>N</i>_operands</code> action (e.g. <code>mov %rsp,%rbp</code> calls <code>process_0_operands</code> which ensures that this operation is not called in <code>REG_RSP</code>/<code>REG_RBP</code> “secondary” DFA state and transtions it to <code>kNoRestrictedReg</code> state).</p>

+    <code>(0x83 (opcode_4 any* & any . any* & operand_disp @check_access) imm8 @process_0_operands) |</code><br />

+    <code>(0x83 (opcode_4 any* & any . any* & operand_rip @check_access) imm8 @process_0_operands) |</code><br />

+    <code>(REX_B? 0x83 (opcode_4 any* & any . any* & single_register_memory @check_access) imm8 @process_0_operands) |</code><br />

+    <code>(REX_X? 0x83 (opcode_4 any* & any . any* & operand_sib_pure_index @check_access) imm8 @process_0_operands) |</code><br />

+    <code>(REX_XB? 0x83 (opcode_4 any* & any . any* & operand_sib_base_index @check_access) imm8 @process_0_operands) |</code><br />

+    <code>(lock 0x83 (opcode_4 any* & any . any* & operand_disp @check_access) imm8 @process_0_operands) |</code><br />

+    <code>(lock 0x83 (opcode_4 any* & any . any* & operand_rip @check_access) imm8 @process_0_operands) |</code><br />

+    <code>(lock REX_B? 0x83 (opcode_4 any* & any . any* & single_register_memory @check_access) imm8 @process_0_operands) |</code><br />

+    <code>(lock REX_X? 0x83 (opcode_4 any* & any . any* & operand_sib_pure_index @check_access) imm8 @process_0_operands) |</code><br />

+    <code>(lock REX_XB? 0x83 (opcode_4 any* & any . any* & operand_sib_base_index @check_access) imm8 @process_0_operands) |</code><br />

+    <code>(REX_B? 0x83 (opcode_4 @operand0_32bit any* & modrm_registers @operand0_from_modrm_rm) imm8 @process_1_operand) |</code><hr />

+As you can see <code>check_access</code> is triggered after parsing ModRM/SIB bytes, but before parsing <code>imm<i>NN</i></code> field while <code>process_<i>N</i>_operands</code> action is triggered at the very end of the “normal” instruction. Even if instruction does not use <code>imm<i>NN</i></code> field <code>check_access</code> action is <b>still</b> triggerded before <code>process_<i>N</i>_operands</code> action. This is important because <code>check_access</code> action actually depends on <b>previous</b> state of <code>restricted_register</code> variable while <code>process_<i>N</i>_operands</code> action changes <code>restricted_register</code> variable. Note that it's only triggered for “normal” instructions—“special” instructions either do the work themselves (e.g. <code>add %r15,%rsp</code>—which is only valid if previous state of <code>restricted_register</code> variable was <code>REG_RSP</code> and changes it to <code>NO_REG</code> in case of succcess) or call the usual <code>process_<i>N</i>_operands</code> action (e.g. <code>mov %rsp,%rbp</code> calls <code>process_0_operands</code> which ensures that this operation is not called when <code>restricted_register</code> is set to <code>REG_RSP</code>/<code>REG_RBP</code> state and transtions it to <code>NO_REG</code> state).</p>

-<p>You can find yet another suprising thing in the snipped above: <code>and</code> instruction is handled either as instruction with zero operands or as instruction with one operand… but of course in reality it always has two operands! Something is strange here… Well, sure: the decoder part of validator is as streamlined as possible. We just ignore all non-register arguments and arguments which are not written to (but we <b>don't</b> ignore memory accesses if they happen here, of course). That's why <code>and</code> has either one or zero operands as far as validator is concerned.</p>

+<p>You can find yet another suprising thing in the snippet above: <code>and</code> instruction is handled either as instruction with zero operands or as instruction with one operand… but of course in reality it always has two operands! Something is strange here… Well, sure: the decoder part of validator is as streamlined as possible. We just ignore all non-register arguments and arguments which are not written to (but we <b>don't</b> ignore memory accesses if they happen here, of course). That's why <code>and</code> has either one or zero operands as far as validator is concerned.</p>

-<h3><div style="float:right"><a href="#TOC">▲</a></div><a name="5-3">5.3. Operands handling.</a></h3>

+<h3><div style="float:right"><a href="#TOC">▲</a></div><a name="6-3">6.3. Operands handling.</a></h3>

-<p>Operands handling as, again, is not that complex… if you are familiar with bit operations. Initial version of the validator used simple array of records to store the information and everything worked well… with GCC, that is. MSVC produced awful code which was almost 30% slower and also needed twenty minutes to do so thus we replaced this simple version with current macro-based one.</p>

+<p>Operands handling as, again, is not that complex… if you are familiar with bit operations. Initial version of the validator used simple array of records to store the information and everything worked well… with GCC, that is. MSVC produced awful code which was almost 30% slower and also needed twenty minutes to do so thus we replaced this simple version with the current macro-based one.</p>

<p>All the information about encountered operands is collected in a single scalar variable <code>operand_states</code>. The layout of said variable looks like this:</p>

-<tr><td colspan="2" style="border: thin solid black" width="100%" align="center">padding</td><td colspan="2" style="border: thin solid black" align="center">operand4:<br />register_type</td><td colspan="2" style="border: thin solid black" align="center">operand4:<br />register_name</td><td style="border: thin solid black" align="center">padding</td><td colspan="2" style="border: thin solid black" align="center">operand3:<br />register_type</td><td colspan="2" style="border: thin solid black" align="center">operand3:<br />register_name</td><td style="border: thin solid black" align="center">padding</td><td colspan="2" style="border: thin solid black" align="center">operand2:<br />register_type</td><td colspan="2" style="border: thin solid black" align="center">operand2:<br />register_name</td><td style="border: thin solid black">padding</td><td colspan="2" style="border: thin solid black" align="center">operand1:<br />register_type</td><td colspan="2" style="border: thin solid black" align="center">operand1:<br />register_name</td><td style="border: thin solid black" align="center">padding</td><td colspan="2" style="border: thin solid black" align="center">operand0:<br />register_type</td><td colspan="2" style="border: thin solid black" align="center">operand0:<br />register_name</td></tr><tr><td></td><td></td><td></td><td></td><td colspan="2"> ↖<br />    0 if normal<br />    register</td><td></td><td></td><td></td><td colspan="2"> ↖<br />    0 if normal<br />    register</td><td></td><td></td><td></td><td colspan="2"> ↖<br />    0 if normal<br />    register</td><td></td><td></td><td></td><td colspan="2"> ↖<br />    0 if normal<br />    register</td><td></td><td></td><td></td><td colspan="2"> ↖<br />    0 if normal<br />    register</td></tr></table>

+<tr><td colspan="2" style="border: thin solid black;" width="100%" align="center">padding</td><td colspan="2" style="border: thin solid black;" align="center">operand4:<br />register_type</td><td colspan="2" style="border: thin solid black;" align="center">operand4:<br />register_name</td><td style="border: thin solid black;" align="center">padding</td><td colspan="2" style="border: thin solid black;" align="center">operand3:<br />register_type</td><td colspan="2" style="border: thin solid black;" align="center">operand3:<br />register_name</td><td style="border: thin solid black;" align="center">padding</td><td colspan="2" style="border: thin solid black;" align="center">operand2:<br />register_type</td><td colspan="2" style="border: thin solid black;" align="center">operand2:<br />register_name</td><td style="border: thin solid black;">padding</td><td colspan="2" style="border: thin solid black;" align="center">operand1:<br />register_type</td><td colspan="2" style="border: thin solid black;" align="center">operand1:<br />register_name</td><td style="border: thin solid black;" align="center">padding</td><td colspan="2" style="border: thin solid black;" align="center">operand0:<br />register_type</td><td colspan="2" style="border: thin solid black;" align="center">operand0:<br />register_name</td></tr>

+<tr><td></td><td></td><td></td><td></td><td colspan="2"> ↖<br />    0 if normal<br />    register</td><td></td><td></td><td></td><td colspan="2"> ↖<br />    0 if normal<br />    register</td><td></td><td></td><td></td><td colspan="2"> ↖<br />    0 if normal<br />    register</td><td></td><td></td><td></td><td colspan="2"> ↖<br />    0 if normal<br />    register</td><td></td><td></td><td></td><td colspan="2"> ↖<br />    0 if normal<br />    register</td></tr></table>

-<p>Register names are defined in <code>register_name</code> enum: first 16 are identical to the AMD/Intel names (from <code>REG_RAX</code> to <code>REG_R15</code>) while other 16 are used (partially) to describe non-register operands (memory operand, immediate operand, <code>REG_RIP</code> and <code>REG_RIZ</code>, etc). This means that if operand's name is >15 then it can be ignored. There are only four operand types: <code>OperandSandboxIrrelevant</code>, <code>OperandSandbox8bit</code>, <code>OperandSandboxRestricted</code>, and <code>OperandSandboxUnrestricted</code>. First type is something not related to general purpose register (x87, MMX, XMM, or YMM registers fall unto this category). We need to handle 8bit operands specially because they are finicky: if <code>REX</code> byte is used they access <code>%spl</code>, <code>%bps</code>, <code>%sil</code>, and <code>%dil</code>, but when <code>REX</code> byte is not used the same numbers are reused for <code>%ah</code>, <code>%ch</code>, <code>%dh</code>, and <code>%bh</code>! Last two types are the most important: these are 32bit operands (which will make the appropriate register “frestricted”) or 16bit/64bit operands (these may affect register in question negatively if that's <code>%rbp</code>, <code>%rsp</code>, or <code>%r15</code>, but for other registers these are just ignored). Note that if you assign <code>0</code> to this variable then all operands will be of <code>OperandSandboxIrrelevant</code> type.</p>

+<p>Register names are defined in <code>register_name</code> enum: first 16 are identical to the AMD/Intel names (from <code>REG_RAX</code> to <code>REG_R15</code>) while other 16 are used (partially) to describe non-register operands (memory operand, immediate operand, <code>REG_RIP</code> and <code>REG_RIZ</code>, etc). This means that if operand's name is >15 then it can be ignored. There are only four operand types: <code>OperandSandboxIrrelevant</code>, <code>OperandSandbox8bit</code>, <code>OperandSandboxRestricted</code>, and <code>OperandSandboxUnrestricted</code>. First type is something not related to general purpose register (x87, MMX, XMM, or YMM registers fall unto this category). We need to handle 8bit operands specially because they are finicky: if <code>REX</code> byte is used they access <code>%spl</code>, <code>%bps</code>, <code>%sil</code>, and <code>%dil</code>, but when <code>REX</code> byte is not used the same numbers are reused for <code>%ah</code>, <code>%ch</code>, <code>%dh</code>, and <code>%bh</code>! Last two types are the most important: these are 32bit operands (which will make the appropriate register “restricted”) or 16bit/64bit operands (these may affect register in question negatively if that's <code>%rbp</code>, <code>%rsp</code>, or <code>%r15</code>, but for other registers these are just ignored). Note that if you assign <code>0</code> to this variable then all operands will be of <code>OperandSandboxIrrelevant</code> type.</p>

-<p>Now the set of macroses used to work with operands should look less mysterious:<hr />

+<p>Now the set of macro used to work with operands should look less mysterious:<hr />

<code>#define SET_OPERAND_NAME(N, S) operand_states |= ((S) << ((N) << 3))</code><br />

<code>#define SET_OPERAND_TYPE(N, T) SET_OPERAND_TYPE_ ## T(N)</code><br />

<code>#define SET_OPERAND_TYPE_OperandSize8bit(N) operand_states |= OperandSandbox8bit << (5 + ((N) << 3))</code><br />

@@ -191,27 +282,98 @@

<code>#define SET_OPERAND_TYPE_OperandSize32bit(N) operand_states |= OperandSandboxRestricted << (5 + ((N) << 3))</code><br />

<code>#define SET_OPERAND_TYPE_OperandSize64bit(N) operand_states |= OperandSandboxUnrestricted << (5 + ((N) << 3))</code><br />

<code>#define CHECK_OPERAND(N, S, T) ((operand_states & (0xff << ((N) << 3))) == ((S | (T << 5)) << ((N) << 3)))</code><hr />

-Calls like <code>SET_OPERAND_NAME(0, REG_RAX)</code> are used by actions to set name of the operand (this particular one is used by <code>operand0_rax</code> action) while calls like <code>SET_OPERAND_TYPE(0, OperandSize2bit)</code> are used by actions to set the type of operand (this particular one is used by <code>operand0_2bit</code> action). Note that we <b>don't</b> handle 2bit operands in the set of macroses above. This is not a mistake: 2bit operands are only ever used as immediate operands (and then only in two instructions: <code>vpermil2pd</code> and <code>vpermil2ps</code>) and we don't process immediate operands here. If they will be by some reason left in the <codeo>validator_x86_64_instruction.rl</code> file this will lead to the compile-time error, not to some kind of weird overflow which may [potentially] produce security hole.</p>

+Calls like <code>SET_OPERAND_NAME(0, REG_RAX)</code> are used by actions to set name of the operand (this particular one is used by <code>operand0_rax</code> action) while calls like <code>SET_OPERAND_TYPE(0, OperandSize2bit)</code> are used by actions to set the type of operand (this particular one is used by <code>operand0_2bit</code> action). Note that we <b>don't</b> handle 2bit operands in the set of macro above. This is not a mistake: 2bit operands are only ever used as immediate operands (and then only in two instructions: <code>vpermil2pd</code> and <code>vpermil2ps</code>) and we don't process immediate operands here. If they will be by some reason left in the <code>validator_x86_64_instruction.rl</code> file this will lead to the compile-time error, not to some kind of weird overflow which may [potentially] produce security hole.</p>

-<p>Almost all manipulations with <code>operand_states</code> are done using macroses described above, but there are one construct in <code>process_<i>N</i>_operands</code> function which accesses the <code>operand_states</code> direfctly:<hr />

+<p>Almost all manipulations with <code>operand_states</code> are done using macro described above, but there are one construct in <code>process_<i>N</i>_operands</code> function which accesses the <code>operand_states</code> direfctly:<hr />

    <code>/* Take 2 bits of operand type from operand_states as *restricted_register,</code><br />

     <code>* make sure operand_states denotes a register (4th bit == 0). */</code><br />

    <code>} else if ((operand_states & 0x70) == (OperandSandboxRestricted << 5)) {</code><br />

      <code>*restricted_register = operand_states & 0x0f;</code><br />

-If you'll take a look on the layout of <code>operand_states</code> then it's pretty easy to understand what goes on here: <code>(operand_states & 0x70) == (OperandSandboxRestricted << 5)</code> yeilds <code>TRUE</code> if and only if zeroth operand is “normal” register <b>and</b> it's of type <code>OperandSandboxRestricted</code>. This is actually central piece of the “Secondary” DFA handling—most other pieces just return this “secondary” DFA back to <code>kNoRestrictedReg</code> state.</p>

+If you'll take a look on the layout of <code>operand_states</code> then it's pretty easy to understand what goes on here: <code>(operand_states & 0x70) == (OperandSandboxRestricted << 5)</code> yeilds <code>TRUE</code> if and only if zeroth operand is “normal” register <b>and</b> it's of type <code>OperandSandboxRestricted</code>. This is actually central piece of the <code>restricted_register</code> handling—most other pieces just return it back to <code>NO_REG</code> state.</p>

-<p>Well… most, but not all. One exception happens in <code>process_<i>N</i>_operands</code> functions: if “secondary” DFA is in <code>kSandboxedRsi</code> state and we restrict the <code>%rdi</code> register then we go to the <code>kSandboxedRsiRestrictedRdi</code> state, not to the usual <code>REG_RDI</code> state. Other exceptions are related to “special” instructions: <code>lea (%r15,%rsi,1),%rsi</code> may move us to <code>kSandboxedRsi</code> state and <code>lea (%r15,%rdi,1),%rdi</code> may move us to either <code>kSandboxedRdi</code> or <code>kSandboxedRsiSandboxedRdi</code> state.</p>

+<h3><div style="float:right"><a href="#TOC">▲</a></div><a name="6-4">6.4. Dynamic code modification support.</a></h3>

-<p>Yet another tricky piece of code can be found in <code>check_access</code> function. It's this piece of code:<hr />

-    <code>if (index == (restricted_register & 0x1f)) {</code><br />

-      <code>BitmapClearBit(valid_targets, instruction_start);</code><br />

-    <code>}</code><hr />

-This is where we use not the full state of the “secondary” DFA, but just low five bits (which describe if there are some restricted register and if it exist then what register is restricted currently). All other places just use full state of “secondary” DFA.</p>

+<p>Dynamic code modification support is implemented similarly to ia32 mode—with the help of <code>CALL_USER_CALLBACK_ON_EACH_INSTRUCTION</code> option. When that happend callback have all the information needed to process the instruction: collected errors, information about immediates, etc.</p>

-<h2><div style="float:right"><a href="#TOC">▲</a></div><a name="6">6. Decoders.</a></h2>

+<p>All that information is squeezed in <code>instruction_info_collected</code> variable. It has the following format:</p>

-<p>The only remaining issue (but a big one) is about generation of the actual decoders (<code>{decoder,validator}_x86_{32,64}_instruction.rl files)</code>. This is big part of the whole package, but, thankfully, it happens in significantly less hostily environment: decoder and validator must work even if they are processing specially-crafted file created by clever adversary while <code>gen_dfa.cc</code> processes data files created by us and should only correcly process certain “good” files.</p>

+<tr><td align="left"> </td><td align="left"> </td><td align="left"> </td><td align="left"> </td><td align="left"> </td><td align="left"> </td><td colspan="12" align="left" style="border: thin solid black;"><table width="100%"><tr><td align="left">⇤</td><td align="center"><code>VALIDATION_ERRORS_MASK</code></td><td align="right">⇥</td></table></td><td align="left"> </td><td colspan="2" align="left" width="1%" style="border: thin solid black;"><table width="100%"><tr><td align="left">⇤</td><td width="1%" align="center"><code>RESTRICTED_REGISTER_MASK</code></td><td align="right">⇥</td></table></td><td align="left"> </td><td colspan="2" align="left" width="1%" style="border: thin solid black;"><table width="100%"><tr><td align="left">⇤</td><td width="1%" align="center"><code>RESTRICTED_REGISTER_MASK</code></td><td align="right">⇥</td></table></td><td align="left"> </td><td colspan="2" align="left" width="1%" style="border: thin solid black;"><table width="100%"><tr><td align="left">⇤</td><td width="1%" align="center"><code>IMMEDIATES_SIZE_MASK</code></td><td align="right">⇥</td></table></td></tr>

+<tr><td style="border: thin solid black; background: gray;" width="1%" align="center"> 0 </td><td style="border: thin solid black;" width="1%" align="center">   </td><td style="border: thin solid black;" width="1%" align="center">   </td><td width="1%" style="border: thin solid black;" align="center">   </td><td style="border: thin solid black;" width="1%" align="center">   </td><td style="border: thin solid black;" width="1%" align="center">   </td><td style="border: thin solid black;" width="1%" align="center">   </td><td style="border: thin solid black;" width="1%" align="center">   </td><td style="border: thin solid black;" width="1%" align="center">   </td><td style="border: thin solid black;" width="1%" align="center">   </td><td style="border: thin solid black;" width="1%" align="center">   </td><td style="border: thin solid black;" width="1%" align="center">   </td><td style="border: thin solid black;" width="1%" align="center">   </td><td style="border: thin solid black;" width="1%" align="center">   </td><td style="border: thin solid black;" width="1%" align="center">   </td><td style="border: thin solid black;" width="1%" align="center">   </td><td style="border: thin solid black;" width="1%" align="center">   </td><td style="border: thin solid black;" width="1%" align="center">   </td><td style="border: thin solid black;" width="1%" align="center">   </td><td colspan="2" style="border: thin solid black;" align="center">   </td><td style="border: thin solid black;" width="1%" align="center">   </td><td colspan="2" style="border: thin solid black;" align="center">   </td><td style="border: thin solid black;" width="1%" align="center">   </td><td colspan="2" style="border: thin solid black;" align="center">   </td><td>   </td></tr>

+<tr><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left"> </td><td align="left">┊</td><td align="left">┊</td><td align="left"> </td><td align="left" colspan="100" >└ <span title="enter, extrq, insertq">Instruction has two immediates</a>.</td></tr>

+<tr><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left"> </td><td align="left">┊</td><td align="left" colspan="100" >└ <span title="00 == 0 bytes, 01 == 1 bytes, 10 = 2 bytes, 11 = 4 bytes">Instruction displacement size</span>.</td></tr>

+<tr><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left" colspan="100" >└ <span title="NO_REG if instruction does not zero-extending one">Register, zero-extended by the instruction.</span></td></tr>

+<tr><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left" colspan="100" >└ <span title="This means that start of this instruction is not a valid jump target.">Instruction is valid, but it access memory using register which is zero-extended by previous instruction.</span></td></tr>

+<tr><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left" colspan="100" >└ <span title="Note that all unsupported instructions trigger this error. This includes mov by absolute 64bit address, system instructions like lidt or even call and jmp used not as part of superinstruction. If combined with CPUID_UNSUPPORTED_INSTRUCTION it means that instruction is not yet enabled in validator.">DFA error: invalid instruction. Validation then resumes from the next bundle.</span></td></tr>

+<tr><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left" colspan="100" >└ <code>%r15b</code>, <code>%r15w</code>, <code>%r15d</code>, or <code>%r15</code> is modified. <code>%r15</code> is untouchable in amd64 mode.</td></tr>

+<tr><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left" colspan="100" >└ <span title="Note that %ebp is not mentioned. It can be modified by a regular instruction. But NEXT instruction must be special if that happened."><code>%bpl</code>, <code>%bp</code>, or <code>%rbp</code> is incorrectly modified. Only <code>%rbp</code> can be modified and then only by special instruction.</span></td></tr>

+<tr><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left" colspan="100" >└ <span title="Note that %esp is not mentioned. It can be modified by a regular instruction. But NEXT instruction must be special if that happened."><code>%spl</code>, <code>%sp</code>, or <code>%rsp</code> is incorrectly modified. Only <code>%rsp</code> can be modified and then only by special instruction.</span></td></tr>

+<tr><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left">┊</td><td align="left" colspan="100">└ <span title="amd64 mode: in ia32 mode all non-special instructions are modifiable">Instruction is modifiable.</span></td></tr>

+<tr><td align="left" colspan="100">└ Reserved.</td></tr>

+</table>

+<p>Using this information you can determine if the given instruction follows <span title="A lot of different commands in amd64 mode: %rbp/%rsp modifications, string instructions, “naclcall”, and “nacljmp”.">special rules</span>, if it includes <span title="Commands like “jcc”, “jmp”, “loopcc”, or “call”.">relative offsets</span>, <span title="Most commands which access memory support displacements.">displacements</span>, or <span title="Immediates are support by many different commands. They can be combined with displacement if command accesses memory.">immediates</span>. Tests way use the information collected to precisely separate different <i title="Immediates, displacements, relative offsets.">anyfields</i>, but in production only few bits are used to determine if the instruction can be changed or not: in amd64 mode <span title="Only “call” and “mov” can be changed.">most instructions can not be changed</span>—and then only <i title="Immediates, displacements, relative offsets.">anyfields</i>.</p>

+<h4><div style="float:right"><a href="#TOC">▲</a></div><a name="6-4-1">6.4.1. Replacement validation.</a></h4>

+<p>As was said <a href="#6-4">above</a> code replacement is not supported by <code>ValidateChunkAMD64</code> function directly. Instead it's done by higher-level function in <code>dfa_validate_64.c</code>.</p>

+<p>It uses <code>CALL_USER_CALLBACK_ON_EACH_INSTRUCTION</code> option to compare lengths of instructions in two fragments in callback and <code>MODIFIABLE_INSTRUCTION</code> flag passed to callback to make sure that <span title="Currently only “call” and “mov” can be changed.">only few hand-picked intrsuctions can be changed</span>.</p>

+<p>One tricky thing there is handling of relative jumps and calls: if relative jump (or call) triggers <code>DIRECT_JUMP_OUT_OF_RANGE</code> <b>but</b> is bit-to-bit identical to the original instruction it's accepted anyway: this means that this particular <code>jump</code> (or <code>call</code>) jumps (or calls) some valid position outside of a given range. If it must be changed then you need to pass bigger region to the <code>ValidatorCodeReplacement_x86_64</code> function—<span title="This is, of course, not needed if landing point is bundle-aligned.">this way validator will have a chance to check the landing place for validity</span>.</p>

+<p style="margin-bottom:0px;">Another tricky bit is related to detection of <i title="Immediates, displacements, relative offsets.">anyfields</i> position: most instructions put them at the end, but some instructions use the last byte for:</p>

+<ul style="margin-top:0px; margin-bottom:0px;">

+<li><i>opcode extension</i>: 3DNow! instructions, <code>cmp<i>cc</i>sd</code>/<code>vcmp<i>cc</i>sd</code> and <code>cmp<i>cc</i>ss</code>/<code>vcmp<i>cc</i>ss</code>, and <code>pclmulqdq</code>/<code>vpclmulqdq</code>.</li>

+<li><i>fourth register operand</i>: some AVX instructions (such as <code>vblendvpd</code>/<code>vblendvps</code>), some FMA4 instructions (such as <code>vfmaddsubpd</code>), and some XOP instructions (such as <code>vpperm</code>).</li>

+<li><i>fourth register operand</i> <b>and</b> <i>fifth 2-bit immediate operand</i>: <code>vpermil2pd</code>/<code>vpermil2ps</code>.</li>

+</ul>

+<p style="margin-top:0px;">All these instructions set <code>LAST_BYTE_IS_NOT_IMMEDIATE</code> flag, last form can be distinguished because it sets <span title="Which actually includes LAST_BYTE_IS_NOT_IMMEDIATE flag"><code>IMMEDIATE_2BIT</code> flag</span>.</p>

+<h4><div style="float:right"><a href="#TOC">▲</a></div><a name="6-4-2">6.4.2. Replacement copying.</a></h4>

+<p>This is done by very simple function which uses <code>CALL_USER_CALLBACK_ON_EACH_INSTRUCTION</code> option to process instructions one-after-another.</p>

+<h2><div style="float:right"><a href="#TOC">▲</a></div><a name="7">7. Decoders.</a></h2>

+<p>The only remaining issue (but a big one) is about generation of the actual decoders (<code>{decoder,validator}_x86_{32,64}_instruction.rl files)</code>. This is big part of the whole package, but, thankfully, it happens in significantly less hostile environment: decoder and validator must work even if they are processing specially-crafted file created by clever adversary while <code>gen_dfa</code> processes data files created by us and should only correcly process certain “good” files.</p>

+<p>To understand how it works it's better to start with the decoders. Remember how we've talked about “streamlined data structures”, “indispensable minimum of the information”, etc? This approach produces fast and [relatively] simple validator, but it makes it hard to test and debug it. To facilitate testing and debugging we create separate decoders: these return all the information about all the intructions they can parse and in fact can produce output identical to <a href="http://sourceware.org/binutils/docs/binutils/objdump.html#objdump">objdump</a>'s output.</p>

+<p>They are used to verify the description of the instructions from <code>.def</code> files—with a special attention to the length of a said instructions.</p>

+<p>Decoders are created using familiar process.</p>

+<center><img src="filesdecoder.svg" height="120%"/><br />Gray elements are hand-written, white elements are generated and dark-gray are mixers.</center><br />

+<p></p>

+<p style="margin-bottom:0px;">There are few big differences between standalone decoders and simplified decoders embedded in <code>ValidateChunkIA32</code>/<code>ValidateChunkAMD64</code>:</p>

+<ul style="margin-top:0px;">

+<li>Standalone decoders are pretty close to each other (the only differences are CPU-dictated differences such as REX prefix handling)—simplified decoders are quite different (as dictated by appropriate SFI models).</li>

+<li>Standalone decoders don't have hand-encoded “special” instructions, all the instructions they can decode come from <code>.def</code> files.</li>

+<li>Standalone decoders don't squeeze extracted information unto a few flat variables. Instead they use <code>struct instruction</code>—common for both decoders.</li>

+</ul>

+<p>All these facts mean that standalone decoders are singnificantly larger and slower—but also much easier to understand. And simplified decoders are using <b>the exact same DFA</b> with only some actions changed or omitted.</p>

</body>