Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(515)

Side by Side Diff: src/trusted/validator_ragel/compress_regular_instructions.py

Issue 49183002: Regular instructions golden file test. Base URL: svn://svn.chromium.org/native_client/trunk/src/native_client/
Patch Set: Created 7 years, 1 month ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch | Annotate | Revision Log
« no previous file with comments | « no previous file | src/trusted/validator_ragel/testdata/32bit_regular.golden » ('j') | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
(Empty)
1 # Copyright (c) 2013 The Native Client Authors. All rights reserved.
2 # Use of this source code is governed by a BSD-style license that can be
3 # found in the LICENSE file.
4
5 """
6 Traverse the validator's DFA, collect all "normal" instruction and then
7 compress output. Note: "anybyte fields" (immediates and displacements)
8 are always filled with zeros. Otherwise processing of sextillions (sic!)
9 of possibilities will take too long.
10
11 The following compression rules are present:
halyavin 2013/11/06 12:11:51 Rules are applied only when all variants are accep
khim 2013/11/06 21:28:28 Done.
12
13 1. Compress ModR/M (+SIB & displacement).
14 Instruction: 00 00 add %al,(%rax)
15 ...
16 Instruction: 00 ff add %bh,%bh
17 becomes
18 Instruction: 00 XX add [%al..%bh],[%al..%bh or memory]
19
20 Only applies if all possibilities are accepted by validator.
21
22 1a. Compress ModR/M (+SIB & displacement) memory-only.
23 Instruction: f0 01 00 lock add %eax,(%eax)
24 ...
25 Instruction: f0 01 bf 00 00 00 00 lock add %edi,0x0(%edi)
26 becomes
27 Instruction: f0 01 XX lock add [%eax..edi],[memory]
28
29 Only applies if all possible memory accesses are accepted by validator.
30
31 1b. Compress ModR/M (+SIB & displacement) register only
halyavin 2013/11/06 12:11:51 There is no SIB & displacement for registers.
khim 2013/11/06 21:28:28 Done.
32 Instruction: 66 0f 50 c0 movmskpd %xmm0,%eax
33 ...
34 Instruction: 66 0f 50 ff movmskpd %xmm7,%edi
35 becomes
36 Instruction: 66 0f 50 XX movmskpd [%xmm0..%xmm7],[%eax..edi]
37
38 Only applies if all possible register accesses are accepted by validator.
39
40 2. Compress ModR/M (+SIB & displacement) with opcode extension.
41 Instruction: 0f 90 00 seto (%eax)
42 ...
43 Instruction: 0f 90 c7 seto %bh
44 becomes
45 Instruction: 0f 90 XX/0 seto [%al..%bh or memory]
46
47 Only applies if all possibilities are accepted by validator.
48
49 2a. Compress ModR/M (+SIB & displacement) memory-only with opcode extension.
50 Instruction: f0 ff 00 lock incl (%eax)
51 ...
52 Instruction: f0 ff 84 ff 00 00 00 00 lock incl 0x0(%edi,%edi,8)
53 becomes
54 Instruction: f0 ff XX/1 lock decl [memory]
55
56 Only applies if all possibile memory accesses are accepted by validator.
57
58 2b. Compress ModR/M (+SIB & displacement) register-only with opcode extension.
halyavin 2013/11/06 12:11:51 There is no SIB & displacement for registers.
khim 2013/11/06 21:28:28 Done.
59 Instruction: 0f 71 d0 00 psrlw $0x0,%mm0
60 ...
61 Instruction: 0f 71 d7 00 psrlw $0x0,%mm7
62 becomes
63 Instruction: 66 0f 71 XX/2 00 psrlw $0x0,[%mm0..%mm7]
64
65 Only applies if all possible register accesses are accepted by validator.
66
67 3. Compress register-in-opcode.
68 Instruction: d9 c0 fld %st(0)
69 ...
70 Instruction: d9 c7 fld %st(7)
71 becomes
72 Instruction: Instruction: d9 c[0..7] fld [%st(0)..%st(7)]
73
74 Only applies if all possible register accesses are accepted by validator.
75
76 4. Special compressor for "set" instruction.
77 Instruction: 0f 90 XX/0 seto [%al..%bh or memory]
78 ...
79 Instruction: 0f 90 XX/7 seto [%al..%bh or memory]
80 becomes
81 Instruction: 0f 90 XX seto [%al..%bh or memory]
82 """
83
84 import itertools
85 import multiprocessing
86 import optparse
87 import os
88 import re
89 import subprocess
90 import sys
91 import tempfile
92 import traceback
93
94 import dfa_parser
95 import dfa_traversal
96 import validator
97
98
99 # Register names in 'natual' order (as defined by IA32/x86-64 ABI)
100 #
101 # X86-64 ABI splits all registers in groups of 8 because it uses 3-bit field
102 # in opcode, ModR/M, and/or SIB bytes to encode them.
103 #
104 # In most cases there are 16 registers of a given kind and two such groups,
105 # but there are couple of exceptions:
106 # 1. There are 20 8-bit registers and three groups (two of them overlap)
107 # 2. There are eight X87 and MMX registers thus two groups are identical
108 #
109 # We use typical register from a group to name the whole group. Most groups
110 # use first register, but 'spl' group uses fifth register because it's first
111 # four registers are the same as 'al' group.
112 REGISTERS = {
113 'al': [ 'al', 'cl', 'dl', 'bl', 'ah', 'ch', 'dh', 'bh' ],
114 'spl': [ 'al', 'cl', 'dl', 'bl', 'spl', 'bpl', 'sil', 'dil' ],
115 'ax': [ 'ax', 'cx', 'dx', 'bx', 'sp', 'bp', 'si', 'di' ],
116 'eax': [ 'eax', 'ecx', 'edx', 'ebx', 'esp', 'ebp', 'esi', 'edi' ],
117 'rax': [ 'rax', 'rcx', 'rdx', 'rbx', 'rsp', 'rbp', 'rsi', 'rdi' ],
118 'r8b': [ 'r{}b'.format(N) for N in range(8,16) ],
119 'r8w': [ 'r{}w'.format(N) for N in range(8,16) ],
120 'r8d': [ 'r{}d'.format(N) for N in range(8,16) ],
121 'r8': [ 'r{}'.format(N) for N in range(8,16) ],
122 'mm0': [ 'mm{}'.format(N) for N in range(8) ],
123 'mm8': [ 'mm{}'.format(N) for N in range(8) ],
halyavin 2013/11/06 12:11:51 'mm_alt'
khim 2013/11/06 21:28:28 Done.
124 'st(0)': [ 'st({})'.format(N) for N in range(8) ],
125 'st(8)': [ 'st({})'.format(N) for N in range(8) ],
halyavin 2013/11/06 12:11:51 'st_alt(0)'
khim 2013/11/06 21:28:28 Removed instead. If we no longer have predictable
126 'xmm0': [ 'xmm{}'.format(N) for N in range(8) ],
127 'xmm8': [ 'xmm{}'.format(N) for N in range(8,16) ],
128 'ymm0': [ 'ymm{}'.format(N) for N in range(8) ],
129 'ymm8': [ 'ymm{}'.format(N) for N in range(8,16) ]
130 }
131
132
133 NOP = 0x90
134
135
136 def PadToBundleSize(bytes):
137 assert len(bytes) <= validator.BUNDLE_SIZE
138 return bytes + [NOP] * (validator.BUNDLE_SIZE - len(bytes))
139
140
141 # In x86-64 mode we have so-called 'restricted register' which is used to
142 # tie two groups together. Some instruction require particular value to
halyavin 2013/11/06 12:11:51 Some instructions
khim 2013/11/06 21:28:28 Done.
143 # be stored in this variable, while some accept any non-special restricted
144 # register (%ebp and %esp are special because they can only be accepted by
145 # a few 'special' instructions).
146 #
147 # You can find more details in the "NaCl SFI model on x86-64 systems" manual.
148 #
149 # We try to feed all possible 'restricted register' into validator and then
halyavin 2013/11/06 12:11:51 restricted registers
khim 2013/11/06 21:28:28 Done.
150 # classify the instruction using this map. If set of acceptable 'restricted
151 # registers' is not here, then it's an error in validator.
152 ACCEPTABLE_X86_64_INPUTS = {
153 0x00001: 'input_rr=%eax',
154 0x00002: 'input_rr=%ecx',
155 0x00004: 'input_rr=%edx',
156 0x00008: 'input_rr=%ebx',
157 0x00010: 'input_rr=%esp',
158 0x00020: 'input_rr=%ebp',
159 0x00040: 'input_rr=%esi',
160 0x00080: 'input_rr=%edi',
161 0x00100: 'input_rr=%r8d',
162 0x00200: 'input_rr=%r9d',
163 0x00400: 'input_rr=%r10d',
164 0x00800: 'input_rr=%r11d',
165 0x01000: 'input_rr=%r12d',
166 0x02000: 'input_rr=%r13d',
167 0x04000: 'input_rr=%r14d',
168 0x08000: 'input_rr=%r15d',
169 0x1ffcf: 'input_rr=any_nonspecial'
170 }
171
172 # Any instruction must produce either None or one of fifteen registers as an
173 # output 'restricted register' value. 'r15d' is NOT acceptable as an output.
174 ACCEPTABLE_X86_64_OUTPUT_REGISTERS = tuple(
175 '%' + reg for reg in (REGISTERS['eax'] + REGISTERS['r8d'])[0:-1])
176
177
178 def ValidateInstruction(instruction, validator_inst):
179 bundle = ''.join(map(chr, PadToBundleSize(instruction)))
180 if options.bitness == 32:
181 result = validator_inst.ValidateChunk(bundle, bitness=32)
182 return result
183 else:
184 valid_inputs = 0
185 known_final_rr = None
186 output_rr = None
187 bit_position = 1
188 # Note that iteration order is aligned with CCEPTABLE_X86_64_INPUTS array
halyavin 2013/11/06 14:23:24 ACCEPTABLE_X86_64_INPUTS
khim 2013/11/06 21:28:28 Done.
189 # above.
190 for initial_rr in validator.ALL_REGISTERS + [None]:
191 valid, final_rr = validator_inst.ValidateAndGetFinalRestrictedRegister(
192 bundle, len(instruction), initial_rr)
193 if valid:
194 # final_rr should not depent on input_rr
halyavin 2013/11/06 14:23:24 depend
khim 2013/11/06 21:28:28 Done.
195 assert valid_inputs == 0 or known_final_rr == final_rr
196 valid_inputs |= bit_position
197 known_final_rr = final_rr
198 bit_position += bit_position
199 # If nothing is accepted then instruction is not valid. Easy and simple.
200 if valid_inputs == 0: return False
201 # If returned value in unacceptable we'll get IndexError here and this
202 # test will fail
203 if known_final_rr is not None:
204 output_rr = ACCEPTABLE_X86_64_OUTPUT_REGISTERS[known_final_rr]
205 # If collected valid_inputs are unacceptable we'll get KeyError here and
206 # this test will fail
207 return [ACCEPTABLE_X86_64_INPUTS[valid_inputs],
208 'output_rr={}'.format(output_rr)]
209
210
211 class WorkerState(object):
212 def __init__(self, prefix, validator):
213 self.total_instructions = 0
214 self.num_valid = 0
215 self.validator = validator
216 self.output = set()
217
218 def ReceiveInstruction(self, bytes):
219 self.total_instructions += 1
220 result = ValidateInstruction(bytes, self.validator)
221 if result is not False:
222 self.num_valid += 1
223 dis = self.validator.DisassembleChunk(
224 ''.join(map(chr, bytes)),
225 bitness=options.bitness)
226 for line_nr in xrange(len(dis)):
227 dis[line_nr] = str(dis[line_nr])
228 assert dis[line_nr][0:17] == 'Instruction(0x' + str(line_nr) + ': '
229 assert dis[line_nr][-1:] == ')'
230 dis[line_nr] = dis[line_nr][17:-1]
231 # If %rip is involved then comment will be different depending on the
232 # instruction length. Eliminate it.
233 if '(%rip)' in dis[0]:
234 dis[0] = re.sub(' # 0x[ 0-9a-fA-F]*', '', dis[0])
halyavin 2013/11/06 14:23:24 0x[ ]*[0-9a-fA-F]*
khim 2013/11/06 21:28:28 Done.
235 # Absolute addressing is allowed in 32-bit mode and completely forbidden
halyavin 2013/11/06 14:23:24 Zero displacements are represented as 0x0 for all
khim 2013/11/06 21:28:28 Done.
236 # in 64-bit mode, but all instructions except jumps use truly absolute
237 # addressing with base address == 0x0, jump instructions use offset from
238 # the instruction pointer (%eip or %rip). And position of the instruction
239 # pointer depends on the instruction length. Eliminate it.
240 if ' 0x' in dis[0] and ' 0x0' not in dis[0]:
241 for bytes in xrange(1, 16):
242 dis[0] = re.sub(
243 '(' + '[0-9a-fA-F][0-9a-fA-F] ' * bytes + ' .* )'
244 '0x' + str(bytes) + '(.*)',
245 '\\1%eip\\2' if options.bitness == 32 else '\\1%rip\\2',
246 dis[0]);
247 dis[0] = 'Instruction: ' + dis[0]
248 if result is not True:
249 dis += result
250 self.output.add('; '.join(dis))
251
252
253 # Compressor has three slots: regex (which picks apart given instruction),
254 # subst (which is used to denote compressed version) and replacements (which
255 # are used to generate set of instructions from a given code).
256 #
257 # Example compressor:
258 # regex = '.*?[0-9a-fA-F]([0-7]) \\w* (%e(?:[abcd]x|[sb]p|[sd]i)).*()'
259 # subst = ('[0-7]', '[%eax..%edi]', ' # register in opcode')
260 # replacements = ((0, '%eax'), (1, '%ecx'), (2, '%edx'), (3, '%ebx')
261 # (4, '%esp'), (5, '%ebp'), (6, '%esi'), (7, '%edi'))
262 #
263 # When faced with instriuction '40 inc %eax' it will capture the following
264 # pieces of said instruction: '4[0] inc [%eax]'.
265 #
266 # Then it will produce the following eight instructions:
267 # '40 inc %eax'
268 # '41 inc %ecx'
269 # '42 inc %edx'
270 # '43 inc %ebx'
271 # '44 inc %esp'
272 # '45 inc %ebp'
273 # '46 inc %esi'
274 # '47 inc %edi'
275 #
276 # If all these instructions can be found in a set of instructions then
277 # compressor will remove them from said set and will insert one replacement
278 # "compressed instruction" '4[0-7] inc [%eax..%edi] # register in opcode'.
279 #
280 # Note that last group is only used in the replacement. It's used to grab marks
281 # added by previous compressors and to replace them with a new mark.
282 class Compressor(object):
283 __slots__ = [
284 'regex',
285 'subst',
286 'replacements'
287 ]
288
289 def __init__(self, regex, subst, replacements=None):
290 self.regex = regex
291 self.subst = subst
292 self.replacements = [] if replacements is None else replacements
293
294
295 def Compressed(instructions, show_progress=None):
halyavin 2013/11/06 14:23:24 We better to create method that collects trace.
khim 2013/11/06 21:28:28 Done.
296 split = None
halyavin 2013/11/06 14:23:24 Use empty string for initial value.
khim 2013/11/06 21:28:28 Not sure what's wrong with None (we don't yet HAVE
297 rule = True
298 if not show_progress:
299 trace = []
300 while rule is not False:
301 def CompressOneInstruction(instructions , split):
302 sorted_instructions = (sorted(i for i in instructions if i > split) +
303 sorted(i for i in instructions if i < split))
304 for instruction in sorted_instructions:
305 for compressor_nr, compressor in enumerate(compressors):
306 match = compressor.regex.match(instruction)
307 if match:
308 pos = 0
309 format_str = ''
310 for group in range(1, len(match.groups())):
311 format_str += instruction[pos:match.start(group)] + '{}'
312 pos = match.end(group)
313 format_str += instruction[pos:match.start(len(match.groups()))]
halyavin 2013/11/06 14:23:24 Extract these 5 lines to a function.
khim 2013/11/06 21:28:28 Done.
314 subset = set()
315 for replacement in compressor.replacements:
316 replacement_str = format_str.format(*replacement)
317 if not replacement_str in instructions: break
318 subset.add(replacement_str)
319 else:
320 instructions -= subset
321 instructions.add((format_str + '{}').format(*compressor.subst))
322 return (instructions, compressor_nr, instruction)
323 return (instructions, False, False)
324 instructions, rule, split = CompressOneInstruction(instructions, split)
325 if rule is not False:
326 if show_progress:
327 show_progress(rule, split)
328 else:
329 trace.append((rule, split))
330 if show_progress:
331 return instructions
332 else:
333 return instructions, trace
334
335
336 def Worker((prefix, state_index)):
337 worker_state = WorkerState(prefix, worker_validator)
338
339 try:
340 dfa_traversal.TraverseTree(
341 dfa.states[state_index],
342 final_callback=worker_state.ReceiveInstruction,
343 prefix=prefix,
344 anyfield=0)
345 if (prefix[0] != 0x0f or prefix[1] != 0x0f): # Skip 3DNow! instructions
346 output, trace = Compressed(set(worker_state.output))
347 else:
348 output = worker_state.output
349 trace = []
350 except Exception as e:
351 traceback.print_exc() # because multiprocessing imap swallows traceback
352 raise
353
354 return (
355 prefix,
356 worker_state.total_instructions,
357 worker_state.num_valid,
358 output,
359 trace)
360
361
362 def ParseOptions():
363 parser = optparse.OptionParser(usage='%prog [options] xmlfile')
364
365 parser.add_option('--bitness',
366 type=int,
367 help='The subarchitecture: 32 or 64')
368 parser.add_option('--validator_dll',
369 help='Path to librdfa_validator_dll')
370 parser.add_option('--decoder_dll',
371 help='Path to librdfa_decoder_dll')
372
373 options, args = parser.parse_args()
374
375 if options.bitness not in [32, 64]:
376 parser.error('specify --bitness 32 or --bitness 64')
377
378 if len(args) != 1:
379 parser.error('specify one xml file')
380
381 (xml_file, ) = args
382
383 return options, xml_file
384
385
386 # Version suitable for use in regular expressions
387 REGISTERS_RE = REGISTERS.copy()
388 REGISTERS_RE['st(0)'] = [ 'st\\({}\\)'.format(N) for N in range(8) ]
389 REGISTERS_RE['st(8)'] = REGISTERS_RE['st(0)']
halyavin 2013/11/06 14:23:24 st(8)->st_alt
khim 2013/11/06 21:28:28 Removed
390 REGISTERS_RE['st\\(0\\)'] = REGISTERS_RE['st(0)']
391 REGISTERS_RE['st\\(8\\)'] = REGISTERS_RE['st(0)']
392
393 # Index names in 'natual' order (as defined by IA32/x86-64 ABI)
394 INDEXES = {
395 'eax': [ 'eax', 'ecx', 'edx', 'ebx', 'eiz', 'ebp', 'esi', 'edi' ],
396 'rax': [ 'rax', 'rcx', 'rdx', 'rbx', 'riz', 'rbp', 'rsi', 'rdi' ],
397 'r8': [ 'r8', 'r9', 'r10', 'r11', 'r12', 'r13', 'r14', 'r15' ]
398 }
399 # Register which can not be used as base in 64-bit mode in all incarnations
400 X86_64_BASE_REGISTERS = set([
401 '%spl', '%bpl', '%r15b', '%sp', '%bp', '%r15w',
402 '%esp', '%ebp', '%r15d', '%rsp', '%rbp', '%r15',
403 '%rip'
404 ])
405
406 def AddModRM_Compressor(regex, subst, subst_register, subst_memory,
407 reg=None, rm=None, rm_to_reg=False, start_byte=0,
408 index_r8=False, input_rr=True, output_rr=False):
409 """Adds three compressors to the list of compressors:
410 main_compressors (register <-> register or memory instructions)
411 register_compressors (register <-> register instructions)
412 memory_compressors (regsiter <-> memory instructions)
413
414 Args:
415 regex: regular expressions for the compressor
416 subst: replacement for register <-> register or memory instructions
417 subst_register: replacement for register <-> register instructions
418 subst_memory: replacement for regsiter <-> memory instructions
419 reg: reg operand kind (see REGISTERS array) or None
420 rm: rm operand kind (see REGISTERS array)
421 rm_to_reg: three-state selector
422 True - instruction uses rm as source, reg as destination
423 False - instruction uses reg as source, rm as destination
424 None - instruction either uses both symmetrically (e.g. test or xchg)
425 start_byte: first valid byte ModR/M byte (used when reg is None)
426 input_rr: True if instruction accesses memory
427 output_rr: three-state selector
428 True - instruction can be used to produce "restricted register"
429 False - instruction does not affect it's operands (e.g. test)
430 None - instruction can damage output but can not be used to restrict it
431 Internal:
432 index_r8: must be called in False position (used to create two compressors
433 in 64-bit mode with index == %rax..%rdi or index == %r8..%r14)
434 Returns:
435 None
436 """
437
438 if options.bitness == 32:
439 base = 'eax'
440 index = 'eax'
441 expanded_regex = re.sub('{RR_NOTES}', '', regex)
442 else:
443 base = 'r8' if rm[0:2] == 'r8' or rm[-1] == '8' else 'rax'
444 index = 'r8' if index_r8 else 'rax'
445 input = 'r8d' if index_r8 else 'eax'
446 if output_rr:
447 output_regs = reg if rm_to_reg else rm
448 assert output_regs in ('eax', 'r8d')
449 expanded_regex = re.sub('{RR_NOTES}', '; input_rr=((?:%{'+ input +
450 '}|any_nonspecial)); output_rr=(%{' + output_regs + '}|None)', regex)
451 else:
452 expanded_regex = re.sub('{RR_NOTES}', '; input_rr=((?:%{' + input +
453 '}|any_nonspecial)); output_rr=(None)', regex)
454 if 'RM_BYTE' in regex:
455 address_regex = '(?:0x0|(?:0x0)?\\((?:%{' + base + '})?\\))'
456 else:
457 address_regex = (
458 '(?:0x0|(?:0x0)?\\((?:%{' + base + '})?(?:,(?:%{' + index + '}))?'
459 '(?:,(?:1|2|4|8))?\\))')
460
461 # We need to process either modrm or reg
462 assert rm is not None or reg is not None
463 # If both modrm and reg are given then ModR/M
464 assert reg is None or start_byte == 0
465 # Replace RM_BYTE placeholders.
466 # Handle only cases without displacement.
467 expanded_regex = re.sub('{RM_BYTE}', '[0-9a-fA-F][0-9a-fA-F]', expanded_regex)
468 expanded_regex = re.sub('{RM_BYTE/0}', '[048cC][0-7]', expanded_regex)
469 expanded_regex = re.sub('{RM_BYTE/1}', '[048cC][89a-fA-F]', expanded_regex)
470 expanded_regex = re.sub('{RM_BYTE/2}', '[159dD][0-7]', expanded_regex)
471 expanded_regex = re.sub('{RM_BYTE/3}', '[159dD][89a-fA-F]', expanded_regex)
472 expanded_regex = re.sub('{RM_BYTE/4}', '[26aAeE][0-7]', expanded_regex)
473 expanded_regex = re.sub('{RM_BYTE/5}', '[26aAeE][89a-fA-F]', expanded_regex)
474 expanded_regex = re.sub('{RM_BYTE/6}', '[37bBfF][0-7]', expanded_regex)
475 expanded_regex = re.sub('{RM_BYTE/7}', '[37bBfF][89a-fA-F]', expanded_regex)
476 register_regex = expanded_regex
477 # Replace RM_SIB_BYTES placeholders.
478 # Handle only cases without displacement.
479 expanded_regex = re.sub(
480 '{RM_SIB_BYTES}', '[0-b][4c] [0-9a-fA-F][0-9a-fA-F]', expanded_regex)
481 expanded_regex = re.sub(
482 '{RM_SIB_BYTES/0}', '[048]4 [0-9a-fA-F][0-9a-fA-F]', expanded_regex)
483 expanded_regex = re.sub(
484 '{RM_SIB_BYTES/1}', '[048][cC] [0-9a-fA-F][0-9a-fA-F]', expanded_regex)
485 expanded_regex = re.sub(
486 '{RM_SIB_BYTES/2}', '[159]4 [0-9a-fA-F][0-9a-fA-F]', expanded_regex)
487 expanded_regex = re.sub(
488 '{RM_SIB_BYTES/3}', '[159][cC] [0-9a-fA-F][0-9a-fA-F]', expanded_regex)
489 expanded_regex = re.sub(
490 '{RM_SIB_BYTES/4}', '[26aA]4 [0-9a-fA-F][0-9a-fA-F]', expanded_regex)
491 expanded_regex = re.sub(
492 '{RM_SIB_BYTES/5}', '[26aA][cC] [0-9a-fA-F][0-9a-fA-F]', expanded_regex)
493 expanded_regex = re.sub(
494 '{RM_SIB_BYTES/6}', '[37bB]4 [0-9a-fA-F][0-9a-fA-F]', expanded_regex)
495 expanded_regex = re.sub(
496 '{RM_SIB_BYTES/7}', '[37bB][cC] [0-9a-fA-F][0-9a-fA-F]', expanded_regex)
497 register_regex = re.sub(
498 '{RM_SIB_BYTES}', '[c-fC-F][0-9a-fA-F]', register_regex)
499 register_regex = re.sub('{RM_SIB_BYTES/0}', '[cC][0-7]', register_regex)
500 register_regex = re.sub('{RM_SIB_BYTES/1}', '[cC][8-9a-fA-F]', register_regex)
501 register_regex = re.sub('{RM_SIB_BYTES/2}', '[dD][0-7]', register_regex)
502 register_regex = re.sub('{RM_SIB_BYTES/3}', '[dD][8-9a-fA-F]', register_regex)
503 register_regex = re.sub('{RM_SIB_BYTES/4}', '[eE][0-7]', register_regex)
504 register_regex = re.sub('{RM_SIB_BYTES/5}', '[eE][8-9a-fA-F]', register_regex)
505 register_regex = re.sub('{RM_SIB_BYTES/6}', '[fF][0-7]', register_regex)
506 register_regex = re.sub('{RM_SIB_BYTES/7}', '[fF][8-9a-fA-F]', register_regex)
507 # Replace register placeholders
508 for register, value in REGISTERS_RE.iteritems():
509 expanded_regex = re.sub('{%' + register + '}',
510 '(?:%' + '|%'.join(value) + '|' + address_regex +')', expanded_regex)
511 register_regex = re.sub('{%' + register + '}',
512 '(?:%' + '|%'.join(value) +')', register_regex)
513 for register, value in REGISTERS_RE.iteritems():
514 expanded_regex = re.sub('{' + register + '}',
515 '(?:' + '|'.join(value) + ')', expanded_regex)
516 register_regex = re.sub('{' + register + '}',
517 '(?:' + '|'.join(value) + ')', register_regex)
518 expanded_regex = re.compile(expanded_regex)
519 register_regex = re.compile(register_regex)
520 # Add index_rr and output_rr fields if we are dealing with 64-bit case
521 if options.bitness == 32:
522 subst_fixed = subst
523 subst_register_fixed = subst_register
524 subst_memory_fixed = subst_memory
525 else:
526 if input_rr:
527 input_note = '[%eax..%edi]' if index == 'rax' else '[%r8d..%r15d]'
528 else:
529 input_note = 'any_nonspecial'
530 if output_rr:
531 output_note = '[%eax..%edi]' if output_regs == 'eax' else '[%r8d..%r14d]'
532 else:
533 output_note = None
534 subst_fixed = subst[0:-1] + (input_note, output_note) + subst[-1:]
535 subst_register_fixed = subst_register[0:-1] + (
536 'any_nonspecial', output_note) + subst_register[-1:]
537 subst_memory_fixed = subst_memory[0:-1] + (input_note,
538 output_note) + subst_memory[-1:]
539 # If we already have replacements in cache then wejust reuse them.
540 output_key = (reg, rm, rm_to_reg, start_byte, index_r8, input_rr, output_rr)
541 if output_key in AddModRM_Compressor.replacements:
542 replacements = AddModRM_Compressor.replacements[output_key]
543 main_compressors.append(
544 Compressor(expanded_regex, subst_fixed, replacements[0]))
545 register_compressors.append(
546 Compressor(register_regex, subst_register_fixed, replacements[1]))
547 memory_compressors.append(
548 Compressor(expanded_regex, subst_memory_fixed, replacements[2]))
549 if options.bitness == 64 and not index_r8:
550 AddModRM_Compressor(
551 regex, subst, subst_register, subst_memory,
552 reg=reg, rm=rm, rm_to_reg=rm_to_reg, start_byte=start_byte,
553 index_r8=True, input_rr=input_rr, output_rr=output_rr)
554 return
555 # It can be memory only instruction, register only one or both
556 main_compressor = Compressor(expanded_regex, subst_fixed)
557 register_compressor = Compressor(register_regex, subst_register_fixed)
558 memory_compressor = Compressor(expanded_regex, subst_memory_fixed)
559
560 # Generation time! Use reversed ranges to check unlikely cases first.
561 if reg is None:
562 # reg field is used as opcode extension
563 byte_range = [byte
564 for byte in range(255, -1, -1)
565 if byte & 0x38 == start_byte]
566 else:
567 byte_range = range(255, -1, -1)
568
569 for modrm in byte_range:
570 # Parse ModRM
571 mod_field = (modrm & 0xc0) >> 6
572 reg_field = (modrm & 0x38) >> 3
573 rm_field = (modrm & 0x07)
574 if reg is not None:
575 reg_text = '%' + REGISTERS[reg][reg_field]
576 # If mod == 3 then it's register-to-register instruction
577 if mod_field == 3:
578 bytes = '{:02x}'.format(modrm)
579 rm_text = '%' + REGISTERS[rm][rm_field]
580 replacement = [bytes]
581 if reg is None:
582 replacement.append(rm_text)
583 else:
584 replacement.append(rm_text if rm_to_reg else reg_text)
585 replacement.append(reg_text if rm_to_reg else rm_text)
586 if options.bitness == 64:
587 replacement.append('any_nonspecial')
588 output = reg_text if rm_to_reg else rm_text
589 if output_rr:
590 replacement.append(output)
591 else:
592 replacement.append(None)
593 if output_rr is None and output in X86_64_BASE_REGISTERS: continue
594 if output_rr is True and output == '%r15d': continue
595 if rm_to_reg is None and reg_text in X86_64_BASE_REGISTERS: continue
596 replacement = tuple(replacement)
597 main_compressor.replacements.append(replacement)
598 register_compressor.replacements.append(replacement)
599 # If mod != 3 then it's register-to-memory instruction
600 else:
601 # If RM field != %rsp then there are no index
602 if rm_field != validator.REG_RSP:
603 base_text = '%' + REGISTERS[base][rm_field]
604 # If RM field == %rbp and MOD fiels is zero then it's absolute address
605 if mod_field == 0 and rm_field == validator.REG_RBP:
606 bytes = '{:02x} 00 00 00 00'.format(modrm)
607 rm_text = '0x0' if options.bitness == 32 else '0x0(%rip)'
608 base_text = '%rip'
609 # Memory access with just a base register
610 elif mod_field == 0:
611 bytes = '{:02x}'.format(modrm)
612 rm_text = '({})'.format(base_text)
613 # Memory access with base and 8bit offset
614 elif mod_field == 1:
615 bytes = '{:02x} 00'.format(modrm)
616 rm_text = '0x0({})'.format(base_text)
617 # Memory access with base and 32bit offset
618 else: # mod_field == 2
619 bytes = '{:02x} 00 00 00 00'.format(modrm)
620 rm_text = '0x0({})'.format(base_text)
621 replacement = [bytes]
622 if reg is None:
623 replacement.append(rm_text)
624 else:
625 replacement.append(rm_text if rm_to_reg else reg_text)
626 replacement.append(reg_text if rm_to_reg else rm_text)
627 if options.bitness == 64:
628 replacement.append('any_nonspecial')
629 output = reg_text if rm_to_reg else None
630 if output_rr:
631 replacement.append(output)
632 else:
633 replacement.append(None)
634 if input_rr and base_text not in X86_64_BASE_REGISTERS: continue
635 if output_rr is None and output in X86_64_BASE_REGISTERS: continue
636 if output_rr is True and output == '%r15d': continue
637 if rm_to_reg is None and reg_text in X86_64_BASE_REGISTERS: continue
638 replacement = tuple(replacement)
639 main_compressor.replacements.append(replacement)
640 memory_compressor.replacements.append(replacement)
641 else:
642 # If RM field == %rsp then we have SIB byte
643 for sib in xrange(256):
644 scale_field = (sib & 0xc0) >> 6
645 index_field = (sib & 0x38) >> 3
646 base_field = (sib & 0x07)
647 index_text = '%' + INDEXES[index][index_field]
648 base_text = '%' + REGISTERS[base][base_field]
649 scale_text = pow(2, scale_field)
650 # If BASE is %rbp and MOD == 0 then index with 32bit offset is used
651 if mod_field == 0 and base_field == validator.REG_RBP:
652 bytes = '{:02x} {:02x} 00 00 00 00'.format(modrm, sib)
653 if (options.bitness == 32 or
654 index_field != validator.REG_RSP or
655 scale_field != 0 or index[0:2] == 'r8'):
656 rm_text = '0x0(,{},{})'.format(index_text, scale_text)
657 else:
658 rm_text = '0x0'
659 base_text = ''
660 # Memory access with base and index (no offset)
661 elif mod_field == 0:
662 bytes = '{:02x} {:02x}'.format(modrm, sib)
663 rm_text = '({},{},{})'.format(base_text, index_text, scale_text)
664 # Memory access with base, index and 8bit offset
665 elif mod_field == 1:
666 bytes = '{:02x} {:02x} 00'.format(modrm, sib)
667 rm_text = '0x0({},{},{})'.format(base_text, index_text, scale_text)
668 # Memory access with base, index and 32bit offset
669 elif mod_field == 2:
670 bytes = '{:02x} {:02x} 00 00 00 00'.format(modrm, sib)
671 rm_text = '0x0({},{},{})'.format(base_text, index_text, scale_text)
672 # Pretty-printing of access via %rsp
673 if (scale_field == 0 and index != 'r8' and
674 base_field == validator.REG_RSP and
675 index_field == validator.REG_RSP):
676 #index_text = 'any_nonspecial'
677 rm_text = ('0x0({})' if mod_field else '({})').format(base_text)
678 if index_text == "%riz":
679 index_text = 'any_nonspecial'
680 replacement = [bytes]
681 if reg is None:
682 replacement.append(rm_text)
683 else:
684 replacement.append(rm_text if rm_to_reg else reg_text)
685 replacement.append(reg_text if rm_to_reg else rm_text)
686 if options.bitness == 64:
687 if not input_rr or index_text == 'any_nonspecial':
688 replacement.append('any_nonspecial')
689 else:
690 replacement.append('%' + REGISTERS[input][index_field])
691 output = reg_text if rm_to_reg else None
692 replacement.append(output if output_rr else None)
693 if input_rr:
694 if base_text not in X86_64_BASE_REGISTERS: continue
695 if index_text in X86_64_BASE_REGISTERS - set(['%r15']): continue
696 if output_rr is None and output in X86_64_BASE_REGISTERS: continue
697 if output_rr is True and output == '%r15d': continue
698 if rm_to_reg is None and reg_text in X86_64_BASE_REGISTERS: continue
699 replacement = tuple(replacement)
700 main_compressor.replacements.append(replacement)
701 memory_compressor.replacements.append(replacement)
702
703 assert len(main_compressor.replacements) > 1
704 assert len(register_compressor.replacements) > 1
705 assert len(memory_compressor.replacements) > 1
706 main_compressor.replacements = tuple(main_compressor.replacements)
707 register_compressor.replacements = tuple(register_compressor.replacements)
708 memory_compressor.replacements = tuple(memory_compressor.replacements)
709 main_compressors.append(main_compressor)
710 register_compressors.append(register_compressor)
711 memory_compressors.append(memory_compressor)
712 AddModRM_Compressor.replacements[output_key] = (
713 main_compressor.replacements,
714 register_compressor.replacements,
715 memory_compressor.replacements
716 )
717 if options.bitness == 64 and not index_r8:
718 AddModRM_Compressor(
719 regex, subst, subst_register, subst_memory,
720 reg=reg, rm=rm, rm_to_reg=rm_to_reg, start_byte=start_byte,
721 index_r8=True, input_rr=input_rr, output_rr=output_rr)
722 # Replacements cache.
723 AddModRM_Compressor.replacements = {}
724
725
726 def PrepareCompressors():
727 global compressors
728 global main_compressors
729 global register_compressors
730 global memory_compressors
731
732 # "Larger" compressors should be tried first, then "smaller" ones.
733 main_compressors = []
734 register_compressors = []
735 memory_compressors = []
736 extra_compressors = []
737
738 if options.bitness == 32:
739 register_kinds = ('al', 'ax', 'eax', 'mm0', 'xmm0', 'ymm0')
740 register_kind_pairs = (
741 ( 'al', 'al'),
742 ( 'ax', 'al'),
743 ( 'ax', 'ax'),
744 ( 'eax', 'al'),
745 ( 'eax', 'ax'),
746 ( 'eax', 'eax'),
747 ( 'eax', 'mm0'),
748 ( 'mm0', 'eax'),
749 ( 'eax', 'xmm0'),
750 ('xmm0', 'eax'),
751 ( 'mm0', 'mm0'),
752 ( 'mm0', 'xmm0'),
753 ('xmm0', 'mm0'),
754 ('xmm0', 'xmm0'),
755 ('xmm0', 'ymm0'),
756 ('ymm0', 'xmm0'),
757 ('ymm0', 'ymm0')
758 )
759 else:
760 register_kinds = ('al', 'spl', 'ax', 'eax', 'rax', 'mm0', 'xmm0', 'ymm0',
761 'r8b', 'r8w', 'r8d', 'r8', 'mm8', 'xmm8', 'ymm8')
762 register_kind_pairs = (
763 ( 'al', 'al'),
764 ( 'spl', 'spl'), ( 'spl', 'r8b'), ( 'r8b', 'spl'), ( 'r8b', 'r8b'),
765 ( 'ax', 'al'),
766 ( 'ax', 'spl'), ( 'ax', 'r8b'), ( 'r8w', 'spl'), ( 'r8w', 'r8b'),
767 ( 'ax', 'ax'), ( 'ax', 'r8w'), ( 'r8w', 'ax'), ( 'r8w', 'r8w'),
768 ( 'eax', 'al'),
769 ( 'eax', 'spl'), ( 'eax', 'r8b'), ( 'r8d', 'spl'), ( 'r8d', 'r8b'),
770 ( 'eax', 'ax'), ( 'eax', 'r8w'), ( 'r8d', 'ax'), ( 'r8d', 'r8w'),
771 ( 'eax', 'eax'), ( 'eax', 'r8d'), ( 'r8d', 'eax'), ( 'r8d', 'r8d'),
772 ( 'rax', 'al'),
773 ( 'rax', 'spl'), ( 'rax', 'r8b'), ( 'r8', 'spl'), ( 'r8', 'r8b'),
774 ( 'rax', 'ax'), ( 'rax', 'r8w'), ( 'r8', 'ax'), ( 'r8', 'r8w'),
775 ( 'rax', 'eax'), ( 'rax', 'r8d'), ( 'r8', 'eax'), ( 'r8', 'r8d'),
776 ( 'rax', 'rax'), ( 'rax', 'r8'), ( 'r8', 'rax'), ( 'r8', 'r8'),
777 ( 'eax', 'mm0'), ( 'eax', 'mm8'), ( 'r8d', 'mm0'), ( 'eax', 'mm8'),
778 ( 'rax', 'mm0'), ( 'rax', 'mm8'), ( 'r8', 'mm0'), ( 'r8', 'mm8'),
779 ( 'mm0', 'eax'), ( 'mm8', 'eax'), ( 'mm0', 'r8d'), ( 'mm8', 'r8d'),
780 ( 'mm0', 'rax'), ( 'mm8', 'rax'), ( 'mm0', 'r8'), ( 'mm8', 'r8'),
781 ( 'eax', 'xmm0'), ( 'eax', 'xmm8'), ( 'r8d', 'xmm0'), ( 'r8d', 'xmm8'),
782 ( 'rax', 'xmm0'), ( 'rax', 'xmm8'), ( 'r8', 'xmm0'), ( 'r8', 'xmm8'),
783 ('xmm0', 'eax'), ('xmm0', 'r8d'), ('xmm8', 'eax'), ('xmm8', 'r8d'),
784 ('xmm0', 'rax'), ('xmm0', 'r8'), ('xmm8', 'rax'), ('xmm8', 'r8'),
785 ( 'mm0', 'mm0'), ( 'mm8', 'mm0'), ( 'mm0', 'mm8'), ( 'mm8', 'mm8'),
786 ( 'mm0', 'xmm0'), ( 'mm8', 'xmm0'), ( 'mm0', 'xmm8'), ( 'mm8', 'xmm8'),
787 ('xmm0', 'mm0'), ('xmm8', 'mm0'), ('xmm0', 'mm8'), ('xmm8', 'mm8'),
788 ('xmm0', 'xmm0'), ('xmm0', 'xmm8'), ('xmm8', 'xmm0'), ('xmm8', 'xmm8'),
789 ('xmm0', 'ymm0'), ('xmm0', 'ymm8'), ('xmm8', 'ymm0'), ('xmm8', 'ymm8'),
790 ('ymm0', 'xmm0'), ('ymm0', 'xmm8'), ('ymm8', 'xmm0'), ('ymm8', 'xmm8'),
791 ('ymm0', 'ymm0'), ('ymm0', 'ymm8'), ('ymm8', 'ymm0'), ('ymm8', 'ymm8')
792 )
793
794 # Largest compressors: both reg and rm fields are used
795 for reg, rm in register_kind_pairs:
796 start_reg = REGISTERS[reg][0]
797 end_reg = REGISTERS[reg][-1 if reg[0:2] != 'r8' else -2]
798 start_rm = REGISTERS[rm][0]
799 end_rm = REGISTERS[rm][-1 if rm[0:2] != 'r8' else -2]
800 # First instruction uses just ModR/M byte in 32bit mode but both
801 # ModR/M in 64bit mode. Both approaches will work in both cases,
802 # this is just an optimization to avoid needless work.
803 if options.bitness == 32:
804 bytes = '({RM_BYTE})'
805 else:
806 bytes = '({RM_SIB_BYTES})'
807 for extra_bytes in ('', ' 00', ' 00 00', ' 00 00 00 00'):
808 # Lea in 64 bit mode is truly unique instruction for now
809 if options.bitness == 64 and reg in ('eax', 'r8d', 'rax', 'r8'):
810 AddModRM_Compressor(
811 '.*?' + bytes + extra_bytes +
812 ' (?:lock )?\\w* (?:\\$0x0,|\\$0x0,\\$0x0,|%cl,|%xmm0,)?'
813 '({%' + rm + '}),(%{' + reg + '}).*{RR_NOTES}()',
814 ('XX', '[%{}..%{} or memory]'.format(start_rm, end_rm),
815 '[%{}..%{}]'.format(start_reg, end_reg), ' # lea'),
816 ('XX', '[%{}..%{}]'.format(start_rm, end_rm),
817 '[%{}..%{}]'.format(start_reg, end_reg), ' # rm to reg; lea'),
818 ('XX', '[memory]', '[%{}..%{}]'.format(start_reg, end_reg), ' # lea'),
819 reg=reg, rm=rm, rm_to_reg=True, input_rr=False,
820 output_rr=True if reg in ('eax', 'r8d') else None)
821 # Normal instructions with two operands (rm to reg).
822 AddModRM_Compressor(
823 '.*?' + bytes + extra_bytes +
824 ' (?:lock )?\\w* (?:\\$0x0,|\\$0x0,\\$0x0,|%cl,|%xmm0,)?'
825 '({%' + rm + '}),(%{' + reg + '}).*{RR_NOTES}()',
826 ('XX', '[%{}..%{} or memory]'.format(start_rm, end_rm),
827 '[%{}..%{}]'.format(start_reg, end_reg), ''),
828 ('XX', '[%{}..%{}]'.format(start_rm, end_rm),
829 '[%{}..%{}]'.format(start_reg, end_reg), ' # rm to reg'),
830 ('XX', '[memory]', '[%{}..%{}]'.format(start_reg, end_reg), ''),
831 reg=reg, rm=rm, rm_to_reg=True)
832 # Normal instructions with two operands (reg to rm).
833 AddModRM_Compressor(
834 '.*?' + bytes + extra_bytes +
835 ' (?:lock )?\\w* (?:\\$0x0,|%cl,)?'
836 '(%{' + reg + '}),({%' + rm + '}).*{RR_NOTES}()',
837 ('XX', '[%{}..%{}]'.format(start_reg, end_reg),
838 '[%{}..%{} or memory]'.format(start_rm, end_rm), ''),
839 ('XX', '[%{}..%{}]'.format(start_reg, end_reg),
840 '[%{}..%{}]'.format(start_rm, end_rm), ' # reg to rm'),
841 ('XX', '[%{}..%{}]'.format(start_reg, end_reg), '[memory]', ''),
842 reg=reg, rm=rm, rm_to_reg=False)
843 # There are few more forms in 64 bit case (rm to reg).
844 if options.bitness == 64 and reg in ('eax', 'r8d'):
845 # Zero-extending version.
846 AddModRM_Compressor(
847 '.*?' + bytes + extra_bytes +
848 ' (?:lock )?\\w* (?:\\$0x0,|\\$0x0,\\$0x0,|%cl,|%xmm0,)?'
849 '({%' + rm + '}),(%{' + reg + '}).*{RR_NOTES}()',
850 ('XX', '[%{}..%{} or memory]'.format(start_rm, end_rm),
851 '[%{}..%{}]'.format(start_reg, end_reg), ''),
852 ('XX', '[%{}..%{}]'.format(start_rm, end_rm),
853 '[%{}..%{}]'.format(start_reg, end_reg), ' # rm to reg'),
854 ('XX', '[memory]', '[%{}..%{}]'.format(start_reg, end_reg), ''),
855 reg=reg, rm=rm, rm_to_reg=True, output_rr=True)
856 # More forms in 64 bit case (reg to rm).
857 if options.bitness == 64 and rm in ('eax', 'r8d'):
858 # Zero-extending version.
859 AddModRM_Compressor(
860 '.*?' + bytes + extra_bytes +
861 ' (?:lock )?\\w* (?:\\$0x0,|%cl,)?'
862 '(%{' + reg + '}),({%' + rm + '}).*{RR_NOTES}()',
863 ('XX', '[%{}..%{}]'.format(start_reg, end_reg),
864 '[%{}..%{} or memory]'.format(start_rm, end_rm), ''),
865 ('XX', '[%{}..%{}]'.format(start_reg, end_reg),
866 '[%{}..%{}]'.format(start_rm, end_rm), ' # reg to rm'),
867 ('XX', '[%{}..%{}]'.format(start_reg, end_reg), '[memory]', ''),
868 reg=reg, rm=rm, rm_to_reg=False, output_rr=True)
869 # Zero-extending xchg/xadd.
870 AddModRM_Compressor(
871 '.*?' + bytes + extra_bytes +
872 ' (?:lock )?\\w* (?:\\$0x0,|%cl,)?'
873 '(%{' + reg + '}),({%' + rm + '}).*{RR_NOTES}()',
874 ('XX', '[%{}..%{}]'.format(start_reg, end_reg),
875 '[%{}..%{} or memory]'.format(start_rm, end_rm),
876 ' # write to both'),
877 ('XX', '[%{}..%{}]'.format(start_reg, end_reg),
878 '[%{}..%{}]'.format(start_rm, end_rm),
879 ' # reg to rm; write to both'),
880 ('XX', '[%{}..%{}]'.format(start_reg, end_reg), '[memory]',
881 ' # write to both'),
882 reg=reg, rm=rm, rm_to_reg=None, output_rr=True)
883 # Still more forms for 64 bit case (rm to reg).
884 if options.bitness == 64 and reg in ('al', 'spl', 'ax', 'eax', 'rax',
885 'r8b', 'r8w', 'r8d', 'r8'):
886 # Dangerous instructions (rm to reg).
887 AddModRM_Compressor(
888 '.*?' + bytes + extra_bytes +
889 ' (?:lock )?\\w* (?:\\$0x0,|\\$0x0,\\$0x0,|%cl,|%xmm0,)?'
890 '({%' + rm + '}),(%{' + reg + '}).*{RR_NOTES}()',
891 ('XX', '[%{}..%{} or memory]'.format(start_rm, end_rm),
892 '[%{}..%{}]'.format(start_reg, end_reg), ''),
893 ('XX', '[%{}..%{}]'.format(start_rm, end_rm),
894 '[%{}..%{}]'.format(start_reg, end_reg), ' # rm to reg'),
895 ('XX', '[memory]', '[%{}..%{}]'.format(start_reg, end_reg), ''),
896 reg=reg, rm=rm, rm_to_reg=True, output_rr=None)
897 # Still more forms for 64 bit case (reg to rm).
898 if options.bitness == 64 and rm in ('al', 'spl', 'ax', 'eax', 'rax',
899 'r8b', 'r8w', 'r8d', 'r8'):
900 # Dangerous instructions (reg to rm).
901 AddModRM_Compressor(
902 '.*?' + bytes + extra_bytes +
903 ' (?:lock )?\\w* (?:\\$0x0,|%cl,)?'
904 '(%{' + reg + '}),({%' + rm + '}).*{RR_NOTES}()',
905 ('XX', '[%{}..%{}]'.format(start_reg, end_reg),
906 '[%{}..%{} or memory]'.format(start_rm, end_rm), ''),
907 ('XX', '[%{}..%{}]'.format(start_reg, end_reg),
908 '[%{}..%{}]'.format(start_rm, end_rm), ' # reg to rm'),
909 ('XX', '[%{}..%{}]'.format(start_reg, end_reg), '[memory]', ''),
910 reg=reg, rm=rm, rm_to_reg=False, output_rr=None)
911 # Dangerous xchg/xadd.
912 AddModRM_Compressor(
913 '.*?' + bytes + extra_bytes +
914 ' (?:lock )?\\w* (?:\\$0x0,|%cl,)?'
915 '(%{' + reg + '}),({%' + rm + '}).*{RR_NOTES}()',
916 ('XX', '[%{}..%{}]'.format(start_reg, end_reg),
917 '[%{}..%{} or memory]'.format(start_rm, end_rm),
918 ' # write to both'),
919 ('XX', '[%{}..%{}]'.format(start_reg, end_reg),
920 '[%{}..%{}]'.format(start_rm, end_rm),
921 ' # reg to rm; write to both'),
922 ('XX', '[%{}..%{}]'.format(start_reg, end_reg), '[memory]',
923 ' # write to both'),
924 reg=reg, rm=rm, rm_to_reg=None, output_rr=None)
925 # 3DNow! instructions. Additional byte is opcode extension.
926 AddModRM_Compressor(
927 '.*?' + bytes + ' [0-9a-fA-F][0-9a-fA-F] \\w* '
928 '({%' + rm + '}),(%{' + reg + '}).*{RR_NOTES}()',
929 ('XX', '[%{}..%{} or memory]'.format(start_rm, end_rm),
930 '[%{}..%{}]'.format(start_reg, end_reg), ''),
931 ('XX', '[%{}..%{}]'.format(start_rm, end_rm),
932 '[%{}..%{}]'.format(start_reg, end_reg), ' # reg to rm'),
933 ('XX', '[memory]', '[%{}..%{}]'.format(start_reg, end_reg), ''),
934 reg=reg, rm=rm, rm_to_reg=True)
935
936 # Smaller compressors: only rm field is used.
937 for rm in register_kinds:
938 start_rm = REGISTERS[rm][0]
939 end_rm = REGISTERS[rm][-1 if rm[0:2] != 'r8' else -2]
940 for opcode in xrange(8):
941 XX_byte_mark = 'XX/' + str(opcode)
942 start_byte = opcode * 8
943 # First instruction uses just ModR/M byte in 32bit mode but both
944 # ModR/M in 64bit mode. Both approaches will work in both cases,
945 # this is just an optimization to avoid needless work.
946 if options.bitness == 32:
947 bytes = '({RM_BYTE/' + str(opcode) + '})'
948 else:
949 bytes = '({RM_SIB_BYTES/' + str(opcode) + '})'
950 if options.bitness == 64:
951 # No memory access (e.g. prefetch)
952 AddModRM_Compressor(
953 '.*?' + bytes + ' ?\\w* (?:\\$0x0,|%cl,)?({%' + rm + '}).*'
954 '{RR_NOTES}()',
955 (XX_byte_mark, '[%{}..%{} or memory]'.format(start_rm, end_rm), ''),
956 (XX_byte_mark, '[%{}..%{}]'.format(start_rm, end_rm), ''),
957 (XX_byte_mark, '[memory]', ''),
958 reg=None, rm=rm, input_rr=False, start_byte=start_byte)
959 for extra_bytes in ('', ' 00', ' 00 00', ' 00 00 00 00'):
960 # Part of opcode is encoded in ModR/M
961 AddModRM_Compressor(
962 '.*?' + bytes + extra_bytes +
963 ' (?:lock )?\\w* (?:\\$0x0,|%cl,)?'
964 '({%' + rm + '}).*{RR_NOTES}()',
965 (XX_byte_mark, '[%{}..%{} or memory]'.format(start_rm, end_rm), ''),
966 (XX_byte_mark, '[%{}..%{}]'.format(start_rm, end_rm), ''),
967 (XX_byte_mark, '[memory]', ''),
968 reg=None, rm=rm, start_byte=start_byte)
969 # More forms in 64 bit case.
970 if options.bitness == 64 and rm in ('eax', 'r8d'):
971 # Zero-extending version.
972 AddModRM_Compressor(
973 '.*?' + bytes + extra_bytes +
974 ' (?:lock )?\\w* (?:\\$0x0,|%cl,)?'
975 '({%' + rm + '}).*{RR_NOTES}()',
976 (XX_byte_mark, '[%{}..%{} or memory]'.format(start_rm, end_rm), ''),
977 (XX_byte_mark, '[%{}..%{}]'.format(start_rm, end_rm), ''),
978 (XX_byte_mark, '[memory]', ''),
979 reg=None, rm=rm, start_byte=start_byte, output_rr=True)
980 # Still more forms for 64 bit case (reg to rm).
981 if options.bitness == 64 and rm in ('al', 'spl', 'ax', 'eax', 'rax',
982 'r8b', 'r8w', 'r8d', 'r8'):
983 # Dangerous instructions.
984 AddModRM_Compressor(
985 '.*?' + bytes + extra_bytes +
986 ' (?:lock )?\\w* (?:\\$0x0,|%cl,)?'
987 '({%' + rm + '}).*{RR_NOTES}()',
988 (XX_byte_mark, '[%{}..%{} or memory]'.format(start_rm, end_rm), ''),
989 (XX_byte_mark, '[%{}..%{}]'.format(start_rm, end_rm), ''),
990 (XX_byte_mark, '[memory]', ''),
991 reg=None, rm=rm, start_byte=start_byte, output_rr=None)
992
993 # Even smaller compressors: only low 3 bits of opcode are used.
994 for reg in register_kinds + ('st(0)',):
995 start_reg = REGISTERS[reg][0]
996 end_reg = REGISTERS[reg][-1 if reg[0:2] != 'r8' else -2]
997 for opcode in xrange(8):
998 for extra_bytes in ('', ' 00', ' 00 00', ' 00 00 00 00'):
999 for text1, text2, nibble in (
1000 ('[0..7]', '[8..f]', xrange(8)),
1001 ('[012367]', '[89abef]', (0, 1, 2, 3, 6, 7)),
1002 ('[0..6]', '[8..e]', xrange(7))
1003 ):
1004 # Operand is encoded in opcode
1005 extra_compressors.append(Compressor(re.compile(
1006 '.*?[0-9a-fA-F]([0-7])' + extra_bytes +
1007 ' \\w* (?:\\$0x0,|%ax,|%st,)?'
1008 '(%(?:' + '|'.join(REGISTERS_RE[reg]) + ')).*()'),
1009 (text1, '[%{}..%{}]'.format(start_reg, end_reg), ''),
1010 tuple(('{:x}'.format(n), '%' + REGISTERS[reg][n])
1011 for n in nibble)))
1012 extra_compressors.append(Compressor(re.compile(
1013 '.*?[0-9a-fA-F]([89a-fA-F])' + extra_bytes +
1014 ' \\w* (?:\\$0x0,|%ax,|%st,)?'
1015 '(%(?:' + '|'.join(REGISTERS_RE[reg]) + ')).*()'),
1016 (text2, '[%{}..%{}]'.format(start_reg, end_reg), ''),
1017 tuple(('{:x}'.format(n + 8), '%' + REGISTERS[reg][n])
1018 for n in nibble)))
1019 # Another version for 64 bit case
1020 if options.bitness == 64 and reg in ('eax', 'r8d'):
1021 # Operand is encoded in opcode and output
1022 extra_compressors.append(Compressor(re.compile(
1023 '.*?[0-9a-fA-F]([0-7])' + extra_bytes +
1024 ' \\w* (?:\\$0x0,|%ax,|%st,)?'
1025 '(%(?:' + '|'.join(REGISTERS_RE[reg]) + ')).*'
1026 'output_rr=(%(?:'+ '|'.join(REGISTERS_RE[reg]) + ')).*()'),
1027 tuple([text1] + ['[%{}..%{}]'.format(start_reg, end_reg)] * 2 +
1028 ['']),
1029 tuple(['{:x}'.format(n)] + ['%' + REGISTERS[reg][n]] * 2
1030 for n in nibble)))
1031 extra_compressors.append(Compressor(re.compile(
1032 '.*?[0-9a-fA-F]([89a-fA-F])' + extra_bytes +
1033 ' \\w* (?:\\$0x0,|%ax,|%st,)?'
1034 '(%(?:' + '|'.join(REGISTERS_RE[reg]) + ')).*'
1035 'output_rr=(%(?:'+ '|'.join(REGISTERS_RE[reg]) + ')).*()'),
1036 tuple([text2] + ['[%{}..%{}]'.format(start_reg, end_reg)] * 2 +
1037 ['']),
1038 tuple(['{:x}'.format(n + 8)] + ['%' + REGISTERS[reg][n]] * 2
1039 for n in nibble)))
1040 compressors = (main_compressors + memory_compressors + register_compressors +
1041 extra_compressors)
1042
1043 # Special compressors: will handle some cosmetic issues.
1044 #
1045 # SETxx ignores reg field and thus are described as many separate instructions
1046 compressors.append(Compressor(
1047 re.compile('.*0f 9[0-9a-fA-F] XX(/[0-7]) set.*()'), ('', ''),
1048 [('/' + str(i), ) for i in range(8)]))
1049 # BSWAP is described with opcode "0f c8+r", not "0f /1" in manual
1050 if options.bitness == 32:
1051 compressors.append(Compressor(
1052 re.compile('.*(XX/1) bswap.*ax.*()'), ('c[8..f]', ''), [('XX/1', )]))
1053 else:
1054 compressors.append(Compressor(
1055 re.compile('.*(XX/1) bswap.*ax.*()'), ('c[89abef]', ''), [('XX/1', )]))
1056 compressors.append(Compressor(
1057 re.compile('.*(XX/1) bswap.*r8.*()'), ('c[8..e]', ''), [('XX/1', )]))
1058 # Add mark '# write to both' to certain versions of CMPXCHG, XADD, and XCHG
1059 if options.bitness == 64:
1060 compressors.append(Compressor(
1061 re.compile('.* (?:cmpxchg|xadd|xchg).*%al\\.\\.%bh[^#]*()$'),
1062 (' # write to both', ), ((), )))
1063 # "and $0xe0,[%eax..%edi]" is treated specially which means that we list all
1064 # versions of and "[$0x1..$0xff],[%eax..%edi]" separately here.
1065 # Without this rule these ands comprise 2/3 of the whole output!
1066 if options.bitness == 32:
1067 compressors.append(Compressor(
1068 re.compile('.*83 (e0 01 and \\$0x1,%eax)()'),
1069 ('XX/4 00 and[l]? $0x0,[%eax..%edi or memory]', ' # special and'),
1070 [('e{} {:02x} and $0x{:x},%{}'.format(r, i, i, REGISTERS['eax'][r]), )
1071 for i in range(1, 256) for r in range(8)] +
1072 [('XX/4 00 and[l]? $0x0,[%eax..%edi or memory]', )]))
1073 else:
1074 for reg in ('eax', 'r8d'):
1075 start_reg = REGISTERS[reg][0]
1076 end_reg = REGISTERS[reg][-1 if reg[0:2] != 'r8' else -2]
1077 for index_reg in ('eax', 'r8d'):
1078 start_index = REGISTERS[index_reg][0]
1079 end_index = REGISTERS[index_reg][-1]
1080 compressors.append(Compressor(
1081 re.compile('.*83 (e0 01 and \\$0x1,%' + reg + ').*'
1082 'input_rr=(any_nonspecial); output_rr=(%' + reg + ')()'),
1083 ('XX/4 00 and[l]? $0x0,[%{}..%{} or memory]'.format(start_reg,
1084 end_reg), '[%{}..%{}]'.format(start_index, end_index),
1085 '[%{}..%{}]'.format(start_reg, end_reg),
1086 ' # special and'),
1087 [('e{} {:02x} and $0x{:x},%{}'.format(r, i, i, REGISTERS[reg][r]),
1088 'any_nonspecial', '%' + REGISTERS[reg][r])
1089 for i in range(1, 256) for r in range(7 + (reg == 'eax'))] +
1090 [('XX/4 00 and[l]? $0x0,[%{}..%{} or memory]'.format(start_reg,
1091 end_reg), '[%{}..%{}]'.format(start_index, end_index),
1092 '[%{}..%{}]'.format(start_reg, end_reg))]))
1093 # Merge memory and non-memory access
1094 if options.bitness == 32:
1095 letters_and_registers = (('b', 'al', ''), ('w', 'ax', ''), ('l', 'eax', ''))
1096 else:
1097 letters_and_registers = (
1098 ('b', 'al', 'eax'), ('b', 'spl', 'eax'), ('b', 'r8b', 'r8d'),
1099 ('w', 'ax', 'eax'), ('w', 'r8w', 'r8d'),
1100 ('l', 'eax', 'eax'), ('l', 'r8d', 'r8d'),
1101 ('q', 'rax', 'eax'), ('q', 'r8', 'r8d')
1102 )
1103 for letter, reg, out_reg in letters_and_registers:
1104 start_reg = REGISTERS[reg][0]
1105 end_reg = REGISTERS[reg][-1 if reg[0:2] != 'r8' else -2]
1106 all_regs = '[%{}..%{}]'.format(start_reg, end_reg)
1107 regs_mark = '[%{}..%{} or memory]'.format(start_reg, end_reg)
1108 if options.bitness == 64:
1109 start_out = REGISTERS[out_reg][0]
1110 end_out = REGISTERS[out_reg][-1 if out_reg[0:2] != 'r8' else -2]
1111 out_regs = '[%{}..%{}]'.format(start_out, end_out)
1112 for notes in ('', ' # rm to reg', ' # reg to rm'):
1113 compressors.append(Compressor(
1114 re.compile('.* \\w*(' + letter + ') .*(\\[memory]).*()()'),
1115 ('[{}]?'.format(letter), regs_mark, '', ''),
1116 ((letter, '[memory]', ''), ('', all_regs, notes))))
1117 if options.bitness == 64:
1118 for index_reg in ('eax', 'r8d'):
1119 start_index = REGISTERS[index_reg][0]
1120 end_index = REGISTERS[index_reg][-1]
1121 index_regs = '[%{}..%{}]'.format(start_index, end_index)
1122 for output_rrs in ((None, out_regs), (out_regs, None), (None, None)):
1123 compressors.append(Compressor(
1124 re.compile('.* \\w*(' + letter + ') .*(\\[memory]).*; '
1125 'input_rr=(\\[%[a-z0-9]*..%[a-z0-9]*\\]); '
1126 'output_rr=(\\[%[a-z0-9]*..%[a-z0-9]*\\]|None)()()'),
1127 ('[{}]?'.format(letter), regs_mark, index_regs,
1128 output_rrs[0] if output_rrs[0] is not None else output_rrs[1],
1129 '', ''),
1130 ((letter, '[memory]', index_regs, output_rrs[0], ''),
1131 ('', all_regs, 'any_nonspecial', output_rrs[1], notes))))
1132
1133 # REX compressors
1134 if options.bitness == 64:
1135 # First pretty complex set of compressors to combine versions of REX with
1136 # three lowest bits in different states.
1137 register_kind_pairs = (
1138 ( None, None),
1139 ( 'al', 'al'), ( 'al', None), (None, 'al'),
1140 ( 'ax', 'al'), ( 'al', 'ax'),
1141 ( 'ax', 'ax'), ( 'ax', None), (None, 'ax'),
1142 ( 'eax', 'al'), ( 'al', 'eax'),
1143 ( 'eax', 'ax'), ( 'ax', 'eax'),
1144 ( 'eax', 'eax'), ( 'eax', None), (None, 'eax'),
1145 ( 'rax', 'al'), ( 'al', 'rax'),
1146 ( 'rax', 'ax'), ( 'ax', 'rax'),
1147 ( 'rax', 'eax'), ( 'eax', 'rax'),
1148 ( 'rax', 'rax'), ( 'rax', None), (None, 'rax'),
1149 ( 'eax', 'mm0'), ( 'mm0', 'eax'),
1150 ( 'rax', 'mm0'), ( 'mm0', 'rax'),
1151 ( 'mm0', 'eax'), ( 'eax', 'mm0'),
1152 ( 'mm0', 'rax'), ( 'rax', 'mm0'),
1153 ( 'eax', 'xmm0'),
1154 ( 'rax', 'xmm0'),
1155 ('xmm0', 'eax'),
1156 ('xmm0', 'rax'),
1157 ( 'mm0', 'mm0'), ( 'mm0', None), (None, 'mm0'),
1158 ( 'mm0', 'xmm0'),
1159 ('xmm0', 'mm0'),
1160 ('xmm0', 'xmm0'),
1161 ('xmm0', 'ymm0'), ('xmm0', None), (None, 'xmm0'),
1162 ('ymm0', 'xmm0'),
1163 ('ymm0', 'ymm0'), ('ymm0', None), (None, 'ymm0'),
1164 )
1165 r8 = {
1166 'al': 'r8b',
1167 'ax': 'r8w',
1168 'eax': 'r8d',
1169 'rax': 'r8',
1170 'mm0': 'mm8',
1171 'xmm0': 'xmm8',
1172 'ymm0': 'ymm8'
1173 }
1174 for reg, rm in register_kind_pairs:
1175 for last_reg, last_rm in ((-1, -1), (-1, -2), (-2, -1), (-2, -2)):
1176 if reg:
1177 start_reg = REGISTERS[reg][0]
1178 start_reg8 = REGISTERS[r8[reg]][0]
1179 end_reg = REGISTERS[reg][-1]
1180 end_reg0 = 'dil' if reg == 'al' else end_reg
1181 end_reg8 = REGISTERS[r8[reg]][last_reg]
1182 reg_regex = '\\[(%' + start_reg + '\\.\\.%' + end_reg + ')]'
1183 reg_regex0 = '\\[(%' + start_reg + '\\.\\.%' + end_reg0 + ')]'
1184 elif last_reg == -2:
1185 continue
1186 if rm:
1187 start_rm = REGISTERS[rm][0]
1188 start_rm8 = REGISTERS[r8[rm]][0]
1189 end_rm = REGISTERS[rm][-1]
1190 end_rm0 = 'dil' if rm == 'al' else end_rm
1191 end_rm8 = REGISTERS[r8[rm]][last_rm]
1192 rm_regex = ('\\[(%' + start_rm + '\\.\\.%' + end_rm + ')'
1193 '(?: or memory)?]')
1194 rm_regex0 = ('\\[(%' + start_rm + '\\.\\.%' + end_rm0 + ')'
1195 '(?: or memory)?]')
1196 elif last_rm == -2:
1197 continue
1198 for rexw in (True, False):
1199 for input_rr in (True, False):
1200 for output_rr in (True, False) if reg or rm else (None, ):
1201 for rm_to_reg in (True, False) if reg and rm else (None, ):
1202 # Legacy prefixes
1203 regex = '.*:(?: 26| 2e| 36| 3e| 64| 65| 66| 67| f0| f2| f3)*'
1204 # REX
1205 regex += '( 48).*' if rexw else '( 40|).*'
1206 # Replacement text
1207 replacement_tuple = (
1208 ' [REX:48..4f]' if rexw else ' [REX:40..47]?', )
1209 if reg:
1210 replacement_regs = '%{}..%{}'.format(start_reg, end_reg8)
1211 if rm:
1212 replacement_rms = '%{}..%{}'.format(start_rm, end_rm8)
1213 # Instruction arguments
1214 if not reg and not rm:
1215 pass
1216 elif not reg and rm:
1217 if rexw:
1218 regex += rm_regex0 + '.*'
1219 else:
1220 regex += rm_regex + '.*'
1221 replacement_tuple += (replacement_rms, )
1222 elif reg and not rm:
1223 if rexw:
1224 regex += reg_regex0 + '.*'
1225 else:
1226 regex += reg_regex + '.*'
1227 replacement_tuple += (replacement_regs, )
1228 elif rm_to_reg:
1229 if rexw:
1230 regex += rm_regex0 + ',' + reg_regex0 + '.*'
1231 else:
1232 regex += rm_regex + ',' + reg_regex + '.*'
1233 replacement_tuple += (replacement_rms, replacement_regs)
1234 else:
1235 if rexw:
1236 regex += reg_regex0 + ',' + rm_regex0 + '.*'
1237 else:
1238 regex += reg_regex + ',' + rm_regex + '.*'
1239 replacement_tuple += (replacement_regs, replacement_rms)
1240 # Input and output restricted registers
1241 if input_rr:
1242 regex += 'input_rr=\\[(%eax\\.\\.%edi)].*'
1243 replacement_tuple += ('%eax..%r15d', )
1244 if output_rr:
1245 regex += 'output_rr=\\[(%eax\\.\\.%edi)].*'
1246 replacement_tuple += ('%eax..%r14d', )
1247 regex += '()'
1248 replacement_tuple += ('', )
1249 # Replacement cases
1250 replacement_tuples = ()
1251 for byte in (range(0x48, 0x50)
1252 if rexw
1253 else range(0x40, 0x48) + ['']):
1254 replacement_case = (
1255 ' {:02x}'.format(byte) if byte else byte, )
1256 if byte:
1257 if rm:
1258 if byte & 0x1:
1259 replacement_rms = '%{}..%{}'.format(start_rm8, end_rm8)
1260 else:
1261 replacement_rms = '%{}..%{}'.format(start_rm, end_rm0)
1262 if byte & 0x2:
1263 replacement_index = '%r8d..%r15d'
1264 else:
1265 replacement_index = '%eax..%edi'
1266 if reg:
1267 if byte & 0x4:
1268 replacement_regs = '%{}..%{}'.format(start_reg8,
1269 end_reg8)
1270 else:
1271 replacement_regs = '%{}..%{}'.format(start_reg,
1272 end_reg0)
1273 else:
1274 if rm:
1275 replacement_rms = '%{}..%{}'.format(start_rm, end_rm)
1276 replacement_index = '%eax..%edi'
1277 if reg:
1278 replacement_regs = '%{}..%{}'.format(start_reg, end_reg)
1279 if not reg and not rm:
1280 pass
1281 elif not reg and rm:
1282 replacement_case += (replacement_rms, )
1283 if byte:
1284 final_rr = '%r8d..%r14d' if byte & 0x1 else '%eax..%edi'
1285 else:
1286 final_rr = '%eax..%edi'
1287 elif reg and not rm:
1288 replacement_case += (replacement_regs, )
1289 if byte:
1290 final_rr = '%r8d..%r14d' if byte & 0x4 else '%eax..%edi'
1291 else:
1292 final_rr = '%eax..%edi'
1293 elif rm_to_reg:
1294 replacement_case += (replacement_rms, replacement_regs)
1295 if byte:
1296 final_rr = '%r8d..%r14d' if byte & 0x4 else '%eax..%edi'
1297 else:
1298 final_rr = '%eax..%edi'
1299 else:
1300 replacement_case += (replacement_regs, replacement_rms)
1301 if byte:
1302 final_rr = '%r8d..%r14d' if byte & 0x1 else '%eax..%edi'
1303 else:
1304 final_rr = '%eax..%edi'
1305 if input_rr: replacement_case += (replacement_index, )
1306 if output_rr: replacement_case += (final_rr, )
1307 replacement_tuples += (replacement_case, )
1308 compressors.append(Compressor(
1309 re.compile(regex), replacement_tuple, replacement_tuples))
1310 # This is pretty simple compressor to combine two lines with different REX.W
1311 # bits (only if they are otherwise identical).
1312 compressors.append(Compressor(
1313 re.compile('.*(\\[REX:40\\.\\.47]\\?).*()'), ('[REX:40..4f]?', ''),
1314 (('[REX:40..47]?', ), ('[REX:48..4f]', ))))
1315
1316
1317 def ShowProgress(rule, instruction):
1318 if rule not in ShowProgress.rules_shown:
1319 first_print = True
1320 ShowProgress.rules_shown[rule]=len(ShowProgress.rules_shown)
1321 else:
1322 first_print = False
1323 print >> sys.stderr, '-------- Compressed --------'
1324 print >> sys.stderr, 'Rule:', ShowProgress.rules_shown[rule]
1325 print >> sys.stderr, '--------'
1326 compressor = compressors[rule]
1327 match = compressor.regex.match(instruction)
1328 assert match
1329 pos = 0
1330 format_str = ''
1331 for group in range(1, len(match.groups())):
1332 format_str += instruction[pos:match.start(group)] + '{{{}}}'
1333 pos = match.end(group)
1334 format_str += instruction[pos:match.start(len(match.groups()))]
1335 replacements = []
1336 for replacement in compressor.replacements:
1337 replacements.append(format_str.format(*replacement))
halyavin 2013/11/06 14:23:24 Replace with for expression.
khim 2013/11/06 21:28:28 Done.
1338 replacements = sorted(replacements)
1339 if len(compressor.replacements) <= 4 or first_print:
1340 for replacement in replacements:
1341 print >> sys.stderr, replacement
1342 else:
1343 print >> sys.stderr, replacements[0]
1344 print >> sys.stderr, "..."
1345 print >> sys.stderr, replacements[-1]
1346 print >> sys.stderr, '--------'
1347 print >> sys.stderr, 'Compressed', (
1348 format_str + '{}').format(*compressor.subst)
1349 ShowProgress.rules_shown = {}
1350
1351
1352 def main():
1353 # We are keeping these global to share state graph and compressors
1354 # between workers spawned by multiprocess. Passing them every time is slow.
1355 global options, xml_file
1356 global dfa
1357 global worker_validator
1358 options, xml_file = ParseOptions()
1359 dfa = dfa_parser.ParseXml(xml_file)
1360 worker_validator = validator.Validator(
1361 validator_dll=options.validator_dll,
1362 decoder_dll=options.decoder_dll)
1363 PrepareCompressors()
1364
1365 assert dfa.initial_state.is_accepting
1366 assert not dfa.initial_state.any_byte
1367
1368 print >> sys.stderr, len(dfa.states), 'states'
1369
1370 num_suffixes = dfa_traversal.GetNumSuffixes(dfa.initial_state)
1371
1372 # We can't just write 'num_suffixes[dfa.initial_state]' because
1373 # initial state is accepting.
1374 total_instructions = sum(
1375 num_suffixes[t.to_state]
1376 for t in dfa.initial_state.forward_transitions.values())
1377 print >> sys.stderr, total_instructions, 'regular instructions total'
1378
1379 tasks = dfa_traversal.CreateTraversalTasks(dfa.states, dfa.initial_state)
1380 print >> sys.stderr, len(tasks), 'tasks'
1381
1382 pool = multiprocessing.Pool()
1383
1384 results = pool.imap(Worker, tasks)
1385
1386 total = 0
1387 num_valid = 0
1388 full_output = set()
1389 for prefix, count, valid_count, output, trace in results:
1390 print >> sys.stderr, 'Prefix:', ', '.join(map(hex, prefix))
1391 total += count
1392 num_valid += valid_count
1393 full_output |= output
1394 for rule, instruction in trace:
1395 ShowProgress(rule, instruction)
1396 for instruction in sorted(Compressed(full_output, ShowProgress)):
1397 print instruction
1398
1399 print >> sys.stderr, total, 'instructions were processed'
1400 print >> sys.stderr, num_valid, 'valid instructions'
1401
1402
1403 if __name__ == '__main__':
1404 main()
OLDNEW
« no previous file with comments | « no previous file | src/trusted/validator_ragel/testdata/32bit_regular.golden » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698