Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(331)

Side by Side Diff: common/cmpbin/doc.go

Issue 1219323002: Rename funnybase and add more serializations. (Closed) Base URL: https://chromium.googlesource.com/external/github.com/luci/luci-go@master
Patch Set: don't need goconvey file Created 5 years, 5 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
OLDNEW
(Empty)
1 // Copyright 2015 The Chromium Authors. All rights reserved.
2 // Use of this source code is governed by a BSD-style license that can be
3 // found in the LICENSE file.
4
5 // Package cmpbin provides binary serialization routines which ensure that the
6 // serialized objects maintain the same sort order of the original inputs when
7 // sorted bytewise (i.e. with memcmp). Additionally, serialized objects are
8 // concatenatable.
9 //
10 // Notes on particular serialization schemes:
11 //
12 // * Numbers:
13 // The number encoding is less efficient on average than Varint
14 // ("encoding/binary") for small numbers (it has a minimum encoded size of
15 // 2 bytes), but is more efficient for large numbers (it has a maximum encoded
16 // size of 9 bytes for a 64 bit int, unlike the largest Varint which has a 10b
17 // representation).
18 //
19 // Both signed and unsigned numbers are encoded with the same scheme, and will
20 // sort together as signed numbers. Decoding with the incorrect routine will
21 // result in an ErrOverflow/ErrUnderflow error if the actual value is out of
22 // range.
23 //
24 // The scheme works like:
25 // * given an 2's compliment value V
M-A Ruel 2015/07/06 23:21:35 I prefer - to * for lists (not a big deal, since i
iannucci 2015/07/07 00:03:27 It's considered an indented block (so it's <pre>..
26 // * extract the sign (S) and magnitude (M) of V
27 // * Find the position of the highest bit (P), minus 1.
28 // * write (bits):
29 // * SPPPPPPP MMMMMMMM MM000000
30 // * S is 1
31 // * P's are the log2(M)-1
32 // * M's are the magnitude of V
33 // * 0's are padding
34 // * Additionally, if the number is negative, invert the bits of all the bytes
35 // (e.g. XOR 0xFF). This makes the sign bit S 0 for negative numbers, and
36 // makes the ordering of the numbers correct when compared bytewise.
37 //
38 // * Strings/[]byte
39 // Each byte in the encoded stream reserves the least significant bit as a stop
40 // bit (1 means that the string continues, 0 means that the string ends). The
41 // actual user data is shifted into the top 7 bits of every encoded byte. This
42 // results in a data inflation rate of 12.5%, but this overhead is constant
43 // (doesn't vary by the encoded content). Note that if space efficiency is very
44 // important and you are storing large strings on average, you could reduce the
45 // overhead by only placing the stop bit on every other byte or every 4th byte,
46 // etc. This would reduce the overhead to 6.25% or 3.125% accordingly (but would
47 // cause every string to round out to 2 or 4 byte chunks), and it would make
48 // the algorithm implementation more complex. The current implementation was
49 // chosen as good enough in light of the fact that pre-compressing regular data
50 // could save more than 12.5% overall, and that for incompressable data a
51 // commonly used encoding scheme (base64) has a full 25% overhead (and a
52 // generally more complex implementation).
53 //
54 // * Floats
55 // Floats are tricky (really tricky) because they have lots of weird
56 // non-sortable special cases (like NaN). That said, for the majority of
57 // non-weird cases, the implementation here will sort real numbers the way that
58 // you would expect.
59 //
60 // The implementation is derived from http://stereopsis.com/radix.html, and full
61 // credit for the original algorithm goes to Michael Herf. The algorithm is
62 // essentially:
63 //
64 // * if the number is positive, flip the top bit
65 // * if the number is negative, flip all the bits
66 //
67 // Floats are not varint encoded, you could varint encode the mantissa
68 // (significand). This is only a 52 bit section, meaning that it is normally
69 // encoded with 6.5 bytes (a nybble is stolen from the second exponent byte).
70 // Assuming you used the numerical encoding above, shifted left by 4 bits,
71 // discarding the sign bit (since its laready the MSb on the float, and then
72 // using 6 bits (instead of 7) to represent the number of significant bits in
73 // the mantissa (since there are only a maximum of 52), you could expect to see
74 // small-mantissa floats (of any characteristic) encoded in 3 bytes (this has
75 // 6 bits of mantissa), and the largest floats would have an encoded size of
76 // 9 bytes (with 2 wasted bits). However the implementation complexity would be
77 // higher.
78 //
79 // The actual encoded values for special cases are (sorted high to low):
80 // * QNaN - 0xFFF8000000000000
81 // // note that golang doesn't seem to actually have SNaN?
82 // * SNaN - 0xFFF0000000000001
83 // * +inf - 0xFFF0000000000000
84 // * MaxFloat64 - 0xFFEFFFFFFFFFFFFF
85 // * SmallestNonzeroFloat64 - 0x8000000000000001
86 // * 0 - 0x8000000000000000
87 // * -0 - 0x7FFFFFFFFFFFFFFF
88 // * -SmallestNonzeroFloat64 - 0x7FFFFFFFFFFFFFFE
89 // * -MaxFloat64 - 0x0010000000000000
90 // * -inf - 0x000FFFFFFFFFFFFF
91 package cmpbin
OLDNEW

Powered by Google App Engine
This is Rietveld 408576698