common/cmpbin/doc.go - Issue 1219323002: Rename funnybase and add more serializations.

Unified Diff: common/cmpbin/doc.go

Issue 1219323002: Rename funnybase and add more serializations. (Closed) Base URL: https://chromium.googlesource.com/external/github.com/luci/luci-go@master

Patch Set: don't need goconvey file Created 5 years, 6 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View side-by-side diff with in-line comments

Download patch

Index: common/cmpbin/doc.go

diff --git a/common/cmpbin/doc.go b/common/cmpbin/doc.go

new file mode 100644

index 0000000000000000000000000000000000000000..6785f8d1332cfa415df70bebc796ee8272f77fc9

--- /dev/null

+++ b/common/cmpbin/doc.go

@@ -0,0 +1,91 @@

+// Use of this source code is governed by a BSD-style license that can be

+// found in the LICENSE file.

+// Package cmpbin provides binary serialization routines which ensure that the

+// serialized objects maintain the same sort order of the original inputs when

+// sorted bytewise (i.e. with memcmp). Additionally, serialized objects are

+// concatenatable.

+//

+// Notes on particular serialization schemes:

+//

+// * Numbers:

+// The number encoding is less efficient on average than Varint

+// ("encoding/binary") for small numbers (it has a minimum encoded size of

+// 2 bytes), but is more efficient for large numbers (it has a maximum encoded

+// size of 9 bytes for a 64 bit int, unlike the largest Varint which has a 10b

+// representation).

+//

+// Both signed and unsigned numbers are encoded with the same scheme, and will

+// sort together as signed numbers. Decoding with the incorrect routine will

+// result in an ErrOverflow/ErrUnderflow error if the actual value is out of

+// range.

+//

+// The scheme works like:

+// * given an 2's compliment value V

M-A Ruel 2015/07/06 23:21:35 I prefer - to * for lists (not a big deal, since i

iannucci 2015/07/07 00:03:27 It's considered an indented block (so it's <pre>..

+// * extract the sign (S) and magnitude (M) of V

+// * Find the position of the highest bit (P), minus 1.

+// * write (bits):

+// * SPPPPPPP MMMMMMMM MM000000

+// * S is 1

+// * P's are the log2(M)-1

+// * M's are the magnitude of V

+// * 0's are padding

+// * Additionally, if the number is negative, invert the bits of all the bytes

+// (e.g. XOR 0xFF). This makes the sign bit S 0 for negative numbers, and

+// makes the ordering of the numbers correct when compared bytewise.

+//

+// * Strings/[]byte

+// Each byte in the encoded stream reserves the least significant bit as a stop

+// bit (1 means that the string continues, 0 means that the string ends). The

+// actual user data is shifted into the top 7 bits of every encoded byte. This

+// results in a data inflation rate of 12.5%, but this overhead is constant

+// (doesn't vary by the encoded content). Note that if space efficiency is very

+// important and you are storing large strings on average, you could reduce the

+// overhead by only placing the stop bit on every other byte or every 4th byte,

+// etc. This would reduce the overhead to 6.25% or 3.125% accordingly (but would

+// cause every string to round out to 2 or 4 byte chunks), and it would make

+// the algorithm implementation more complex. The current implementation was

+// chosen as good enough in light of the fact that pre-compressing regular data

+// could save more than 12.5% overall, and that for incompressable data a

+// commonly used encoding scheme (base64) has a full 25% overhead (and a

+// generally more complex implementation).

+//

+// * Floats

+// Floats are tricky (really tricky) because they have lots of weird

+// non-sortable special cases (like NaN). That said, for the majority of

+// non-weird cases, the implementation here will sort real numbers the way that

+// you would expect.

+//

+// The implementation is derived from http://stereopsis.com/radix.html, and full

+// credit for the original algorithm goes to Michael Herf. The algorithm is

+// essentially:

+//

+// * if the number is positive, flip the top bit

+// * if the number is negative, flip all the bits

+//

+// Floats are not varint encoded, you could varint encode the mantissa

+// (significand). This is only a 52 bit section, meaning that it is normally

+// encoded with 6.5 bytes (a nybble is stolen from the second exponent byte).

+// Assuming you used the numerical encoding above, shifted left by 4 bits,

+// discarding the sign bit (since its laready the MSb on the float, and then

+// using 6 bits (instead of 7) to represent the number of significant bits in

+// the mantissa (since there are only a maximum of 52), you could expect to see

+// small-mantissa floats (of any characteristic) encoded in 3 bytes (this has

+// 6 bits of mantissa), and the largest floats would have an encoded size of

+// 9 bytes (with 2 wasted bits). However the implementation complexity would be

+// higher.

+//

+// The actual encoded values for special cases are (sorted high to low):

+// * QNaN - 0xFFF8000000000000

+// // note that golang doesn't seem to actually have SNaN?

+// * SNaN - 0xFFF0000000000001

+// * +inf - 0xFFF0000000000000

+// * MaxFloat64 - 0xFFEFFFFFFFFFFFFF

+// * SmallestNonzeroFloat64 - 0x8000000000000001

+// * 0 - 0x8000000000000000

+// * -0 - 0x7FFFFFFFFFFFFFFF

+// * -SmallestNonzeroFloat64 - 0x7FFFFFFFFFFFFFFE

+// * -MaxFloat64 - 0x0010000000000000

+// * -inf - 0x000FFFFFFFFFFFFF

+package cmpbin

« common/cmpbin/common_test.go ('K') | « common/cmpbin/common_test.go ('k') | common/cmpbin/float.go » ('j') | common/cmpbin/string.go » ('J')