Skip to content

Latest commit

 

History

History
184 lines (131 loc) · 6.68 KB

rpc-spec-binary-protocol.asciidoc

File metadata and controls

184 lines (131 loc) · 6.68 KB

Thrift Binary protocol encoding

Integer encoding

In the binary protocol integers are encoded with the most significant byte first (big endian byte order, aka network order). An i8 needs 1 byte, an i16 2, an i32 4 and an i64 needs 8 bytes.

The CPP version has the option to use the binary protocol with little endian order. Little endian gives a small but noticeable performance boost because contemporary CPUs use little endian when storing integers to RAM.

Enum encoding

The generated code encodes enums by taking the ordinal value and then encoding that as an i32.

Binary encoding

Binary is sent as follows:

Binary protocol, binary data, 4+ bytes:
+--------+--------+--------+--------+--------+...+--------+
| byte length                       | bytes               |
+--------+--------+--------+--------+--------+...+--------+

Where:

  • byte length is the length of the byte array, a signed 32 bit integer encoded in network (big endian) order (must be >= 0).

  • bytes are the bytes of the byte array.

Be default the length is limited to 2147483647, however some implementation have the option to lower the limit.

String encoding

Strings are first encoded to UTF-8, and then send as binary.

Double encoding

Values of type double are first converted to an i64 according to the IEEE 754 floating-point "double format" bit layout. Most run-times provide primitives for the conversion. The i64 is encoded using 8 bytes in big endian order.

This is some scala code showing the JVM primitives to convert from double to i64 and back:

def doubleToI64(d: Double): Long = java.lang.Double.doubleToLongBits(d)
def i64ToDouble(l: Long): Double = java.lang.Double.longBitsToDouble(l)

Boolean encoding

Values of bool type are first converted to an i8. True is converted to 1, false to 0.

Message encoding

A Message can be encoded in two different ways, the modern 'strict encoding', or the nameless old encoding.

Binary protocol Message, strict encoding, 12+ bytes:
+--------+--------+--------+--------+--------+--------+--------+--------+--------+...+--------+--------+--------+--------+--------+
|1vvvvvvv|vvvvvvvv|unused  |00000mmm| name length                       | name                | seq id                            |
+--------+--------+--------+--------+--------+--------+--------+--------+--------+...+--------+--------+--------+--------+--------+

Where:

  • vvvvvvvvvvvvvvv is the version, an unsigned 15 bit number fixed to 1 (in binary: 000 0000 0000 0001). The leading bit is 1.

  • unused is an ignored byte.

  • mmm is the message type, an unsigned 3 bit integer. The 5 leading bits must be 0 as some clients (checked for java in 0.9.1) take the whole byte.

  • name length is the byte length of the name field, a signed 32 bit integer encoded in network (big endian) order (must be >= 0).

  • name is the method name, a UTF-8 encoded string.

  • seq id is the sequence id, a signed 32 bit integer encoded in network (big endian) order.

The second, older encoding (aka non-strict) is:

Binary protocol Message, old encoding, 9+ bytes:
+--------+--------+--------+--------+--------+...+--------+--------+--------+--------+--------+--------+
| name length                       | name                |00000mmm| seq id                            |
+--------+--------+--------+--------+--------+...+--------+--------+--------+--------+--------+--------+

Where name length, name, mmm, seq id are as above.

Because name length must be positive (therefore the first bit is always 0), the first bit allows the receiver to see whether the strict format or the old format is used. Therefore a server and client using the different variants of the binary protocol can transparently talk with each other. However, when strict mode is enforced, the old format is rejected.

Message types are encoded with the following values:

  • Call: 1

  • Reply: 2

  • Exception: 3

  • Oneway: 4

Struct encoding

In the binary protocol field headers and the stop field are encoded as follows:

Binary protocol field header and field value:
+--------+--------+--------+--------+...+--------+
|tttttttt| field id        | field value         |
+--------+--------+--------+--------+...+--------+

Binary protocol stop field:
+--------+
|00000000|
+--------+

Where:

  • tttttttt the field-type, a signed 8 bit integer.

  • field id the field-id, a signed 16 bit integer in big endian order.

  • field-value the encoded field value.

The following field-types are used:

  • bool, encoded as 2

  • byte, encoded as 3

  • double, encoded as 4

  • i16, encoded as 6

  • i32, encoded as 8

  • i64, encoded as 10

  • string, used for binary and string fields, encoded as 11

  • struct, used for structs and union fields, encoded as 12

  • map, encoded as 13

  • set, encoded as 14

  • list, encoded as 15

List and Set

List and sets are encoded the same: a header indicating the size and the element-type of the elements, followed by the encoded elements.

Binary protocol list (5+ bytes) and elements:
+--------+--------+--------+--------+--------+--------+...+--------+
|tttttttt| size                              | elements            |
+--------+--------+--------+--------+--------+--------+...+--------+

Where:

  • tttttttt is the element-type, encoded as an i8

  • size is the size, encoded as an i32, positive values only

  • elements the element values

The element-type values are the same as field-types. The full list is included in the struct section above.

The maximum list/set size is configurable. By default there is no limit (meaning the limit is the maximum i32 value: 2147483647).

Map

Maps are encoded with a header indicating the size, the element-type of the keys and the element-type of the elements, followed by the encoded elements. The encoding follows this BNF:

map  ::=  key-element-type value-element-type size ( key value )*
Binary protocol map (6+ bytes) and key value pairs:
+--------+--------+--------+--------+--------+--------+--------+...+--------+
|kkkkkkkk|vvvvvvvv| size                              | key value pairs     |
+--------+--------+--------+--------+--------+--------+--------+...+--------+

Where:

  • kkkkkkkk is the key element-type, encoded as an i8

  • vvvvvvvv is the value element-type, encoded as an i8

  • size is the size of the map, encoded as an i32, positive values only

  • key value pairs are the encoded keys and values

The element-type values are the same as field-types. The full list is included in the struct section above.

The maximum map size is configurable. By default there is no limit (meaning the limit is the maximum i32 value: 2147483647).