1.0.0

Specification for dr4 version 0.0.1

Notes

  • This is the first fully-fledged release version for dr4
  • This version expands the selection of types to include sequence types, time, and pairs.

Document

A dr4 document shall be defined as a sequence of bytes, that includes a document header, document body, and a termination sequence.

Document Header

A dr4 document header shall be defined as a sequence of eight unsigned 8-bit integers at the beginning of the document. These are designated as follows:

  • Bytes 1-3 are the magic sequence: 83, 94 and 121.
  • Bytes 4-6 are the version bytes.
  • Byte 7 is the sizer byte for varieties.
  • Byte 8 is reserved for future use.

Note: This version of the dr4 spec is non-standard. It does not enforce the presence of a version in the beginning of a document, nor a size variety.

Document Body

A dr4 document body shall be defined as a series of data rows, terminated by a 4-byte sequence, all of which have the value 0.

  • A document body is always terminated by 0 0 0 0, also read as \x00\x00\x00\x00

Rows

A dr4 row shall be defined as a sequence of bytes, which includes a header, body, and stop byte.

Row Header

A dr4 row header shall be defined as sequence of either unsigned 8-bit, 16-bit, or 32-bit integers, which represent the size, length, and offsets of the row. The header values are defined as follows:

  • Bytes 1-4 of the header are an unsigned (8, 16, or 32-bit) integer holding the row size.
  • Bytes 5-8 of the header are an unsigned (8, 16, or 32-bit) integer holding the row length.
  • The row length shall never be greater than the row size.
  • The row length shall never be zero.
  • The row size shall never be zero.
  • Bytes 9-f of the header are a sequence of unsigned (8, 16, or 32-bit) integers.

The offset array of the row header's end byte, f, is derived by the following formulas:

8-bit f = 2 + length - 1 16-bit f = 4 + (length * 2) - 1 32-bit f = 8 + (length * 4) - 1

The offset of a row's field is defined as the first byte of the field at an offset of n bytes from byte 0 of the row's body. The rules of the offsets held within a row header's offset array are defined as:

  • An offset value is never greater than the size of the row.
  • No two offset value's can be the same within the same row.
  • The length of the row and the size of the offset array must always be equal.
  • The first offset in the offset array is always 0.
  • Each row must always have at least one offset.

Row Body

A dr4 row body shall be defined as a sequence of bytes, that is terminated by a stopping byte. The stopping byte is an unsigned 8-bit integer with a value of 0. The contents within the row body correspond to the typed fields which hold the data of the dr4 format.

Types

A type shall be defined as a unique, 8-bit unsigned integer with indicates the type of data stored within a field. A type mark is defined as the first byte of each type, which indicates the type.


Notation

The binary notation used adheres to the format:

\x01 == 1 == 00000001
 |      |       |
 fmt    dec    binary 

The max value for a byte is \x255 and the minimum is \x00.

Type Listings

The types supported by this specification are as follows:


 STOP                     \x00

 . simple types .
......................................

 NONE                     \x01
 BOOL                     \x02
 UI08                     \x03
 UI16                     \x04
 UI32                     \x05
 UI64                     \x06
 SI08                     \x07
 SI16                     \x08
 SI32                     \x09
 SI64                     \x10
 SGFN                     \x11
 DBFN                     \x12

 . time type .
 ......................................
 UNXT (unix-time)         \x13

 . sequence types .
 ......................................

 CSTR (C-string)          \x14
 RAWB (raw-bytes)         \x15

 . structure types .
 ......................................
 PAIR                     \x16

Note : FN refers to floating-point number.

STOP

The stoppage byte. This byte is always present at the end of a row body, and thus, the entire row itself.

BOOL

The bool type is a data type with a size of 2 bytes. The first byte is the bool type byte, \x02, while the second byte holds the state. \x01 corresponds to the true state, and \x00 for the false state.

UI08

The UI08 type is an 8-bit unsigned integer. The first byte is the UI08 type byte, the second byte is the value between 0, 255.

UI16

The UI16 type is a 16-bit unsigned integer. The first byte is the UI16 type byte, the remaining bytes represent the value between 0, 65535

UI32

The UI32 type is a 32-bit unsigned integer. The first byte is the UI32 type byte, the remaining bytes represent the value between 0, 4294967295

UI64

The UI64 type is a 64-bit unsigned integer. The first byte is the UI64 type byte, the remaining bytes represent the value between 0, 18,446,744,073,709,551,615

SI08

The SI08 type is an 8-bit signed integer. The first byte is the SI08 type byte, the second byte is the value between -127 , 127.

SI16

The SI16 type is an 16-bit signed integer. The first byte is the SI16 type byte, the remaining bytes represent the value between -32767 , 32767.

SI32

The SI32 type is a 32-bit signed integer. The first byte is the SI32 type byte, the remaining bytes represent the value between 0, 4294967295

SI64

The SI64 type is a 64-bit signed integer. The first byte is the SI64 type byte, the remaining bytes represent the value between −9,223,372,036,854,775,807, +9,223,372,036,854,775,807

SGFN

Single precision floating point number type. Equivalent to a float data type in the C language. The first byte is the SGFN type byte, the remaining bytes represent the float itself.

DBFN

Double precision floating point number type. Equivalent to a double data type in the C language. The first byte is the DBFN type byte, the remaining bytes represent the float itself.

UNXT

A 64-bit signed integer representing a UTC unix time stamp, of the number of seconds since January 1st, 1970. The first byte is the UNXT type byte, the remaining bytes represent the 64-bit

CSTR

A type representing the equivalent of a const char * string of the C-language. Consists of a sequence of signed 8-bit integers to store the characters of the string, and then a '\0' terminating character.

Structure:

\x14    \x120\x101\x45\x43 \x00
  |             |           |
 type         text         null

RAWB

A type representing a sequence of raw binary data, 8-bit unsigned integers. Consists of a 32-bit unsigned integer to store the size, then a sequence of 8-bit unsigned, equivalent to unsigned char in the C-language.

Structure:

\x15 \x04\x00\x00\x00 \x203\x161\x45\x43 
 |           |                  |       
 type      size               bytes        

PAIR

A type representing a pairing of two other dr4 types. However,

  1. Either of a pair's values cannot be another pair.
  2. The values of a pair can never be empty.

Structure:

\x16 <type 1>     <type 2>
        |           |
     -------     ---------
    |       |    |        |
  \x15\x46\x00      \x01

Note: The types used inside the pair above are just examples, pairs can contain any type except for another pair.