Binary encoding of GeoGig objects

This is the format used for internal storage in the GeoGig object store.

Conventions

Formats are specified using a modified Backus-Naur notation. Definitions generally take the form:

<structure> := part1 part2 part3

Indicating that the structure has three parts. The parts can be:

  • Another structure, referenced by name.

  • One of these specially defined structures

    NUL       := 0x00 (ASCII NUL character)
    SP        := 0x20 (ASCII space character)
    BR        := 0x0a (ASCII newline character)
    <rev>     := <byte>* (exactly 20 bytes)
    <utf8>    := <int16> <byte>* (two-byte count followed by the number
                                  of bytes indicated by the count.
                                  These should then be decoded as
                                  modified UTF-8, as seen in the
                                  readUTF and writeUTF methods in the
                                  java.io.DataInputStream and
                                  java.io.DataOutputStream classes
                                  in the Java Standard Library.)
    <byte>    := (8 bit byte)
    <int16>   := (16 bit signed integer, "short" in Java)
    <int32>   := (32 bit signed integer, "int" in Java)
    <int64>   := (64 bit signed integer, "long" in Java)
    <float32> := (32 bit IEEE floating point value, "float" in Java)
    <float64> := (64 bit IEEE floating point value, "double" in Java)
    
  • A literal byte sequence. These are generally used as markers and are represented as text in double quotes (). These markers will always be constrained to printable ASCII characters and should be encoded as ASCII, one byte per character.

  • A literal byte, specified as a hexidecimal string (for example, 0xFF).

  • any of the above suffixed by a modifier:

    • An asterisk (*) to indicate 0 or more repetitions
    • A number in brackets ([]) to indicate a specific number of repetitions.
  • Comments sometimes appear to clarify the intent of certain structures. These will be enclosed in parentheses (()).

Commit

commit := commitHeader treeRef parent* authorLine committerLine message
commitHeader := "commit" NUL
treeRef := 0x01 <rev>
parent  := 0x02 <rev>
authorLine := 0x03 person
committerLine := 0x04 person
person := name email timestamp tzOffset
name := <utf8>
email := <utf8>
timestamp := <int64>
tzOffset := <int32>
message := <utf8>

Tree

Note

In representing trees we split the count of tree contents into three fields: features, trees, and buckets. Because of the way GeoGig builds trees, buckets must be zero when either of the other two fields is nonzero.

We should probably document how exactly GeoGig builds trees :)

tree := treeHeader size treeCount features trees buckets
size := <int64> (the total [recursive] count of features in this tree)
treeCount := <int32> (in a bucket tree: the number of trees that are
                      direct children of the bucket tree. In a node
                      tree: 0)
features := count node*
trees := count node*
buckets := count bucket*
count := <int32>
node := name objectId metadataId envelope nodeType
name := <utf8>
objectId := <byte>[20]
metadataId := <byte>[20]
envelope := <float64>[4] (minx, maxx, miny, maxy.  Note that this may be
                         (0, -1, 0, -1) as is traditional for indicating
                         NULL envelopes. Of course empty (zero-area)
                         envelopes are valid as well.)
nodeType := <byte> (0x01: Tree, 0x02: Feature)
bucket := index objectId envelope
index := <int32>

Feature

feature := featureHeader count fields
featureHeader := "feature" NUL
count := <int32>
fields := field*
field = nullField |
        booleanField | byteField | shortField | intField | longField | floatField | doubleField | stringField |
        booleanArray | byteArray | shortArray | intArray | longArray | floatArray | doubleArray | stringArray |
        geometryField | uuidField | bigIntField | bigDecimalField
nullField               := 0x00
booleanField            := 0x01 <byte>
byteField               := 0x02 <byte>
shortField              := 0x03 <int16>
intField                := 0x04 <int32>
longField               := 0x05 <int64>
floatField              := 0x06 <float32>
doubleField             := 0x07 <float64>
stringField             := 0x08 <utf8>
booleanArray            := 0x09 <int32> <byte>* (note that the int is the number of boolean values and booleans are packed to save space. so the number of bytes is actually the count of bits divided by 8)
byteArray               := 0x0A <int32> <byte>*
shortArray              := 0x0B <int32> <int16>*
intArray                := 0x0C <int32> <int32>*
longArray               := 0x0D <int32> <int64>*
floatArray              := 0x0E <int32> <float32>*
doubleArray             := 0x0F <int32> <float64>*
stringArray             := 0x10 <utf8>
pointField              := 0x11 <int32> <byte>* (bytes represent the geometry encoded as Well-Known Binary)
lineStringField         := 0x12 <int32> <byte>* (same)
polygonField            := 0x13 <int32> <byte>* (same)
multiPointField         := 0x14 <int32> <byte>* (same)
multiLineStringField    := 0x15 <int32> <byte>* (same)
multiPolygonField       := 0x16 <int32> <byte>* (same)
geometryCollectionField := 0x17 <int32> <byte>* (same)
geometryField           := 0x18 <int32> <byte>* (same)
uuidField               := 0x19 <int64> <int64>
bigIntField             := 0x1A <int32> <byte>*
bigDecimalField         := 0x1B <int32> <int32> <byte>* (scale, length of byte array, byte array)
datetimeField           := 0x1C <int64> (milliseconds since unix epoch)
dateField               := 0x1D <int64> (datetime with hours, minutes, seconds, milliseconds all set to 0)
timeField               := 0x1E <int64> (datetime with years, months, days all set to zero (ie, a time on Jan 1 1970))
timestampField          := 0x1F <int64> <int32> (datetime followed by a specifier of nanoseconds within the millisecond)

FeatureType

featureType := featureTypeHeader name properties
featureTypeHeader := "featuretype" NUL
name := namespace localPart
namespace := <utf8>
localPart := <utf8>
properties := <int32> property*
property := name nillability minOccurs maxOccurs type
nillability := <byte> (0: non-nillable, 1: nillable. other values unused.)
minOccurs := <int32>
maxOccurs := <int32>
type := spatialType | aspatialType
aspatialType := name typeTag (aspatial types are distinguished from
                              spatial ones by the value of the type tag)
typeTag := <byte> (as used in features)
spatialType := name typeTag crsTextInterpretation crsText
crsTextInterpretation := <byte> (0: crsText is WKT CRS definition,
                                 1: crsText references a well-known CRS by
                                 identifier. If it uses URI notation
                                 ("urn:...") then the axes should be
                                 forced to X=Easting, Y=Northing order.)
crsText := <utf8> (as determined by crsTextInterpretation)

Tag

tag := tagHeader objectId tagName message tagger
tagHeader := "tag" NUL
objectId := <byte>[20]
tagName := <utf8>
message := <utf8>
tagger := name email timestamp tzOffset
name := <utf8>
email := <utf8>
timestamp := <int64>
tzOffset := <int32>

back to top