Binary encoding of GeoGig objects ================================= This is the format used for internal storage in the GeoGig object store. Conventions ----------- Formats are specified using a modified Backus-Naur notation. Definitions generally take the form:: := part1 part2 part3 Indicating that the structure has three parts. The parts can be: * Another structure, referenced by name. * One of these specially defined structures .. code-block:: none NUL := 0x00 (ASCII NUL character) SP := 0x20 (ASCII space character) BR := 0x0a (ASCII newline character) := * (exactly 20 bytes) := * (two-byte count followed by the number of bytes indicated by the count. These should then be decoded as modified UTF-8, as seen in the readUTF and writeUTF methods in the java.io.DataInputStream and java.io.DataOutputStream classes in the Java Standard Library.) := (8 bit byte) := (16 bit signed integer, "short" in Java) := (32 bit signed integer, "int" in Java) := (64 bit signed integer, "long" in Java) := (32 bit IEEE floating point value, "float" in Java) := (64 bit IEEE floating point value, "double" in Java) * A literal byte sequence. These are generally used as markers and are represented as text in double quotes (`"`). These markers will always be constrained to printable ASCII characters and should be encoded as ASCII, one byte per character. * A literal byte, specified as a hexidecimal string (for example, 0xFF). * any of the above suffixed by a modifier: * An asterisk (`*`) to indicate 0 or more repetitions * A number in brackets (`[]`) to indicate a specific number of repetitions. * Comments sometimes appear to clarify the intent of certain structures. These will be enclosed in parentheses (`()`). Commit ------ .. code-block:: none commit := commitHeader treeRef parent* authorLine committerLine message commitHeader := "commit" NUL treeRef := 0x01 parent := 0x02 authorLine := 0x03 person committerLine := 0x04 person person := name email timestamp tzOffset name := email := timestamp := tzOffset := message := Tree ---- .. note:: In representing trees we split the count of tree contents into three fields: features, trees, and buckets. Because of the way GeoGig builds trees, buckets must be zero when either of the other two fields is nonzero. We should probably document how exactly GeoGig builds trees :) .. code-block:: none tree := treeHeader size treeCount features trees buckets size := (the total [recursive] count of features in this tree) treeCount := (in a bucket tree: the number of trees that are direct children of the bucket tree. In a node tree: 0) features := count node* trees := count node* buckets := count bucket* count := node := name objectId metadataId envelope nodeType name := objectId := [20] metadataId := [20] envelope := [4] (minx, maxx, miny, maxy. Note that this may be (0, -1, 0, -1) as is traditional for indicating NULL envelopes. Of course empty (zero-area) envelopes are valid as well.) nodeType := (0x01: Tree, 0x02: Feature) bucket := index objectId envelope index := Feature ------- .. code-block:: none feature := featureHeader count fields featureHeader := "feature" NUL count := fields := field* field = nullField | booleanField | byteField | shortField | intField | longField | floatField | doubleField | stringField | booleanArray | byteArray | shortArray | intArray | longArray | floatArray | doubleArray | stringArray | geometryField | uuidField | bigIntField | bigDecimalField nullField := 0x00 booleanField := 0x01 byteField := 0x02 shortField := 0x03 intField := 0x04 longField := 0x05 floatField := 0x06 doubleField := 0x07 stringField := 0x08 booleanArray := 0x09 * (note that the int is the number of boolean values and booleans are packed to save space. so the number of bytes is actually the count of bits divided by 8) byteArray := 0x0A * shortArray := 0x0B * intArray := 0x0C * longArray := 0x0D * floatArray := 0x0E * doubleArray := 0x0F * stringArray := 0x10 pointField := 0x11 * (bytes represent the geometry encoded as Well-Known Binary) lineStringField := 0x12 * (same) polygonField := 0x13 * (same) multiPointField := 0x14 * (same) multiLineStringField := 0x15 * (same) multiPolygonField := 0x16 * (same) geometryCollectionField := 0x17 * (same) geometryField := 0x18 * (same) uuidField := 0x19 bigIntField := 0x1A * bigDecimalField := 0x1B * (scale, length of byte array, byte array) datetimeField := 0x1C (milliseconds since unix epoch) dateField := 0x1D (datetime with hours, minutes, seconds, milliseconds all set to 0) timeField := 0x1E (datetime with years, months, days all set to zero (ie, a time on Jan 1 1970)) timestampField := 0x1F (datetime followed by a specifier of nanoseconds within the millisecond) FeatureType ----------- .. code-block:: none featureType := featureTypeHeader name properties featureTypeHeader := "featuretype" NUL name := namespace localPart namespace := localPart := properties := property* property := name nillability minOccurs maxOccurs type nillability := (0: non-nillable, 1: nillable. other values unused.) minOccurs := maxOccurs := type := spatialType | aspatialType aspatialType := name typeTag (aspatial types are distinguished from spatial ones by the value of the type tag) typeTag := (as used in features) spatialType := name typeTag crsTextInterpretation crsText crsTextInterpretation := (0: crsText is WKT CRS definition, 1: crsText references a well-known CRS by identifier. If it uses URI notation ("urn:...") then the axes should be forced to X=Easting, Y=Northing order.) crsText := (as determined by crsTextInterpretation) Tag --- .. code-block:: none tag := tagHeader objectId tagName message tagger tagHeader := "tag" NUL objectId := [20] tagName := message := tagger := name email timestamp tzOffset name := email := timestamp := tzOffset :=