Storage Back-ends

GeoGig can store repository elements in different storage backends or databases.

There are currently two storage backends: one default backend for repositories in the local computer’s file system, and the second is a PostgreSQL backend that stores all the repository data in a Postgres database.

Whether to use one or the other is a matter of choice and/or convenience. The default storage backend is recommended for single repositories, such as when you are working on a specific project in your local machine.

The PostgreSQL backend is recommended for server side deployments, such as a deployment where several repositories are served through GeoServer.

A common scenario is to have a remote GeoServer with several repositories that use the PostgreSQL backend to host them all, and a local clone of one of those repositories in your local machine, can take the local clone off-line or simply do work disconnected from the server, and then pull from and push changes to the remote repository.

Default storage backend

Up to 1.0-RC3, the default storage backend was a BerkeleyDB Java Edition key/value database for the main repository elements such as the revision and graph objects, and simple text files for configuration settings, refs, and merge conflicts.

As of 1.0-RC4, the BerkeleyDB database was replaced by a better performing database, RocksDB. Nonetheless, repositories created with the BerkeleyDB storage backend will still be supported through GeoGig’s plug-in mechanism.

When a repository is created in the local filesystem, it uses a directory structure like the following:

user@localhost:/data/myrepository$ ls .geogig/ -F1
config
graph/
HEAD
hooks/
log/
objects/
refs/
STAGE_HEAD
WORK_HEAD

In order to use the GeoGig command line for a specific local repository, you can either:

  • Work inside the repository directory (the .geogig directory’s parent), like in the example above;
  • Use the --repo <path/to/repository> argument to the geogig command. e.g.: user@localhost:/home/user$ geogig --repo /data/myrepository log to list the commits in the current branch of the /data/myrepository repo.

PostgreSQL storage backend

Whilst the default storage backend can hold only one single repository per directory, the PostgreSQL storage backend is designed to host several repositories in the same database.

It stores everything that’s needed in the database, making it “stateless” and hence the best choice for server deployments, since you can even have several GeoServer instances serving data from the same repositories, in a cluster-like deployment.

Note

The database DOES NOT need PostGIS. A plain PostgreSQL database is used as a pure key/value store for most needs. Beware that for performance reasons, PostgreSQL 9.4 or newer is recommended, since prior versions have a performance issue with hash indexing that has been fixed since.

PostgreSQL repository URI

Since a repository stored in PostgreSQL is stateless, as far as the local machine is concerned, you need to specify the repository location for every console command you want to execute against it.

This is done through a URI (Universal Resource Identifier) of the following form:

postgresql://<server>[:<port>]/<database>[/<schema>]/<reponame>?user=<username>&password=<pwd>

In the above URI scheme, parts enclosed between <> symbols are mandatory, and parts enclosed between [] symbols are optional. These symbols do not have to be written as part of the actual URI.

  • <server>: The PostgreSQL server host or IP address
  • [:port]: The TCP port number the server is listening to for connections. Defaults to 5432 if not provided.
  • <database>: The name of the database in the server where GeoGig will store the repositories
  • <schema>: The PostgreSQL database schema, defaults to public
  • <reponame: The name of the GeoGig repository to create
  • <username>: The PostgreSQL user name to connect as
  • <pwd>: The PostgreSQL user’s password to connect with

Examples:

The following commands create two repositories, gold and QA in the same geogig database of the dbserver PostgreSQL instance:

user@localhost:/home/user$ geogig --repo "postgresql://dbserver/geogig/master/gold?user=peter&password=secret" init

user@localhost:/home/user$ geogig --repo "postgresql://dbserver/geogig/master/QA?user=peter&password=secret" init

Tired already of typing the repository URI? A nice trick is to use environment variables instead:

user@localhost:/home/user$ export gold="postgresql://dbserver/geogig/master/gold?user=peter&password=secret"
user@localhost:/home/user$ export QA="postgresql://dbserver/geogig/master/QA?user=peter&password=secret"

user@localhost:/home/user$ geogig --repo $gold init
user@localhost:/home/user$ geogig --repo $QA init

Performance Tuning

There are several configuration options for tuning the cacheing and threading of the PostgreSQL backend. The following configuration options should be set in the global config of the database.

  • postgres.maxConnections: The maximum number of simultaneous idle and in-use connections to the database. Defaults to 10.
  • postgres.threadPoolSize: The number of threads that the object database should use. Defaults to the number of processors available to the virtual machine, or 2, whichever is higher.
  • Caching: refer to the Runtime configuration section for more information

Additionally, each repository can be tuned by adjusting the batch size of GET and PUT requests by adjusting the following configuration options.

  • postgres.getAllBatchSize: The maximum number of objects to retrieve from the database in a single request. If the number of objects that need to be fetched exceed the batch size, the job will be split into multiple tasks that can be run on multiple threads, if they are available.
  • postgres.putAllBatchSize: The maximum number of objects to insert into the database in a single request. If the number of objects that need to be inserted exceed the batch size, the job will be split into multiple tasks that can be run on multiple threads, if they are available.

Database set up

GeoGig will create the necessary tables the first time it’s used against a given database. For example, if you start with an empty databse and create a new repository on it, GeoGig will detect the missing tables and create them, like in the following example:

$ su - postgres -c "create database geogig_tests"
$ geogig init "postgresql://localhost/geogig_tests/myrepo?user=postgres&password=postgres"
Initialized empty Geogig repository in postgresql://localhost/geogig_tests/myrepo?user=postgres&password=postgres

However, the database and user/role must already exist in PostgreSQL. You can use a pre-existing PostgreSQL role with administrative access to an existing database or you can run the following steps and SQL script to create the GeoGig database and tables first.

geogig_postgres.sql

user@localhost:/home/user$ su - postgres
postgres@localhost: $ createdb geogig
postgres@localhost: $ psql -d geogig -f geogig_postgres.sql

Finally, refer to the PostgreSQL-backed GeoGig repository to learn how to configure a repository in GeoServer.

back to top