GeoGig tutorial¶
GeoGig is a Distributed Version Control System (DVCS) for geospatial data.
This document is a short introduction to the main ideas and elements of GeoGig. It describes how to set up and use GeoGig to version spatial data, introducing the following operations:
- Importing unversioned spatial data into GeoGig (“import”)
- Making changes and storing snapshots (“commit”)
- Maintaining independent lines of modifications (“branch”)
- Integrating changes from separate branches (“merge”)
- Resolving conflicting edits
- Synchronizing data across a network (“push” and “pull”)
- Marking specific versions of the data (“tag”)
- Exporting data from GeoGig to a shapefile
This tutorial assumes no prior experience with GeoGig. More details can be found in later sections.
Installation¶
Follow the instructions on the Installation page to install GeoGig.
Data¶
The data used in the following examples was supplied by the City of Raleigh, and is available for download from the City of Raleigh Open Data Portal.
For the purposes of this tutorial download the sample data HERE.
This zip file contains point locations of parks maintained by the City of Raleigh, named Raleigh_Park_Locations
. Each version of the data has already been slightly modified in order to simulate an example workflow.
Configuration¶
Before we start working with geospatial data in GeoGig, you must provide GeoGig with a name and email, using the config
command, substituting your name and email:
geogig config --global user.name "Author"
geogig config --global user.email "author@example.com"
Initialization¶
We must create a new repository. Create a new folder to contain the repository and extract the data zip into that directory. Initialize the GeoGig repository by typing:
geogig init
Now your GeoGig repository is ready to manage and version your geospatial data. Note that a .geogig
directory was created.
Importing data¶
To work with data in GeoGig, it has to first be imported into the repository working tree.
We will start by importing the Raleigh Park Locations
shapefile using the following command:
geogig shp import parks/Raleigh_Park_Locations.shp
The response will look like this:
Importing from shapefile parks/Raleigh_Park_Locations.shp
0%
Importing Raleigh_Park_... (1/1)...
0%
113 features inserted in 57.11 ms
Building final tree Raleigh_Park_Locations...
113 features tree built in 3.450 ms
100%
parks/Raleigh_Park_Locations.shp imported successfully.
The data from the shapefile is now in the working tree, but it is not yet versioned. However, the data is now in a format that GeoGig can understand, such that it may be aware of any changes to the data you might introduce.
Run the following command to verify that your data is in the working tree.
geogig ls -r
The response will look like this:
Root tree/
9
7
8
...
11
12
Features from the shapefile are added to the working tree under a tree named Root tree
. A tree in a GeoGig repository is analogous to a directory in a filesystem. Features are named as numbers, reflecting the order in which they are found in the source data. It is not necessarily the same order in which they are listed by the ls
command.
Running the status
command will give you information about the data you have that is not already versioned.
geogig status
# On branch master
# Changes not staged for commit:
# (use "geogig add <path/to/fid>..." to update what will be committed
# (use "geogig checkout -- <path/to/fid>..." to discard changes in working directory
#
# added Raleigh_Park_Locations
# added Raleigh_Park_Locations/9
# added Raleigh_Park_Locations/7
# added Raleigh_Park_Locations/8
...
# added Raleigh_Park_Locations/75
# added Raleigh_Park_Locations/70
# 114 total.
Adding data¶
To tell GeoGig that you want to version data in the working tree, you have to add it to the staging area. Do this by running the following command.
geogig add
The response will look like this:
Counting unstaged elements...114
Staging changes...
100%
113 features and 1 trees staged for commit
0 features and 0 trees not staged for commit
Now your data is ready to be used to create a snapshot (a commit in GeoGig terminology).
If you run the status
command again, you will see a different output, since your data has now been added and is ready to be versioned.
geogig status
The response will look like this:
# On branch master
# Changes to be committed:
# (use "geogig reset HEAD <path/to/fid>..." to unstage)
#
# added Raleigh_Park_Locations
# added Raleigh_Park_Locations/9
# added Raleigh_Park_Locations/7
# added Raleigh_Park_Locations/8
...
# added Raleigh_Park_Locations/75
# added Raleigh_Park_Locations/70
# 114 total.
The staging area is the last area before the data gets versioned in the repository database.
Committing¶
Committing means to create a new version of the data which is in the staging area.
Type the following command.
geogig commit -m "first version"
The response will look like this:
100%
[11b7058f4b8aaca98036f24c127e929281a01cce] first version
Committed, counting objects...113 features added, 0 changed, 0 deleted.
The text between quotes after the -m
option is the commit message, which should describe the snapshot in a human-readable format.
Making edits¶
We will now simulate making an edit to our parks layer. The parks_plus_1feature/Raleigh_Park_Locations.shp
file contains the same data as the original parks file, but with an added feature. Import this file.
To do this, follow the same procedure as before: import data, add, and then commit.
geogig shp import parks_plus_1feature/Raleigh_Park_Locations.shp
Note
All editing of data must be done externally to GeoGig. If you prefer to make your own edits, you can do so using QGIS or any other GIS software you prefer.
If you run the status
command after importing (and before adding), you will see elements which are not yet staged for commits. GeoGig will only report modifications to features that have been changed.
geogig status
The response will look like this:
# On branch master
# Changes not staged for commit:
# (use "geogig add <path/to/fid>..." to update what will be committed
# (use "geogig checkout -- <path/to/fid>..." to discard changes in working directory
#
# modified Raleigh_Park_Locations
# added Raleigh_Park_Locations/114
# 2 total.
Now add the new features:
geogig add
Counting unstaged elements...2
Staging changes...
100%
1 features and 1 trees staged for commit
0 features and 0 trees not staged for commit
Then commit to create a new version:
geogig commit -m "first modification"
100%
[bcafa36c5d6107e6bb95ba8a93fef48800762771] first modification
Committed, counting objects...1 features added, 0 changed, 0 deleted.
Viewing repository history¶
You can use the log
command to see the history of your repository. The history consists of the listing of commits, ordered in reverse chronological order (most recent first).
geogig log
Commit: bcafa36c5d6107e6bb95ba8a93fef48800762771
Author: Author <author@example.com>
Date: (2 minutes ago) 2016-12-17 11:40:04 -0800
Subject: first modification
Commit: 11b7058f4b8aaca98036f24c127e929281a01cce
Author: Author <author@example.com>
Date: (13 minutes ago) 2016-12-17 11:28:57 -0800
Subject: first version
Creating a branch¶
Data editing can be done on multiple history lines of the repository, so one line can be kept clean and stable while edits are performed on another line. These lines are known as branches. You can merge commits from one branch to another branch at any time.
To create a new branch named “myedits”, run the following command:
geogig branch myedits -c
The response will look like this:
Created branch refs/heads/myedits
The -c
option tells GeoGig to not only create the branch, but also, to switch to that branch. Everything done will now be added to this new history line.
Note
The default branch is named master
.
Now use the parks_plus_2features/Raleigh_Park_Locations.shp
file. Once again - import, add, and then commit. This shapefile contains the same data as the last version, with yet another feature added on.
geogig shp import parks_plus_2features/Raleigh_Park_Locations.shp
geogig add
geogig commit -m "added new feature"
The log
command will show a history like this:
Commit: 1466c1c75d51282093b9d85e96b14e9898b74d2f
Author: Author <author@example.com>
Date: (40 seconds ago) 2016-12-17 11:45:02 -0800
Subject: added a new feature
Commit: bcafa36c5d6107e6bb95ba8a93fef48800762771
Author: Author <author@example.com>
Date: (5 minutes ago) 2016-12-17 11:40:04 -0800
Subject: first modification
Commit: 11b7058f4b8aaca98036f24c127e929281a01cce
Author: Author <author@example.com>
Date: (16 minutes ago) 2016-12-17 11:28:57 -0800
Subject: first version
Merging commits from a branch¶
Our repository now has two branches: the one we just created (myedits
) and the default branch (master
). To see all the branches within a given repository, execute the geogig branch
command.
Let’s merge the changes we have just added from the myedits
branch into the master
branch.
First switch to the branch to which you would like to apply the changes, in this case it is master
. Execute the geogig checkout master
command to switch to the master
branch.
geogig checkout master
The response will look like this:
Switched to branch 'master'
The log
command will show the following history. Use the --oneline
option to compact the output:
geogig log --oneline
The response will look like this:
bcafa36c5d6107e6bb95ba8a93fef48800762771 first modification
11b7058f4b8aaca98036f24c127e929281a01cce first version
Notice that the most recent commit (with the message “added new feature”) is missing. This is because it was added to the myedits
branch, not the master
branch (the branch we are currently on).
To merge the work done in the myedits
branch into the current master
branch, enter the following command:
geogig merge myedits
The response will look like this:
Checking for possible conflicts...
1%
Merging commit 71217cac78d501e0dc120c596bb01a01a0a737d7
Conflicts: 0, merged: 0, unconflicted: 2
0%
[71217cac78d501e0dc120c596bb01a01a0a737d7] added new feature
Committed, counting objects...1 features added, 0 changed, 0 deleted.
Now we can see that the latest commit introduced into the myedits
branch is also present in master
.
geogig log --oneline
1466c1c75d51282093b9d85e96b14e9898b74d2f added a new feature
bcafa36c5d6107e6bb95ba8a93fef48800762771 first modification
11b7058f4b8aaca98036f24c127e929281a01cce first version
Handling merge conflicts¶
We just saw that the work done on one branch could be merged automatically to another branch without problems. This is not always possible, in which case it needs to be done manually.
To see this in action, create a new branch named conflict_res
, and create a commit based on the parks_1st_change/Raleigh_Park_Locations.shp
shapefile.
geogig branch conflict_res -c
geogig shp import parks_1st_change/Raleigh_Park_Locations.shp
geogig add
geogig commit -m "edits on the conflict_res branch"
This is the same data as parks_plus_2features/Raleigh_Park_Locations.shp
, however the new shapefile changes the name for ‘Walnut Terrace Park’ to ‘Walnut Terrace Field’.
Now go back to the master
branch and create a new commit with the data in parks_2nd_change/Raleigh_Park_Locations.shp
.
This is the same data as parks_plus_2features/Raleigh_Park_Locations.shp
, however the new shapefile changes the name for ‘Walnut Terrace Park’ to ‘Walnut Terrace Grove’.
geogig checkout master
geogig shp import parks_2nd_change/Raleigh_Park_Locations.shp
geogig add
geogig commit -m "edits on the master branch"
This is a conflict situation, as the same data has been changed in two different manners in the two branches. If you try to merge the fix
branch into master
, GeoGig cannot automatically resolve this situation and will fail.
geogig merge conflict_res
Checking for possible conflicts...
1%
Possible conflicts. Creating intermediate merge status...
0%
Saving 1 conflicts...
CONFLICT: Merge conflict in Raleigh_Park_Locations/1
Automatic merge failed. Fix conflicts and then commit the result.
You can see that there is a conflict by running the status
command:
geogig status
# On branch master
#
# Unmerged paths:
# (use "geogig add/rm <path/to/fid>..." as appropriate to mark resolution
#
# unmerged Raleigh_Park_Locations/1
# 1 total.
An unmerged path represents a element with a conflict.
You can get more details about the conflict by running the conflicts
command:
geogig conflicts --diff
The response will look like this:
---Raleigh_Park_Locations/1---
Ours
NAME: Walnut Terrace Park -> Walnut Terrace Grove
Theirs
NAME: Walnut Terrace Park -> Walnut Terrace Field
The output indicates that the value in the NAME
attribute of the Raleigh_Park_Locations/1
feature is causing the conflict.
The conflict has to be solved manually. You will have to merge both versions yourself, or just select one of the versions to be used.
Assume we want to use the changed feature in the conflict_res
branch. Since we are in the master
branch, the conflict_res
branch is considered “theirs.” Run the following command:
geogig checkout -p Raleigh_Park_Locations/1 --theirs
The response will look like this:
Objects in the working tree were updated to the specified version.
That puts the conflict_res
branch version in the working tree, overwriting what was there. This removes the conflict.
geogig add
Counting unstaged elements...2
Staging changes...
50%
Building final tree Raleigh_Park_Locations
Removing 1 merged conflicts...
Done. 0 unmerged conflicts.
100%
1 features and 1 trees staged for commit
0 features and 0 trees not staged for commit
Now that the conflict has been resolved, you may commit the change. There is no need to add a commit message, since that is created automatically during a merge operation.
geogig commit
Tagging a version¶
You can add a “tag” to a version to easily identify a snapshot with something more descriptive than the identifier associated with each commit.
To do so, use the tag
command:
geogig tag my_tag_name -m "First official version"
Now you can refer to a specific version of the data with a name.
Exporting from a GeoGig repository¶
Data can be exported from a GeoGig repository into several formats, ready to be used by external applications.
To export a given tree to a shapefile, use the shp export
command.
geogig shp export Raleigh_Park_Locations my_parks.shp
Exporting from Raleigh_Park_Locations to my_parks...
100%
Raleigh_Park_Locations exported successfully to my_parks.shp
This will create a file named my_parks.shp
in the current directory that contains the current state of the repository.
Past versions of the data can also be exported by prefixing the tree name with a commit ID and a colon, as in the following example:
geogig shp export 6bcd72b1a536aa6ec9a773a353f3e4e6f2ffa973:Raleigh_Park_Locations my_older_parks.shp
Use “HEAD” notation to export changes relative to the current working revision. For example, HEAD~1
refers to the second-most recent commit, HEAD~2
refers to the commit prior to that, etc.
geogig shp export HEAD~1:Raleigh_Park_Locations 2nd_last_version_parks.shp
Synchronizing repositories¶
A GeoGig repository can interact with other GeoGig repositories that are working with the same data. Other GeoGig repositories are known as remotes.
In our situation, we created a new repository from scratch using the init
command. But if we wanted to start with a copy of an existing repository (referred to as the origin
), use the clone
command.
Let’s clone the repository we have been working on. Create a new directory in your file system, move into it and run the following commands (replace ‘YOUR_FIRST_REPO’ with the actual name of the first directory created)
mkdir ../my_new_repo
cd ../my_new_repo
geogig clone ../YOUR_FIRST_REPO
The response will look like this:
Cloning into 'geogig_tutorial'...
Fetching objects from refs/heads/conflict_res
1%
Fetching objects from refs/heads/master
Fetching objects from refs/heads/myedits
Fetching objects from refs/tags/my_tag_name
100%
Done.
With the repository cloned, you can work here as you would normally and the changes will be placed on top of the changes that already exist from the original repository.
You can merge commits from the origin
repository to this new repository by using the pull
command. This will update the current branch with changes that have been made on that branch in the remote repository since the last time both repositories were synchronized.
geogig pull origin
To move your local changes from your repository into origin
, use the push
command:
geogig push origin
Tutorial complete¶
Congratulations! You now know the basics of managing data with GeoGig.
Check out the rest of the GeoGig manual in order to learn more!