GeoGig merge conflicts strategy

This documents discusses how GeoGig behaves in the case of merge conflicts, and the different elements involved.

Possible merge scenarios

This is a comprehensive list of possible situations that are handled when merging changes from two actors.

Non-conflicting situations:

Both actors have added the same feature type Both actors have added the same feature Both actors have modified the same feature attribute of a feature, setting the same new value Both actors have removed the same feature Both actor have removed the same feature type Both actor have modified a feature, each of them modifying different attributes, so there is no overlap in changes Both actor have modified a feature geometry, each of them altering a different part of it, so changes are compatible (for instance, in a multipolygon geometry, each one has edited a different polygon, or in a single-line one, one actor has moved the starting point while the other has moved the ending point.

Conflicting changes

Actors have modified the same attribute in a feature, setting different new values. Actors have edited the geometry of a feature, making incompatible changes (for instance, one of them has added deleted a subgeometry, while the other has deleted it). User should select one of the changes, or the union of resulting geometries, if possible. One actor has modified a feature or feature type, while the other has deleted it. One actor has modified a feature attribute, while the other has deleted it. Both actors have added a feature or feature type with the same name, but different content. User should select which feature to keep, or not to add any of them. This should consider features added under the path for that feature type, to correctly set the corresponding metadata ID for features.

Conflict-solving strategy

The three basic parts of the strategy followed by GeoGig are the conflicts file (to flag conflicted files and store versions to merge), the working tree (to edit the conflicts and solve them) and the references (to keep track of merged branches while the conflicts are being solved and then committed). They are explained next.

The conflict file during a conflicted merge operation

Unlike git, GeoGig does not put conflict marks in the index, but in a conflict file in the geogig repository. This text file contains triplets of IDs in each line, with the following structure:

<ancestor_ID> t <ours_ID> t and <theirs_ID>

IDs refer to features in the case of a conflicted feature, and to featureTypes in the case of a conflict in a tree (not the tree ID)

Ours is equivalent to the HEAD version, while theirs is the MERGE_HEAD version (this references are discussed later)

The working tree file during a conflicted merge operation

In Git, the working tree version contains the result of the “merge” program with conflict markers (<<< === >>>). Git alters the working tree version so the merge tool, which is not a Git tool, can access all the changes from both parts without having to be able to read the index. The application doesn’t have to be aware of Git at any point, but just handle text files and understand the syntax of the merge program.

This behaviour cannot be copied in GeoGig, since a version with all changes together cannot be stored with the design of the working tree. Instead, the working tree features will remain unchanged, and should be changed by the merge tool when resolving the conflict. The 3 versions needed for a 3-way merge are all stored in the repository database, and can be accessed using the IDs referenced in the conflicts file.

References during a conflicted merge operation.

When a merge operation cannot be completed because of existing conflicts, GeoGig creates a MERGE_HEAD reference that points to the branch to merge. It also keeps a ORIG_HEAD reference, which points to the head of the branch onto which the merge is to be performed. GeoGig actually doesn’t use this as a flag to see if there are conflicts (they can be removed and it will still complain when performing a commit).

The conflicted stated is just kept on the conflicts file. MERGE_HEAD is, however, used when performing the commit. If removed, once the conflict is solved, the commit can be performed, but will have just HEAD as the parent commit, like a normal commit.

ORIG_HEAD is useful for aborting the merge operation by doing, for instance, geogig reset --hard ORIG_HEAD

The resulting commit will have as parents both ORIG_HEAD and the MERGE_HEAD.

GeoGig keeps a MERGE_MSG file, that will be used by the commit operation, and its generated by Git automatically, containing the names of conflicted files

Some implementation details

The StagingArea class is responsible of marking conflicts as solved when an element is added, so classes staging to the index do not have to take care of that. It also resolve conflicts when the STAGE_HEAD reference is updated

The heap-based staging database has also a heap-based conflicts storage.

An AOP scheme has been implemented to intercept calls to the ‘call’ method of operations that cannot run while a merge conflict is being solved. Operations in the ‘plumbing’ package, and those annotated with the @CanRunDuringConflict annotation, are not intercepted. The remaining ones are intercepted and aborted if conflicts exist.

This approach should not collide with the method interception used by the hooks functionality, although certain operations might be intercepted by both of them. It should be reviewed once both branches are merged, to ensure the interceptor corresponding to the merge conflict has priority

The available merge strategies are as follows:

  • normal merge with conflicts marking if merging two branches.
  • octopus merge if merging more than two branches. Operation is canceled if conflicts exist
  • “ours” and “theirs” strategies available if explicitly invoked, only for the case of merging just two branches

MergeTool

Although still unfinished, the above functionality can be used to discuss, implement and test a tool for solving merge conflicts.

Currently, a conflicts commands is implemented, which returns a description of all conflicts, or just those that match a given path, if it is specified. It basically calls the ‘cat’ command on the ancestor, ours and theirs versions of each conflict, and prints that out. This could be used by an external mergetool.

Import of solved elements

If external merge tools are planned, there should be a way for them to import solved elements in the working tree. Currently, the only way of importing is from shp/PostGIS/Geopackage, but that might not be the best solution in this case. A simpler way, like importing from a text string, should be implemeted in GeoGig. Normal users will not use those method to import their data, but tools needing to alter the working tree could use them and it would be much easier to implement a working connection between them and GeoGig.

Tree/Feature type changes

The current implementation support merging branches where features have been altered. If changes have affected feature types, this requires a different handling, which is rather different to the case of git.

This different approach is not only related to how conflicts are evaluated and merged, but also how differences are reported by operations like DiffTree. Some changes might not actually affect an object in the repo, but just the node pointing to it. This should be handled differently.

In the current implementation, the merge operation is capable of detecting conflicting changes in the default metadata associated to a tree, marking the tree as conflicted, even if no change has been done to its content. The conflict is stored in the index by storing the ancestor, ‘ours’ and ‘theirs’ metadata id’s, instead of the element id, which might remain unchanged (if no change to the nodes in the tree has been made), or not.

When a conflicted entry in the index resolves to a tree, it stores the medatata id’s, but not tree id’s. This is not needed, since the modification causing a conflict has to be a modification in the associated feature type, not in the features it contains.

How to solve this kind of conflict that appears due to different metadata and not due to different features, should be discussed and taken into account when designing the mergetool.

back to top