Top mercurial Questions

List of Tags

I've been using git for some time now on Windows (with msysGit) and I like the idea of distributed source control. Just recently I've been looking at Mercurial (hg) and it looks interesting. However, I can't wrap my head around the differences between hg and git.

Has anyone made a side-by-side comparison between git and hg? I'm interested to know what differs hg and git without having to jump into a fanboy discussion.

Answered By: jfs ( 345)

These articles may help:

Edit: Comparing Git and Mercurial to celebrities seems to be a trend. Here's one more:

For a while now I've been using subversion for my personal projects.

More and more I keep hearing great things about Git and Mercurial, and DVCS in general.

I'd like to give the whole DVCS thing a whirl, but I'm not too familiar with either option.

What are some of the differences between Mercurial and Git?

Note that I'm not trying to find out which one is "best" or even which one I should start with. I'm mainly looking for key areas where they are similar and where they are different, because I am interested to know how they differ in terms of implementation and philosophy.

Answered By: Jakub Narębski ( 321)

Disclaimer: I use Git, follow Git development on git mailing list, and even contribute a bit to Git (gitweb mainly). I know Mercurial from documentation and some from discussion on #revctrl IRC channel on FreeNode.

Thanks to all people on on #mercurial IRC channel who provided help about Mercurial for this writeup



Summary

Here it would be nice to have some syntax for table, something like in PHPMarkdown / MultiMarkdown / Maruku extension of Markdown

  • Repository structure: Mercurial doesn't allow octopus merges (with more than two parents), nor tagging non-commit objects.
  • Tags: Mercurial uses versioned .hgtags file with special rules for per-repository tags, and has also support for local tags in .hg/localtags; in Git tags are refs residing in refs/tags/ namespace, and by default are autofollowed on fetching and require explicit pushing.
  • Branches: In Mercurial basic workflow is based on anonymous heads; Git uses lightweight named branches, and has special kind of branches (remote-tracking branches) that follow branches in remote repository.
  • Revision naming and ranges: Mercurial provides revision numbers, local to repository, and bases relative revisions (counting from tip, i.e. current branch) and revision ranges on this local numbering; Git provides a way to refer to revision relative to branch tip, and revision ranges are topological (based on graph of revisions)
  • Mercurial uses rename tracking, while Git uses rename detection to deal with file renames
  • Network: Mercurial supports SSH and HTTP "smart" protocols; modern Git supports SSH, HTTP and GIT "smart" protocols, and HTTP(S) "dumb" protocol. Both have support for bundles files for off-line transport.
  • Mercurial uses extensions (plugins) and established API; Git has scriptability and established formats.

There are a few things that differ Mercurial from Git, but there are other things that make them similar. Both projects borrow ideas from each other. For example hg bisect command in Mercurial (formerly bisect extension) was inspired by git bisect command in Git, while idea of git bundle was inspired by hg bundle.

Repository structure, storing revisions

In Git there are four types of objects in its object database: blob objects which contain contents of a file, hierarchical tree objects which store directory structure, including file names and relevant parts of file permissions (executable permission for files, being a symbolic link), commit object which contain authorship info, pointer to snapshot of state of repository at revision represented by a commit (via a tree object of top directory of project) and references to zero or more parent commits, and tag objects which reference other objects and can be signed using PGP / GPG.

Git uses two ways of storing objects: loose format, where each object is stored in a separate file (those files are written once, and never modified), and packed format where many objects are stored delta-compressed in a single file. Atomicity of operations is provided by the fact, that reference to a new object is written (atomically, using create + rename trick) after writing an object.

Git repositories require periodic maintenance using git gc (to reduce disk space and improve performance), although nowadays Git does that automatically. (This method provides better compression of repositories.)

Mercurial (as far as I understand it) stores history of a file in a filelog (together, I think, with extra metadata like rename tracking, and some helper information); it uses flat structure called manifest to store directory structure, and structure called changelog which store information about changesets (revisions), including commit message and zero, one or two parents.

Mercurial uses transaction journal to provide atomicity of operations, and relies on truncating files to clean-up after failed or interrupted operation. Revlogs are append-only.

Looking at repository structure in Git versus in Mercurial, one can see that Git is more like object database (or a content-addressed filesystem), and Mercurial more like traditional fixed-field relational database.

Differences:
In Git the tree objects form a hierarchical structure; in Mercurial manifest file is flat structure. In Git blob object store one version of a contents of a file; in Mercurial filelog stores whole history of a single file (if we do not take into account here any complications with renames). This means that there are different areas of operations where Git would be faster than Mercurial, all other things considered equal (like merges, or showing history of a project), and areas where Mercurial would be faster than Git (like applying patches, or showing history of a single file). This issue might be not important for end user.

Because of the fixed-record structure of Mercurial's changelog structure, commits in Mercurial can have only up to two parents; commits in Git can have more than two parents (so called "octopus merge"). While you can (in theory) replace octopus merge by a series of two-parent merges, this might cause complications when converting between Mercurial and Git repositories.

As far as I know Mercurial doesn't have equivalent of annotated tags (tag objects) from Git. A special case of annotated tags are signed tags (with PGP / GPG signature); equivalent in Mercurial can be done using GpgExtension, which extension is being distributed along with Mercurial. You can't tag non-commit object in Mercurial like you can in Git, but that is not very important, I think (some git repositories use tagged blob to distribute public PGP key to use to verify signed tags).

References: branches and tags

In Git references (branches, remote-tracking branches and tags) reside outside DAG of commits (as they should). References in refs/heads/ namespace (local branches) point to commits, and are usually updated by "git commit"; they point to the tip (head) of branch, that's why such name. References in refs/remotes/<remotename>/ namespace (remote-tracking branches) point to commit, follow branches in remote repository <remotename>, and are updated by "git fetch" or equivalent. References in refs/tags/ namespace (tags) point usually to commits (lightweight tags) or tag objects (annotated and signed tags), and are not meant to change.

Tags

In Mercurial you can give persistent name to revision using tag; tags are stored similarly to the ignore patterns. It means that globally visible tags are stored in revision-controlled .hgtags file in your repository. That has two consequences: first, Mercurial has to use special rules for this file to get current list of all tags and to update such file (e.g. it reads the most recently committed revision of the file, not currently checked out version); second, you have to commit changes to this file to have new tag visible to other users / other repositories (as far as I understand it).

Mercurial also supports local tags, stored in hg/localtags, which are not visible to others (and of course are not transferable)

In Git tags are fixed (constant) named references to other objects (usually tag objects, which in turn point to commits) stored in refs/tags/ namespace. By default when fetching or pushing a set of revision, git automatically fetches or pushes tags which point to revisions being fetched or pushed. Nevertheless you can control to some extent which tags are fetched or pushed.

Git treats lightweight tags (pointing directly to commits) and annotated tags (pointing to tag objects, which contain tag message which optionally includes PGP signature, which in turn point to commit) slightly differently, for example by default it considers only annotated tags when describing commits using "git describe".

Git doesn't have a strict equivalent of local tags in Mercurial. Nevertheless git best practices recommend to setup separate public bare repository, into which you push ready changes, and from which others clone and fetch. This means that tags (and branches) that you don't push, are private to your repository. On the other hand you can also use namespace other than heads, remotes or tags, for example local-tags for local tags.

Personal opinion: In my opinion tags should reside outside revision graph, as they are external to it (they are pointers into graph of revisions). Tags should be non-versioned, but transferable. Mercurial's choice of using a mechanism similar to the one for ignoring files, means that it either has to treat .hgtags specially (file in-tree is transferable, but ordinary it is versioned), or have tags which are local only (.hg/localtags is non-versioned, but untransferable).

Branches

In Git local branch (branch tip, or branch head) is a named reference to a commit, where one can grow new commits. Branch can also mean active line of development, i.e. all commits reachable from branch tip. Local branches reside in refs/heads/ namespace, so e.g. fully qualified name of 'master' branch is 'refs/heads/master'.

Current branch in Git (meaning checked out branch, and branch where new commit will go) is the branch which is referenced by the HEAD ref. One can have HEAD pointing directly to a commit, rather than being symbolic reference; this situation of being on an anonymous unnamed branch is called detached HEAD ("git branch" shows that you are on '(no branch)').

In Mercurial there are anonymous branches (branch heads), and one can use bookmarks (via bookmark extension). Such bookmark branches are purely local, and those names were (up to version 1.6) not transferable using Mercurial. You can use rsync or scp to copy the .hg/bookmarks file to a remote repository. You can also use hg id -r <bookmark> <url> to get the revision id of a current tip of a bookmark.

Since 1.6 bookmarks can be pushed/pulled. The BookmarksExtension page has a section on Working With Remote Repositories. There is a difference in that in Mercurial bookmark names are global, while definition of 'remote' in Git describes also mapping of branch names from the names in remote repository to the names of local remote-tracking branches; for example refs/heads/*:refs/remotes/origin/* mapping means that one can find state of 'master' branch ('refs/heads/master') in the remote repository in the 'origin/master' remote-tracking branch ('refs/remotes/origin/master').

Mercurial has also so called named branches, where the branch name is embedded in a commit (in a changeset). Such name is global (transferred on fetch). Those branch names are permanently recorded as part of the changeset\u2019s metadata. With modern Mercurial you can close "named branch" and stop recording branch name. In this mechanism tips of branches are calculated on the fly.

Mercurial's "named branches" should in my opinion be called commit labels instead, because it is what they are. There are situations where "named branch" can have multiple tips (multiple childless commits), and can also consist of several disjoint parts of graph of revisions.

There is no equivalent of those Mercurial "embedded branches" in Git; moreover Git's philosophy is that while one can say that branch includes some commit, it doesn't mean that a commit belongs to some branch.

Note that Mercurial documentation still proposes to use separate clones (separate repositories) at least for long-lived branches (single branch per repository workflow), aka branching by cloning.

Branches in pushing

Mercurial by default pushes all heads. If you want to push a single branch (single head), you have to specify tip revision of the branch you want to push. You can specify branch tip by its revision number (local to repository), by revision identifier, by bookmark name (local to repository, doesn't get transferred), or by embedded branch name (named branch).

As far as I understand it, if you push a range of revisions that contain commits marked as being on some "named branch" in Mercurial parlance, you will have this "named branch" in the repository you push to. This means that names of such embedded branches ("named branches") are global (with respect to clones of given repository / project).

By default (subject to push.default configuration variable) "git push" or "git push <remote>" Git would push matching branches, i.e. only those local branches that have their equivalent already present in remote repository you push into. You can use --all option to git-push ("git push --all") to push all branches, you can use "git push <remote> <branch>" to push a given single branch, and you can use "git push <remote> HEAD" to push current branch.

All of the above assumes that Git isn't configured which branches to push via remote.<remotename>.push configuration variables.

Branches in fetching

Note: here I use Git terminology where "fetch" means downloading changes from remote repository without integrating those changes with local work. This is what "git fetch" and "hg pull" does.

If I understand it correctly, by default Mercurial fetches all heads from remote repository, but you can specify branch to fetch via "hg pull --rev <rev> <url>" or "hg pull <url>#<rev>" to get single branch. You can specify <rev> using revision identifier, "named branch" name (branch embedded in changelog), or bookmark name. Bookmark name however (at least currently) doesn't get transferred. All "named branches" revisions you get belong to get transferred. "hg pull" stores tips of branches it fetched as anonymous, unnamed heads.

In Git by default (for 'origin' remote created by "git clone", and for remotes created using "git remote add") "git fetch" (or "git fetch <remote>") gets all branches from remote repository (from refs/heads/ namespace), and stores them in refs/remotes/ namespace. This means for example that branch named 'master' (full name: 'refs/heads/master') in remote 'origin' would get stored (saved) as 'origin/master' remote-tracking branch (full name: 'refs/remotes/origin/master').

You can fetch single branch in Git by using git fetch <remote> <branch> - Git would store requested branch(es) in FETCH_HEAD, which is something similar to Mercurial unnamed heads.

Those are but examples of default cases of powerful refspec Git syntax: with refspecs you can specify and/or configure which branches one want to fetch, and where to store them. For example default "fetch all branches" case is represented by '+refs/heads/*:refs/remotes/origin/*' wildcard refspec, and "fetch single branch" is shorthand for 'refs/heads/<branch>:'. Refspecs are used to map names of branches (refs) in remote repository to local refs names. But you don't need to know (much) about refspecs to be able to work effectively with Git (thanks mainly to "git remote" command).

Personal opinion: I personally think that "named branches" (with branch names embedded in changeset metadata) in Mercurial are misguided design with its global namespace, especially for a distributed version control system. For example let's take case where both Alice and Bob have "named branch" named 'for-joe' in their repositories, branches which have nothing in common. In Joe's repository however those two branches would be mistreated as a single branch. So you have somehow come up with convention protecting against branch name clashes. This is not problem with Git, where in Joe's repository 'for-joe' branch from Alice would be 'alice/for-joe', and from Bob it would be 'bob/for-joe'. See also Separating branch name from branch identity issue raised on Mercurial wiki.

Mercurial's "bookmark branches" currently lack in-core distribution mechanism.

Differences:
This area is one of the main differences between Mercurial and Git, as james woodyatt and Steve Losh said in their answers. Mercurial, by default, uses anonymous lightweight codelines, which in its terminology are called "heads". Git uses lightweight named branches, with injective mapping to map names of branches in remote repository to names of remote-tracking branches. Git "forces" you to name branches (well, with exception of single unnamed branch, situation called detached HEAD), but I think this works better with branch-heavy workflows such as topic branch workflow, meaning multiple branches in single repository paradigm.

Naming revisions

In Git there are many ways of naming revisions (described e.g. in git rev-parse manpage):

  • The full SHA1 object name (40-byte hexadecimal string), or a substring of such that is unique within the repository
  • A symbolic ref name, e.g. 'master' (referring to 'master' branch), or 'v1.5.0' (referring to tag), or 'origin/next' (referring to remote-tracking branch)
  • A suffix ^ to revision parameter means the first parent of a commit object, ^n means n-th parent of a merge commit. A suffix ~n to revision parameter means n-th ancestor of a commit in straight first-parent line. Those suffixes can be combined, to form revision specifier following path from a symbolic reference, e.g. 'pu~3^2~3'
  • Output of "git describe", i.e. a closest tag, optionally followed by a dash and a number of commits, followed by a dash, a 'g', and an abbreviated object name, for example 'v1.6.5.1-75-g5bf8097'.

There are also revision specifiers involving reflog, not mentioned here. In Git each object, be it commit, tag, tree or blob has its SHA-1 identifier; there is special syntax like e.g. 'next:Documentation' or 'next:README' to refer to tree (directory) or blob (file contents) at specified revision.

Mercurial also has many ways of naming changesets (described e.g. in hg manpage):

  • A plain integer is treated as a revision number. One need to remember that revision numbers are local to given repository; in other repository they can be different.
  • Negative integers are treated as sequential offsets from the tip, with -1 denoting the tip, -2 denoting the revision prior to the tip, and so forth. They are also local to repository.
  • An unique revision identifier (40-digit hexadecimal string) or its unique prefix.
  • A tag name (symbolic name associated with given revision), or a bookmark name (with extension: symbolic name associated with given head, local to repository), or a "named branch" (commit label; revision given by "named branch" is tip (childless commit) of all commits with given commit label, with largest revision number if there are more than one such tip)
  • The reserved name "tip" is a special tag that always identifies the most recent revision.
  • The reserved name "null" indicates the null revision.
  • The reserved name "." indicates the working directory parent.

Differences
As you can see comparing above lists Mercurial offers revision numbers, local to repository, while Git doesn't. On the other hand Mercurial offers relative offsets only from 'tip' (current branch), which are local to repository (at least without ParentrevspecExtension), while Git allows to specify any commit following from any tip.

The most recent revision is named HEAD in Git, and "tip" in Mercurial; there is no null revision in Git. Both Mercurial and Git can have many root (can have more than one parentless commits; this is usually result of formerly separate projects joining).

See also: Many different kinds of revision specifiers article on Elijah's Blog (newren's).

Personal opinion: I think that revision numbers are overrated (at least for distributed development and/or nonlinear / branchy history). First, for a distributed version control system they have to be either local to repository, or require treating some repository in a special way as a central numbering authority. Second, larger projects, with longer history, can have number of revisions in 5 digits range so they are offer only slight advantage over shortened to 6-7 character revision identifiers, and imply strict ordering while revisions are only partially ordered (I mean here that revisions n and n+1 doesn't need to be parent and child).

Revision ranges

In Git revision ranges are topological. Commonly seen A..B syntax, which for linear history means revision range starting at A (but excluding A), and ending at B (i.e. range is open from below), is shorthand ("syntactic sugar") for ^A B, which for history traversing commands mean all commits reachable from B, excluding those reachable from A. This means that the behavior of A..B range is entirely predictable (and quite useful) even if A is not ancestor of B: A..B means then range of revisions from common ancestor of A and B (merge base) to revision B.

In Mercurial revision ranges are based on range of revision numbers. Range is specified using A:B syntax, and contrary to Git range acts as a closed interval. Also range B:A is the range A:B in reverse order, which is not the case in Git (but see below note on A...B syntax). But such simplicity comes with a price: revision range A:B makes sense only if A is ancestor of B or vice versa, i.e. with linear history; otherwise (I guess that) the range is unpredictable, and the result is local to repository (because revision numbers are local to repository).

This is fixed with Mercurial 1.6, which has new topological revision range, where 'A..B' (or 'A::B') is understood as the set of changesets that are both descendants of X and ancestors of Y. This is, I guess, equivalent to '--ancestry-path A..B' in Git.

Git also has notation A...B for symmetric difference of revisions; it means A B --not $(git merge-base A B), which means all commits reachable from either A or B, but excluding all commits reachable from both of them (reachable from common ancestors).

Renames

Mercurial uses rename tracking to deal with file renames. This means that the information about the fact that a file was renamed is saved at the commit time; in Mercurial this information is saved in the "enhanced diff" form in filelog (file revlog) metadata. The consequence of this is that you have to use hg rename / hg mv... or you need to remember to run hg addremove to do similarity based rename detection.

Git is unique among version control systems in that it uses rename detection to deal with file renames. This means that the fact that file was renamed is detected at time it is needed: when doing a merge, or when showing a diff (if requested / configured). This has the advantage that rename detection algorithm can be improved, and is not frozen at time of commit.

Both Git and Mercurial require using --follow option to follow renames when showing history of a single file. Both can follow renames when showing line-wise history of a file in git blame / hg annotate.

In Git the git blame command is able to follow code movement, also moving (or copying) code from one file to the other, even if the code movement is not part of wholesome file rename. As far as I know this feature is unique to Git (at the time of writing, October 2009).

Network protocols

Both Mercurial and Git have support for fetching from and pushing to repositories on the same filesystem, where repository URL is just a filesystem path to repository. Both also have support for fetching from bundle files.

Mercurial support fetching and pushing via SSH and via HTTP protocols. For SSH one needs an accessible shell account on the destination machine and a copy of hg installed / available. For HTTP access the hg-serve or Mercurial CGI script running is required, and Mercurial needs to be installed on server machine.

Git supports two kinds of protocols used to access remote repository:

  • "smart" protocols, which include access via SSH and via custom git:// protocol (by git-daemon), require having git installed on server. The exchange in those protocols consist of client and server negotiating about what objects they have in common, and then generating and sending a packfile. Modern Git includes support for "smart" HTTP protocol.
  • "dumb" protocols, which include HTTP and FTP (only for fetching), and HTTPS (for pushing via WebDAV), do not require git installed on server, but they do require that repository contains extra information generated by git update-server-info (usually run from a hook). The exchange consist of client walking the commit chain and downloading loose objects and packfiles as needed. The downside is that it downloads more than strictly required (e.g. in corner case when there is only single packfile it would get downloaded whole even when fetching only a few revisions), and that it can require many connections to finish.

Extending: scriptability vs extensions (plugins)

Mercurial is implemented in Python, with some core code written in C for performance. It provides API for writing extensions (plugins) as a way of adding extra features. Some of functionality, like "bookmark branches" or signing revisions, is provided in extensions distributed with Mercurial and requires turning it on.

Git is implemented in C, Perl and shell scripts. Git provides many low level commands (plumbing) suitable to use in scripts. The usual way of introducing new feature is to write it as Perl or shell script, and when user interface stabilizes rewrite it in C for performance, portability, and in the case of shell script avoiding corner cases (this procedure is called builtinification).

Git relies and is built around [repository] formats and [network] protocols. Instead of language bindings there are (partial or complete) reimplementations of Git in other languages (some of those are partially reimplementations, and partially wrappers around git commands): JGit (Java, used by EGit, Eclipse Git Plugin), Grit (Ruby), Dulwich (Python), git# (C#).


TL;DR

Inspired by Git for beginners: The definitive practical guide.

This is a compilation of information on using Mercurial for beginners for practical use.

Beginner - a programmer who has touched source control without understanding it very well.

Practical - covering situations that the majority of users often encounter - creating a repository, branching, merging, pulling/pushing from/to a remote repository, etc.

Notes:

  • Explain how to get something done rather than how something is implemented.
  • Deal with one question per answer.
  • Answer clearly and as concisely as possible.
  • Edit/extend an existing answer rather than create a new answer on the same topic.
  • Please provide a link to the the Mercurial wiki or the HG Book for people who want to learn more.

Questions:

Installation/Setup

Working with the code

Tagging, branching, releases, baselines

Other

Other Mercurial references

Answered By: Joakim Lundborg ( 14)

How do you configure it to ignore files?

Ignore is configured in a normal text file called .hgignore in the root of your repository. Add it just like a normal file with:

hg add .hgignore

There are two syntax options available for file matching, glob and regexp. glob is unix-like filename expansion and regexp is regular expressions. You activate each by adding syntax: glob or syntax: regexp on a line by itself. All lines following that will use that syntax, until the next syntax marker. You can have as many syntax markers as you want. The default syntax is regexp, so if you only use regexp you don't need any syntax marker.

You can add comments with #

Example:

# python temporary files
syntax: glob
*.pyc

#editor autosaves
*~

# temporary data
syntax: regexp
temp

Ignore only applies to unmanaged files (i.e. files that are not already checked in). To ignore files that are under version control, you can use the switches -I and -X.

I've heard a few places that one of the main ways distributed version control systems shine, is much better merging than traditional tools like SVN. Is this actually due to inherent differences in how the two systems work, or do specific DVCS implementations like Git/Mercurial just have cleverer merging algorithms than SVN?

Answered By: Spoike ( 322)

The claim of why merging is better in a DVCS than in Subversion was largely based on how branching and merge worked in Subversion a while ago. Subversion prior to 1.5.0 didn't store any information about when branches were merged, thus when you wanted to merge you had to specify which range of revisions that had to be merged.

So why did Subversion merges suck?

Ponder this example:

      1   2   4     6     8
trunk o-->o-->o---->o---->o
       \
        \   3     5     7
b1       +->o---->o---->o

When we want to merge b1's changes into the trunk we'd issue the following command, while standing on a folder that has trunk checked out:

svn merge -r 3:7 {link to branch b1}

… which will attempt to merge the changes from b1 into your local working directory. And then you commit the changes after you resolve any conflicts and tested the result. When you commit the revision tree would look like this:

      1   2   4     6     8   9
trunk o-->o-->o---->o---->o-->o      "the merge commit is at r9"
       \
        \   3     5     7
b1       +->o---->o---->o

However this way of specifying ranges of revisions gets quickly out of hand when the version tree grows as subversion didn't have any meta data on when and what revisions got merged together. Ponder on what happens later:

           12        14
trunk  …-->o-------->o
                                     "Okay, so when did we merge last time?"
              13        15
b1     …----->o-------->o

This is largely an issue by the repository design that Subversion has, in order to create a branch you need to create a new virtual directory in the repository which will house a copy of the trunk but it doesn't store any information regarding when and what things got merged back in. That will lead to nasty merge conflicts at times. What was even worse is that Subversion used two-way merging by default, which has some crippling limitations in automatic merging when two branch heads are not compared with their common ancestor.

To mitigate this Subversion now stores meta data for branch and merge. That would solve all problems right?

And oh, by the way, Subversion still sucks…

On a centralized system, like subversion, virtual directories suck. Why? Because everyone has access to view them… even the garbage experimental ones. Branching is good if you want to experiment but you don't want to see everyones' and their aunts experimentation. This is serious cognitive noise. The more branches you add, the more crap you'll get to see.

The more public branches you have in a repository the harder it will be to keep track of all the different branches. So the question you'll have is if the branch is still in development or if it is really dead which is hard to tell in any centralized version control system.

Most of the time, from what I've seen, an organization will default to use one big branch anyway. Which is a shame because that in turn will be difficult to keep track of testing and release versions, and whatever else good comes from branching.

So why are DVCS, such as Git, Mercurial and Bazaar, better than Subversion at branching and merging?

There is a very simple reason why: branching is a first-class concept. There are no virtual directories by design and branches are hard objects in DVCS which it needs to be such in order to work simply with synchronization of repositories (i.e. push and pull).

The first thing you do when you work with a DVCS is to clone repositories (git's clone, hg's clone and bzr's branch). Cloning is conceptually the same thing as creating a branch in version control. Some call this forking or branching (although the latter is often also used to refer to co-located branches), but it's just the same thing. Every user runs their own repository which means you have a per-user branching going on.

The version structure is not a tree, but rather a graph instead. More specifically a directed acyclic graph (DAG, meaning a graph that doesn't have any cycles). You really don't need to dwell into the specifics of a DAG other than each commit has one or more parent references (which what the commit was based on). So the following graphs will show the arrows between revisions in reverse because of this.

A very simple example of merging would be this; imagine a central repository called origin and a user, Alice, cloning the repository to her machine.

         a…   b…   c…
origin   o<---o<---o
                   ^master
         |
         | clone
         v

         a…   b…   c…
alice    o<---o<---o
                   ^master
                   ^origin/master

What happens during a clone is that every revision is copied to Alice exactly as they were (which is validated by the uniquely identifiable hash-id's), and marks where the origin's branches are at.

Alice then works on her repo, committing in her own repository and decides to push her changes:

         a…   b…   c…
origin   o<---o<---o
                   ^ master

              "what'll happen after a push?"


         a…   b…   c…   d…   e…
alice    o<---o<---o<---o<---o
                             ^master
                   ^origin/master

The solution is rather simple, the only thing that the origin repository needs to do is to take in all the new revisions and move it's branch to the newest revision (which git calls "fast-forward"):

         a…   b…   c…   d…   e…
origin   o<---o<---o<---o<---o
                             ^ master

         a…   b…   c…   d…   e…
alice    o<---o<---o<---o<---o
                             ^master
                             ^origin/master

The use case, which I illustrated above, doesn't even need to merge anything. So the issue really isn't with merging algorithms since three-way merge algorithm is pretty much the same between all version control systems. The issue is more about structure than anything.

So how about you show me an example that has a real merge?

Admittedly the above example is a very simple use case, so lets do a much more twisted one albeit a more common one. Remember that origin started out with three revisions? Well, the guy who did them, lets call him Bob, has been working on his own and made a commit on his own repository:

         a…   b…   c…   f…
bob      o<---o<---o<---o
                        ^ master
                   ^ origin/master

                   "can Bob push his changes?" 

         a…   b…   c…   d…   e…
origin   o<---o<---o<---o<---o
                             ^ master

Now Bob can't push his changes directly to the origin repository. How the system detects this is by checking if Bob's revisions directly descents from origin's, which in this case doesn't. Any attempt to push will result into the system saying something akin to "Uh... I'm afraid can't let you do that Bob."

So Bob has to pull-in and then merge the changes (with git's pull; or hg's pull and merge; or bzr's merge). This is a two-step process. First Bob has to fetch the new revisions, which will copy them as they are from the origin repository. We can now see that the graph diverges:

                        v master
         a…   b…   c…   f…
bob      o<---o<---o<---o
                   ^
                   |    d…   e…
                   +----o<---o
                             ^ origin/master

         a…   b…   c…   d…   e…
origin   o<---o<---o<---o<---o
                             ^ master

The second step of the pull process is to merge the diverging tips and make a commit of the result:

                                 v master
         a…   b…   c…   f…       1…
bob      o<---o<---o<---o<-------o
                   ^             |
                   |    d…   e…  |
                   +----o<---o<--+
                             ^ origin/master

Hopefully the merge won't run into conflicts (if you anticipate them you can do the two steps manually in git with fetch and merge). What later needs to be done is to push in those changes again to origin, which will result into a fast-forward merge since the merge commit is a direct descendant of the latest in the origin repository:

                                 v origin/master
                                 v master
         a…   b…   c…   f…       1…
bob      o<---o<---o<---o<-------o
                   ^             |
                   |    d…   e…  |
                   +----o<---o<--+

                                 v master
         a…   b…   c…   f…       1…
origin   o<---o<---o<---o<-------o
                   ^             |
                   |    d…   e…  |
                   +----o<---o<--+

There is another option to merge in git and hg, called rebase, which'll move Bob's changes to after the newest changes. Since I don't want this answer to be any more verbose I'll let you read the git, mercurial or bazaar docs about that instead.

As an exercise for the reader, try drawing out how it'll work out with another user involved. It is similarly done as the example above with Bob. Merging between repositories is easier than what you'd think because all the revisions/commits are uniquely identifiable.

There is also the issue of sending patches between each developer, that was a huge problem in Subversion which is mitigated in git, hg and bzr by uniquely identifiable revisions. Once someone has merged his changes (i.e. made a merge commit) and sends it for everyone else in the team to consume by either pushing to a central repository or sending patches then they don't have to worry about the merge, because it already happened. Martin Fowler calls this way of working promiscuous integration.

Because the structure is different from Subversion, by instead employing a DAG, it enables branching and merging to be done in an easier manner not only for the system but for the user as well.

171
Paul Nathan

I want mercurial to remove several files from the current state of the repository. However, I want the files to exist in prior history.

How do forget and remove differ, and can they do what I want?

Answered By: Ry4an ( 182)

'hg forget' is just shorthand for 'hg remove -Af'. From the 'hg remove' help:

...and -Af can be used to remove files from the next revision without deleting them from the working directory.

Bottom line: 'remove' deletes the file from your working copy on disk (unless you uses -Af) and 'forget' doesn't.

I am currently using TortoiseHg (Mercurial) and accidentally committed an incorrect commit message. How do I go about editing this commit message in the repository?

Answered By: Thilo ( 110)

You can rollback the last commit (but only the last one) and then reapply it.

Important: this permanently removes the latest commit (or pull). So if you've done a hg update that commit is no longer in your working directory then it's gone forever. So make a copy first.

Other than that, you cannot change the repository's history (including commit messages), because everything in there is check-summed. The only thing you could do is prune the history after a given changeset, and then recreate it accordingly.

None of this will work if you have already published your changes (unless you can get hold of all copies), and you also cannot "rewrite history" that include GPG-signed commits (by other people).