Tuesday, September 23, 2014

Watch Your Back, Git

Changeset evolution is a big deal.  But nobody seems to be talking about it.  Well, except for this guy:


But even he says it's a small set of incremental improvements.  This is not small.  But it is all a little abstract right now.  Let's write a use case.

Suppose we have a small group of developers (say, Alice, Bob, Carol, and David).  They want to do some work on a repository, but don't want to share that work with the public (if you're a closed-source shop, substitute "rest of the team" for "public") until it's finished.  If it helps your imagination, you may assume the work is a fix for a security vulnerability that hasn't gone public yet.

The obvious first step is to clone the public ("company-wide") repo and set up a private repo.  Then they can all push and pull to the private repo.  Individual team members may also want to set up their own clones to avoid accidentally pushing to the wrong repo, but that's simple enough.

Because this is a fairly serious security issue, our developers are working fast and pushing often.  The resulting history is a bit of a mess.  Alice wants to do an interactive rebase to tidy things up.  But it turns out Git actually makes this pretty difficult.  To be fair, so does Mercurial, in its stable incarnation.

She can do the rebase locally, but then the only way to get it onto the server is git push --force.  And that's rarely a good idea.  If she does so, Bob, Carol, and David will need to do their own history fixups.  That process can easily eat up a whole afternoon if a developer hasn't pushed in a while, the rebase is extensive, or both.  We did say they were pushing often, but maybe they don't all follow the same workflow.

More importantly, Bob, Carol, and David will need to know they need to do history fixups, or they'll just merge the old history back in and ruin all of Alice's careful work.  Of course, developers should be communicating with each other regularly.  Developers should also be writing unit tests and documentation.  Nothing is ever perfect.

Enter changeset evolution.  Under Mercurial, Alice could perform the equivalent of her rebase with a series of simple commands (like prune, fold, and reorder).  She can then push it normally.  The other three developers will get a somewhat messy history out of this, but they can just do hg evolve to clean it up (semi)automatically (they can still fix it manually if they really want to).  In particular, Mercurial will prevent pushing from a messed up history.

Historically, Mercurial has resisted history rewriting, preferring to mark things as unwanted and quietly forget about them (cf. hg ci --close-branch, which just marks a head as closed).  Then MQ became popular.  MQ is a built-in extension for managing a stack (or "queue," in the LIFO sense) of patches.  It can commit and uncommit them to local history, and thereby provides basic history editing.  Lately, however, the Mercurial developers have expressed dissatisfaction with MQ, in favor of several newer tools.  In particular, we have hg histedit as a built-in extension.  That's roughly the equivalent of an interactive rebase.  We also have hg rebase for non-interactive rebase.  hg strip can be used to drop a changeset and its descendents with no further ado.  And of course, you can always do hg ci --amend to quickly fix the parent of the working directory (i.e. HEAD, in Git parlance, or . in hg revsets).

All this history modification is nice, but it's somewhat risky.  If you modify public history, it's easy to make a fine mess.  So Mercurial tracks whether a revision has been pushed to a public server yet with so-called "phases."  Changesets in the "public" phase are immutable, though you can manually force them back into the "draft" phase if necessary.  But Mercurial also allows some servers to be flagged as non-publishing, which means pushing to them doesn't count.

Right now, working with non-publishing servers is unnecessarily cumbersome.  You see, Mercurial does not discard "unreachable" changesets the way Git does.  So an hg push --force just creates a new remote head.  It's basically a detached HEAD except that it isn't eligible for garbage collection (which I believe doesn't even exist under Mercurial).  On the other hand, Mercurial makes it easy to find these heads with the hg heads command.  To fix this issue, you need to manually strip the old head server-side, and then do so again on everyone else's local copy, possibly with additional history fixups.  Alternatively, you can manually close the head with hg ci --close-branch, but all that really does is hide the head from hg heads; the history still appears in hg log and the revision DAG.

Changeset evolution resolves this issue by returning to the append-only history model.  Pruning a changeset does not delete it; it simply flags it obsolete.  If it lacks non-obsolete descendents, it is hidden from the repository history.  The same obsolescence marker is used for all of the other new history rewrites.  These markers are pushable and pullable, though Mercurial will try to avoid pulling or pushing obsolete changesets unless absolutely necessary or the user manually requests it.  These markers are how hg evolve knows what to do.

When changeset evolution hits stable, Mercurial will have a significant advantage over Git in terms of history rewriting.