|
Monday, February 23, 2009
On Git's lack of respect for immutability and the Best Practices for a DVCS
I learned something very important from the feedback after
my entry last week on Git's
index. Here's what I learned:
Suppose I wrote a 300 page book
describing all the great things about Git and why it is so awesome.
Further suppose that on page 295
near the bottom, I include a one-sentence mention of a way that I think Git
might change for the better.
Further suppose that I wrote that
sentence in Klingon. And then I encrypted it with Schneier's latest cipher,
wrapped it in base64 encoding, ran it through rot13 and then pasted it into the
book.
If I did this, the primary response
from the Git user community would be: "Eric's new book says that Git sucks.
He doesn't get it."
Trust me folks -- I get it. Commits to a DVCS are
different. When you commit to a private instance of the repository, you don't
"break the build". The rules and guidelines for a DVCS are different than the
ones for a centralized system.
Best Practices
But some of the best practices are the same. Here's my
off-the-cuff sloppy definition of a "best practice":
A best practice is a guideline that
can be followed lots of times by lots of different people in lots of different
situations with minimal likelihood of causing pain to the team.
Actually, I want to give TWO definitions. Here's another
one, speaking as a source control vendor:
A best practice is a guideline that
I can give to our customers to minimize the likelihood that they will need to
call our tech support staff.
A technique can be "really cool" or "very powerful" and
still not qualify for any reasonable person's definition of "best practice".
I stand by my original claims. I think "git add --p" is
"really cool", but it doesn't qualify as a "best practice". It allows the developer
to commit code they have never seen. Yes, that commit happens in a private
instance of the repo, but that code is eligible to be pushed into another
instance.
Is there a good outcome here?
Suppose I use "git add --p" to commit some code that doesn't
even compile. What can happen?
- Maybe this changeset never escapes my private repository
instance. In that case, it has caused no harm. But it has also caused no
benefit.
- Maybe my next checkin fixes the build. So now the
offending changeset is less likely to cause problems, because the fix will
get pushed as well. But this scenario is equivalent to the centralized
case where I break the build but fix it before anybody finds out. It's
not very harmful, but it's not very helpful either.
- Maybe I later use Git's history rewriting features to
eliminate the offending changeset, replacing a chain of small changesets
with one larger one that has been well-tested. In this scenario, I have
eliminated all the potentially harmful effects, since the DAG will not
have any nodes that are "broken". But now I have other concerns.
Immutability
The issue of rewriting history is perhaps my biggest
philosophical objection to the way Git works. Call me old fashioned if you
like, but I believe changesets and the history of the repository should be
immutable. Version control features that alter history make me squirm.
My own product
supports an "Obliterate" feature and I hate it. I understand why it's there,
but I still wish it wasn't. One thing I've learned from twelve years of
supporting version control products is that customers will find a way to misuse
things.
The purpose of Obliterate is to help with that once-a-year
situation where you really screwed up and checked in something that should
never have been in the repository and absolutely must be removed. But every
now and then we get a tech support call from somebody who is using Obliterate every
day. Those are the days when I want to ship the product with that feature
locked and only enable it for customers where every developer has passed a
written exam.
Think about it. Even if you love Git's ability to rewrite
history, does this sound to you like a "best practice"? Or does it
sound like a quick way to get a bunch of geeks addicted to recreational
pharmaceuticals?
Sandboxes
Like I said, I get it. A DVCS gives me a private sandbox,
so I can have more freedom while I play. It's "really cool" that I can kick
and throw sand without hurting the other kids. But that doesn't mean it's a
"best practice".
Conceptually, my private instance of the repository is still
part of a larger whole. The entire repository may not exist on any one
machine, but it exists in concept. It is one big Directed Acyclic Graph. When
I use "git add --p" and checkin something that doesn't compile, my offending
commit is conceptually still a member of that DAG.
The best practices for a DVCS are built around this
principle: The extra freedom provided by a private sandbox should be held in
the proper tension with a measure of respect for the entire DAG.
|