Why is Git so Fast?
In the DVCS world, Git has a reputation for being really
fast. I am curious about how Git got this way.
When I started thinking about this question, seven different
answers came to my mind. Some of those answers seem more interesting or
correct than others.
One: Maybe Git is fast simply because it's a DVCS.
There's probably some truth here. One of the main benefits
touted by the DVCS fanatics is the extra performance you get when everything is
But this answer isn't enough. Maybe it explains why Git is
faster than Subversion, but it doesn't explain why Git is so often described as
being faster than the other DVCSs.
Two: Maybe Git is fast because Linus Torvalds is so smart.
This might very well be correct. But it's not interesting.
Fine. So Linus is smarter than all of us. But how did he
use those smarts to make Git so fast? What are the details?
Three: Maybe Git is fast because it's written in C instead of one of those
newfangled higher-level languages.
Nah, probably not. Lots of people have written fast
software in C#, Java or Python.
And lots of people have written really slow software in
traditional native languages like C/C++. Adobe writes most of their stuff in
C++, and they don't have any trouble making sure that release N+1 is slower
than release N.
Four: Maybe Git is fast because being fast is the primary goal for Git.
This is another one of those high-level answers that is probably
correct but doesn't have the kind of details about which I am curious.
Still. Take some time to read through the archives of the Git developers
mailing list. These people spend a LOT of time talking about performance
Five: Maybe Git is fast because it does less.
One of my favorite recent blog entries is this
piece which claims that the way to make code faster is to have it do less.
Predictably, people came out of the woodwork to say how
wrong this guy was. That's what happens to almost any blog entry about
performance tuning or optimization. Readers ignore anything correct in the
article and quibble about little stuff.
But this guy was essentially correct. One way to make
software faster is to make it do less.
For example, the way you get something in the Git index is
you use the "git add" command. Git doesn't scan your working copy for changed
files unless you explicitly tell it to. This can be a pretty big performance
win for huge trees. Even when you use the "remember the timestamp" trick,
detecting modified files in a really big tree can take a noticeable amount of
Or maybe Git's shortcut for handling renames is faster than
doing them more
correctly like Bazaar does.
Six: Maybe Git is fast because it doesn't use much external code.
Very often, when you are facing a decision to use somebody
else's code or write it yourself, there is a performance tradeoff. Not always,
but often. Maybe the third party code is just slower than the code you could
write yourself if you had time to do it. Or maybe there is an impedance
mismatch between the API of the external library and your own architecture.
This can happen even when the library is very high quality.
For example, consider libcurl. This
is a great library. Tons of people use it. But it does have one
problem that will cause performance problems for some users: When using
libcurl to fetch an object, it wants to own the buffer. In some situations,
this can end up forcing you to use extra memcpys or temporary files. The
reason all the low level calls like send() and recv() allow the caller to own
the loop and the buffer is because this is the best way to avoid the need to
make extra copies of the data on disk or in memory.
People make fun of those with NIH Syndrome, but my
observation is that folks who suffer from this disorder tend to create faster
software, even if they also tend to ship everything late. :-)
Maybe Git is fast because every time they faced one of these
"buy vs. build" choices, they decided to just write it themselves.
Seven: Maybe Git isn't really that fast.
If there is one thing I've learned about version control
it's that everybody's situation is different. It is quite likely that Git is a
lot faster for some scenarios than it is for others.
How does Git handle really large trees? Git was designed
primary to support the efforts of the Linux kernel developers. A lot of people
think the Linux kernel is a large tree, but it's really not. Many enterprise
configuration management repositories are FAR bigger than the Linux kernel.
This week's version control blog entry raises more questions
than answers. I'm not a Git user, nor have I looked much at its code, so I
don't really know why it's so fast. I'm just curious. If you have better
answers than mine (and I admit that's a low hurdle), feel free to send them to
me or post them in my comments.
But FWIW, I have decided it is time for me to become a Git
user. When I was writing about Git a few weeks ago, a lot of Git users kept
telling me I just don't get it. I've spent more time thinking about version
control implementation and design than most folks, so I tend to think I
actually do "get it". But my curiosity is piqued, and I hate to pass up an
opportunity to learn something, so I'm going to give it a try. I've got a
small project here at SourceGear that I work on part-time with a couple other
people. We've decided to switch to Git and see how it goes. I'll let you know
what I find out.