Video of my presentation from OSCON 2010

For those who are interested, we've posted the video of my presentation at OSCON on YouTube.

I had a few problems when displaying my slide deck at the conference.  When I'm speaking at an event, I usually like to use whatever equipment is provided.  To be assured of compatibility between my MacBook Pro and the projector, I would need to bring like [what seems like] 23 different video adapters.  It's easier to just bring my slide deck on a thumb drive.

The email from the conference organizers told us there would be "Dell laptops" in the room.  I remember thinking how boneheaded it was of them to be running Windows at the Open Source convention, but I complied and brought my slides as a PowerPoint file.

And then I got there and discovered that I was the one being a bonehead for assuming that "Dell laptop" == "Windows + Office".  Actually, those Dell laptops were running Linux with OpenOffice.org.  Anyway, OO.org imported my .pptx file, but it botched the formatting in some rather unexpected and entertaining ways.

Moving Forward

Since OSCON ended three weeks ago, folks on our team have been taking their summer vacations, but we've still made some good progress:

  • After hearing lots of (well deserved) complaints from people trying to build 64-bit Veracity, we expanded our continuous integration build farm to do both 32 and 64 bit builds, debug and release, on all our platforms.

  • We had just missed our goal of dogfooding Veracity's bug-tracking features before OSCON, but after another round of improvements to the Web UI stuff, now we're using Veracity not just for source control, but also for project tracking.

  • We implemented Mercurial-style version numbers.  They're specific to one instance of a repo, but still kind of handy.

  • We started work on letting Veracity run through mainstream web servers (instead of only using its embedded web server).

  • We did lots of bug fixes, including some deep polishing and testing work on patterns for include/exclude settings.

  • I've been working in a private branch, focused mostly on improving performance:

    • Every changeset record has a blob list which is used for making things like push/pull and incremental indexing efficient.  For changesets which are a DAG merge (more than one parent), we need to normalize that blob list to ensure that the exact same list is constructed on each side of the merge.  Our previous normalization code was additive.  It walked the DAG back to the lowest common ancestor and added any blob which wasn't present on both sides.  Gradually, this caused those blob lists to keep getting bigger and bigger, which turned out to be a nasty performance probem that gets worse as the repo grows.  So, I switched the normalization code to remove any blob which was present in the blob list of any ancestor.  This is a lot harder to calculate, but it results in a much tighter list.

    • The changeset record for a database DAG includes a delta.  When that changeset is a merge, the delta is calculated against the lowest common ancestor of the two parents.  However, when it comes time to store that delta for later use by the indexing code, it would be better to calculate an equivalent delta against one of the two parents.

    • In a Veracity database, every record has two fields:  recid and rectype.  However, some our databases just don't need both of these fields.  For example, recid is really only useful if you plan to modify records, but the audit DBs are filled with record that never get modified.  Similarly, if a DB only has one record type, we don't need every single record to have a field reminding us what the name of that type is.  So, I made a bunch of changes to allow a Veracity DB to exclude one or both of these fields.  Eliminating the need to store, retrieve, index and obey these superfluous fields resulted in a nice perf increase.

    • I went through and made dozens of little optimizations in the indexer.  Remember to always use SQLite's prepared statements in loops.  Make sure every blob getting indexed only gets loaded once.  Tune the hash table which represents JSON objects.

    • I found and fixed a few GC rooting bugs in our SpiderMonkey code.  BTW, I can't wait until we can upgrade to new and improved version of the JS engine.  I greatly dislike the fact that SpiderMonkey doesn't have a wider int.

    • Unfortunately, some of my changes break compatibility, so I've been writing a script to migrate all our data.  This week I'll merge with the trunk and we'll do what we call a "repository reboot".

    • This firehose of detail is mostly just the ramblings of yet another blogger who is under the delusion that anybody cares about the mundane elements of his day.  Which reminds me, Thursday morning for breakfast I had iced coffee with an omelet made of red peppers, Portobello mushrooms, and provolone.  Anyway, on the off chance that anything here wants to get discussed, meet me on the Veracity mailing list.

After things settle down just a bit more, we'll be ready to start publishing nightly tarballs.

 

Slides from my presentation at OSCON 2010

Several folks have asked for a copy of the slides from my talk at OSCON last week, so here they are (PDF, 2 MB).  They might be a little hard to follow without the narrative that goes with them.  A videotape of the talk will be posted in a week or so.

Thanks to all who attended my presentation.  The turnout was great, and folks seemed quite enthusiastic about Veracity.

My apologies to the Prophet and SD developers (one of whom attended my talk) for neglecting to mention them.  A silly oversight on my part.

I was especially appreciative of the attendance and expressions of support from several members of the original Subversion development team.  Subversion is one of the most successful version control tools ever, and I watched its early development closely enough to develop an admiration for the folks who built it.  So it was a very pleasant surprise to find a few "celebrities" in attendance at my session.  :-)

 

Veracity Technology Overview

When I encounter a new piece of software, I usually ask, "What's in it?"

Tools and technologies we've been using to build Veracity

C

The core library and the command-line app are written entirely in C.  Some folks won't like our coding conventions.  I'll probably do a whole blog entry sometime to ((apologize for) && (defend)) the liberties we've taken with the C preprocessor.

JavaScript

jQuery

On the browser side of things, Veracity is a web app written in JavaScript using jQuery.

SVG

Burn down charts and other web graphics are done using SVG.

JSON

Veracity uses JSON all over the place.  All serialized structures in the repo are JSON.  Database records and templates are JSON.  We have a bunch of C code for parsing JSON, writing JSON, and dealing with JSON-like data in memory.

SQLite

We use SQLite in several places as a more scalable disk format, and also as an index.

Mongoose

The Veracity command-line app has an embedded web server for personal use.  It's based on Mongoose.

Curl

The client side of push/pull is done by calling libcurl.

UTF-8

ICU

Early on in the Veracity project, we did a lot of work to make sure that stuff was done right with respect to Unicode.  Our preferred encoding is UTF-8.  The ICU library from IBM has been helpful in a number of places.

REST

The Veracity web API is very RESTy.  And of course, everything serialized over the network is in JSON. 

CMake

Our build system is CMake, with which we have a love/hate relationship.  We love it because it generates makefiles, Xcode projects or Visual Studio solutions.  We hate it because its language makes Forth look sane.

CTest

We have a huge suite of automated tests.  CMake's integrated test stuff actually works pretty well.  Just run 'ctest' at the top level directory.

SpiderMonkey

A lot of our tests are written in Javascript.  We have a command-line executable called 'vscript' which is basically the Spidermonkey Javascript engine glued to the main Veracity library.

Continuous Integration

Our CI system system rebuilds from scratch and runs the main tests after every checkin, on Mac, Linux and Windows.  Results are published to an internal web page and sent to the team by email.

Scrum

The more we use Scrum, the more I like it.  We're patient with ourselves.  We just try to get a little better in our Scrum practices each iteration.

gcov

lcov

Nightly builds run the entire test suite with code coverage done by gcov.  Our current coverage level is 81%.

vcdiff

For binary deltas, Veracity uses the algorithm described in RFC 3284.  (Actually, the use of binary deltas is currently turned off by default, so if you notice that repositories seem big, that's why.  All the plumbing is done.  We're just not using it yet.)  Anyway, we've got our own implementation of vcdiff.  We may consider switching to Google's open-vcdiff at some point if its performance is better.

zlib

For simple non-deltified compression in repo implementations, Veracity uses zlib.

Valgrind

When coding in C, valgrind is indispensable.

Shark

I do most of my coding on the Mac, so I use the Shark profiler.  Very cool.

emacs

vim

Eclipse

Visual Studio

bash

gdb

Every developer on our team chooses their own tools.  We have a good representation of most of the major religions.

Firefox

Safari

Chrome

Similarly, every developer chooses their own web browser.  I'm not sure what feelings to have upon noticing that nobody is using Internet Explorer.  It seems so wrong.  And yet, so right.

A few notable things we'll probably be using later

.NET

Even though Veracity was not built fundamentally on the .NET platform, we are committed to providing excellent support for Windows developers.  Visual Studio integration is a high priority.

Java

Similarly, we didn't use Java to build the core libraries for Veracity, but we plan to deliver excellent integration into the Java world, including an Eclipse plugin.

IIS

Apache

The embedded web server is fine for personal use on the desktop, but large teams will want to run a real web server for their central repository.  We designed for this case early, but have not yet implemented something like an IIS plugin.

A few notable things we are NOT using (and maybe never will)

C++

Here's another blog article I need to write.  Basically, we only considered two choices:  C, and the C-like subset of C++.  We chose plain C.  I just wish the Microsoft C compiler supported C99.

Flash

Silverlight

GWT

We seriously considered other ways of building our web apps.  We ended up choosing basic HTML/CSS/jQuery/Ajax.  No regrets, but I sometimes wonder how things are going for people using GWT.

NSPR

APR

Sorry folks, in an apparent fit of NIH syndrome, we wrote our own portability layer. And I am completely unrepentant.

XML

JSON won.  What can I say?  I just like curly braces a lot more than angle brackets.

 

 

Veracity screenshot: Burndown Chart

Yesterday I tried to describe Veracity in a thousand words.  Today, let's try a picture.

Veracity's distributed work item tracking feature is one of several things which is built on that "decentralized database" I mentioned.  This screenshot is Veracity displaying a burndown chart for a Scrum iteration.

The thing on the left is an activity stream.  It's a Twitter-like feature with other notifications mixed in, such as code checkins and comments on work items.

 

Veracity: The next step in DVCS

One week from today, at the O'Reilly Open Source Convention, SourceGear will be making a big announcement.  Today I'm giving you an early preview.  We've been building something new.  :-)

  • It's called Veracity.
  • It's a Distributed Version Control System (DVCS), somewhat like Mercurial or Git.
  • It has some cool new capabilities no other DVCS has.
  • It will be open source, released one week from today under the Apache License, Version 2.0.

This project has been consuming the bulk of my time, and I am glad to finally be able to write and speak about it.  I'll have a lot to say going forward, but for today I just want to answer some questions we anticipate folks will be asking.

Why build yet another DVCS?

At OSCON next week we will be referring to Veracity as "the next step in DVCS".  This description may sound a bit audacious, but it describes exactly what we have built Veracity to be.

Git, Mercurial and Bazaar are all great, but we don't think they are the last word. This model of distributed development is the future of our industry. Things are just getting started.  We're building Veracity to push forward.

So let me try to explain how our vision is different from what is available from the popular DVCS tools today.

Please understand that my intent here is not to criticize existing tools or start a war with their fans (especially because Veracity needs to simmer a bit longer before it's ready). I simply know that the easiest way to explain something new is to compare it to something well-known.

Decentralized Database

Veracity goes beyond versioning of directories and files to provide management of records and fields, with full support for pushing, pulling and merging database changesets, just like source tree changesets.

Veracity's decentralized, template-driven database is used for all kinds of administrative data, including user accounts, tags, commit messages, and history. This database is also the platform on which we are building features like work item tracking.

User accounts

Existing DVCS tools have no real concept of user accounts. Enterprise customers need robust administration features like auditing and permissions. Veracity supports these features with a user system built on its decentralized database engine.

Pluggable storage layers

Veracity wraps all the actual storage of a repository in an API. This allows different implementations to offer different tradeoffs. For example, an organization may want to use an enterprise SQL database to store repository data on a central server, while developer desktop machines may use a simpler filesystem-oriented storage engine.  You can push and pull changesets across different storage implementations seamlessly.

Hash functions

Just as with Mercurial and Git, Veracity identifies all repository objects using a cryptographic hash of the contents. Veracity supports SHA1 like current tools, but is ready for the future with full support for SHA2 and Skein, at 256 or even 512 bits.

Veracity's default hash is SHA1.  Our dogfooding repo is SHA2/256.

Robust tracking for rename and directories

Like Bazaar, Veracity assigns every repository object an ID which remains constant when the object is renamed or moved to a different path. This handles the situation where a developer changes both the contents of a file and its path in the same transaction, and is a critical feature for robust merge operations.

Veracity also tracks directories as first-class repository objects, just like files.

Cross-Platform C

From the beginning, we wanted to make it easy to integrate Veracity into all kinds of other systems on a wide variety of platforms. So we wrote everything in C, with Windows, Mac OS and Linux all on equal footing. We love Python too, but C is a lowest common denominator that can be ported or integrated everywhere we need to go.

Apache License Version 2.0

Current DVCS tools do not yet have much penetration with enterprise customers.  This is largely due to lack of features and company infrastructure.  But even if Git or Mercurial were enterprise-ready in every other way, many companies will hesitate because of the GPL.

We chose the Apache License Version 2.0 (instead of the GPL) because we wanted there to be no obstacles for Veracity to be adopted in commercial and enterprise scenarios.

Open Source? How are you guys gonna make money?

The core of Veracity will be open source, but we do plan to sell add-on products built on the core.

Does this news mean you are abandoning Vault?

Heck no. Vault is like, 100% of our revenue. And there are still thousands of teams on SourceSafe that need to be rescued from their plight.  :-)

We looked hard at the notion of morphing Vault into a DVCS and decided it just isn't feasible.  If we had forced the square peg into the round hole, the result would either have fallen short of being a true DVCS or it would have been an incredibly painful upgrade for Vault customers.

Vault will continue to be supported and improved for centuries.

Is Veracity ready for people to actually use?

Not yet.

We are dogfooding Veracity here at SourceGear, but if anybody else tries to use it, they'll be frustrated.  File formats, command syntax and APIs are all still in flux.  We have a lot of stuff to finish up before we give it a 1.0 version number.

In the meantime, if you need a DVCS that is ready to use now, Mercurial, Git and Bazaar offer you three great choices.

How can I give feedback?

My blog currently does not have a comments feature, but I would still welcome feedback from anyone who has something to say.  If you want to say something privately, feel free to email me directly (eric@sourcegear.com).  Or you can use Twitter (eric_sink).

We'll be hosting a project mailing list which will be opened next week when the source is released.  And we'll have a "modern" website for the Veracity community a bit later.

Coming Soon...

Remember, this is open source stuff, so it's not real until the source is actually available.  That'll happen a week from today with the "official" announcement.  For now, I just wanted to let you know what's coming.

 

Going to OSCON

Hey folks, I just wanted to let my readers know that I'll be at the O'Reilly Open Source Convention (OSCON) a couple weeks from now in Portland, Oregon.  SourceGear will have an exhibitor booth with, as usual, the very coolest free T-shirts.

Stop by and say hello!  :-)

 

Two Weeks with an iPad

I mostly bought an iPad just to see what the hype was all about.  I never expected to actually use it, for two reasons:

  • The iPad is a tweener, like a 6'5"" guy who wants to play power forward in the NBA.  It's just the wrong size.  There isn't enough space between my phone and my laptop for this thing.
  • The iPad is just another character in the story Apple started telling back in 1985, "the computer for the rest of THEM".  I think it's kind of cute that people who don't know how to deal with non-blocking socket I/O can still use the Web.

So, even though this thing was not designed for me, after two weeks with my iPad, I am surprised to admit that I really like it.

Surprised and a little embarrassed.  This challenges my geekhood.  I mean really, what's wrong with me?  The next thing you know I'll be giving up my RPN calculator.  Or using Finder instead of bash.  How far is this spiral toward normalcy going to go?

But I can't help it.  The iPad is just really pleasant to use:

  • The email UI is nearly perfect.
  • Using the web is far nicer than my Nexus One or my iPod Touch.  The larger screen makes all the difference.
  • It's really fast.  Everything on my netbook feels like it moves at 1997 speeds.
  • Instant-on is a critical feature.  I hate waiting for my netbook to boot or unhibernate.
  • Physically the iPad is a lot more convenient to grab than my laptop or even my netbook.  It never feels unwieldy.
  • The battery life is surprisingly good.

Not that the iPad is perfect:

  • I really need an openvpn client.
  • Getting the mail client to talk with my IMAP server was harder than it should have been.
  • I still haven't figured out how to make the calendar app sync with Google Apps for my domain.
  • I want Flash.
  • Nobody has the Apple iPad Case in stock, and I really want one.  This isn't really a criticism of the device itself, but it's really exasperating and I want to vent.

Bottom line, for me the iPad is one of those anomalies of theory and practice.  In theory, I should hate this thing.  In practice, it competes with four other excellent devices for my attention, and it wins a lot.

 

Twitter! Curse your sudden but inevitable betrayal!

Last night I saw this article, which basically says the one day soon computers won't present the user with the ability to manipulate "files" anymore.

Suddenly, I had stuff to say.  But I've been using Twitter, which limits me to 140 characters.  So, I Tweeted this:

We're heading toward two classes of computers: one for people like me, and one for people like my Mom.

(And BTW, Mom, usually when I mention you on the Internet, I'm not really talking about you.  You're a metaphor for "normal people", those who use computers to get things done, as opposed to geeks like me, who use computers just because they are shiny.)

Anyway, I thought the tweet would be enough.  It wasn't.

In the beginning, we were the only ones here.  Normal people didn't use computers at all.  Only the geeks used computers, and we certainly weren't using them to get anything done.

I remember my Mom saying that she would never use a computer.  (Mom, this one is actually you.)  And I certainly can't blame her for thinking that at the time.  She had no reason to see computers as a way of getting things done.  All she knew is that I would periodically run into the family room to announce to my parents that I had just shaved three more instructions out of the main loop so now my graphics move faster.  And Mom just wanted me to at least stack my Byte magazines in the corner so she could get the vacuum cleaner through.

Fast forward to today.  Computers, by and large, are still designed for geeks.  This is why we all buy T-shirts that say "No, I will not fix your computer".  The genius of the iPad is that it cannot get things like viruses.  It is a closed platform.  You can't put apps on it.  You can't write and distribute software for it without Apple's permission.  This is why geeks hate it and normal people will love it.

Your Mom wants a computer she doesn't have to ask you to fix.  She is willing to trade power and flexibility to get simplicity.  The iPad is another major step.

I find this interesting because it raises all kinds of questions:

  • The industry is finally ready to sell things that make geeks feel frustrated instead of things that make normal people feel helpless.  What does this mean for geeks and our role in society?

  • How is the terminology going to shake out?  Surely we need two different names?  Things that geeks use should probably still be called "computers".  What should we call the class of devices that help normal people manage their Amazon wish list?

  • We geeks will become the minority market niche.  How will this affect the pricing of things?  Will there be sufficient economies of scale to sell computers to geeks at margins that are tolerable to both buyer and seller?

  • What kinds of computers/devices will get caught in the middle and suddenly have no place in the world?

  • What classes of users are going to be special cases?  I'm talking about folks that are not geeks but that for some other reason cannot accept the power/simplicity tradeoff of devices designed for normal people.

This is a major wave of change.  I don't know the answers to these questions.  The only thing that seems clear to me is that Microsoft will miss this wave just like they missed the last one.

 

Better late than never?

OK, so I finally created a Twitter account.

I might have done this earlier if there weren't some nitwit squatting the "ericsink" user name.  I considered registering as the National Waffle Institute and posting "French Toast Sucks" as my first tweet, but in the end I settled for "eric_sink".

I've been on Facebook for quite a while.  I apologize for turning down all the friend requests from blog readers, but I mostly just use Facebook for family stuff.  Unfortunately that means Facebook is a lousy place for me to make snarky comments about the technology world.  Most of my friends there either don't get it or don't care.

But Twitter should fill this hole in my life nicely.  Now, when it occurs to me that my German Shepherd is smarter, bigger and better-looking than Spolsky's husky will ever be, I can just let the world know immediately, and everyone will be better off.

At first I was worried about the length limit, but I've been practicing, and it is surprising how often 140 characters are enough.  For example, this one leaves plenty of room to spare:

Everything Borland ever created is now owned by someone who will destroy it.

But some of my practice tweets didn't go so well.  This one is way over the limit, but I could probably make the point without being so wordy:

Imagine what the software industry would be like if Bjarne Stroustrup had chosen a career with less potential for harm to the world, such as the intentional destruction of all tropical rainforests.

For me Twitter looks like a solution at the intersection of two problems.  With verbal remarks, it's easy to speak before thinking, but it just doesn't scale.  With blogging, I can reach lots of people, but I always end up thinking carefully before I post.  Twitter allows me to spew hasty, poorly-thought-out observations to a potentially worldwide audience.  I'm obviously a newbie, but that seems like a great feature.

 

Obstacles to an enterprise DVCS

On 26 January 2010 I gave a presentation to Software GR.  The talk was an overview of several trends that we have seen in the version control tools market over the last 40 years.  I often like to follow a talk like this by publishing the same content here on my blog in the form of a complete article.  This time I think I'll just eliminate a lot of the tldr problem and summarize the highlights:

  • The two big trends in version control today are Integration and Decentralization.
  • Integration is driven by ALM.  It is the desire to have all tools used by a development team fully integrated together.
  • Decentralization is driven by the recent wave of DVCS tools like Git and Mercurial.  They offer compelling benefits such as performance, a different kind of scalability, and more flexible workflows.
  • The ALM trend is happening in the enterprise market.  Enterprises want everything integrated with everything else, and they want everything to support their ability to enforce process.
  • The DVCS trend is happening in the open source community.  Born of the legendary cat fight between BitKeeper and the Linux kernel developers, Git and Mercurial are maturing and gathering momentum at a remarkable rate.
  • These two trends are going to clash in a big way.  SourceGear's graphic designer drew me a nice diagram to depict this.
  • The two trends cannot stay separate.  Each one has advantages which are too important for the other community to ignore.
  • But the two trends and their respective communities are a bit like oil and water.
  • Enterprises want tools that constrain.  The open source community wants tools that empower.
  • The benefits of a DVCS would be diluted by integrating it with a bunch of other tools that are highly centralized.
  • Enterprises need a least a little centralization for things like user administration.  In their eyes, complete decentralization without accountability and auditing features is a bug.
  • Even as enterprise attitudes about open source are changing, that change is happening slowly, and the GPL (used by both Git and Mercurial) is still considered the scariest license.
  • So Git and Mercurial are not even close to being enterprise-ready.  Similarly, none of the leading enterprise ALM tools are even close to being a DVCS.
  • I believe that the main enterprise ALM providers (IBM/Rational, Microsoft, Serena and Borland) will all attempt to add DVCS features to their products.  At least two of these companies (IBM/Rational, in a talk by Jean-Michel Lemieux at the Rational Conference in 2009, and Microsoft) have already made public remarks about a desire to move in that direction.
  • And I predict that they will all fail.  It is impossible to turn any of these systems into a true DVCS without a nearly complete rewrite.  The D in DVCS is not a feature which can be added.
  • But all of them will do it anyway, by making compromises.  They will try to add "just enough" Decentralization.  Some of their customers will find the results to be sufficient.
  • Meanwhile, the true DVCS tools will continue to move forward, but their progress toward credible ALM will be slow.  Enterprise-level integration is grunge work, not the kind of coding that hackers do as a labor of love.  Nobody does this stuff without getting paid.
  • So these two trends will continue to be distinct for a while, but the pressure and tension between them will remain, and the areas of overlap are going to continue getting messier.