Mercurial: Repository Structure

Revlogs

An important part of Mercurial’s design is the notion of a revlog, a file format which is designed to store all versions of a given file in an efficient manner. Mercurial uses the revlog format for basically everything it stores in the repository.

Each revision of a file is identified by a “NodeID”, which is a SHA-1/160 hash of its contents (combined with the position of that node in the history).

Each version of the file can be stored as either a complete snapshot of the file’s contents, or as a binary delta against the previous version. Mercurial stores a complete snapshot every so often to ensure that it is only necessary to walk back so far.

The revlog file is append-only. Each new version of an object is written to the end of the file without altering anything that was already there. This means that it uses forward deltas. Reverse deltas are a lot more typical today, because the most common operation is the retrieval of the most recent version. With reverse deltas the most recent version is always stored as a snapshot. In Mercurial, retrieving the most recent version might involve reconstructing it from an older snapshot with later deltas applied to it.

Reading a given version of the file from a revlog can be accomplished by a single contiguous read. No seeks are necessary. If that version is stored as a snapshot, just read it. If it is stored as a delta, read it and any deltas before it, back to the previous snapshot. This elegant aspect of the design is one of the reasons Mercurial is so fast.

A revlog is actually two files. The .d file contains the actual file data. The .i file is an index designed to make it easier to find things. When the revlog is small, these two files are combined into one, with the data stored in the .i file and no .d file.

As I said, Mercurial gets a lot of its efficiency from the careful design of this revlog file format, but there are some tradeoffs. Mercurial always assumes that the entire file (including the last snapshot and all deltas) will fit into RAM. This makes things much faster, but it makes Mercurial generally not effective for large files (over 10 MB).[48]

lottery harry$ hg debugindex .hg/store/data/src/pb.c.i 
   rev    offset  length   base linkrev nodeid       p1           p2
     0         0     467      0      10 a7bdd2379025 000000000000 000000000000
     1       467     168      0      12 692932a95c0d 000000000000 a7bdd2379025
     2       635     173      0      15 f1d9cb4201e4 692932a95c0d 000000000000
     3       808     476      0      17 d238a6113e4c f1d9cb4201e4 000000000000
     4      1284     491      0      18 b71d299270a5 f1d9cb4201e4 000000000000
     5      1775     470      0      19 4a7ebb32f962 b71d299270a5 d238a6113e4c
     6      2245      64      0      20 6b99ca4dde14 4a7ebb32f962 000000000000
     7      2309     177      0      21 33557d969679 d238a6113e4c 000000000000
     8      2486     213      0      22 e4d67566afd0 6b99ca4dde14 33557d969679
     9      2699     102      0      23 ab4bcfb966f8 33557d969679 000000000000
    10      2801     384      0      24 86d19e47e6d0 e4d67566afd0 000000000000
    11      3185      88      0      25 4969c00e0bc8 86d19e47e6d0 ab4bcfb966f8
lottery harry$ hg debugindex .hg/store/00manifest.i 
   rev    offset  length   base linkrev nodeid       p1           p2
     0         0      52      0       0 4bf51ef87fa1 000000000000 000000000000
     1        52      52      1       1 df9a6175c86f 4bf51ef87fa1 000000000000
     2       104      52      2       2 f282fd300cae 4bf51ef87fa1 000000000000
     3       156      52      3       3 2128ed694101 df9a6175c86f f282fd300cae
     4       208      52      4       4 cf6095e27d1b 2128ed694101 000000000000
     5       260      52      5       5 a3954dc14901 2128ed694101 000000000000
     6       312      52      6       6 84f3337a15c2 cf6095e27d1b a3954dc14901
     7       364      56      7       7 723f96182c10 84f3337a15c2 000000000000
     8       420      52      8       8 f81e41ac9f78 84f3337a15c2 000000000000
     9       472      56      9       9 43b4d425d11b f81e41ac9f78 723f96182c10
    10       528     100      9      10 db730b6b114f 43b4d425d11b 000000000000
    11       628      56     11      11 c0916422f5f9 43b4d425d11b 000000000000
    12       684      98     11      12 a0a068b209a9 c0916422f5f9 db730b6b114f
    13       782   12861     11      13 fa7d4fbf3283 a0a068b209a9 000000000000
    14     13643      91     14      14 847ed0078d54 fa7d4fbf3283 000000000000
    15     13734      62     14      15 26f762825d61 847ed0078d54 000000000000
    16     13796      61     14      16 fa14759e626d 26f762825d61 000000000000
    17     13857      62     14      17 65ed8051c722 fa14759e626d 000000000000
    18     13919     122     18      18 96c0a3cf81b1 fa14759e626d 000000000000
    19     14041      62     18      19 61aa1de12abe 96c0a3cf81b1 65ed8051c722
    20     14103      62     18      20 f68d6078c862 61aa1de12abe 000000000000
    21     14165     119     21      21 47f22792ec34 65ed8051c722 000000000000
    22     14284      62     21      22 1e7caebb4684 f68d6078c862 47f22792ec34
    23     14346      62     21      23 a30745ba5cae 47f22792ec34 000000000000
    24     14408     119     24      24 cbe36265b98c 1e7caebb4684 000000000000
    25     14527      62     24      25 f991d0456dd4 cbe36265b98c a30745ba5cae

Manifests

For every version of the tree, Mercurial stores a manifest, a complete list of all the files in the tree and their versions.

lottery harry$ hg debugdata .hg/store/00manifest.i 24 
.hgtagsc04bfcf9c20c06746293f5474da270d88501a9c1
Makefileb87f10c1ca797b426bc6ac4522aae0de1bf6902a
src/pb.c86d19e47e6d07cfddba6a4a7f6d7013dd782075a

The manifest is also stored in a revlog. The deltification here is critical because storing a full listing for every revision of the tree could become enormously large.

Note that a Mercurial manifest only contains files. Mercurial does not track information about the directories that contain those files. Consequently, it cannot store an empty directory.

Changesets

For each revision of the tree, Mercurial stores a changeset. A changeset is a record which lists all the changes to files, including who made the change, the log message, the date/time, and the name of the branch.

lottery harry$ hg debugdata .hg/store/00changelog.i 24 
cbe36265b98c1f656ad1f0c3546c458a68ee85eb
Harry <harry@futilisoft.com>
1305662021 18000
src/pb.c

fixed spelling error

A Mercurial changeset has zero, one, or two parents. If it is the root node of the DAG, it has zero parents. If it is a merge node, it has two parents. All the rest of the nodes have one parent.

The SHA-1/160 hash of the changeset record becomes the changeset ID.

All changesets are stored in the changelog, which is another revlog file.



[48] There is a Bigfiles extension which works around the problem by keeping the large file somewhere else and storing a reference to it.