Description of the pepper cache format
======================================

The repository cache is actually a directory, named with the repository's
UUID and containing two or more files:

	* index
	This is the index file, mapping revision identifiers to files
	and offsets.  It is a gzipped text file with the following format:

		$VERSION
		$REVISION_1 $FILEINDEX $OFFSET $CRC
		$REVISION_2 $FILEINDEX $OFFSET $CRC
		...

	$VERSION is a 32bit unsigned integer definining the format
	version, which is currently 5.	$REVISION_i is a null-terminated
	string, representing a revision ID. Immediately afer the null
	byte, a 4-byte unsigned integer specifying the index of the
	cache file. Finally, there's a nother 4-byte unsigned integer
	that defines the offset in the cache file.  The $CRC checksum
	is a simple CRC-32 checksum. of the compressed revision data.

	* cache.N
	where N is the file index. The cache file is just a series of
	gzipped revisions which can be identified using the current
	file offset and index and the table given in the index file.
	A revision contains the following data:

		$DATE $AUTHOR $MESSAGE $DIFFSTAT

	$DATE is a 64-bit integer, $AUTHOR and $MESSAGE are
	null-terminated strings. The diffstat data is made of the
	following components:

		$COUNT $FILE_ENTRY_1 $FILE_ENTRY_2 ...

	$COUNT is an unsigned 32-bit integer, representing the number
	of file entries that follow. Each one is given in the following
	format:

		$FILE $BYTES_ADDED $LINES_ADDED $BYTES_REMOVED $BYTES_ADDED
		
	$FILE is the name of the file, and the 4 unsigned 32-bit integers
	following describe the number of bytes or lines added and removed,
	respectively.

Please note that single revisions are gzipped (using zlib's compress()),
but not the whole cache file. This allows for faster seeking when selecting 
individual revisions.

Primitive data is stored using the following conventions:

	* String: UTF-8 encoding, null-terminated
	* Integer: Big endian


Description of the log interval cache for Subversion repositories
=================================================================

Since retrieving the revision log for large Subversion repositories
over the network is time-consuming, the Subversion backend stores
revision intervals for specific paths in the cache directory of the
respective repository. Here's a short description of the files
used and their format:

	* log_$PREFIX
	where $PREFIX is the prefix that was used for retrieving the
	log data. This can represent a module, a branch or simply a
	subdirectory. This file is a gzipped and contains the following
	data. Each variable is an unsigned 64bit integer (except the
	unsigned 32bit version number), stored in big endian.

		$VERSION
		$STARTREV_1 $ENDREV_2 $NUMREVS_1 $REVNUM_11 $REVNUM_12 ...
		$STARTREV_2 $ENDREV_2 $NUMREVS_2 $REVNUM_21 $REVNUM_22 ...

	The current $VERSION is 1.  Every file can store multiple
	intervals, each starting with a header of three integers defining
	the start and end revision used while fetching the log. The last
	integer defines the number of revisions that will follow. Each
	revision is single unsigned 64bit integer, too.
