Making large depots smaller

Info & Tags

Article #:
72
Created:
04/25/07
Modified:
02/26/08

Task

Decreasing the size of large Perforce depots

Solution

Perform the operations discussed herein with care. Failure to follow correct procedures can damage or destroy your data. As always, make a backup copy of everything before doing anything!

Perforce stores the history of your data (metadata) as well as the data itself (versioned files). Metadata and versioned files are stored separately. By default they are both located in what we call the P4ROOT directory, with the metadata stored in the db.* files and the versioned file trees stored in directories named after your depots. Over time, Perforce depots tend to grow. This happens as you add files and file revisions to your depot and as the history of those files changes.

There are several ways to make your depot smaller or to divide it up into smaller pieces. Among them are:

  • Rebuild your db.* files to reclaim space.
  • Remove large, obsolete or unwanted binary files from the versioned file tree.
  • Divide the depot into two or more pieces.
Each of these operations has its own implications with regard to time, effort, and results. They are discussed below.

Rebuilding your depot metadata to reclaim space

Perforce metadata is stored in the db.* using btrees. To keep the btrees balanced and to reclaim any unused space, rebuild them from time to time. The table that usually benefits the most from this is db.have, which can often shrink by 20%. In addition to taking up less space, newly rebuilt db.* files are faster to access.

To rebuild your db.*files, take a checkpoint, move your db.* files out of the P4ROOT directory, and recover from the checkpoint. Details on the exact procedure can be found in Chapter 2, "Supporting Perforce: Backup and Recovery" of the Perforce System Administrator's Guide.

Removal of large, obsolete or unwanted binary files from the versioned file tree

Binary files can consume a lot of disk space because they are stored on a per-revision basis, unlike the reverse-delta storage used for text files. To minimize disk space consumption, store only required revisions of binary files. While the metadata for such files is quite small, the file revisions themselves can take up a lot of disk space. You can easily replace these with small stub files that indicate their removal.

If the file revision being replaced has been branched or integrated and is being used as a lazy copy of another file revision in the depot, the integrated revision is going to be replaced as well. You can undo the lazy copy with the p4 obliterate -z command. The undoing of lazy copies is going to cause your versioned file tree to use more disk space! Note that the p4 obliterate runs in report mode unless you add the -y flag on the command line prior to -z.

Replacing binary file revisions with a small stub file makes the versioned file tree smaller. Files stored as RCS reverse deltas, such as text files, can not be modified using the procedure below.

To replace a revision of a file that is stored as an individual compressed revision in your versioned file tree, perform the following steps:

  1. Create a text file with descriptive text, for example:
    The original version of this file has been deleted to save space!
  2. Compress the text file with the gzip compression utility
    • If the file is not stored compressed, as with the ubinary storage format, you should skip this step.
    • For binary files of type "apple", simple gzip compression alone will not work. Instead, use Perforce to store the stub as an "apple" type file somewhere in a Perforce depot. Then, use that stub file as it exists in the depot's versioned file tree as the source file in the steps below.
  3. Find the file representing the desired revision within the versioned file tree.
    • File revisions are stored on a per revision basis in a directory that has the full name of the file with ,d appended.
    • If the depot path of the the file revision you are looking for is //depot/path/too/big.exe#12 and the server's P4ROOT directory is /perforce the file for this revision would, by default, be located here:
            /perforce/depot/path/too/big.exe,d/1.12.gz
      If the file is not stored compressed, as with the ubinary storage format, the filename will not have .gz at the end
  4. Replace the desired versioned file in your versioned file tree with the new, small, file that you created in step 1.
  5. Run p4 verify -v on the file to cause the server to replace the MD5 hash value of the old file with that of the new file. For example:
    p4 verify -v //depot/path/to/big.exe#12
  6. Run a test sync of the file to ensure you have performed all the steps properly. For example, sync //depot/path/to/big.exe#12 and verify that it is now the text file you created in step 1.
To automate the process, you can write a script that traverses your versioned file tree and performs this substitution on files meeting specified criteria, such as age, size, number of revisions, etc. With Perforce Server release 2004.2, you can create a commit trigger that performs the substitution. Please contact Perforce Technical Support for additional assistance with this approach.

Splitting your depot into depots on separate Perforce servers

Dividing a depot into more than one depot allows you to separate your data into logical pieces with each piece being smaller and more manageable than the whole. Dividing a depot will remove all metadata relationships between the resultant pieces. Any integration references between the pieces will be gone when you are finished, as if the files never had any relationship whatsoever.

Dividing a depot is easiest if done at the branch level. For example, let us consider a depot with the following two main branches:

    //depot/path_A/...
//depot/path_B/...
and a server root of /perforce.

The steps required to split a depot into two depots at the branch level are as follows:

  1. Take a checkpoint on the source server, where the path_A data is to reside.
  2. Set up a server executable for the path_B server in another location (possibly on a different server machine) where the path_B data is to reside.
  3. Copy the versioned files for path_B onto the new server location (use of your favorite tar/zip utility may make this easier).
  4. Restore the path_B checkpoint in this new location.
  5. Remove the path_B data from the path_A server using a p4 obliterate command.
    (See the note on optimal obliteration at the end of this document before proceeding!)
  6. Remove the path_A data from the path_B server using a p4 obliterate command.
    (Again, see the note on optimal obliteration at the end of this document before proceeding!)
  7. Start the servers.

If you have any questions about the above, please contact Perforce Technical Support.

Notes on optimal obliteration

Obliterating using server releases prior to 2005.1 can take an extremely long time. Detailed instructions on the use of the obliterate command can be found in the Perforce Command Reference.

To optimize the operation and to ensure you obliterate only the desired files, use a client specification with a view that matches only the files you want to obliterate, like this:

   View:
//depot/path_to_remove/... //client_name/path_to_remove/...
and a client-syntax p4 obliterate command as follows:
   p4 obliterate //client_name/...
In order to actually perform the requested action, you must specify the -y flag when you issue the p4 obliterate command. Otherwise, the command will only report on what would be done if the flag were included. For your safety, the flag is omitted in the above example.