Making Large Depots Smaller

Warning: Perform the operations discussed herein with care. Failure to follow correct procedures can damage or destroy your data. As always, make a backup copy of your data before proceeding!

TASK

Decreasing the size of large Perforce depots


SOLUTION

Perforce stores the history of your data (metadata) as well as the data itself (versioned files). Metadata and versioned files are stored separately. By default they are both located in the P4ROOT directory, with the metadata stored in the db.* files and the versioned file trees stored in directories named after your depots. Over time, Perforce depots tend to grow. This happens as you add files and file revisions to your depot and as the history of those files changes.

There are several ways to make your depot smaller or to divide it up into smaller pieces. Among them are:

  • Rebuild your db.* files to reclaim space.
  • Remove large, obsolete, or unwanted binary files from the versioned file tree.
  • Divide the depot into two or more pieces.
Each of these operations has its own implications with regard to time, effort, and results. They are discussed below.

Rebuilding your depot metadata to reclaim space

Perforce metadata is stored in the db.* files using btrees. To keep the btrees balanced and to reclaim any unused space, rebuild them from time to time. The table that usually benefits the most from this is db.have, which can often shrink by 20%. In addition to taking up less space, newly rebuilt db.* files are faster to access.

To rebuild your db.*files: take a checkpoint, move your db.* files out of the P4ROOT directory, and recover from the checkpoint. Details on the exact procedure can be found in Chapter 2, "Supporting Perforce: Backup and Recovery" of the Perforce System Administrator's Guide.

Removal of large, obsolete or unwanted binary files from the versioned file tree

Binary files can consume a lot of disk space, because they are stored on a per-revision basis, unlike the reverse-delta storage used for text files. To minimize disk space consumption, store only required revisions of binary files. While the metadata for such files is quite small, the file revisions themselves can take up a lot of disk space. You can easily replace these with small stub files that indicate their removal.

If the file revision being replaced has been branched or integrated and is being used as a lazy copy of another file revision in the depot, the integrated revision is replaced as well. You can undo the lazy copy with the p4 obliterate -z. p4 obliterate runs in report mode unless you add the -y flag on the command line prior to -z.

Replacing binary file revisions with a small stub file makes the versioned file tree smaller. Files stored as RCS reverse deltas, such as text files, cannot be modified using the procedure below.

To replace a revision of a file that is stored as an individual compressed revision in your versioned file tree, perform the following steps:

  1. Create a text file with descriptive text, for example:
    The original version of this file has been deleted to save space!
  2. Compress the text file with the gzip compression utility
    • If the file is not stored compressed, as with the ubinary storage format, skip this step.
    • For binary files of type "apple", simple gzip compression alone does not work. Instead, use Perforce to store the stub as an "apple" type file somewhere in a Perforce depot. Then, use that stub file as it exists in the depot's versioned file tree as the source file in the steps below.
  3. Find the file representing the desired revision within the versioned file tree.
    • File revisions are stored on a per revision basis in a directory that has the full name of the file with ,d appended.
    • For Perforce Servers pre-2006.2, if the depot path of the the file revision you are looking for is //depot/path/too/big.exe#12 and the server's P4ROOT directory is /perforce the file for this revision would, by default, be located in the following directory:
      /perforce/depot/path/too/big.exe,d/1.12.gz
    • For Perforce Servers 2006.2 and greater, files are stored in the database archive keyed to the filename and pending change number, rather than the filename and revision number. The pending change number can be found using p4 fstat -Oc:
    • # p4 fstat -Oc //depot/path/too/big.exe#12
      ... depotFile //depot/path/too/big.exe
      ... clientFile /Users/perforce/path/too/big.exe
      ... isMapped 
      ... headAction edit
      ... headType binary
      ... headTime 1239673838
      ... headRev 12
      ... headChange 713
      ... headModTime 1210827015
      ... haveRev 12
      ... lbrFile //depot/path/too/big.exe
      ... lbrRev 1.713
      ... lbrType binary
      The lbrRev field shows the details of the pending change in which the archive file is stored. If this server's P4ROOT is /perforce, then this file //depot/path/too/big.exe#12 would be located in the following directory;
      /perforce/depot/path/too/big.exe,d/1.713.gz
    • If the file is not stored compressed, as with the ubinary storage format, the filename does not have .gz at the end.

  4. Replace the desired versioned file in your versioned file tree with the new, smaller file that you created in step 1.
  5. Run p4 verify -v on the file to cause the server to replace the MD5 hash value of the old file with that of the new file. For example:
    p4 verify -v //depot/path/to/big.exe#12
  6. Run a test sync of the file to ensure you performed all the steps properly. For example, sync //depot/path/to/big.exe#12 and verify that it is now the text file you created in step 1.

To automate the process, you can write a script that traverses your versioned file tree and performs this substitution on files meeting specified criteria, such as age, size, number of revisions, and so on. As of Perforce Server release 2004.2 and later, you can create a commit trigger that performs the substitution. Please contact Perforce Technical Support for additional assistance with this approach.

Splitting your depot into depots on separate Perforce servers

Dividing a depot into more than one depot allows you to separate your data into logical pieces with each piece being smaller and more manageable than the whole. Dividing a depot removes all metadata relationships between the resultant pieces. Any integration references between the pieces are gone when you are finished, as if the files never had any relationship whatsoever.

Dividing a depot is easiest if done at the branch level. For example, consider a depot with the following two main branches:

//depot/path_A/...
//depot/path_B/...
and a server root of /perforce.

The steps required to split a depot into two depots at the branch level are as follows:

  1. Take a checkpoint on the source server, where the path_A data is to reside.
  2. Set up a server executable for the path_B server in another location (possibly on a different server machine) where the path_B data is to reside.
  3. Copy the versioned files for path_B onto the new server location (use of your favorite tar/zip utility may make this easier).
  4. Restore the path_B checkpoint in this new location.
  5. Remove the path_B data from the path_A server using a p4 obliterate command.
    (See the note on optimal obliteration at the end of this document before proceeding!)
  6. Remove the path_A data from the path_B server using a p4 obliterate command.
    (Again, see the note on optimal obliteration at the end of this document before proceeding!)
  7. Start the servers.

If you have any questions about the above, please contact Perforce Technical Support at support@perforce.com

Notes on optimal obliteration

Obliterating using server releases prior to 2005.1 can take an extremely long time. Detailed instructions on the use of the obliterate command can be found in the Perforce Command Reference.

To optimize the operation and to ensure you obliterate only the desired files, use a client specification with a view that matches only the files you want to obliterate, like the following:

View:
     //depot/path_to_remove/...   //client_name/path_to_remove/...
and a client-syntax p4 obliterate command as follows:
p4 obliterate //client_name/...
In order to actually perform the requested action, you must specify the -y flag when you issue the p4 obliterate command. Otherwise, the command only reports on what would be done if the flag were included. For your safety, the flag is omitted in the above example.