Most current personal computers store all programs and data ("information") on a storage device called a "hard disk drive" ("hard disk" or "hard drive" or simply "drive"), usually a single device installed semi-permanently inside the computer. Information can be recorded on (written to) hard disk; retained even when power is turned off; and retrieved (read) from hard disk as required. The typical capacity of hard disk drives used on personal computers has steadily increased with advances in technology, and is now in the range of 1 to 20 gigabytes (billions of characters or "GB"). In brief terms, hard disks are used in the following manner:
Control software on the computer (the "operating system") keeps track of what is being stored on the hard disk by means of a catalog ("directory") recorded on the hard disk. Catalog entries give the location and other characteristics (e.g., date and time of recording) of pieces of information ("files"). The catalog also keeps track of unused space on the hard disk ("free space"). Different operating systems use different hard disk structures and catalogs that can be incompatible without special translation software.
A given location on hard disk can be overwritten as desired any number of times. When it is overwritten, the prior contents are lost beyond normal recovery. (Special analysis procedures can sometimes recover data that has been overwritten, but such procedures are difficult, expensive, and uncertain.)
Documents such as memos, letters, and reports are typically stored as individual files, although they may sometimes be combined in "archives" for more efficient storage. Sometimes those archives are reduced in size ("compressed") and/or encoded (encrypted for security).
When a given file is modified ("updated") it might be written back to the same location (overwriting the prior contents), or it might be written to (moved to) a new location. When it is written to a new location, the prior contents might or might not be retained. When the prior contents are retained, they may be referred to as a "backup" or prior "generation" copy.
When information located in a given location is changed, the catalog is updated. When information in a given location is deleted, the catalog is likewise updated to reflect that fact.
Locations no longer in use (e.g., due to file movement or deletion) are usually not erased (overwritten). Instead, these unused locations are merely returned to the "pool" of free space. Eventually they may be overwritten in whole or in part when the operating system needs to use them for other files, but until that happens the prior contents can still be retrieved using special techniques. Certain types of routine maintenance can also overwrite free space.
Faithfully reconstructing prior contents in free space is often difficult and uncertain, particularly when part(s) of a given file has been overwritten.
As a computer is typically used, files are constantly being recorded, updated, and deleted. As a result, locations returned to the "pool" of free space may be overwritten relatively quickly, in days or even hours.
The only way to faithfully preserve a hard disk is to completely remove the hard disk (and hence the computer) from service as quickly as possible.
The alternative to preserving a hard disk by completely removing it from service is to copy the contents of the hard disk to some other storage that can be preserved instead ("archival storage"). Typical archival storage devices include other hard disks (using either fixed or removable cartridges), magnetic tapes, and optical disks (e.g., CD discs). Such copying for preservation is generally performed in one of two ways:
"Backup" is the practice of copying all files from hard disk to archival storage. Free space is not copied or otherwise preserved; hence, common backup would rule out later analysis of free space. For this reason, although backup is the most efficient and common means of preserving information, it is often considered unsuitable in a forensic context.
"Image copy" is the practice of faithfully preserving an exact image of the hard disk, not only locations in use, but also free space. This makes possible exact (or near exact) reconstruction of the original hard disk. The drawback to image copy as compared to backup is that it is usually more difficult and expensive than backup. Image copy is sometimes used for ordinary business purposes and is usually the preferred method for forensic work. The exacting standards of forensic work can greatly exacerbate the difficulty and cost issues of "image copy."
A hard disk drive can either be image copied in place, or temporarily removed and attached to another computer for imaging. Removing a hard disk for image copying can substantially increase risk of damage, cost, and inconvenience, so image copying in place is normally preferred. In general terms there are two critical requirements for making a faithful image copy of a hard disk in place:
A special software "program" is needed to perform the image copy. It must be loaded into the computer to be preserved with as little modification to the existing contents of the hard disk as possible, because loading software onto the hard disk would overwrite free space that might have had important contents. Also, it is difficult to image copy while a computer is running from its own hard disk because files may be in active use. Hence it is desirable to load the special software from a removable storage device like a diskette or CD. This can severely limit the available software, since as a practical matter many programs cannot be run that way.
A suitable storage device must be attached to the computer. Given the large sizes of typical hard disks, a fast device is important to both practicality and cost. Unfortunately, the easiest and most reliable methods of storage tend to be relatively slow. For example, image copy of a relatively common hard disk with a capacity of 9 gigabytes (billions of characters) over a slow parallel connection could take a minimum of about 50 hours (more than two 24-hour days, or more than a full standard work week). Faster storage could cut the time down to as little as 1 hour, but from a technical standpoint it can be more difficult to get faster storage working properly.
A 9-gigabyte (9,000,000,000, i.e., 9 billion, bytes, also known as "9 GB") hard disk (a typical size as of this writing) will hold the equivalent of approximately 4.7 million pages of data when printed in a basic ASCII and hexadecimal format (24 characters per line, 80 lines per page of 8-point type). Such a format is appropriate in order to determine whether, when and in what particulars specific material was created, modified or deleted. A one-page letter will generally use 2,000 to 40,000 bytes of storage. Thus, a 9-gigabyte hard disk could hold the equivalent of 225,000 to 4,500,000 million one-page letters.
An image copy can be recorded on CD-R (Compact Disc Recordable) discs in one or more fragments as a means of permanently preserving in unalterable form the exact (or near exact) contents of a complete drive, including deleted files. A single standard CD-R will hold about 650 megabytes ("MB") of data. The "image copy" of a single 9 GB drive uses 14 CD-R discs in an uncompressed, immediately searchable format. Compression might reduce the total to 4-7 CD-R discs, but restoration to an equal or larger size hard disk would be necessary for efficient searches. Although each CD-R costs only about one dollar, preparation of each disc requires a significant amount of labor.
It is possible to locate information of interest in image copies, including both active and deleted files, through the use of software searching tools. The effectiveness of such a search is substantially affected by a number of factors:
Unless the search pattern or patterns (word or words) are unique to the specific information being sought, search results can be either substantially under-inclusive or substantially over-inclusive. If the search is substantially under-inclusive, that fact may not apparent; thus, search terms are usually selected that will most probably find every possible relevant and discoverable piece of information. This often results in a search that is over-inclusive. Generally this means that after the computer search has been completed, the computer output must be reviewed by someone authorized to view any and all materials (e.g., subject to attorney-client privilege) that may have been "captured" by the search in order to determine which, if any, materials are properly discoverable. Additionally, it may be necessary, depending on the nature of the information deemed to be "relevant," to conduct multiple searches using different search terms. Each such search has the same problem of over or under-inclusiveness. Finally, there may be an overlap between the searches so that the same materials may be identified multiple times in multiple searches.
Searching multiple CD-R discs on standard computer equipment (i.e., without an expensive "jukebox" library) is time-consuming and labor-intensive.
This kind of search, particularly given the vast amount of material, requires highly specialized software searching tools and expertise. (Standard software tools are generally unsuitable for this specialized purpose.)