carrier at sleuthkit dot org
December 15, 2003
One my recent areas of focus for The Sleuth Kit and Autopsy has been on incident response. The last issue of the Sleuth Kit Informer was about using The Sleuth Kit to verify an incident and Autopsy is undergoing some changes (see What's New) before it is modified to make it easier to use for incident response. When the transformation is complete, I will write an article on using Autopsy for IR.
This issue goes back to the basics and we look at using 'dd' to make an image of a disk or partition. There are other articles that cover bits and pieces of this process, but I have found that they are usually missing some part of the full story, such as using netcat or using 'dcfldd'. So, this article is my version of "dd 101". Hopefully, I will be done with the Autopsy redesign by the next issue and I will be able to discuss the new incident response features.
Autopsy is in the middle of getting a long awaited architecture update. Autopsy was started almost 3 years ago and it was originally designed to be a CGI script, so it was in one BIG Perl file that had over 10,000 lines of code and comments! When the redesign is complete, there will be over 15 files that are modular and it will be much easier to add and remove features. As a user, you will not see any differences, but for those that look at the source code it will be a welcome change.
After the architecture change is done, I will integrate the incident response features (which involves disabling some of the components that write to the local file system).
Robert Harwood had an excellent idea after last months issue of The Informer, which discussed using The Sleuth Kit for incident response. Instead of modifying Autopsy for incident response so that it operates in a read-only environment on a CD (which I am soon to be working on), he suggested that an SMB or NFS drive be mounted and used as the evidence locker. A case can be created and symbolic links to the raw devices added as images.
This design allows the investigator to save output data, create strings files, and generate audit logs. For those that want to try this, you can specify the evidence locker directory on the command line with '-d'. I will still need to do some work on making it easier to place Autopsy on a CD because it normally needs hard coded references to the location of The Sleuth Kit and Perl.
This article shows how to acquire a disk image of a computer using 'dd'. While this process is not specific to The Sleuth Kit, it is an important function and essential if you want to use The Sleuth Kit. There are already articles that discuss using 'dd' and this one focuses on the functionality that is important to a forensic acquisition and incident response.
'dd' is available on many Unix and Windows platforms and I am going to focus on using it with Linux, because that where it is most commonly used. I'll also mention its usage in Windows in case you need to perform a live acquisition of a Windows system.
The goal of an acquisition is to preserve the state of the system and make an exact copy of a target disk or other storage medium. If you had to do this by hand, you can imagine that you would read a little from the target disk and write it to the source, then read a little more from the target and write it to the disk. It is like copying a document by hand, where you read a few words and then write them to a second document. If your goal is just to copy the document, you probably wouldn't even try to understand what the document is about because that would take longer. This is exactly what 'dd' does.
'dd' reads a block of data from an input file and writes it to an output file. This process repeats until the end of the input file is reached. The blocks are not processed or interpreted and therefore disks that are formatted as a Windows system, disks that are wiped with 0s, and disks that have some obscure Unix variant are processed in the same way.
You can specify where 'dd' should read data from using the 'if=' flag and you can specify the output using the 'of=' flag. If the 'of=' flag is not given, then the data is printed to the screen, which means that we can redirect it with a pipe to another tool. Both Unix and Windows have files that correspond to the disk and partitions in a disk, and those can be used in the acquisition.
The default block size for 'dd' is 512-bytes, which means that it will read 512-bytes from the input file and write them to the output file. You can change the block size by specifying the 'bs=' flag. Specifying block sizes in the range of 2k to 8k will generate faster acquisitions than using the default size. By default, 'dd' will not pad the final block so that it is a multiple of the block size.
Lets give an example. To use 'dd' to copy the '/dev/hda' disk to a file named 'hda.dd' in blocks of 2k (2048-bytes), we would use the following:
# dd if=/dev/hda of=./hda.dd bs=2k
1421041+0 records in
1421041+0 records out
We see from the output that it copied 1,421,041 blocks of size 2048-bytes from the '/dev/hda' device to the 'hda.dd' file. If the final block was not full (i.e. hda was not a multiple of 2k) then the records output would have a +1 instead of +0.
The output of 'dd' is an exact copy of the input file. It is in a raw format, meaning that there are no embedded hash values or other meta data. This format can be used as input to most of the popular analysis tools. So, if you are interested in having flexibility with respect to the analysis tools that you can use, than a raw format may be the easiest for you to use.
In some cases, it is easier to use the network to acquire the data. This is useful when you can't remove the suspect disk, can't install a disk into the suspect system, or if you acquiring a live system (which is less than ideal). A network-based acquisition uses a network transport, such as netcat or cryptcat (a version of netcat with encryption). We discussed netcat in Issue #10 of The Sleuth Kit Informer, so refer to that for usage details.
To send the data over the network, we will pipe the output data from 'dd' to netcat. 'dd' sends data to Standard Output (the screen) if the 'of=' flag is not given, so all we need to do to send the contents of the '/dev/hda' disk device to the server at IP 10.0.0.1 and port 9000 is:
# dd if=/dev/hda bs=2k | nc 10.0.0.1 9000
The server (at 10.0.0.1) would run something like the following:
# nc -l -p 9000 > hda.dd
When using the network, all of the same flags to 'dd' apply. The only difference is that we will not supply an output file and we will instead send it to netcat (or cryptcat).
This section has a brief overview of the naming conventions on Linux and Windows for the devices that are needed by 'dd'. 'dd' will use either a disk or a partition device file as input. The disk is typically easier to acquire because the process only has to be performed once. Some tools (such as the current version of The Sleuth Kit), require partition images as input and you may need to break a disk image apart in the lab.
For Linux, the devices are in the '/dev/' directory. ATA/IDE disks have a name that starts with 'hd' and SCSI disk names start with 'sd'. The next letter corresponds to the disk number. The primary master ATA disk is 'hda', the primary slave is 'hdb', the secondary master is 'hdc', and the secondary slave is 'hdd'. The devices mentioned correspond to the entire disk, from sector 0 to the end. For each of the partitions on the disk (if there are any), there is a device file whose name is the same as the disk device name plus a number. For example, the first partition on the secondary slave is '/dev/hdd1' and the third partition on the primary slave is '/dev/hdb3'.
If you need to acquire a live Windows system, there is a device object that corresponds to the disks and partitions. The '\\.\PhysicalDrive0' object corresponds to the first disk and the final number can be increased to specify other disks. To acquire just one partition, you need its logical drive letter. The 'C:' drive's object is named '\\.\C:'.
If you acquire the full disk and need to later split it up, you can use either 'mmls' in The Sleuth Kit or 'fdisk'. Refer to Issue #2 of The Sleuth Kit Informer for details on splitting the disk up using 'fdisk'.
I've included a few scenarios here to show how 'dd' can be used to acquire a disk. In all cases, the suspect system maybe running any operating system on an x86 platform, including Microsoft Windows, Linux, Sun Solaris for Intel, FreeBSD, OpenBSD etc.
The first scenario is a suspect system that can be shut down and have its hard disk removed. The hard disk is placed in an acquisition system, which is running Linux, as the secondary slave (hdd) and saved to a file, '/mnt/disk.dd':
# dd if=/dev/hdd of=/mnt/disk.dd bs=4k
The second scenario is a suspect system that can be shut down, but its hard disk cannot be removed or installed in the acquisition system. Examples of this scenario include a suspect system that is a laptop, if the needed SCSI adaptors are missing, or if it is just easier to use the network. The suspect system and the acquisition system are connected with a cross-over network cable and the suspect system is booted with a bootable Linux CD (a list of bootable Linux CDs can be found on the Links page of www.sleuthkit.org).
First, the acquisition system (at IP 10.0.0.1) will run the following to start the netcat listener:
# nc -l -p 9000 > disk.dd
Next, the 'dd' command is run on the suspect system from a bootable CD and will take the primary master (hda) as the input file and pipe the data to netcat:
# dd if=/dev/hda bs=4k | nc -w 3 10.0.0.1 9000
The final scenario deals with a live acquisition. In this example, we will use a live Windows system as the suspect system. To acquire it, we will reference the \\.\PhysicalDisk0 device and send it to an acquisition system using netcat. This is not an ideal acquisition scenario because we are relying on an untrusted operating system and because the file system will be in an inconsistent state. The tools should be run from a trusted source, such as a CD:
E:\>dd if=\\.\PhysicalDrive0 bs=2k | nc -w 3 10.0.0.1 9000
If 'dd' comes across an error while reading a block from the input file, an error will be generated and the copying process will stop. You can cause 'dd' to keep on going when it encounters an error if you provide the 'conv=noerror' flag. Unfortunately, with just that flag, 'dd' will skip writing that block and the remaining data will be in the wrong location.
A more desirable behavior is to have 'dd' write 0's where the bad block was found, which can be done by adding the 'sync' flag to the 'conv=' option. The 'sync' flag will pad the blocks so that each block written is a full block size. The disadvantage of the 'sync' flag is that if you chose a block size that is not a multiple of the original media size, then the final block will be padded with 0's. Therefore, this option should not be used unless there are known errors and you should chose a smaller block size. An example of using these flags are:
# dd if=/dev/hdd of=/mnt/disk.dd bs=1k conv=noerror,sync
It is always desirable to have an MD5 or SHA-1 hash of data that is acquired. If you are using the standard version of 'dd', then you will need to execute separate procedures to obtain the hash values. For example:
# md5sum disk.dd > disk.md5
# dd if=/dev/hda bs=2k | md5sum
The U.S. Department of Defense Computer Forensics Lab (DCFL) has released an updated version of 'dd' that includes the ability to calculate the MD5 while the data is being copied. The 'dcfldd' tool takes the same arguments as the standard 'dd', but it also takes the 'hashwindow=' option to specify how many hashes to calculate and the 'hashlog=' option to specify a file where the hashes should be saved to.
To calculate one hash for the entire image, 'hashwindow=0' is used. If a hash for every 2MB is desired, then 'hashwindow=2M' is used. For example, to acquire the system and calculate a hash for every 2MB, the following would be used:
# dcfldd if=/dev/hdd of=/mnt/disk.dd bs=2k hashwindow=2M hashlog=/mnt/disk.md5
Having a separate file of hashes makes the data more portable and easier to use in many applications. Remember that the final hash value for the file should be documented in a paper notebook. If someone can modify the acquired image, then they can also recalculate the hash values that are stored with the image. Even if the hashes are embedded in the image file, they can still be updated unless there is a cryptographic signature and trusted time stamp.
This article has focused on acquisitions using the Intel platform. For other platforms, such as Sparc and PowerPC, different bootable media will be needed. Linux will run on non-Intel platforms, so the above procedures can be applied. In other cases, the operating system installation CD can be used because it contains a copy of 'dd'. For example, the Sun Solaris install CD can be used to boot a Sparc system into a trusted shell, mount a target disk, and copy disk contents using 'dd'. If you want to use netcat, you'll have to make your own bootable CD or ftp a binary to the system.
The Computer Forensic Tool Testing group at NIST tested several disk imaging tools, including the 'dd' that ships with Red Hat Linux 7.1. The test results document a bug that exists in the Linux kernel, but not in 'dd'. In fact, the same tests were performed on a FreeBSD system with 'dd' and the bug was not observed.
The bug occurs when a disk or partition is encountered that has an odd-number of sectors. Linux will not be able to read the final sector because it reads the data from the disk in 1024-byte chunks and will not be able to read the final 512-byte chunk. To acquire such a disk, a FreeBSD or OpenBSD system can be used instead. It has been reported that future versions of Linux will fix this error.
This article has shown the basics of using 'dd' to acquire a suspect system. 'dd' uses a basic design to copy data from one location to another and does not care about the file system or data type. The result of using 'dd' is a raw image format, which is flexible for using multiple analysis tools. A raw format file is sometimes called a "dd image" because that is the most common tool that makes the format, but there is nothing unique that 'dd' does to the image. The 'dcfldd' tool provides a fast method of calculating MD5 hashes of an image.
Bootable Linux CDs:
NIST CFTT Test Results for Disk Imaging Tools: dd
Splitting the Disk:
The Sleuth Kit Informer Issue #2
UNIX Incident Verification with The Sleuth Kit:
The Sleuth Kit Informer Issue #10