The Sleuth Kit Informer

`The Sleuth Kit Informer`

http://www.sleuthkit.org/informer

Brian Carrier
carrier at sleuthkit dot org

Issue #23
May 17, 2006

Introduction
What's New?
Call For Papers
Expert Witness and AFF Support
An Introduction To The libewf Expert Witness Library

Introduction

In the 23rd issue of the Informer, we look at some of the new features in version 2.04 of The Sleuth Kit. Specifically, we focus on new image file formats. The first article is about how to use TSK with the formats and the second is about the library that was developed by Joachim Metz and Robert-Jan Mora to support the Expert Witness Format.

What's New?

New versions of The Sleuth Kit (TSK) and Autopsy were released on May 11, 2006. TSK 2.04 included bug fixes and new features including support for the EWF and AFF image file formats, ISO 9660 file system, and an updated error handling system. There is also a new 'img_cat' tool to display the raw contents of an image file. Autopsy 2.07 adds support for the new TSK file formats and file systems. It also adds a hex view for files.

The DFRWS 2006 Challenge was released and it is on file carving. The data set contains files and fragments and the challenge is to develop tools and techniques to recover as many files as possible with few false positives.

http://www.dfrws.org/2006/challenge/

Call For Papers

The Sleuth Kit Informer is looking for articles on open source tools and techniques for digital investigations (computer / digital forensics) and incident response. Articles that discuss The Sleuth Kit and Autopsy are appreciated, but not required. Example topics include (but are not limited to):

Tutorials on open source tools
User experiences and reviews of open source tools
New investigation techniques using open source tools
Open source tool testing results

More details can be found at:
http://www.sleuthkit.org/informer/cfp.html

Expert Witness and AFF Support

By: Brian Carrier

The 2.04 release of The Sleuth Kit (TSK) included support for two new image file formats. This integration was relatively straight forward because of the designs of the image layer in TSK [1] and the libraries that were developed to support the formats. Articles in this and future issues of the Informer will describe the details of the image formats and the libraries. This article describes the basics and how to use the formats in TSK and Autopsy.

Expert Witness Support

The Expert Witness format (EWF) is the basis of the image file format created by EnCase and other tools. Support for it was added to TSK using libewf [2] by Joachim Metz and Robert Jan Mora of Hoffmann Investigations, which is described in the next article in this issue of the Informer.

EWF files have a signature in the header that can be used to identify them and therefore TSK can automatically detect EWF files when they are given as input and the image file format type is not given. If you want to specify the image file format, then provide the '-i ewf' flag to the command line. The normal '-o' flag can still be used to specify sector offsets in the image file. For example, if you wanted to specify the image file type and list the files in the partition at sector 63 then you would use:

    # fls -i ewf -o 63 disk1.E01

The libewf library will convert the sector offset to the correct location in a compressed image file. Using EWF with TSK will be no different from using raw files.

The acquisition of large disks may cause multiple EWF files to be created if the acquisition tool had a maximum file size set. For example, it is common to use a maximum file size of 2 GB if the files are going to be stored on a FAT32 drive. TSK and libewf support segmented EWF files, but all of the files must be specified on the command line. For example, you could use either of the following to list the files in the file system that starts in sector 63:

    # fls -o 63 disk1.E01 disk1.E02

    # fls -o 63 disk1.E0?

The 'img_stat' tool for an EWF file currently displays the size of the file and its MD5 hash value. Future releases will likely display additional metadata that are stored in the file.

AFF Support

Support for the Advanced Forensic Format (AFF) was provided by AFFlib [3] from Simson Garfinkel and Basis Technology (disclaimer: I work for Basis Technology). More details than what are included in this article will be provided in a future issue of the Informer and the AFFlib website describes the image format in detail.

AFF is an image file format that has been designed to be open and extensible. Its data structures are documented and the format consists of a collection of name and value pairs, which are called segments. Some segments contain the evidence data and other segments contain metadata. The value in each of the segments can be compressed.

There are three different types of AFF files: AFF, AFD, and AFM. All of which use the same data structures. The AFF format uses a single file to store all of the segments. The AFD format can have multiple files and they are all stored in a directory, whose name has an extension of '.afd'. This format is used when a maximum file size is needed and each of the files contains a collection of segments. The AFM format stores the data in one or more raw files and stores the metadata in a separate file that has the standard segment structure. The AFM format allows metadata to be stored about the data and it allows the data to be imported into tools that support the raw format. With AFM, there is a file with a '.afm' extension and the raw files have the same base name and extensions that are numbers.

TSK can automatically detect the types of AFF, AFD, or AFM files. Or, you can specify them on the command line with '-i aff', '-i afd', or '-i afm'. When any of the three types are used, only one file name must be specified on the command line. With AFF, only the AFF file name must be given. With AFD, only the directory name must be given and the tools will find the files in the directory. With AFM, only the AFM file must be given and the tools will locate the associated raw files. The 'img_stat' tool will display the MD5 and SHA-1 values, the image size, and many other segments that could be defined by the 'aimage' tool, which is the acquisition tool in AFFlib.

Conclusion

Support for AFF and EWF should be transparent to users because TSK can automatically detect the image format types. Because of the work of Joachim, Robert, Simson, and others, TSK now can be used in more situations. Future versions of TSK will include tools to verify the integrity of the image files and the 'img_stat' tool will display more metadata from the files.

References

[1] The Sleuth Kit Informer Issue #19: http://www.sleuthkit.org/informer/sleuthkit-informer-19.html

[2] libewf: http://www.uitwisselplatform.nl/projects/libewf/

[3] AFFlib: http://www.afflib.org

An Introduction To The libewf Expert Witness Library

By: Joachim Metz, Robert-Jan Mora

Hoffmann Investigations <forensics at hoffmannbv.nl> [1]

Description

The Expert Witness Compression format (EWF) is used by EnCase (Guidance) and FTK (AccessData) to create bit-copies. EWF currently is the de-facto (widely used) evidence file standard used within the forensic community. This file format itself is not fully documented and could not be used natively within the Sleuth Kit.

The Sleuth Kit now includes support for reading Expert Witness Format (EWF) image files. This was accomplished using libewf, which is an open source C library that we developed. In addition to the library, the libewf distribution also comes with tools to export data from EWF files (ewfexport), show the metadata stored in the EWF files (ewfinfo), and verify the integrity of the EWF files (ewfverify). This article describes the history of the library, its capabilities, and an overview of how to use it in a program.

A short history

In 2005, Michael Cohen used the Expert Witness format specification [4] to create a library called ‘iowrapper’ that allowed PyFlag [7] to read EnCase 4 evidence files. The 'iowrapper' tool uses library hooking techniques to allow programs to access data in image file formats [6]. The 'iowrapper' tool supports EnCase 5 evidence files only when the default chunk size is used. Although 'iowrapper' was a big breakthrough we preferred native EWF support within the Sleuth Kit. Therefore, we created libefw for current and future open source tools.

In the beginning, libewf was developed in C and for i386 Linux machines. We initially used Michael Cohen code as a reference, but early on decided to do a complete rewrite of his implementation.

It became apparent that both the Expert Witness specification and the code in 'iowrapper' were not complete for current versions of the format. For example, image files created by EnCase 5 contained new metadata that was not previously documented.

In our tests we analyzed different versions of EWF evidence files. We found differences regarding the original specification. Therefore we concluded that the original specification by Andrew Rosen [4] was outdated.

The more noteworthy differences are described in this article. A document containing the detailed specification of the format has been released on the project site [3]. This document contains findings and assumptions about the EWF format. Please note that this is a working document and may contain errors. If you find errors or if you have additional information regarding the format, please send us an e-mail.

Specification

EWF can span the bit-copy over multiple files, called segment files (E01, E02, etc). The data within the file is stored in little endian ordering. The format uses zlib for compression.

Each segment file starts with an 8 byte signature: 0x455646090d0a0ff00. The first three bytes of this signature are "EVF" in ASCII. This signature is unique to EWF files.

After the signature there is a part that contains the segment file number. Within the files there are different sections specified. Each section starts with a header, which will be referred to as section start for clarity. The section start contains fields that represent:

the section type
the size of the section (64 bit)
the offset of the next section (64 bit)
a cyclical redundancy check (CRC) of the data within the section start

These sections contain different types of information regarding the EWF format. These different sections are marked by the section type in the section start. These sections are:

the “header” and “header2” sections, which contain the case data
the “volume” or “disk” and “data” sections, which contain information about the acquired media
the “sectors” section, which contains the actual bit-copied data
the “table” and “table2” sections, which contain information about the bit-copied data in the sectors section
the “next” and “done” sections, which are used to mark the end of a segment file
the “hash” section, which contains a MD5 hash of the bit-copied data
the “error2” section, which contains information about the read errors during acquisition

The original specification only contained the “header”, “volume”, “table”, “table2”, “next” and “done” sections. A short overview of two important sections is provide below.

The header sections

Both the header and header2 sections contain information regarding the case, like the examiner name, case number, evidence number, and the password hash.

The header sections consist of a compressed text string.

In the header section the string contains ASCII values.
In the header2 section the string contains UTF16 Unicode values.

These strings contain tab and line separated values. The values are represented by different identifiers. For example, the values in the header section within EnCase 4 and 5 contain the following:

Case number (identified by c)
Evidence number (identified by n)
Unique description (identified by a)
Examiner name (identified by e)
Notes (identified by t)
The EnCase (software) version used to acquire the media (identified by av)
The platform/operating system used to acquire the media (identified by ov)
Acquired date (identified by m)
System date (identified by u)
Password Hash (identified by p)

Only the first segment file contains the header section. The header section is the first section in the segment file after the signature and the field part. There is a difference of which header section is used within different versions of Encase. Also the data within the header sections differs per version. This can be found in the detailed specification.

The hash section

The hash section contains a MD5 hash of the acquired data. In Encase 4 and 5, these sections also contain additional data (16 bytes) of which we are still uncertain what it represents.

The hash section is found in the last segment file, before the done section. The hash section is an optional section so it can be left out. The hash only allows verification of the acquired data, not the verification of the additional metadata like the case information within the header. For example the hash could also be calculated over the data in the headers-, media information- and tables sections.

Evidence integrity?

Setting a password within EnCase does not always protect the data in the segment files. The password is only used by the EnCase application. So only EnCase may read the password hash from the header sections. Within the libewf implementation in the Sleuth Kit, the password is ignored.

The EWF format does allow for protection against several forms of corruption. But it does not enforce protection against manipulation. The most recent version of EnCase still allows for the hash value to be optional.

This allows for ways of manipulating the evidence. The following is a theoretical approach to manipulate the data. We think it is possible but we have not actually verified it. This requires further analysis, but I think people should be aware of the fact that one cannot rely entirely on the integrity features provided by the EWF format.

One could adjust the case information in the header sections. The header contains several values that could be of interest of manipulation. Method of approach:

extract the payload of a header section
decompress the payload
adjust the values
compress the modified data
write the compressed data back to the file

The amount of data that needs to be modified is dependent on the EWF version. E.g. for Encase 5 the header data should be adjusted in both the header and header2 sections.

If the new header is larger than the previous one, all section offsets in the first segment file need to be corrected. If the new header is smaller then it is highly probable that simple padding should suffice. But most of the time manipulation of a single value could suffice. For example:

One could adjust the password hash, which would render the file unreadable in EnCase.
One could adjust the acquisition date. Which is just a date string in the header section and a time stamp string in the header2 section.

This is not very serious, unless any value is given to the additional case data. But how will the evidence hold up in court if any of these values have been manipulated?

The data of specific chunks could be manipulated. The data within a chunk can be adjusted as long as the CRC of the specific chunk is recalculated and also adjusted. This approach is more difficult for compressed chunks, because one should also adjust the offsets in the table sections and the offsets of the sections. An approach of manipulating an uncompressed chunk is to:

read a chunk from the file
adjust the data
calculate the new CRC
write the data and CRC back to the file

However if a hash section has been specified then the manipulation can be detected. But that is no problem because the hash section is optional, so can be removed completely. This is very easy to do:

find the start offset of the hash section in the last segment file
remove it
adjust the offset in the done section to point to itself, the value can be obtained from the section prior to the former hash section

Apart from removing the hash section, in theory it is possible to adjust the MD5 hash directly in the hash section. If one is able to determine the new MD5 hash for the manipulated data all one has to do is to write it to the hash section.

Another, but more difficult approach is to determine data which has the identical MD5 hash as the original data (collision). The only manipulation required is that of chunks and their CRC values. Collision attacks for MD5 hashes are possible.

If you want to be thorough, you can also hash the individual segment (EWF) files using tools like md5sum and/or sha1sum to be able to verify the integrity of the segment files.

The library

Because we found differences in the EWF image files that were created by different programs, we analyzed the image files from EnCase version 1.991, 2.17a, 3.21b, 4.22, and 5.04a as well as from FTK Imager 2.3.

We also tested several different features for each program including different compression levels, password settings, hash settings, disk drive sizes, and both split and single evidence files. Extra attention was paid to EnCase 5 and additional test cases included different block sizes and error block sizes.

These test cases allowed us to determine the details of the different versions of the image file format. From this work, libewf can read and verify:

Single and multiple segment EWF files
Non compressed and compressed EWF files
EWF files with or without password
EWF files with or without a hash
EWF files with chunk size differentiating 32k
EWF files containing large images (tested up to 300 Giga bytes)

Libewf was developed to support multiple platforms so that it could be integrated into The Sleuth Kit. Therefore, it supports different endian orderings, 64-bit systems, and other platforms besides only Linux.

Libewf supports the following platforms and compilers:

Linux Fedora Core 4, i386, gcc 4.0.2
Linux Fedora Core 5, x86_64, gcc 4.1.0
Linux buntu 5.10, i386, gcc 4.0.2
Mac OS-X Tiger, G4 Motolora, gcc
Cygwin/Windows XP, i386, gcc
FreeBSD 6.0, i386, gcc 3.4.4
OpenBSD 3.8, i386, gcc 3.3.5
NetBSD 3.0, i386, gcc 3.3.3
SunOS 5.11 / Solaris 11, i386, gcc 3.3.2
Debian 3.1, i386, gcc 2.9.5 and gcc 3.3

Library Features

Libewf provides an easy to use interface that allows programs to open EWF files, read data from them, and verify the data integrity. When reading image files, libewf performs various integrity checks. It validates:

that the structure of the segment files is correct and that no offset or size values have been corrupted;
that the CRC value for the sections is correct and that no file structure data was corrupted;
that the CRC value for the media data is correct
that the offset values of the media data in the table and the table2 sections are equal and that the table sections are not corrupt

When a CRC error is encountered in one of the chunks, a warning message is displayed. If other validation errors occur though, the code terminates. One of the future goals is to implement a more user friendly method of error handling. In addition to providing access to the data stored in the image format, the library also provides access to the metadata. Example metadata includes case information and MD5 values.

Source and dependencies

The libewf source code is organized into several files. The following naming convention is used:

Files that start with ewf_ contain the implementation of EWF specific definitions, such as the layout of the different sections,
The remaining files contain code regarding the workings of the library. For example, the main file handling code is in file.c and the reading code is in file_read.c.

The “libewf.h” file should be included when using libewf with other tools.

Libewf is dependent on two libraries, which are found on most systems per default:

zlib for (un)compression support
openssl libcrypto for MD5 calculation support

The clockwork

In this section we take a look at how to develop tools using libewf and describe some of its internal clockwork.

To read from an image file, the image file must first be opened using libewf_open, which will dynamically create an instance of the LIBEWF_HANDLE structure. This structure is used by other libewf functions to store and retrieve data. Some functions allocate memory in the structure and therefore it should be closed with libewf_close.

The libewf_open function builds a dynamic array of the segment files in a split image, referred to as the segment table, and verifies:

That each file contains the EWF file signature
That the segment number in the field part is specified only once

After each segment file has been added to the segment table the offset table is built. The offset table is a dynamic array containing the offset of every chunk within all the segment files. This process:

Verifies that no segment files are missing.
Builds a list of the section in each segment.
Verifies the CRC value of each section. Currently it warns you if a CRC is invalid but keeps processing.
Compares backup copies of data in the image format to ensure that they are consistent, however the code does not currently correct for errors in one of the structures.

To read the entire media to a file descriptor, for example to export the media data, the function libewf_read_to_file_descriptor can be used. This function reads the data from the EWF files, and writes this to a file descriptor. This function allows for a callback function to be specified which can be used to provide for a progress indicator as in ewfexport.

For tools like TSK, reading data from random locations is required. For this purpose, the libewf_read_random function can be used. The function will convert an offset value to the segment file that it is located in.

The libewf_read_random caches the data of a single chunk to improve the performance of a large amount of small reads. As mentioned before currently libewf only warns about a CRC mismatch within a chunk. It will only do so when reading the actual chunk from file, not the cached version.

Libewf provides other functions to access the MD5 stored within the EWF files and to calculate the MD5 hash over the media data. The ewfverify tool uses this functionality.

To access the stored MD5 hash value, the function libewf_data_md5hash can be used. To calculate the MD5 hash value, the function libewf_calculate_md5hash can be used. Both return a string containing the MD5 hash value.

The future

The immediate future for libewf entails publishing it on line. We have created a project page for it on the uitwissel platform (translated exchange platform). The link to the project page and working document can be found below in the references. The source code will be uploaded shortly.

The future goals regarding libewf are:

Implement write support
Implement friendlier error handling, not immediate termination
Figuring out more of the unknown values in the new versions of the format.
More data verification to test that the data sections match the volume or disk section
To allow overriding certain validation checks to be able to process corrupted or manipulated image files

We would appreciate feedback on the specification we published on EWF and libewf, both the successes and the failures. We are also interested in feedback and reports on the EWF files created by other tools that we have not been able to test and analyze.