The Sleuth Kit Informer

`The Sleuth Kit Informer`

http://www.sleuthkit.org/informer
http://sleuthkit.sourceforge.net/informer

Brian Carrier
carrier at sleuthkit dot org

Issue #19
March 15, 2005

Introduction
What's New?
Call For Papers
New Image File Support
Hooking IO Calls for Multi-Format Image Support (By: Michael Cohen)

Introduction

This issue of the Informer is unique because it has articles that describe two different approaches to the same problem. The first main article describes the new image file features in The Sleuth Kit version 2, which supports disk and split image files. The second article is by Michael Cohen and it describes how PyFlag added support for different image file formats before TSK v2 existed. PyFlag uses TSK to analyze file system images.

In the last issue of the Informer, I mentioned that I was no longer going to make the text version because it took a lot of time to manually convert between the two. Alexander Ehlert e-mailed me to tell me that lynx can be used to dump an HTML page to text, so I will be using that for this and future issues (until I find a better document management system).

What's New?

New versions of TSK and Autopsy are being released soon. TSK v2 has many new features including disk and split image support (as discussed later in this issue), autodetect file system types, and a new internal design. There were also several new features added to existing tools. The new disk_sreset tool was added to remove an HPA from an ATA disk and the diskstat tool was renamed to disk_stat as an attempt to make the tool names more clear. Autopsy has been updated to version 2.04 and it supports the new disk and split images.

The 5th Annual Digital Forensic Research Workshop (DFRWS) announced its Call for Papers in January. One of the areas that we are interested in is general tool design or testing, so it could be a good place to publish papers on tools based on TSK/Autopsy or other open source tools.

www.dfrws.org

Call For Papers

The Sleuth Kit Informer is looking for articles on open source tools and techniques for digital investigations (computer / digital forensics) and incident response. Articles that discuss The Sleuth Kit and Autopsy are appreciated, but not required. Example topics include (but are not limited to):

Tutorials on open source tools
User experiences and reviews of open source tools
New investigation techniques using open source tools
Open source tool testing results

http://www.sleuthkit.org/informer/cfp.html

New Image File Support

Brian Carrier

Overview

Version 2 of The Sleuth Kit (TSK) has (finally) added support for image files other than only raw partitions. TSK now supports raw disk and split images and future versions will support non-raw and compressed formats. This article describes how to use the new features and a high-level description of how it was implemented.

Usage

There are two new things that you must consider with TSK. One is the image file format and the second is the offset location of a specific partition or file system in a disk image. Accordingly, there are also two new command line flags. The -i flag is optional and is used to specify the image file format. If it is not given, then the tool will try to detect the format. The second flag is -o and it is used to specify the offset where a specific partition starts.

When specified, the image type argument is a list of one or more format types separated by commas. Currently, the argument needs only one type, but future versions may require multiple types. The currently supported types are raw and split and a basic image would use the arguments -i raw or -i split. In the future, the tools may support ACME Company's image file format with embedded data and if the image file is split among several files you would use -i acme,split.

The offset argument is, by default, in sectors. For example, to specify that the partition starts at sector 63 you would use -o 63. If you want to specify the offset using a different block size, then the block size can be given with the format of offset@blocksize. For example, to specify that the partition starts at block 1000 and each block is 2,048 bytes then you would use -o 1000@2048.

The location of the image file names for each command has not changed. If split images are used, then the names must be given in the sorted order. For example: fls image.dd.01 image.dd.02 image.dd.03 image.dd.04 .... You can use the * wildcard to specify a large number of files: fls image.dd.*.

Here is an example with all possible arguments:

    # fls -i split -o 63 -f ntfs disk1.dd.*

Here is an example using the new autodetect features:

    # fls -o 63 disk1.dd.*

If you have a raw partition image, then you can skip the -o argument (and take advantage of the new file system type autodetect feature):

    # fls part1.dd

New Tool

There is a new tool to help with the image file formats. The img_stat tool will display details about the image file. Example information includes the sector ranges of each split image file and other embedded data will be shown for future file formats. The -t flag can be used to determine the file format type.

Implementation Overview

For those interested in code-level information about the new image support, this section will fill you in. The new features were added by creating a new imgtools library. This library is used to read the data from the image. The file system code never knows which image format is used.

Before the file system is processed, the image file is opened using the img_open function. This function determines the format type and initializes an IMG_INFO data structure. That data structure is passed to the file system and media management code and is used to read all data. The imgtools library contains all of the code to read the image files.

Conclusion

Version 2 of TSK has finally introduced disk and split image support, which will make setting up a case much easier. This article has described the basics of using the new features and tools.

Hooking IO Calls for Multi-Format Image Support

Michael Cohen <scudette at users dot sourceforge dot net >

Overview

Often when analysing hard disk images, the image may be provided in a slightly different format to the expected partition dd image. This may happen because the image was split into multiple files, or it might be that the image was acquired using Encase (TM) which uses its own proprietary image file format.

Many forensic tools require the image to be in a specific format. For example previous versions of the Sleuthkit required the image to be an uncompressed partition images, for example that obtained using the dd command line::

  dd if=/dev/hda1 of=image.dd

If the raw disk was used, i.e. /dev/hda, the investigator was forced to use dd to "slice" the original image into partitions depending on the partition table (Note that the 63 sector skip is normally found from the partition table, using sfdisk, mmls or a similar tool)::

  dd if=disk_image.dd of=partition_image.dd bs=512 skip=63

If the original disk was very large to start with, this was a time consuming operation. It would be nice to have an abstraction layer which converts between the different formats of images (a partition image vs. a disk image) on the fly without requiring to copy the image again.

This functionality becomes even more desirable when considering the analysis of images which have been stored using compression. For example, the popular forensic package Encase(tm) stores images in a proprietary format called `The Expert Witness Compression Format`[1]. This format provides compression as well as splitting large images into manageable parts. By providing a transparent abstraction layer it is possible to enable any tool to automatically support the image format.

Hooking IO for fun and profit

The PyFlag[2] forensic package used to have an IO Subsystem patch for the Sleuthkit which enabled it to operate on a number of different file formats. Although the Sleuthkit is an excellent tool, it soon became obvious that the same functionality was also required of other tools, like strings, sfdisk etc.

Modifying the source code of an application resulted in an increased amount of code maintenance required to retrofit the IO subsystem patch as each version of the Sleuthkit was released. The developers of PyFlag had to find a better way. Ideally the tool would have to involve no source code modification, and allow arbitrary programs to handle the supported file formats transparently.

The obvious solution to this problem was an abstraction layer based on library hooking techniques.

When a program wishes to perform an IO operation on a file (for example open, read or write the file), it is very rare that the program issue the kernel system call directly. In fact, most programs will call the C library's open(), read() and write() calls as required. Since most programs are dynamically linked rather than statically compiled, the linking of the C library code is done during run time, by the dynamic linker.

Most dynamic linker implementations (and in particular the GNU libc dynamic loader) allow a library to be loaded first, before loading other system libraries. Also, if a library provides a required symbol, the linker will stop searching for that symbol in other libraries. This property allows a library to "hook" a library function by simply masking the library function with a locally defined function.

An example serves to illustrate the technique. Assume we have the following program, written in pseudo C code::

 main() {
   fd=open("somefile",O_RDONLY);
   read(fd,buffer,SIZE);
   close(fd);
 }

When this program is executed, it calls the C library's open function (which actually does the system call). The program then reads some data from the filehandle, by calling the C library's read function, and finally calls the library's close function to close the filehandle.

In the glibc implementation of the dynamic loader (The one used in most Linux systems), the environment variable LD_PRELOAD specifies to the linker that the named library should be loaded before any other libraries. If the desired symbol is present within the named library it will mask other functions with the same name present in other libraries.

In our case, we wish to hook the open(), read() and close() functions, hence we need to create a shared object (a library - we shall call it the hooker object) with these functions defined. After setting LD_PRELOAD to the location of the hooker object we have created, our library will trap all calls to the specified function::

  External program ---> Hooker object ---> real libc functions

The result of this is that as far as the external program is concerned, it is operating on a simple partition image as would have been obtained using dd. In practice however, the hooker object is able to read more complex images, emulating a simple partition image to the external program.

Implementation

The PyFlag iohooker tool implements this technique. Not only does it hook open, read, write etc, but also hooks the stream functions fopen, fread, fwrite etc. It currently supports many different external programs, such as dd, sfdisk, all Sleuthkit executables, strings and many more.

IOHooker is distributed in two components. The main component is a shared object called libio_hooker.so. In order to control this object, environment variables are set by a wrapper program: iowrapper.

For the purposes of demonstration we download the `binary version of PyFlag`[3]. We untar the distribution in our home directory, and change directory into it.

The first step, prior to being able to use the iowrapper is to set the LD_LIBRARY_PATH environment variable. This is required to allow the dynamic linker to find libio_hooker.so. If we fail to set this properly, the linker can not run the iowrapper::

  ~/pyflag$ ./bin/iowrapper -h
  ./bin/iowrapper: error while loading shared libraries: 
  libio_hooker.so: cannot open shared object file: No such 
  file or directory

After setting the LD_LIBRARY_PATH environment variable, we are able to run the iowrapper normally::

  ~/pyflag$ export LD_LIBRARY_PATH=`pwd`/libs/
  ~/pyflag$ ./bin/iowrapper

  This program wraps library calls to enable binaries to operate
  on images with various formats. NOTE: Ensure that libio_hooker.so
  is in your LD_LIBRARY_PATH before running this wrapper.

  Usage: ./bin/iowrapper -i subsys -o option prog arg1 arg2 arg3...
        -i subsys: The name of a subsystem to use (help for a list)
        -o optionstr: The option string for the subsystem (help for an example)
        -f wrapped filename: All wrapped filenames will start 
  with this string. This is useful for programs that need to 
  open other files as well as the target file (for example 
  /usr/bin/file needs to open magic files as well).
  Loading library now for hooking

The final message "Loading library now for hooking" confirms that the hooker object is properly initialised and ready. Let us first check to see what IO Subsystems are supported by the iowrapper::

  ~/pyflag$ ./bin/iowrapper -i help
  Loading library now for hooking
  Available Subsystems:

        standard - Standard Sleuthkit IO Subsystem
        advanced - Advanced Sleuthkit IO Subsystem
        sgzip - Seekable Gzip format
        ewf - Expert Witness Compression format
        raid - Raid 5 implementation
  Unhandled Exception(IO Error): No such IO subsystem: help

Each subsystem requires specific options that make sense for it. The Advanced filesystem, allows users to specify arbitrary offsets, as well as multiple split image sets. We can get a more detailed explanation of these options::

  ~/pyflag$ ./bin/iowrapper -i advanced -o help
  Loading library now for hooking
  Advanced io subsystem options

        offset=bytes            Number of bytes to seek to in 
  the image file. Useful if there is some extra data at the start
  of the dd image (e.g. partition table/other partitions)
        file=filename           Filename to use for split files.
  If your dd image is split across many files, specify this parameter
  in the order required as many times as needed for seamless 
  integration
        A single word without an = sign represents a filename 
  to use

For our first example, we use the Sleuthkit's fls tool to list the files present in partition 6 of a hard disk image. The fls tool does not provide the option of selecting an offset into the image for the start of the filesystem, hence we need to wrap it. First we calculate the offset where the partition starts::

  /pyflag# sfdisk -uS -l /tmp/test.dd
  Disk /tmp/test.dd: cannot get geometry

  Disk /tmp/test.dd: 0 cylinders, 0 heads, 0 sectors/track
  read: Inappropriate ioctl for device

  Warning: The partition table looks like it was made
    for C/H/S=*/255/63 (instead of 0/0/0).
  For this listing I'll assume that geometry.
  Units = sectors of 512 bytes, counting from 0

     Device Boot    Start       End   #sectors  Id  System
  /tmp/test.dd1            63     96389      96327  de  Dell Utility
  /tmp/test.dd2   *     96390  19647494   19551105   7  HPFS/NTFS
  /tmp/test.dd3      19647495  58733639   39086145   c  W95 FAT32 (LBA)
  /tmp/test.dd4      58733640 117210239   58476600   5  Extended
  /tmp/test.dd5      58733703  59328044     594342  82  Linux swap
  /tmp/test.dd6      59328108 117210239   57882132  83  Linux

The start of partition 6 is at 59328108 sectors * 512 bytes = 30375991296. We can therefore use the wrapper to force fls to read the file system located at that offset::

  ~/pyflag$ ./bin/iowrapper -i advanced -o offset=30375991296,filename=/tmp/test.dd fls \
  -f linux-ext3 foobar
  Set file to read from as /tmp/test.dd
  d/d 11: lost+found
  d/d 32769:      etc
  l/l 12: cdrom
  d/d 131073:     var
  ...
  d/d 3211272:    opt
  d/d 3555336:    initrd
  l/l 16: vmlinuz

Note that as far as fls is concerned it is opening and reading the file foobar. It does not realise that foobar does not exist, since the wrapper provides it with valid data.

For the next example, we used Encase(tm) to create an evidence file of a floppy disk. The file command is unable to determine what is stored inside the image, due to it being encoded in the proprietary EWF format::

  ~/pyflag$ file test.e01
  test.e01: data
  ~/pyflag$ hexdump -C test.e01 | head 
  00000000  45 56 46 09 0d 0a ff 00  01 01 00 00 00 68 65 61  |EVF...�......hea|
  00000010  64 65 72 00 00 00 00 00  00 00 00 00 00 b2 00 00  |der..........�..|
  00000020  00 00 00 00 00 a5 00 00  00 00 00 00 00 80 00 10  |.....�..........|

Lets wrap the hexdump program to show the contents of the raw image::

  ~/pyflag$ ./bin/iowrapper -i ewf -o filename=test.e01 hexdump -C test.e01 | head
  00000000  eb 3c 90 4d 53 44 4f 53  35 2e 30 00 02 01 01 00  |�<.msdos5.0.....|
  00000010  02 e0 00 40 0b f0 09 00  12 00 02 00 00 00 00 00  |.�.@.�..........|
  00000020  00 00 00 00 00 00 29 fc  02 29 08 4e 4f 20 4e 41  |......)�.).no na|
  00000030  4d 45 20 20 20 20 46 41  54 31 32 20 20 20 33 c9  |me    fat12   3�|

From this hexdump it looks like the image is that of a FAT 12 floppy disk. To confirm we can run the file command over the image. Since file opens other files other than the image (it needs to open the magic file), we need to prevent the hooker from hooking those other files (otherwise when the file program tries to open its magic file, it will be getting the image instead). To this end we can use the -f flag to restrict hooking only to files of a given name::

  ~/pyflag$ ./bin/iowrapper -i ewf -f test.e01 -o filename=test.e01 file test.e01
  test.e01: x86 boot sector, code offset 0x3c, OEM-ID "MSDOS5.0", root entries 224, 
  sectors 2880 (volumes <=32 mb) , sectors/fat 9, serial number 0x82902fc, unlabeled, 
  fat (12 bit)

Sleuthkit's fls can be used on this Encase image::

  ~/pyflag# ./bin/iowrapper -i ewf -f test.e01 -o filename=test.e01 ./bin/fls -f fat12 test.e01
  r/r 9:  gunzip.exe
  r/r 11: Hiew.exe
  r/r 12: tar.exe
  r/r 22: cygwin1.dll
  ..

Finally we wish to extract the Encase image into a standard dd image. We wrap dd and redirect the output to a file::

  ~/pyflag$ ./bin/iowrapper -i ewf -f test.e01 -o filename=test.e01 dd if=test.e01 > /tmp/test.dd

Remote Access to live systems

Sometimes we wish to analyse a live unix system remotely. This may be so we can quickly see if the system is compromised, without having to acquire the entire image first. We can use our forensic tools to examine the remote raw device by using the remote IO subsystem.

.. note:: This type of analysis is quite fragile because the system is still live, and using its file system. The forensic tools are accessing the raw device while it is being modified which makes it susceptible to race conditions. For example, if a file is removed just as the forensic utility is accessing its directory inode inconsistant data may be obtained.

The ramifications of this is that forensic tools may crash, or provide inconsistant results. It is impossible, however, for the IO subsystem to alter the live system in any way (since the raw device is opened as read only).

One of the common problems with accessing a remote system is authentication and encryption. Access to the raw device over the network could easily lead to a root compromise by disclosing sensitive system information (e.g. the shadow file). The problem of authentication and encryption is best left to dedicated programs, such as Secure Shell (ssh). This is the approach taken by the remote access IO subsystem. The only requirements on the live system are an ssh server, and the remote_server program (which may be compiled staticly).

These are the steps required to access remote raw devices over the network:

Have a static version of remote_server - the remote server component installed on the remote system.
Have an ssh server available with root logons allowed.
Use the local system to access the remote raw device by wrapping library calls through the wrapper.

The following is an example of a session which might be run on a remote target machine::

  ~/pyflag$ ./bin/iowrapper -i remote -o host=target,\
    server_path=/path/to/remote_server,device=/dev/hda \
    mmls -t dos foo

  DOS Partition Table
  Units are in 512-byte sectors
  
       Slot    Start        End          Length       Description
  00:  -----   0000000000   0000000000   0000000001   Primary Table (#0)
  01:  -----   0000000001   0000000062   0000000062   Unallocated
  02:  00:00   0000000063   0000096389   0000096327   Dell Utilities FAT (0xde)
  03:  00:01   0000096390   0019647494   0019551105   NTFS (0x07)
  04:  00:02   0019647495   0058733639   0039086145   Win95 FAT32 (0x0C)
  05:  00:03   0058733640   0117210239   0058476600   DOS Extended (0x05)
  06:  -----   0058733640   0058733640   0000000001   Extended Table (#1)
  07:  -----   0058733641   0058733702   0000000062   Unallocated
  08:  01:00   0058733703   0059328044   0000594342   Linux Swap / Solaris x86 (0x82)
  09:  01:01   0059328045   0117210239   0057882195   DOS Extended (0x05)
  10:  -----   0059328045   0059328045   0000000001   Extended Table (#2)
  11:  -----   0059328046   0059328107   0000000062   Unallocated
  12:  02:00   0059328108   0117210239   0057882132   Linux (0x83)

We can now list the contents of the windows partition::

  ~/pyflag$ ./bin/iowrapper -i remote -o host=target,\
    server_path=/path/to/remote_server,device=/dev/hda,\
    offset=0000096390s fls -f ntfs foo

  d/d 12763-144-4:        Documents and Settings
  d/d 6672-144-3: DRIVERS
  d/d 6941-144-6: I386
  r/r 6915-128-3: IO.SYS
  d/d 62628-144-5:        LDIR
  r/r 6916-128-3: MSDOS.SYS
  d/d 16844-144-1:        My Music
  r/r 6671-128-3: NTDETECT.COM
  r/r 6670-128-3: NTLDR
  d/d 13231-144-4:        Program Files
  ...

In the above analysis we use the following parameters:

host
 The host we should try to log on to.

server_path
 The path to the remote_server program. This program must reside on the remote machine.

device
 The raw device to export

offset
 An offset to use on the remote device. This can be speficied in sectors (s), kilobytes (k) or meganbytes(m) depending on the suffix.

.. note:: This analysis would easily reveal to us if there are hidden files or directories, even in cases where kernel level rootkits are installed. This is because most kernel level rootkits trap system calls accessing files on the filesystem, but do not filter access to raw devices. Since fls is reading the filesystem structures on the raw device, it is independant of the kernel's filesystem driver or filesystem related system calls.

Although it is conceivable that rootkits can filter the raw device to hide files, this will dramatically increase the complexity of the rootkit.

Conclusions

Library hooking is a powerful technique which enables a wrapper to be inserted between an arbitrary executable, and the image. PyFlag has developed an image abstraction layer which allows arbitrary programs to automatically support a variety of forensic image formats transparently.

The remote IO subsystem allows for the remote access and analysis of raw devices by forensic tools, making it possible to detect some kernel level rootkits remotely.

[1] The Expert Witness Compression Format: http://www.asrdata.com/SMART/whitepaper.html
[2] PyFlag: http://pyflag.sourceforge.net/
[3] binary version of PyFlag: http://pyflag.sourceforge.net/Downloads/index.html