April 15, 2003
The third issue of The Sleuth Kit Informer contains the first article in a series on the 'sorter' tool. This article shows how to use the Sleuth Kit tool and what happens when it is run from Autopsy. Additional articles in the series will show how to create custom rules for 'sorter' and will describe the procedures used by 'sorter' so that they can be easily disclosed and discussed.
The domain sleuthkit.org was setup. This site and the sleuthkit.sourceforge.net site will be the primary websites for future versions of The Sleuth Kit and Autopsy. Additional documentation and references will be added shortly. The 'man' pages and other tool references will be online for easy reference.
TASK was officially renamed to The Sleuth Kit and is independent of commercial and academic organizations.
The Sleuth Kit 1.61 and Autopsy 1.71 were released with new features and bug fixes. Major changes for The Sleuth Kit include a serious NTFS bug fix and hash database support for NSRL 2 and Hash Keeper. Thumbnails were also added for graphic images in the 'sorter' tool. Autopsy had new navigation techniques added and an easier method for adding file system images. For a full list of all changes and updates, refer to the release notes:
<!-- Autopsy ver. 1.71 Forensic Browser -->
<!-- Page created at: Tue Apr 15 01:02:03 2003 -->
Have you ever gotten a file system image and not really known where to start? Naming conventions for files and directories are unique to each user. What makes sense for one person may seem overly complex for another. This makes it difficult for an investigator to quickly identify where a user has saved information, especially if the user has take additional steps to hide the data.
This article covers one of the newest Sleuth Kit tools on the block, 'sorter'. Instead of showing files in the normal directory hierarchy (which may not be intuitive) , 'sorter' shows files based on their file type. Therefore, all images are grouped together and all executables are grouped together. If the investigation is regarding theft of trade secrets, then the investigator can identify the Office documents and text files and examine those first, regardless of where they were saved to. In addition, hash databases are utilized to flag files that are known to be bad and ignores files that are known to be good. Therefore, documents that came with Windows will not be included in the documents category.
This article discusses how to use 'sorter' during a forensic analysis. Future articles in this series will show how to create custom rule sets and describe the architecture and procedures of the tool.
With 18 flag options, 'sorter' may seem complex. In practice though, only a few are typically needed. At a high level, 'sorter' runs the 'file' command on each allocated and deleted file to identify its file type. Rules in the 'sorter' configuration files are used to identify what category the file belongs to and the extension is compared to valid extensions. Using hash databases, 'sorter' ignores files that are known to be good and trusted (i.e. operating system files) and flags files that are known to be bad (i.e. child pornography).
The most basic method to run 'sorter' in a post-mortem environment would be the following:
# sorter -f ntfs -d save_dir -m C:/ hda1.dd
Loading Allocated File Listing
Processing 8213 Allocated Files and Directories
Loading Unallocated File Listing
Processing 158 Unallocated meta-data structures
All files have been saved to: save_dir
This command will process 'hda1.dd' as an NTFS file system and save all output to the 'save_dir' directory. When a path is printed, it prepends 'C:/' to it. The 'sorter.sum' file in 'save_dir' will contain a summary of how many files were found in each category and how many files were ignored because of hash databases. The file system types specified by '-f' are the same as used by The Sleuth Kit.
Each category will have a file named 'CAT.txt'. For example, the images category has a file named 'images.txt'. Every file in the file system image has an entry in the category file:
JPEG image data, JFIF standard 1.02, resolution (DPI), 300 x 300
--- Extension Mismatch! ---
Image: hda1.dd Inode: 53798
We see that the full path is 'C:/My Documents/projects/resume.doc' and it has a JPEG structure. As the extension of the file is '.doc' and JPEG images should have '.jpg' or '.jpeg' the entry is flagged as an extension mismatch. Lastly, the image and meta data address are given so that we can easily recover the file via 'icat' if needed. As this has an extension mismatch, it would also be noted in the 'mismatch.txt' file.
Any file type that does not have a rule set will be saved to the unknown category. Files typically reside in the unknown category because a rule set has not been submitted by a user. Therefore, please send entries from the unknown category to me or develop rules and send me them so that everyone can benefit. The next article in the series shows how to develop rules. Files will not be saved to the unknown category if the '-U' flag is used when 'sorter' is run.
Throw in the '-h' flag and all category files will be in HTML instead of plain ASCII. Add the '-s' flag and a copy of each file will be saved in a category sub-directory. For example, a copy of all JPEGs will be saved to the 'images' directory. Note that this may consume lots of disk space. The files are renamed to have the image name, the meta data address, and the original extension. For example:
As a special bonus for images, if the '-s' save and '-h' HTML flags are given, thumbnails are created for the 'images' category. The 'images' directory will have 'thumbs-X.html' files that contain 100 thumbnails each. This is of course useful for quickly finding unauthorized images. The Sleuth Kit comes with a custom configuration file that will only save information on graphic images. The man page shows the syntax needed to use it (the '-C' flag).
Using the above flags, we have designed a tool that reorganizes data based on file type instead of directory and file structure. The next step is to reduce the number of files that we have to sort, using hash databases.
The NIST National Software Reference Library (NSRL) contains hashes of 'known good' files, such as operating systems and off the shelf software. The NSRL 'NSRLFile.txt' file is specified on the command line with '-n'. A custom 'known good' hash database is specified with '-x'. Any file whose hash is found in either database will be ignored and not added to the category files. They will be documented in the 'exclude.txt' file though for future reference. Before the known good files are ignored, their extension will be checked and will be added to the 'mismatch_exclude.txt' file if a mismatch is found.
An example 'known good' database would be that of a standard build. If all computers are created from the same image, then all of the original files can be ignored. Supplying 'known good' databases may remove the standard Microsoft graphics from the 'images' category and remove system binaries from the 'executable' category. Also, standard application documentation may not be in the 'text' category. This makes it easier to focus on the suspect material.
A hash database of 'known bad' files is specified with '-a'. Any file found in this database will be added to the 'alert.txt' file. This database should contain hashes of root kits, child pornography, and other files that one wants to identify.
An example of all of these flags is:
# sorter -f ntfs -d save_dir -h -s -n /usr/local/nsrl/NSRLFile.txt
-a /usr/local/forensics/hash/porn.txt -m "C:/" hda1.dd
This processes hda1.dd as an NTFS file system (-f) and prepends C:/ (-m) to all paths. It saves all data to 'save_dir' (-d) and output files are in HTML format (-h). A copy of each file is saved (-s) and files in the the NSRL (-n) and 'win2k.txt' (-x) are ignored to reduce data. Any file found in 'porn.txt' (-a) will be flagged.
Custom hash databases and the NIST NSRL must be indexed with the 'hfind' tool before they are used by 'sorter'. Currently, 'sorter' uses MD5 as the hash value and can use custom databases created by md5sum and Hash Keeper.
The final interesting flag is '-l'. It was designed to be used during incident response. This mode writes nothing to disk and outputs all findings to STDOUT. It can be used with 'netcat' to send file system information from a suspect system to a server for analysis.
# sorter -f linux-ext2 -l /dev/hda1 | nc -w 10 10.0.0.1 9000
The Sleuth Kit and netcat should be compiled and placed on a CD that is inserted into the suspect system.
'sorter' comes with default configuration files and rule sets. These can be customized, but the following categories exist in the default setup:
The file types that belong to each category will vary depending on the file system used. The rule sets article will discuss this more.
Autopsy has a 'File Type' mode that uses the 'sorter' tool. The user can choose what actions 'sorter' performs. Major options are whether files are just sorted by file type, sorted by file type and have their extension validated, or just have their extension validated. It can use the 'known good' and 'known bad' hash databases that were specified when the host was created as well as the NIST NSRL. Other options include ignoring the unknown category, saving copies of the files, and only saving graphic images with thumbnails.
All output is saved to a directory in the 'output' directory of the host, named 'sorter-IMG'. If one chooses the 'save graphics only' option, the data is saved to 'sorter-graphics-IMG'. Any data in the directory will be overwritten (a warning is given). Currently, the output cannot be viewed within Autopsy, but can be viewed by opening the 'index.html' page in the above directory. Integration will occur in future versions.
Hard disks are getting much bigger and the number of files that investigators encounter are also increasing. Data reduction techniques are needed to sort the useful data from the useless data. 'sorter' helps the investigator to focus on file types of interest. If an investigation revolves around sensitive documents, then the entries in the 'documents' category should be examined. If graphic pictures are the focus, then 'images' should be examined.
The rules that 'sorter' uses can be customized by the user, and the next article in the series should be read to learn how to create them. The number of rules in the default configuration file will only improve with support from users, so please send in rules that you have created, entries from the unknown category, and extensions that are not registered as valid (carrier at sleuthkit dot org).