The Sleuth Kit Informer http://www.sleuthkit.org/informer http://sleuthkit.sourceforge.net/informer Brian Carrier Issue #5 June 15, 2003 CONTENTS -------------------------------------------------------------------- - Introduction - What's New? - Did You Know? - Sorter Internals (Part 3 of 3) INTRODUCTION -------------------------------------------------------------------- This issue of The Sleuth Kit Informer is different than previous issues, which focused on how to use tools. This issue dives into the architectural details of one of The Sleuth Kit tools and documents the procedures it uses. This allows a non-programmer to understand the process and allows others to read the source code along with the written description. Two of the Daubert Guidelines for scientific evidence in the US courts are for published procedures of the theories and techniques used and that there be general acceptance of them. It is still under debate if digital evidence falls under scientific evidence, but it is important to start documenting the procedures used with digital forensics. This allows for a better general understanding and consistency between different tools. As always, comments and feedback are welcome (carrier at sleuthkit.org). WHAT'S NEW? -------------------------------------------------------------------- The Sleuth Kit 1.62 and Autopsy 1.73 were released on June 10, 2003. The Sleuth Kit had some bug fixes and 'mactime' can now export data in a comma delimited format (see DID YOU KNOW?). Autopsy has a new mode called "Event Sequencer" that allows one to sort events from an incident to help identify what the attacker did. Autopsy also saves the search results to a file so that it can be easily recalled. DID YOU KNOW? -------------------------------------------------------------------- Have you ever tried to attach a timeline from 'mactime' to an incident report? Or, make any graphs of file activity? It is much easier now with the '-d' flag. Using the '-d' flag forces 'mactime' to output the timeline with the fields separated by a comma instead of white space and each line has its own time entry. This makes it easy to import into a spread sheet, delete the columns that are not needed and attach to a report. # mactime -b body -d > tl.txt If you want to graph the activity on a daily or hourly basis to find trends, then use the '-i' flag to get a summary index. With the '-d' flag, the data can be imported into a spread sheet and a bar graph made. # mactime -b body -d -i day sum.day.txt > tl.txt or # mactime -b body -d -i hour sum.hour.txt > tl.txt SORTER INTERNALS (PART 3 OF 3) -------------------------------------------------------------------- Introduction In the past two issues of The Sleuth Kit Informer we have shown how to use the 'sorter' tool from the command line and how to create custom rule sets for new file types or to identify only specific file types. The last part in the series on the 'sorter' tool focuses on its internal design. The goal of documenting the design in this level of detail is to allow users who do not read Perl to learn exactly how the program works. This will allow "acceptance" of the procedure (as noted in the Daubert Guidelines for scientific evidence) and general knowledge that can be applied to other analysis techniques. For those who have forgotten what the 'sorter' tool actually does, refer back to Issue #3 of The Sleuth Kit Informer. This description applies to 'sorter' that is distributed with The Sleuth Kit v1.62 (June 10, 2003). Setup The first steps involve setting up the environment for the tool. This includes reading the arguments from the command line and verifying that they are compatible. The second major step verifies that the needed binaries from The Sleuth Kit can be found. The configuration files are processed next. If the exclusive configuration file flag, '-C', was given then only that file is loaded. Otherwise, the 'default.sort' file is loaded first, followed by the operating system file (windows.sort for example), followed by a local customization of the operating system (windows.sort.lcl for example). If a file was specified with the '-c' flag, it is loaded last. As will be shown, the configuration file loaded last will override conflicts with previous files. The 'read_config()' procedure loads the configuration files. For categories, the category name is changed into lowercase and examined to see if it is one of the reserved names. A hash variable is used to store the category information (%file_to_cat). The key to the hash variable is the regular expression for the 'file' output and the value is the category name. If the regular expression already exists in the hash variable, a warning is produced and the existing category is replaced with the new category. For example: $file_to_cat{'image data'} = 'images' For extension rules, the extensions are first changed into lowercase. Similar to the category rules, the extension information is also stored in a hash variable (%file_to_ext). The 'file' regular expression is the key to the hash variable and the value is a list of valid extensions separated by commas. If there are already extensions for the regular expression, the new ones are added to the end of the list. For example: $file_to_ext{'JPEG image data'} = 'jpg,jpeg,jpe' Lastly, the category files are opened and initialized if the '-l' listing flag was not specified. The file handles are stored in a hash variable that has the category name as the key. Identifying The Data After the tool has been setup, it is time to process and sort the data. A loop cycles through each image that is specified on the command line, sets up some static values, and calls the 'analyze_img()' method. The 'analyze_img()' method runs 'fls -rp' on the image to get a listing of all allocated and deleted files with the full path and the corresponding meta data address. The file listings are processed and any entry that is not a regular file or has a type of "-" is ignored (such as directories and devices). The 'analyze_file()' method is then called for each file. After the 'fls' output is processed, 'ils' is run to identify the unused meta data structures. These are sometimes called "Lost Files". The 'ils' tool is given the '-m' flag so that part of the name is found for NTFS and FAT files. Any files that have a size of 0 or missing definitions are ignored. Valid files are processed with the same 'analyze_file()' method that was called for the 'fls' output. Sorting The Data The 'analyze_files()' method is called for each file and unallocated meta data structure. It uses the 'icat' command from The Sleuth Kit to extract the file contents from the file system image. The following procedure is used for each file: 1. Data Collection. The file's contents are analyzed with the 'file' command that is distributed with The Sleuth Kit and the MD5 and SHA-1 hash algorithms if hash databases are being used. If local files can be created, a temporary copy of the file is extracted. The temporary file is not deleted if a copy of the file will be later saved in a category. Non-ASCII characters are removed from the 'file' output and files with a type of 'empty' are skipped. 2. Alert Database. If an alert hash database was specified, a hash lookup is performed with the file's hash using 'hfind'. If it is found in the database, an alert variable is set and the file information will be later saved to the 'alert.txt' file after more information has been gathered. 3. Exclude Database. If the file was not found in the alert database and an 'exclude' hash database was given then a hash lookup is performed. If the hash is found an exclude variable is set and the file information will be later saved to one of the exclude files after additional information has been gathered. 4. NSRL Database. If the file was not found in the alert or exclude databases and the NIST NSRL database was given, then a hash lookup is performed with the NSRL. If the hash is found an exclude variable is set and the file information will be later saved to one of the exclude files after additional information has been gathered. 5. Extension Setup. The extension from the filename is extracted next. The extension will be validated unless the '-i' flag was given, there is no extension (including meta data structures with no name) , or the file type is 'data' (which means that 'file' did not know the file type). 6. Extension Validation. To verify the extension, the regular expressions that were loaded from the configuration files are used. Recall that the regular expressions were stored as the key to a hash variable and the value was the list of valid extensions. When one of the regular expressions matches the 'file' output, the extensions are examined to find a valid one. If a valid extension is found, the search ends. Otherwise, the '$mismatch' variable is set to 1 and the search continues. The '$mismatch' variable can only be cleared if a valid regular expression and extension are found in the remaining rules. If no regular expression was found for the file type then no errors are generated. 7. Ignoring Known Files. If the file was found in the exclude or NSRL hash database, then it will be ignored at this point. If there was an extension mismatch though, it will be first added to the mismatch_exclude.txt file. This is to find 'known good' files that may have been changed to hide them. 8. File Category. Unless the '-e' flag was given to only do extension verification, the file category is found next. Recall that the category rule sets were saved to a hash variable where the key was the file type regular expression and the value was the category name. Each key in the hash variable is applied to the 'file' output, case insensitive. When a regular expression matches, the category name is verified to not be the 'ignore' category. If it is, then it is skipped. Files that did not have a regular expression are placed into the 'unknown' category. A count is saved for each category for the final summary. 9. File Saving. If the file category was found and the '-s' flag was given to save files, the temporary file is moved. A category directory is created, if needed, and the file is renamed to have the original image name, the meta data address, and the original extension of the file. The original file name is not used because of naming collisions. The name can be translated to the original path using the category file created later. If the category is 'images' and the output is in HTML, then the image is added to the current thumbnail file using print_thumb(). 10. Final Output. If the '-l' flag was given, then a summary of the file's MD5 hash, category, and extension are printed to STDOUT. If '-l' was not given, then the details are printed (including the path to where the file was saved if '-s' was given) to the corresponding category file. The Extension Mismatch and Hash database Alerts are also printed. They are printed at the end of the sorting process so the category name can also be displayed. Cleanup After all of the files and unallocated meta data structures have been sorted, the open file handles are closed. A summary of the number of categories and files in each category are also displayed. Conclusion This has given a fairly low-level overview of the process that 'sorter' uses. The ordering of analysis steps is important because, for example, if the exclude hashes were checked before the alert hashes then a hash that happens to be in both would not be alerted. This was likely not good reading for a summer day on the beach, but was hopefully informative. Suggestions for how to improve this process are welcome. References The Sleuth Kit: http://www.sleuthkit.org/sleuthkit The Sleuth Kit Informer Issue #3: http://www.sleuthkit.org/informer/sleuthkit-informer-3.html The Sleuth Kit Informer Issue #4: http://www.sleuthkit.org/informer/sleuthkit-informer-4.html -------------------------------------------------------------------- Copyright (c) 2003 by Brian Carrier. All Rights Reserved