carrier at sleuthkit dot org
September 15, 2003
The eighth issue of The Sleuth Kit Informer focuses on how Autopsy does keyword searching. I recently released a test image for keyword searching to the Computer Forensics Tools Testing (cftt) list on Yahoo Groups. It had two goals: to test tools and make sure they found certain strings and to get a survey of how different tools operated. Users of other tools sent the results to the list.
The tests were designed so that some strings would not be found by all tools. For example, some strings started inside of one file and ended in the next consecutive file and this tested if tools looked at each file individually or at the image as a whole (which Autopsy does). In the process, I found a bug in Autopsy and improved the documentation to ensure that the users knew when a keyword would not be found (fragmented data units). This article describes how Autopsy performs keyword searching and what its limitations are.
The test image can be found at:
Aug 28, 2003
The Sleuth Kit v1.65 and Autopsy v1.74 were released. The Sleuth Kit had minor updates and a minor HTML fix in 'sorter'. Autopsy had some keyword searching bug fixes and support for raw and swap partitions was added. NSRL is no longer used by 'sorter' until a solution to identifying the 'known good' and 'known bad' hashes is found - as reported in The Sleuth Kit Informer #7.
I was notified that The Sleuth Kit is included in the Local Area Security Linux bootable CD.
The 1.74 version of Autopsy has a new logging feature. Any command that is executed by Autopsy is logged to a file in the 'logs' directory of the host. The file is named with the investigator's name and ".exec.log". For example, mine could be "carrier.exec.log".
Each entry has a time stamp and all of the flags that were used to execute the command. The log provides an additional layer of audit material and also allows you to appreciate how much easier it is with Autopsy instead of having to type all of that stuff in every time!
Keyword searching for digital forensics sounds easy doesn't it? You have a string and you want to find if there are any occurrences of it in the image. It's just like using the 'Find' option in a word processor, right? Well, the answer is yes and not. Keyword searching is similar to using 'Find', but there are some important differences and depending on how the searching is implemented, different results will occur.
This article describes how I implemented keyword searching in Autopsy. The goal of the article, like previous Informer articles, is to educate you on what is going on behind the scenes and to show you what will NOT be found. Autopsy has a VERY basic implementation of keyword searching and it is important for you to know what will not be found.
The first section will describe how Autopsy searches an image and the second section describes what Autopsy will and will not find. I'll spare you the details of the actual keyword searching algorithms. For those that want to learn about the algorithms though, you should check out an algorithms book for string matching algorithms.
The main concept of keyword searching is to look at consecutive bytes and compare them to a target string. Sleuth Kit gurus may notice that the toolkit does not contain any keyword searching tools. Instead of re-inventing the wheel, I chose to use the 'grep' tool that is found on all current Unix flavors. In other words, the keyword search mode of Autopsy is just an interface to the 'grep' tool that shipped with your OS.
When keyword searching, Autopsy runs 'grep' on the actual file system image file. Therefore, during a normal search the entire file system is examined too find the keyword - including meta data structures, allocated space, unallocated space, and slack space. This technique creates a problem though, because most users want to see the name of the file (or at least the cluster) that the string was found in and do not want to see only the string that was found. This is solved by having Autopsy apply the needed flags so that the byte offset of the keyword is reported. With that value, Autopsy can calculate what cluster or fragment the string was found in. In the search results, Autopsy reports the data unit information and offset of each hit.
If needed, Autopsy can also call the 'ifind' tool from The Sleuth Kit to find a meta data structure that points to the cluster or fragment. If 'ifind' finds a meta data structure, then the 'ffind' tool from The Sleuth Kit is used to find a file name that points to the meta data structure.
When you ask Autopsy to search the unallocated space, the same process occurs. The byte offset is reported to Autopsy and it calculates what fragment in the unallocated image the string was found in. Autopsy will either show you the string in the unallocated image or calculate where the string exists in the original image by using the 'dcalc' tool from The Sleuth Kit.
A recent improvement to Autopsy was that the keyword search results are saved to a file. Autopsy does this by running 'grep' on the image, processing the output, and saving the output to a file. The output is saved with 5 characters before and 5 characters after the actual string, the data unit that the string was found in, and the byte offset in that data unit. The results file is saved to the 'output' directory of the host directory of the evidence locker.
For those that have used 'grep' before, you likely noticed that some characters need special attention. For example, if we were to search for the '[' value in a file, we would get the following:
# grep "[" test.txt
grep: Unmatched [ or [^
The '[' character is a special value in 'grep' and must be escaped. Therefore, the following would work:
# grep "\[" test.txt
When you enter a search string into Autopsy and the "Regular Expression" box is not checked, then Autopsy will escape the special characters for you. Version 1.74 of Autopsy escapes the following characters:
In addition, if the "'" character is in the keyword string then Autopsy will escape the value and surround the keyword string with "'" values. The 'grep' tool always is called with the keyword surrounded by "'". For example, if we search for the string "I'm" then the following would be run:
# grep ''I\'m'' image.dd
When you choose to do a case insensitive search, then Autopsy adds the '-i' flag to 'grep'. If you choose to do a regular expression search, then Autopsy uses the '-E' flag and does not escape the special values.
A final note about how Autopsy performs keyword searches is in order. Autopsy does not run 'grep' directly on the image file. Autopsy actually runs the 'strings' tool first and then runs 'grep' on the output of the 'strings' tool. This is more efficient because 'grep' only has to look at ASCII strings and the strings are much shorter than if 'grep' had to read the entire file system image. The other benefit is that 'grep' will output the entire line that a string was found in. With a file system image, that could include many non-printable ASCII values and it is hard to process and read. By running it through strings, Autopsy can more easily process the hits from 'grep'.
I recently sent a test image of a FAT file system with several ASCII strings to the Computer Forensic Tool Testing (cftt) e-mail list on Yahoo Groups. The strings in the image were placed in ways that would test the keyword search abilities of forensic tools. The results were interesting and this section will explain the scenarios in which Autopsy will not find strings and scenarios in which Autopsy will find false positives.
I will begin with the false positives, which are irrelevant hits that Autopsy thinks are relevant. These increase the number of search hits and you will have to ignore the hits that are not relevant. As we just discussed, Autopsy simply uses 'grep' to search the image file and 'grep' knows nothing about file system structure. Therefore, strings that cross the "boundaries" in the file system will be identified by 'grep', even if the string only exists because of a coincidence. For example, a string that starts at the end of a file and extends past the end of the file will be found. Similarly, a string that starts before a file starts and ends in the beginning of the file will be found. A string that starts at the end of one file and ends at the beginning of another file will also be found.
Meta data is also searched, so Autopsy will include hits in the meta data structures in the search results. This includes file names, super block values, and the slack space of other meta data structures. With false positives, you will hopefully be able to quickly tell if the string is not part of an interesting file or not.
Autopsy has one scenario where it will not find a keyword string, and it warns you about it in the Keyword Search Mode. If a file has allocated fragments or clusters that are not consecutive (i.e. the file is fragmented), then Autopsy will not find a string that starts in one cluster and ends in the next. That is because there are hundreds of bytes between the beginning and end of the string and 'grep' does not know that the file jumps around.
The (future) solution to this problem is to run 'grep' on the contents of each file. This can be done by using 'fls' to find the meta data address of each file and then using 'icat' to extract the contents. 'strings' and 'grep' would be run on the 'icat' output. This feature will likely be added in the future, but the performance of this approach could be bad because it has a lot of overhead associated with calling 'icat', 'strings', and 'grep' on thousands of files in the image. Its performance would likely be similar to that of the 'sorter' tool in The Sleuth Kit.
Autopsy also will not find UNICODE strings. Recall that 'strings' was run before 'grep' was. 'strings' displays only ASCII strings, so the UNICODE strings would never get to the 'grep' process. So, even if you create a regular expression that should match the UNICODE string, it will not be found. Support for UNICODE will be added in the future.
In this article, I have described how Autopsy does keyword searching. It has (hopefully) shown that a keyword search is similar to using the 'Find' option in a word processor, but the difference is that there are different levels of abstraction that need to be searched in the file system. Autopsy currently searches only the lowest level of abstraction (i.e. the raw data).
Keyword searching has not been one of Autopsy's main features and therefore it is not ideal. It will miss keywords that exist in non-consecutive data units of a file and will find keywords that may not be appropriate (one that starts in one file and ends in another for example).
Work is being done in this area to improve Autopsy's search abilities. Paul Bakker has been working on making an indexed search capability to make searching faster and more powerful. If that is done on the contents of each file, then the fragmented keywords could be more quickly found.
We all know that no one tool is perfect and the most important thing to know is what a tool can and cannot do so that you can plan accordingly. This article has hopefully shown when keyword searching in the current version of Autopsy is appropriate and when it is not.
Computer Forensic Tool Testing List
Digital Forensics Tool Testing Images - FAT Keyword Search
The Sleuth Kit