The Sleuth Kit Informer

                    http://www.sleuthkit.org/informer
                http://sleuthkit.sourceforge.net/informer

                             Brian Carrier
                      carrier at sleuthkit dot org

                               Issue #8
                          September 15, 2003


CONTENTS
--------------------------------------------------------------------
- Introduction
- What's New?
- Did You Know?
- Locking In On Keywords


INTRODUCTION
--------------------------------------------------------------------
The eighth issue of The Sleuth Kit Informer focuses on how Autopsy
does keyword searching.  I recently released a test image for
keyword searching to the Computer Forensics Tools Testing (cftt)
list on Yahoo Groups.  It had two goals: to test tools and make
sure they found certain strings and to get a survey of how different
tools operated.  Users of other tools sent the results to the list.

The tests were designed so that some strings would not be found by
all tools.  For example, some strings started inside of one file
and ended in the next consecutive file and this tested if tools
looked at each file individually or at the image as a whole (which
Autopsy does).  In the process, I found a bug in Autopsy and improved
the documentation to ensure that the users knew when a keyword
would not be found (fragmented data units).  This article describes
how Autopsy performs keyword searching and what its limitations
are.

The test image can be found at:
    http://dftt.sourceforge.net/test2/


WHAT'S NEW?
--------------------------------------------------------------------

Aug 28, 2003
The Sleuth Kit v1.65 and Autopsy v1.74 were released.  The Sleuth
Kit had minor updates and a minor HTML fix in 'sorter'.  Autopsy
had some keyword searching bug fixes and support for raw and swap
partitions was added.  NSRL is no longer used by 'sorter' until a
solution to identifying the 'known good' and 'known bad' hashes is
found - as reported in The Sleuth Kit Informer #7.  

    http://www.sleuthkit.org/sleuthkit
    http://www.sleuthkit.org/autopsy

I was notified that The Sleuth Kit is included in the Local Area
Security Linux bootable CD.

    http://localareasecurity.com


DID YOU KNOW?
--------------------------------------------------------------------
The 1.74 version of Autopsy has a new logging feature.  Any command
that is executed by Autopsy is logged to a file in the 'logs'
directory of the host.  The file is named with the investigator's
name and ".exec.log".  For example, mine could be "carrier.exec.log".

Each entry has a time stamp and all of the flags that were used to
execute the command.  The log provides an additional layer of audit
material and also allows you to appreciate how much easier it is
with Autopsy instead of having to type all of that stuff in every
time!


LOCKING IN ON KEYWORDS
--------------------------------------------------------------------

Introduction

Keyword searching for digital forensics sounds easy doesn't it?
You have a string and you want to find if there are any occurrences
of it in the image.  It's just like using the 'Find' option in a
word processor, right?  Well, the answer is yes and not.  Keyword
searching is similar to using 'Find', but there are some important
differences and depending on how the searching is implemented,
different results will occur.

This article describes how I implemented keyword searching in
Autopsy.  The goal of the article, like previous Informer articles,
is to educate you on what is going on behind the scenes and to show
you what will NOT be found.  Autopsy has a VERY basic implementation
of keyword searching and it is important for you to know what will
not be found.

The first section will describe how Autopsy searches an image and
the second section describes what Autopsy will and will not find.
I'll spare you the details of the actual keyword searching algorithms.
For those that want to learn about the algorithms though, you should
check out an algorithms book for string matching algorithms.  


Searching for Strings

The main concept of keyword searching is to look at consecutive
bytes and compare them to a target string.  Sleuth Kit gurus may
notice that the toolkit does not contain any keyword searching
tools.  Instead of re-inventing the wheel, I chose to use the 'grep'
tool that is found on all current Unix flavors.  In other words,
the keyword search mode of Autopsy is just an interface to the
'grep' tool that shipped with your OS.

When keyword searching, Autopsy runs 'grep' on the actual file
system image file.  Therefore, during a normal search the entire
file system is examined too find the keyword - including meta data
structures, allocated space, unallocated space, and slack space.
This technique creates a problem though, because most users want
to see the name of the file (or at least the cluster) that the
string was found in and do not want to see only the string that
was found.  This is solved by having Autopsy apply the needed flags
so that the byte offset of the keyword is reported.  With that
value, Autopsy can calculate what cluster or fragment the string
was found in.   In the search results, Autopsy reports the data
unit information and offset of each hit.

If needed, Autopsy can also call the 'ifind' tool from The Sleuth
Kit to find a meta data structure that points to the cluster or
fragment.  If 'ifind' finds a meta data structure, then the 'ffind'
tool from The Sleuth Kit is used to find a file name that points
to the meta data structure.

When you ask Autopsy to search the unallocated space, the same
process occurs.  The byte offset is reported to Autopsy and it
calculates what fragment in the unallocated image the string was
found in.  Autopsy will either show you the string in the unallocated
image or calculate where the string exists in the original image
by using the 'dcalc' tool from The Sleuth Kit.

A recent improvement to Autopsy was that the keyword search results
are saved to a file.  Autopsy does this by running 'grep' on the
image, processing the output, and saving the output to a file.
The output is saved with 5 characters before and 5 characters after
the actual string, the data unit that the string was found in, and
the byte offset in that data unit.  The results file is saved to
the 'output' directory of the host directory of the evidence locker.

For those that have used 'grep' before, you likely noticed that
some characters need special attention.  For example, if we were
to search for the '[' value in a file, we would get the following:

    # grep "[" test.txt
    grep: Unmatched [ or [^

The '[' character is a special value in 'grep' and must be escaped.  
Therefore, the following would work:

    # grep "\[" test.txt

When you enter a search string into Autopsy and the "Regular
Expression" box is not checked, then Autopsy will escape the special
characters for you.  Version 1.74 of Autopsy escapes the following
characters:

  - \
  - .
  - [
  - ^
  - $
  - *
  - - (if it is the first character)

In addition, if the "'" character is in the keyword string then
Autopsy will escape the value and surround the keyword string with
"'" values.  The 'grep' tool always is called with the keyword
surrounded by "'".  For example, if we search for the string "I'm"
then the following would be run:

    # grep ''I\'m'' image.dd

When you choose to do a case insensitive search, then Autopsy adds
the '-i' flag to 'grep'.  If you choose to do a regular expression
search, then Autopsy uses the '-E' flag and does not escape the
special values.

A final note about how Autopsy performs keyword searches is in
order.  Autopsy does not run 'grep' directly on the image file.
Autopsy actually runs the 'strings' tool first and then runs 'grep'
on the output of the 'strings' tool.  This is more efficient because
'grep' only has to look at ASCII strings and the strings are much
shorter than if 'grep' had to read the entire file system image.
The other benefit is that 'grep' will output the entire line that
a string was found in.  With a file system image, that could include
many non-printable ASCII values and it is hard to process and read.
By running it through strings, Autopsy can more easily process the
hits from 'grep'.


Autopsy's Report Card

I recently sent a test image of a FAT file system with several
ASCII strings to the Computer Forensic Tool Testing (cftt) e-mail
list on Yahoo Groups.  The strings in the image were placed in ways
that would test the keyword search abilities of forensic tools.
The results were interesting and this section will explain the
scenarios in which Autopsy will not find strings and scenarios in
which Autopsy will find false positives.

I will begin with the false positives, which are irrelevant hits
that Autopsy thinks are relevant.  These increase the number of
search hits and you will have to ignore the hits that are not
relevant.  As we just discussed, Autopsy simply uses 'grep' to
search the image file and 'grep' knows nothing about file system
structure.  Therefore, strings that cross the "boundaries" in the
file system will be identified by 'grep', even if the string only
exists because of a coincidence.  For example, a string that starts
at the end of a file and extends past the end of the file will be
found.  Similarly, a string that starts before a file starts and
ends in the beginning of the file will be found.  A string that
starts at the end of one file and ends at the beginning of another
file will also be found.

Meta data is also searched, so Autopsy will include hits in the
meta data structures in the search results.  This includes file
names, super block values, and the slack space of other meta data
structures.  With false positives, you will hopefully be able to
quickly tell if the string is not part of an interesting file or
not.

Autopsy has one scenario where it will not find a keyword string,
and it warns you about it in the Keyword Search Mode.  If a file
has allocated fragments or clusters that are not consecutive (i.e.
the file is fragmented), then Autopsy will not find a string that
starts in one cluster and ends in the next.  That is because there
are hundreds of bytes between the beginning and end of the string
and 'grep' does not know that the file jumps around.

The (future) solution to this problem is to run 'grep' on the
contents of each file.  This can be done by using 'fls' to find
the meta data address of each file and then using 'icat' to extract
the contents.  'strings' and 'grep' would be run on the 'icat'
output.  This feature will likely be added in the future, but the
performance of this approach could be bad because it has a lot of
overhead associated with calling 'icat', 'strings', and 'grep' on
thousands of files in the image.  Its performance would likely be
similar to that of the 'sorter' tool in The Sleuth Kit.

Autopsy also will not find UNICODE strings.  Recall that 'strings'
was run before 'grep' was.  'strings' displays only ASCII strings,
so the UNICODE strings would never get to the 'grep' process.  So,
even if you create a regular expression that should match the
UNICODE string, it will not be found.  Support for UNICODE will be
added in the future.


Conclusion

In this article, I have described how Autopsy does keyword searching.
It has (hopefully) shown that a keyword search is similar to using the
'Find' option in a word processor, but the difference is that there are
different levels of abstraction that need to be searched in the file
system.  Autopsy currently searches only the lowest level of abstraction
(i.e. the raw data).  

Keyword searching has not been one of Autopsy's main features and
therefore it is not ideal.  It will miss keywords that exist in
non-consecutive data units of a file and will find keywords that
may not be appropriate (one that starts in one file and ends in
another for example).

Work is being done in this area to improve Autopsy's search abilities.
Paul Bakker has been working on making an indexed search capability
to make searching faster and more powerful.  If that is done on
the contents of each file, then the fragmented keywords could be
more quickly found.

We all know that no one tool is perfect and the most important
thing to know is what a tool can and cannot do so that you can plan
accordingly.  This article has hopefully shown when keyword searching
in the current version of Autopsy is appropriate and when it is
not.

References:
Autopsy
    http://www.sleuthkit.org/autopsy/

Computer Forensic Tool Testing List
    http://groups.yahoo.com/group/cftt/

Digital Forensics Tool Testing Images - FAT Keyword Search
    http://dftt.sourceforge.net/test2/

The Sleuth Kit
    http://www.sleuthkit.org/sleuthkit/


--------------------------------------------------------------------
Copyright (c) 2003 by Brian Carrier.  All Rights Reserved