Overview

A module is always executed in the context of a pipeline. Please refer to File Analysis vs. Post-Processing Pipelines for details on the different types of pipelines and how to configure them.

A module should perform a single task so that it is usable in many differnt scenarios. For example, if you want to perform hash lookup in a file analysis pipeline then it would be best to have one module that calculates the hash value and another module to lookup the hash in a database. This separation of concerns allows the hash value to be used by other modules later in the pipeline and allows you to easily support interchangeable or multiple hash databases.

Note that pipelines can also include executable program modules. This section covers only how to develop a dynamic link library (DLL) module. Please refer to Plug-In vs. Executable Modules for details on executable modules.

Setup

It is assumed that you have compiled the framework or at least have access to the header files and library files from a compiled version. If not, refer to the BUILDING.txt file that comes with the source code.

Creating a Module the Easy Way

The framework source code includes several official modules that perform common tasks (such as hash calculation). One of these modules (the entropy module) is intended to be copied and renamed as the starting point for developing your own modules.

Start by copying the c_EntropyModule folder under TskModules to the location of your choice. Rename the folder. The framework will not allow multiple modules with the same name, so you may want to incorporate your project or company name into the module name to prevent name collisions.

You will find Microsoft Visual Studio 2010 solution and project files in the win32 folder within your new module folder. Delete everything in the win32 folder except for the .sln and .vcxproj files. Rename these files and open the solution in Visual Studio.

Visual Studio will report that it is unable to open the EntropyModule project. Remove this project from the solution and add the renamed project to the solution as an existing project. Rename EntropyModule.cpp in Visual Studio and open the renamed file. You are now ready to edit the code to create your own module.

Creating a Module from Scratch

If you want or need to create a DLL module from scratch instead of simply copying and renaming the entropy module, you'll need know how to configure your compiler and development environment to make a DLL. The remainder of this section assumes you know how to do so.

The framework will not allow multiple modules with the same name, so you may want to incorporate your project or company name into the module name to prevent name collisions.

It is highly recommended that you include the following framework header file in at least one of the source files for your DLL module:

#include "TskModuleDev.h"

This header file includes the header files for the framework services and utilities and adds "boiler plate" definitions of four version info functions (see /ref mod_setup_version)the framework requires your DLL to export. Your compiler should be able to find this header file if you ensure that the TSK_HOME environment variable is set to the root folder of TSK (in a Windows environment) and that $TSK_HOME and $TSK_HOME/framework are in the include search path.

Next, you need to define several functions that the framework will call when it uses the module. These functions are described in the following subsections.

Module Initialization Function

The initialize function is where you perform one-time module configuration tasks such as validating input or output folders or establishing a connection to an external system such as a database server. The module initialization function must be implemented and it must have the following signature:

TskModule::Status TSK_MODULE_EXPORT initialize(const char *args);

The initialize function takes a char* argument, which is not NULL if an arguments string for the module is specified in the pipeline configuration file. Please see Pipeline Configuration for details.

The initialize function returns a status which should be TskModule::OK or TskModule::FAIL. Returning TskModule::FAIL indicates a fatal problem and will disable the pipeline.

Note that if a module is designed to be included in both the file analysis and post-processing/reporting pipelines, it must support multiple calls to initialize.

Module Execution Function: File Analysis

If your module will be executing in a file analysis pipeline, then it must implement the run function. This function will be called for each file that is sent through the file analysis pipeline and should have the following signature:

TskModule::Status TSK_MODULE_EXPORT run(TskFile *pFile);

The run function is passed a pointer to a TskFile object that can be used to access both the metadata and the content associated with the file to analyze, as explained in Accessing File Content and Metadata. Note that the framework calls the open member function on the TskFile object for the module and the object addressed by the pointer is managed by the framework and therefore a module must not delete it.

The run function should return either TskModule::OK, TskModule::FAIL, or TskModule::STOP. TskModule::OK should be returned if the module successfully analyzed the file or if it decided that it did not need to analyze the file. TskModule::FAIL should be returned if the module tried to analyze the file, but was unable to do so due to an error condition. A TskModule::FAIL return value will be recorded, but the pipeline will still continue to run. TskModule::STOP should be returned if the module wants the pipeline to stop processing the file. This is useful if a module determines that further analysis of the file is not warranted (e.g., by identifying it as a known good file).

Module Execution Function: Post-processing/Reporting

If your module will be executing in a post-processing/reporting pipeline, then it must implement the report function. Unlike the module execution function for a file analysis pipeline, this function is not passed a pointer to a TsKFile object. The function signature is:

TskModule::Status TSK_MODULE_EXPORT report();

The report function does not have access to an individual file pointer as an argument, but may access files as described in the Accessing Other Files section below. Like the run function, the report() function can stop subsequent modules in the pipeline from executing by returning a TskModule::STOP status. This should not be done lightly. Returning TskModule::STOP from the run function terminates analysis of the current file, but returning TskModule::STOP from report terminates analysis of the current disk image.

Module Cleanup Function

When a pipeline is destroyed, the finalize function on each module in the pipeline will be called. The function must have the following signature:

TskModule::Status TSK_MODULE_EXPORT finalize();

The finalize function is where the module should free any resources that were allocated during module initialization or execution.

Note that if a module is designed to be included in both the file analysis and post-processing/reporting pipelines, it must support multiple calls to finalize.

Module Identification Functions

You should also implement functions to return the name, description, and description of the module with the following signatures:

    TSK_MODULE_EXPORT const char *name();
    TSK_MODULE_EXPORT const char *description();
    TSK_MODULE_EXPORT const char *version();

Module Version Info Functions

Including the TskModuleDev.h framework header file in at least one of the source files for your DLL module adds "boiler plate" definitions of these four version info functions the framework requires your DLL to export:

    TSK_MODULE_EXPORT TskVersionInfo::Compiler getCompiler();
    TSK_MODULE_EXPORT int getCompilerVersion();
    TSK_MODULE_EXPORT int getFrameWorkVersion();
    TSK_MODULE_EXPORT TskVersionInfo::BuildType getBuildType();

Other Functions

More complex modules will likely put functions and variables other than the module API functions in separate source files and/or may define various C++ classes to perform the work of the module. However, it is also possible to simply enclose such functions and variables in an anonymous namespace to give them file scope instead of global scope. This replaces the older practice of declaring file scope functions and variables using the "static" keyword. An anonymous namespace is a more flexible construct, since it is possible to define types within it. For example:

namespace
{
    struct Stats
    {
        long byteCounts[256];
        long totalBytes = 0;
    };

    const char *MODULE_NAME = "Entropy";
    const char *MODULE_DESCRIPTION = "Performs an entropy calculation for the contents of a given file";
    const char *MODULE_VERSION = "1.0.0";

    double calculateEntropy(TskFile *pFile)
    {
        // CODE HERE
    }
}

Linux/OS-X module developers should make sure module functions other than the module API functions are uniquely named or bound at module link time. Placing these functions in an anonymous namespace to give them static-linkage is one way to accomplish this.

CAVEAT: Static data can be incompatible with multithreading, since each thread will get its own copy of the data.

Linux/Os-X Modules and the Module API

The module identification, initialization, execution, clean up, and version info functions are intended to be called by TSK Framework only. Linux/OS-X modules must not call these functions within the module unless appropriate compiler/linker options are used to bind all library-internal symbols at link time.

Doing Stuff in a Module

Now that you know how to structure your module, we'll examine how your module can interact with the framework to do its job.

Framework Services

The framework provides a set of services to each module. These services allow the module to access common resources and information, and can be accessed through the singleton TskServices class. The following code snippet demonstrates how to use the TskServices class to get access to the Log service:

Log& tskLog = TskServices::Instance().getLog();

Other framework services can be accessed in a similar manner. Below is a list of the framework service classes and a brief description of each (please refer to the documentation of the service classes for more details). Many of these services return a pointer or reference to an interface and the implementation of the services is left up to the programs that integrate the framework. Because of this, some services may be unavailable in a given application:

TskSystemProperties provides an interface to system-wide configuration data such as data that could be read from a configuration file. System properties are stored as name/value pairs.
TskImgDB supplies an interface to the an image database. This interface can be used to run ad hoc queries against the database to identify subsets of files.
TskFileManager allows modules to save, copy, or delete file content.
Log lets modules write log messages to whatever logging mechanism the application using the framework has configured. The framework comes with a default logging infrastructure that logs messages to a single file. As an alternative to getting the Log service from TskServices and interacting with it, a module can use the LOGERROR(), LOGWARN(), and LOGINFO() macros to get the Log interface and log the message in a single statement.
TskBlackboard provides a mechanism for inter-module communication. Modules may post data to and retrieve data from the blackboard. Typically, the blackboard stores its data in the image database.
Scheduler provides a mechanism to schedule other types of analysis (especially useful for distributed implementations). For example, if a module opens up a ZIP file, it might schedule analysis of each of the extracted files.
ImageFile provides access to the disk image being analyzed. This allows a module to access a specific file or raw data directly from the image.

Accessing File Content and Metadata

When a module's run function is invoked, it will always be passed a pointer to a TskFile object for the file currently being analyzed. TskFile objects can represent all types of files, including allocated and deleted files, carved files, and derived files (e.g., a file extracted from a ZIP file). Your module can read file content using the TskFile::read() member function and can access file metadata using various other TskFile methods, e.g., TskFile::getSize(), TskFile::getCrtime(), and TskFile.getPath(). Please refer to the TskFile documentation for more comprehensive coverage of its member functions.

Accessing Other Files

Post-processing/reporting modules may want to access specific subsets of files. To gain access to these files, the module will need the unique ids assigned to the files in the image database. The framework supports ad hoc querying of the image database through the TskImgDB service. In particular, the TskImgDB::getFileIds() and TskImgDB::getFileCount() methods allow you to define a condition and get either the file ids of any files that satisfy the condition or the number of files that satisfy the condition.

The following code snippet demonstrates the use of the TskImgDB::getFileIds() method to retrieve identifiers for Windows "NTUSER.DAT" registry files:

    std::string condition("WHERE files.dir_type = TSK_FS_NAME_TYPE_REG AND UPPER(files.name) = 'NTUSER.DAT");
    TskImgDB &imgDB = TskServices::Instance().getImgDB();
    std::vector<uint64_t> fileIds = imgDB.getFileIds(condition);

Notice that the condition is simply a SQL "WHERE" clause. The "dir_type = TSK_FS_NAME_TYPE_REG" part of the condition limits the query to files rather than directories.

The condition can be any SQL clause that can be appended to a SQL SELECT statement. The next code snippet specifies a condition that includes a JOIN and an ORDER BY clause:

    std::ostringstream condition;
    condition << " JOIN file_hashes ON files.file_id = file_hashes.file_id WHERE file_hashes.known = " << TskImgDB::IMGDB_FILES_KNOWN_BAD << " ORDER BY files.file_id";
    std::vector<uint64_t> fileIds = TskServices::Instance().getImgDB().getFileIds(condition.str());

Please refer to SQLite Image Database Schema v1.5 for details on the tables and columns that can be queried.

Once you have a file id, you can get the corresponding TskFile object from the TskFileManager service using the TskFileManager::getFile() method.

Utilities

If you find yourself needing to do some routine data conversions, then check out the TskUtilities class. It may have methods that are relevant. If it doesn't, then you may also want to look at what the Poco library that the framework depends on provides. Poco may save you much effort writing commonly needed code from scratch.