Overview

A pipeline in the TSK framework is simply a set of modules that are run in a specific order. Modules are covered in more detail in Developing Modules, but for now all we need to know is that a module does a specific type of analysis. For example, one module could calculate the MD5 hash of a file and another module could do a hash database lookup to see if it the file is a known file.

Pipelines are configured using an XML file, which is described later.

File Analysis vs. Post-Processing Pipelines

The framework currently supports two types of pipelines: file analysis pipelines and post-processing pipelines. Each type of pipeline is used in a different context.

A file analysis pipeline allows you to perform tasks on every file in an image. Each module in a file analysis pipeline is passed a reference to a file object that can be used to access both the metadata and content of the file. The module can also access the analysis results of previously run modules using the blackboard (see The Blackboard for details). Examples of file analysis modules include modules to do hash calculation, hash lookup, archive file extraction, and text extraction.

A post-processing pipeline allows you to perform tasks after all of the file analysis modules have been run and all the files discovered in an image have been analyzed individually. There are two main uses for this type of pipeline. First, post-processing modules can compile the results from the file analysis modules into a single summary analysis, perhaps writing a report. Second, a post-processing module is a more efficient mechanism for analyzing a small subset of files. For example, if you need a Windows registry analysis module, it would be better to develop it as a post-processing module that simply locates the handful of registry hive files in an image and analyzes them, rather than as a file analysis module that would run for every file in the image, but would ignore most of them.

Plug-In vs. Executable Modules

There are two types of modules that can exist in either type of pipeline. One is a dynamic library or plug-in module and the other is an executable program module.

Plug-in modules are programmed specifically for inclusion into the framework. These modules can access all of the framework resources. What's required to create one of these modules is described in Developing Modules.

Executable modules are simply command line tools that a pipeline runs - if a tool can be run from the command line, then it can be run from within a pipeline. The pipeline configuration file allows you to specify a string of command line options and arguments to pass through to the executable. The command line arguments can be made into variables resolved at runtime using configuration file macros. However, executable modules do not have access to the image database, the blackboard, and other services that the framework provides. This means that if you want the results from an executable module to be available to other modules, you'll still need to make a companion plug-in module to parse the results and add them to the blackboard. Alternatively, you could write a plug-in module that spawns another process to run a command line tool and waits for the child process to complete.

Pipeline Configuration

Pipeline Configuration Files

Both file analysis and post-processing pipelines are configured using an XML file. A single XML file can store the configuration of both a file analysis and a post-processing pipeline. Take a look at Sample Pipeline Config File to see an example of a pipeline configuration file.

Note that each module to be included in a pipeline is represented by a MODULE element in the configuration file. MODULE elements can have the following attributes:

Attribute	Description	Required?
order	The position of the module within the pipeline.	Yes
type	Either "executable" or "plugin".	Yes
location	The path of the program to be run by an executable module or the dynamic library to be loaded for a plug-in module. This can either be a fully qualified path or a relative path. If the path is relative, the framework will look for the file in the current working directory, TskSystemProperties::MODULE_DIR, and TskSystemProperties::PROG_DIR.	Yes
arguments	The arguments to pass to the module. See Configuration File Macros to learn how arguments can incorporate information not available until runtime.	No
output	The path to a file to contain anything the module writes to `stdout`. This attribute applies only to executable modules. See Configuration File Macros to learn how output file paths can incorporate information not available until runtime.	No

When configuring a pipeline module pay particular attention to the following details:

Module ordering does not need to be sequential (i.e., there can be gaps), but you cannot have two modules with the same order value.
Redirected output on executable modules will be appended to the specified output file. Attempting to write output to a shared file may result in file access errors when multiple pipelines (in a multithreaded or distributed environment) attempt to write data to the same file. You can avoid this by using the TskSystemProperties::UNIQUE_ID macro (if the property is set) to construct the output file name for an executable module (see Configuration File Macros for more on configuration file macros).
You must escape the following characters if you wish to include them in the command line:

Character	Escaped Character
&	&
"	"
>	>
<	<
'	'

Configuration File Macros

The arguments and output attributes of a MODULE element in a pipeline configuration file allow for the substitution of runtime values into the associated strings. This is possible because there is a set of configuration file macros that the framework expands when it reads in a pipeline configuration file.

There are a number of TskSystemProperties::PredefinedProperty macros. To substitute the value of a TskSystemProperties::PredefinedProperty into an arguments or output string, surround the system property name with '#' marks. For example, use #SYSTEM_OUT_DIR#/file1.txt to refer to a file named file1.txt in the system output directory.

File analysis modules can also use the TskModule::CURRENT_FILE_MACRO macro. This macro expands to the path of the file the module is currently analyzing. Note that although this macro is not strictly necessary for plug-in modules because they have access to file metadata through the TskFile objects passed to them by the pipeline, executable modules can only obtain the path to the current file using this macro.

Validating Pipeline Configuration Files

The tsk_validatepipeline tool, which comes with the framework, can be used to verify that a pipeline configuration file is well-formed and all of the modules specified in the file can be found.