Chapter 1. Boost.Pipeline 1

Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

Introduction

The goal of this library is to allow parallel execution of operations on batch data. The design is based on the N3534 paper. This work is founded by Google through the Google Summer of Code 2014 program.

Motivation

The UNIX pipeline allows programs to build a multi threaded chain of transformations in an easy a reliable way. The pipeline described by the following snippet is distributable across processors, free of deadlocks, data races and undefined behavior:

$ grep 'Error:' error.log | grep -v 'test-user@example.com' | sed 's/^User:.*Error: //' > output.txt

This reads a logfile, selects errors and filters events generated by test users then feeds the message to the output file. Creating such a pipeline in C++ using synchronization primitives, queues and threads is possible but not as intuitive as this. This library intends to make describing such chains easy.

Aside simple pipelines, it's a common application design to separate input processing into different stand-alone modules which communicate through message passing and executed by a thread pool. For example, an HTTP server might act this way: There are threads reading the requests from sockets, others process them, again others send them.

The current scheduling schemes used in the library do not support low latency applications very well but it's planned to be changed to prevail on this area as well.

	Important
	This is not an official Boost library and is under development. The interface is subject of change and currently it's not recommended to build production applications top of this library.