Intel Threading Building Blocks (Intel TBB)

tbb-large-product-plainWidely used C++ template library for task parallelism

  • Rich set of components to efficiently implement higher-level, task-based parallelism
  • Future-proof applications to tap multicore and many-core power
  • Compatible with multiple compilers and portable to various operating systems

Demo as part of Intel Parallel Studio XE


Simplify Parallelism with a Scalable Parallel Model

Intel Threading Building Blocks (Intel® TBB) 4.2 is a widely used, award-winning C and C++ library for creating high performance, scalable parallel applications.

  • Enhance Productivity and Reliability – Rich set of components to efficiently implement higher-level, task-based parallelism
  • Gain Performance Advantage Today and Tomorrow – Future-proof applications to tap multicore and many-core power
  • Fits Within Your Environment – Advanced threading library, compatible with multiple compilers and portable to various operating systems

“Intel® TBB provided us with optimized code that we did not have to develop or maintain for critical system services. I could assign my developers to code what we bring to the software table—crowd simulation software.”
Michaël Rouillé, CTO, Golaem





Flow Graph

The flow graph feature provides a flexible and convenient API for expressing static and dynamic dependencies between computations. It is customizable for a wide variety of problems. It also extends the applicability of Intel® TBB to event-driven/reactive programming models.

Intel® TBB delivers high performing and reliable code with less effort than hand-made threading. Pre-tested algorithms, concurrent containers, synchronization primitives, and a scalable memory allocator simplify parallel application development.


Dynamic Task Scheduler

Application performance can automatically improve as processor core count increases by using abstract tasks. The sophisticated Intel® TBB task scheduler dynamically maps tasks to threads to balance the load among available cores, preserve cache locality, and maximize parallel performance. The implementation supports C++ exceptions, task/task group priorities, and cancellation which are essential for large and interactive parallel C++ applications.

Dynamic task scheduler and parallel algorithms support nested and recursive parallelism as well as running parallel constructs side-by-side. This is useful for introducing parallelism gradually and helps independent implementation of parallelism in different components of an application.


Cross Platform Support and Composability

Organizations that require cross platform support today or anticipate needing it in the future should consider Intel® TBB. It is validated and commercially supported on Windows*, Linux*, and OS X* platforms, using multiple compilers. It is also available on FreeBSD*, IA-based Solaris*, and PowerPC*-based systems via the open source community. Intel® TBB is optimized for multicore architectures and Intel® Xeon Phi™ coprocessor.

Intel® TBB is designed to co-exist with other threading packages and technologies. Different components of Intel® TBB can be used independently and mixed with other threading technologies.



Organizations can expand their customer base by using a production-ready, open solution for parallelism that is available on a broad range of platforms. Intel® TBB is validated and commercially supported on Windows*, Linux*, and OS X* platforms, using multiple compilers. It is also available on FreeBSD*, IA-based Solaris*, and PowerPC*-based systems via the open source community.

Top Community Support

The broad support from an involved community provides developers access to additional platforms and OS’s. Intel® Premier Support services and Intel® Support Forums provide confidential support, technical notes, application notes, and the latest documentation.

A complete documentation package and code samples are readily available both as a part of Intel® TBB installation and online at The User Guide provides an introduction into Intel® TBB. The Design Patterns chapter in the User Guide covers common parallel programming patterns and how to implement them using Intel® TBB. The Reference Manual contains formal descriptions of all classes and functions implemented in Intel® TBB.


Order the Intel® Threading Building Blocks book online at

What’s New

Feature Benefit
Support for Latest Intel Architectures Take advantage of the newest features in Intel’s latest processors including Transactional Synchronization Extensions (TSX). Adds support for Intel® Xeon Phi™ coprocessor for Windows and Intel® Xeon™ Processor (Ivy Bridge-EP).Selecting the best models for your application today will set a path for you to take full advantage of multicore and many-core performance without re-writing your code. Start today by implementing parallelism for today’s architecture and be ready for future architectures.
Lower memory overhead Improved heuristics in the memory allocator reduce memory overhead by intelligently releasing unused or stale memory.
Improved handling of large memory requests Improved handling of large (>8K-128MB) memory requests results in better performance when using frequent large memory allocations. Use of big memory pages can now be explicitly enabled via a function call or environment variable.
Better Fork Support Fork safety through a user enabled API that ensures Intel® TBB worker threads are completed before executing a fork.
PPL* Compatibility Improved compatibility with Parallel Patterns Library (PPL) by adding concurrent_unordered_multimap and concurrent_unordered_multiset API’s.
Windows* Store Customers that use Intel® TBB in their applications can now submit and sell their app through the Windows Store.
Android* OS support The Android OS is now supported as a target operating system for improved application performance and power efficiency. See Beacon Mountain for more Android developer tool details.




Intel® TBB 4.2 Pre-Tested Capabilities

Parallel Algorithms
Generic implementation of common parallel performance patterns
Generic implementations of parallel patterns such as parallel loops, flow graphs, and pipelines can be an easy way to achieve a scalable parallel implementation without developing a custom solution from scratch.
Concurrent Containers
Generic implementation of common idioms for concurrent access
Intel® TBB 4.2 concurrent containers are a concurrency-friendly alternative to serial data containers. Serial data structures (such as C++ STL containers) often require a global lock to protect them from concurrent access and modification; Intel® TBB concurrent containers allow multiple threads to concurrently access and update items in the container increasing allowed concurrency and improving an application’s scalability.
Synchronization Primitives
Exception-safe locks, condition variables, and atomic operations
Intel® TBB 4.2 provides a comprehensive set of synchronization primitives with different qualities that are applicable to common synchronization strategies. Exception-safe implementation of locks helps to avoid a dead-lock in programs which use C++ exceptions. Usage of Intel® TBB atomic variables instead of the C-style atomic API minimizes potential data races.
Scalable Memory Allocators
Scalable memory manager and false-sharing free memory allocator
The scalable memory allocator avoids scalability bottlenecks by minimizing access to a shared memory heap via per-thread memory pool management. Special management of large (>=8KB) blocks allows more efficient resource usage, while still offering scalability and competitive performance. The cache-aligned memory allocator avoids false-sharing by not allowing allocated memory blocks to split a cache line.
Create arbitrary task trees When an algorithm cannot be expressed with high-level Intel® TBB 4.2 constructs, the user can choose to create arbitrary task trees. Tasks can be spawned for better locality and performance or en-queued to maintain FIFO-like order and ensure starvation-resistant execution.
Conditional Numerical Reproducibility Ensure deterministic associativity for floating-point arithmetic results with the new Intel® TBB template function ‘parallel_deterministic_reduce’.
C++11 Support Intel® TBB can be used with C++11 compilers and supports lambda expressions. For developers using parallel algorithms, lambda expressions reduce the time and code needed by removing the requirement for separate objects or classes


Scalability with Future-proofing

  • Intel® TBB provides a simple and rapid way of developing robust parallel applications that abstracts platform details and threading mechanisms for performance that scales with increasing core counts
  • Intel® Threading Building Blocks yields linear scaling in these example applications

Select the right Intel® TBB license

  • Commercial Binary Distribution for customers who may require commercial support services. Attractive pricing available for academic, student and classroom usage.
  • Open Source Distribution can be used under GPLv2 with the runtime exception allowing usage in proprietary applications. Allows support for additional OSs and hardware platforms. Both source and binary forms are available for download from
  • Custom license available if you require the ability to modify or distribute the commercial source code of Intel® TBB. Contact your Intel representative for more information.


Available Commercially and as open source

Videos to get you started


The Next Steps

Was sagen unsere Kunden über uns?

You have set a benchmark standard that many other companies should aspire to.

JS, Chippenham, UK

I have greatly appreciated your help, your restraint in response to some daft questions and your sheer professionalism.

PL, Tunbridge Wells, UK

Huge amount learnt, much more than I expected. Appreciate the ability of the programme much more than I did before.

PA, Liverpool, UK

Adept are continuing to be the most reliable and effective of all the technical helpline staff that I encounter.

MD, Worcs, UK