NUMA (Non-Uniform Memory Access) describes systems where memory latency/bandwidth depends on which CPU socket (node) a thread runs on. Keeping threads on CPUs near their memory (NUMA-local) reduces latency and cache/memory traffic.
This library provides helpers to discover a basic topology and construct NUMA-aware CPU affinity masks.
Concepts
- CPU: Logical processor index (0..N-1).
- NUMA node: Group of CPUs with local memory. Access to remote nodes has higher latency.
- Affinity: Bitmask of CPUs a thread is allowed to run on.
API
Header: include/threadschedule/topology.hpp
using namespace threadschedule;
(void)t.set_affinity(aff);
for (size_t i = 0; i < pool.size(); ++i) {
(void)pool.set_affinity(affs[i]);
}
Simple thread pool for general-purpose use.
Enhanced std::thread wrapper.
Snapshot of basic CPU/NUMA topology.
Modern C++23 Thread Scheduling Library.
auto affinity_for_node(int node_index, int thread_index, int threads_per_node=1) -> ThreadAffinity
Build a ThreadAffinity for the given NUMA node.
auto distribute_affinities_by_numa(size_t num_threads) -> std::vector< ThreadAffinity >
Distribute thread affinities across NUMA nodes in round-robin order.
auto read_topology() -> CpuTopology
Discover basic topology. Linux: reads /sys for NUMA nodes. Windows: single node, sequential CPU indic...
When to use NUMA-aware affinity
- Memory-bound workloads with per-thread state/buffers
- Databases, caches, audio/video pipelines, networking stacks
- Latency-sensitive services that benefit from cache locality
Notes
- Linux: Topology is read from /sys/devices/system/node/node*/cpulist.
- Windows: Fallback is a single-node view; group-affinity is handled inside thread setters.
- Combine with profiles for best results: e.g. low-latency + node-local affinity.