
I have a multi-core CPU, but Ostinato is using only one core while the rest are idle
This is a question we get asked often.
This observation is not entirely true, though.
The Ostinato TX code is single-threaded and therefore only one core is used - for one port.
If you have multiple ports and you start transmit on them in parallel, then Ostinato will use one core per transmitting port.
Why single-core TX per port?
So why does a single port TX not use more cores?
To start with, it’s historical - Ostinato was first written in 2007 and at that time multi-core CPUs were not yet commonplace.
Subsequently though, we did revisit this design and did some research to see if we can improve the performance of a single port TX using multiple cores.
Here’s what we found.
As a cross-platform tool, Ostinato uses libpcap for packet transmission. libpcap by itself is not multi-threaded. Also, the underlying OS mechanisms (different for each OS) used by libpcap are not generally designed for multi-threading.
We could still open the same port multiple times (one per thread) and transmit on them in parallel. But this would lead to contention in the kernel’s network stack and likely not improve the performance much.
Instead, we took a different approach - rather than attempting a 2x or 3x performance improvement, we did a 10x (and more!) performance improvement by using packet acceleration technology.
Turbo Transmit - the multi-core solution
Ostinato Turbo Transmit uses the in-kernel packet acceleration technology - AF_XDP built into the Linux kernel.
You may be familiar with DPDK (Data Plane Development Kit) which is a kernel bypass library for accelerating packet processing. While we did a prototype using DPDK, we productized Turbo Transmit using AF_XDP instead.
Ostinato does not use DPDK for Turbo Transmit.
AF_XDP (and DPDK for that matter) does not have the contention issue because it targets NICs (typically 10G or higher) which have multiple hardware queues. By distributing the queues across multiple cores and using a Linux driver that supports AF_XDP, we can achieve higher throughput for both Tx and Rx - without any contention.
Launched for 10G ports in 2021, Ostinato now uses the same Turbo Transmit technology to scale and support line rate on 100G and 400G ports.

You need an add-on Turbo license to use this feature. Also, this is a Linux-only feature.
You may not need the Turbo license though - the base license, which uses max-one-core-per-port code for TX, may actually be sufficient for your use case.
How to increase TX rate with single-core TX
The trick is to increase the packet size.
Like most networking devices and software, the Ostinato TX processing is primarily pps (packets per second) bound and independent of the packet size.
The larger the packet size, the higher the TX bitrate.
The default packet size for Ostinato streams is 64 bytes (the smallest possible Ethernet frame size). If packet size does not matter to you, increase the packet size to a larger size - up to the maximum possible Ethernet frame size (1518 bytes) or jumbo frames (9000 bytes). This will likely give you the higher Tx throughput you are looking for.
See The cheat-code for high throughput for more details.
Operating System Note: Please use Ostinato on Linux or MacOS for higher performance. Ostinato on Windows is not as performant due to the underlying Windows platform limitations.
Conclusion
For higher throughput, use the largest packet size possible.
If you want higher throughput for smaller packets, use the Turbo Transmit feature.
You can also review other performance related guides.
