To reduce the length that the barrier is held (and therefore reduce packet loss), VPP-967 changes vlib_worker_thread_node_runtime_update() so that it only runs one per barrier sync, just before the barrier is released.
This change extends that modification such that the rebuild of the per-thread data happens in parallel on the worker threads, rather than being done serially on the main thread. For systems with large number of worker threads, this substantially reduces the refork time.
For various reasons the patch for this issue includes the code which was originally submitted under VPP-967 .