-
Bug
-
Resolution: Unresolved
-
High
-
None
-
None
RFC 2544 trial description [0] lists several phases before the main traffic starts. They are there to ensure DUT has all the information needed for forwarding in the steady state (as opposed to learning the information from already started traffic).
Most CSIT test either pass some information via configuration (e.g. static ARP), or rely on various warmup-like measurements, which do not affect the main results (e.g. l2bdmaclearn is warmed up by "Clear and show runtime counters with running traffic" keyword in MRR tests).
But for some tests, these methods are not sufficient, and the main results are affected by DUT not having the complete information at the start of the main traffic. Hence this Bug.
A typical test that needs fixing is NAT. VPP NAT creates translations only when processing in2out packets, meanwhile dropping the corresponding out2in packets as unknown. For small scales, "Clear and show runtime counters with running traffic" can be enough in practice. While pushing in2out packets at line rate leads to some packet loss on VPP receive side, probability of losing all packets for a given translation is quite low. But as we increase scale, the probability becomes high enough (as there is less packets per translation) to affect the results considerably. For specific results, see [1], which attempts to reduce the impact by increasing trial duration and multiplicity.
Increasing trials will reduce the impact of missing translations on the overall average result, but stdev will remain quite high. A better solution would be to add an explicit warmup trial (of configurable length) even before "Clear and show runtime counters with running traffic". This will bring even large scale NAT tests to the level of small scale NAT test, where the results are good enough in practice, but still not ensuring all translations exist (in2out packets for some translation can still get all lost).
If we want to follow the spirit of RFC2544, we should introduce a specific procedure that makes sure each translation has been created, while taking as little time as needed. This could be tricky to create from scratch, but perhaps stateful TRex or some other utility can be used, as it boils down to a set of clients, each trying to ping its destination once.
[0] https://tools.ietf.org/html/rfc2544#section-23
[1] https://gerrit.fd.io/r/c/csit/+/27979