Uploaded image for project: 'csit'
  1. csit
  2. CSIT-1946

ipsec hwasync fails with large scale and multiple queues

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Medium Medium
    • None
    • None
    • None

      On 3nb-spr, 1c tests are passing also with 10k tunnels, but 2c and 4c almost always fail [0].
      In most cases it seems there is a crash [1] due to main thread giving up in vlib_worker_thread_barrier_sync_int (workers not syncing within 1 second) as a worker is seen doing something in /lib/x86_64-linux-gnu/libcrypto.so.3 (related to dpdk plugin calling qat_sym_session_configure for each new tunnel?) and as cryptodev_session_create uses a spinlock, it may explain why the failure is more frequent when more workers are used.

      So far I have seen one run which avoided the crash, but the total_received values in ndrpdr search [2] look very suspicious (including one trial with large negative loss). But I guess the pattern makes sense if we assume QAT takes a long time to create a new session (perhaps increasing with the number of sessions registered so far?), but if subsequent trial hits an already created session, the processing is fast.

      Maybe VPP can change some logic to make handling of slow session creation more gracefully, but the real bug is some lower level (dpdk or firmware or QAT itself) scaling badly with the number of sessions. (I do not believe the spinlock in cryptodev_session_create is able to slow things down this much by itself.)

      [0] https://csit.fd.io/trending/#eNrtmMFOhDAQQL8GL2YS2qWyFw-u_MemlHFpXGBsuyp-vYAby14kLgQT0wskzLQzmZeXTLCuMbi3eLyPxC5KdxFPddE9os3Dbfd6JYJNnYMlAyyOD8iJ4ZbF6gUKKp5BmZZcA0ywbQ5MAbpSU6LJourS49jVx_JN2rZW0H3PpUXQtQOJlou7g6qgMqavxh_7asXJXZT2ESpbH_mhIX9CGpT-yFefPurQjkr9tmt_z5ORFVr9gf6yYRY-Q3XzHQXVZQ-upVH0PIw0GzLWh0FFgPEdPQ9jERg8mDEFg69mBg9mzIOxpBlJMGMKRrKaGUkwYx6MZczQlX4Py9RwWT-Kv9ylrkbx_7S4HsWCVoRFagoFX8uKsEbNQ7GgFWGJmkKRrGVFWKHmofBWiOymbkw1_JcS2SdTsF-H
      [1] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/vpp-csit-verify-perf-master-ubuntu2204-x86_64-3n-snr/31/csit_parent/0/log.html.gz#s1-s1-s1-s1-s1-t3-k3-k4-k1
      [2] https://s3-logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-verify-master-3n-snr/48/log.html.gz#s1-s1-s1-s1-s1-t1-k2-k13-k14

            Unassigned Unassigned
            vrpolak Vratko Polak
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: