Uploaded image for project: 'csit'
  1. csit
  2. CSIT-1776

Mellanox NIC can enter a bad state

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: High High
    • None
    • None
    • None

      As far as we know, the bad state goes away only when we reboot the DUT machine.

      The symptom [0] when there is a bad state:

      exec_command on ('10.32.8.18', 22) with timeout 600: sudo -E -S ethtool -A enp94s0f0 rx off tx off
      exec_command on ('10.32.8.18', 22) took 0.10094594955444336 seconds
      return RC 77
      return STDOUT
      return STDERR Cannot get device pause settings: No such device

      There may be multiple ways to introduce the bad state, I found it is sufficient to just run a RDMA suite with multiple frame sizes [1]. There, 64B case works, 1518B case passes the initialization but reports 0 packets forwarded, and 9000B case already shows the bad state, so I guess it is the 1518B case that breaks it.

      [0] https://logs.fd.io/production/vex-yul-rot-jenkins-1/csit-vpp-perf-verify-master-2n-clx/575/archives/log.html.gz#s1-s1-s1-s1-s1-t1-k2-k4-k1-k1-k1-k1-k1
      [1] https://logs.fd.io/production/vex-yul-rot-jenkins-1/csit-vpp-perf-verify-master-2n-clx/574/console.log.gz

            Unassigned Unassigned
            vrpolak Vratko Polak
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: