Uploaded image for project: 'vpp'
  1. vpp
  2. VPP-974

Restarting VPP with unbound DPDK interfaces crashes the machine

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Open
    • Icon: Medium Medium
    • 17.01
    • None
    • VPPInfra
    • VPP 17.01 on RHEL 7.3.

      If the DPDK interfaces are not shutdown, DPDK does not bind them as expected. I shut them down and unbound them using "/usr/local/bin/dpdk-devbind.py -u <PCI ID>". Then, restarting VPP crashes the machine unexpectedly. The machine must be rebooted to be recovered. This issue is easily reproducible every time with VPP 17.01 on RHEL 7.3.

      All the logs are attached in the attachment logs.tar.

      crash1.png and crash2.png are the screenshots of the console of the machine when it crashes when VPP is restarted.

      eth2 and eth3 below are the DPDK interfaces with PCI IDs 0000:00:14.0 and 0000:00:15.0 respectively. They are not shutdown below:

      # ip a
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
           link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
           inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
      2: eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP qlen 1000
           link/ether 0e:69:82:08:e1:6c brd ff:ff:ff:ff:ff:ff
      3: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP qlen 1000
           link/ether 0e:69:82:08:e1:6c brd ff:ff:ff:ff:ff:ff
      4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
           link/ether 6a:f0:f1:a9:b5:c9 brd ff:ff:ff:ff:ff:ff
      5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
           link/ether 8e:c5:53:e3:5c:28 brd ff:ff:ff:ff:ff:ff
      6: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
           link/ether 0e:69:82:08:e1:6c brd ff:ff:ff:ff:ff:ff
           inet 172.18.7.165/16 brd 172.18.255.255 scope global bond0
           valid_lft forever preferred_lft forever
      
      [root@mhv3 metacloud]# lspci | grep -i eth
      00:12.0 Ethernet controller: Red Hat, Inc Virtio network device
      00:13.0 Ethernet controller: Red Hat, Inc Virtio network device
      00:14.0 Ethernet controller: Red Hat, Inc Virtio network device
      00:15.0 Ethernet controller: Red Hat, Inc Virtio network device
      

      Status of DPDK devices:

      # /usr/local/bin/dpdk-devbind.py -s
      
      Network devices using DPDK-compatible driver
      ============================================
      <none>
      
      Network devices using kernel driver
      ===================================
      0000:00:12.0 'Virtio network device' if=eth0 drv=virtio-pci unused=virtio_pci,vfio-pci,uio_pci_generic 
      0000:00:13.0 'Virtio network device' if=eth1 drv=virtio-pci unused=virtio_pci,vfio-pci,uio_pci_generic 
      0000:00:14.0 'Virtio network device' if=eth2 drv=virtio-pci unused=virtio_pci,vfio-pci,uio_pci_generic 
      0000:00:15.0 'Virtio network device' if=eth3 drv=virtio-pci unused=virtio_pci,vfio-pci,uio_pci_generic 
      
      Other network devices
      =====================
      <none>
      
      Crypto devices using DPDK-compatible driver
      ===========================================
      <none>
      
      Crypto devices using kernel driver
      ==================================
      <none>
      
      Other crypto devices
      ====================
      <none>
      

       VPP configuration:

      # cat /etc/vpp/startup.conf 
      unix {
      nodaemon
      log /var/log/vpp.log
      cli-listen localhost:5002
      full-coredump
      }
      
      
      dpdk {
      uio-driver uio_pci_generic
      socket-mem 1024
      dev 0000:00:14.0
      dev 0000:00:15.0
      
      vdev eth_bond0,mode=1,slave=0000:00:14.0,slave=0000:00:15.0,primary=0000:00:14.0,xmit_policy=l34
      }
      
      api-trace {
      on
      }
      
      api-segment {
      gid vpp
      }
      
      vhost-user {
      coalesce-frames 32
      coalesce-time 2e-3
      dont-dump-memory
      }
      

      No VPP interfaces seen as DPDK does not bind eth2 (0000:00:14.0) and eth3 (0000:00:15.0) as they are not shutdown:

      # vppctl show interfaces
      Name Idx State Counter Count local0 0 down 
      

      Shutdown DPDK interfaces eth2 and eth3:

      #ifdown eth2
      #ifdown eth3

      Unbind eth2 (0000:00:14.0) and eth3 (0000:00:15.0):

      # /usr/local/bin/dpdk-devbind.py -u 0000:00:14.0
      # 
      
      # /usr/local/bin/dpdk-devbind.py -u 0000:00:15.0
      # 
      

      Check status of DPDK devices. eth2 (0000:00:14.0) and eth3 (0000:00:15.0) are not bound by DPDK below:

      # /usr/local/bin/dpdk-devbind.py -s
      
      Network devices using DPDK-compatible driver
      ============================================
      <none>
      
      Network devices using kernel driver
      ===================================
      0000:00:12.0 'Virtio network device' if=eth0 drv=virtio-pci unused=virtio_pci,vfio-pci,uio_pci_generic 
      0000:00:13.0 'Virtio network device' if=eth1 drv=virtio-pci unused=virtio_pci,vfio-pci,uio_pci_generic 
      
      Other network devices
      =====================
      0000:00:14.0 'Virtio network device' unused=virtio_pci,vfio-pci,uio_pci_generic
      0000:00:15.0 'Virtio network device' unused=virtio_pci,vfio-pci,uio_pci_generic
      
      Crypto devices using DPDK-compatible driver
      ===========================================
      <none>
      
      Crypto devices using kernel driver
      ==================================
      <none>
      
      Other crypto devices
      ====================
      <none>
      

      Make sure that eth2 and eth3 are not seen by the Operating System:

      # ip a
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
           link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
           inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
      2: eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP qlen 1000
           link/ether 0e:69:82:08:e1:6c brd ff:ff:ff:ff:ff:ff
      3: eth1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP qlen 1000
           link/ether 0e:69:82:08:e1:6c brd ff:ff:ff:ff:ff:ff
      6: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
           link/ether 0e:69:82:08:e1:6c brd ff:ff:ff:ff:ff:ff
           inet 172.18.7.165/16 brd 172.18.255.255 scope global bond0
           valid_lft forever preferred_lft forever
      
      # vppctl show interfaces
      Name Idx State Counter Count local0 0 down 
      

      At this point, eth2 (0000:00:14.0) and eth3 (0000:00:15.0) are both shutdown, unbound by DPDK, and not seen by the Operating System.

      Restart VPP:

      systemctl restart vpp.service

      At this point, the machine crashes unexpectedly. The machine must be rebooted to be recovered. This issue is easily reproducible every time with VPP 17.01 on RHEL 7.3.

      VPP's systemd service:

      # cat /usr/lib/systemd/system/vpp.service
      [Unit]
      Description=Vector Packet Processing Process
      After=syslog.target network.target auditd.service
      
      [Service]
      ExecStartPre=-/bin/rm -f /dev/shm/db /dev/shm/global_vm /dev/shm/vpe-api
      ExecStartPre=-/sbin/modprobe uio_pci_generic
      ExecStart=/usr/bin/vpp -c /etc/vpp/startup.conf
      Type=simple
      Restart=on-failure
      RestartSec=5s
      
      [Install]
      WantedBy=multi-user.target
      

      The following logs from /var/log are attached in attachment logs.tar.

      # tar -xvf logs.tar 
      x kern.log
      x vpp.log
      x messages
      x dmesg
      x dmesg.old
      x boot.log
      
      # ls -l
      total 4040
      -rw-r-----  1 vhosakot  staff        0 May 23 14:36 boot.log
      -rw-r--r--  1 vhosakot  staff    39167 Sep  6 11:57 dmesg
      -rw-r--r--  1 vhosakot  staff    38961 Sep  5 21:57 dmesg.old
      -rw-r-----  1 vhosakot  staff   171587 Sep  6 11:58 kern.log
      -rwxr-xr-x  1 vhosakot  staff  1034240 Sep  6 13:41 logs.tar
      -rw-r-----  1 vhosakot  staff   773215 Sep  6 12:57 messages
      -rw-r--r--  1 vhosakot  staff     2685 Sep  6 11:57 vpp.log
      

        1. crash1.png
          72 kB
        2. crash2.png
          73 kB
        3. logs.tar
          1010 kB

            Unassigned Unassigned
            vhosakot Vikram Hosakote
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: