Uploaded image for project: 'csit'
  1. csit
  2. CSIT-1043

Guest OS becomes unresponsive during CSIT crypto suite execution

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Medium Medium
    • 1807
    • None

      The guest OS (Fedora, Ubuntu) becomes unresponsive during CSIT crypto suite execution.
      When the issue occurs the next CSIT tests fails due to SSH timeout and the target is not responsive by ssh/virtual console anymore.

      The VM can be recovered in two modes:

      • reboot the VM (virsh destroy & start);
      • connect by gdb to QEMU builtin gdbserver, stop and resume from execution the guest;
        host# gdb
        (gdb) target remote :1235
        (gdb) CTRL+C
        (gdb) info threads
        Id Target Id Frame
      • 1 Thread 1 (CPU#0 [running]) 0xffff285d47283cec in ?? ()
        2 Thread 2 (CPU#1 [halted ]) 0xffff285d47283cec in ?? ()
        (gdb) continue

      There is a simplified test scenario based on CSIT which can be used to reproduce a issue with a similar behavior.

      These steps are executed which shall be executed in parallel on host:

      • Execute PCI rescan in loop on VM using the ssh connection (see the script attached - ./qemu_kvm_guest_hang.sh 192.168.122.18)
      • Flood one VM network interface with ICMP packets (e.g. ping -s 8 -i 0.02 192.168.121.22)

      What to observe when the issue appears:

      • No ping reply and the SSH on that IP address in not working anymore;
      • The other interfaces are accessible, but these can become unusable also if the traffic is generated on them during the rescan;
      • The non-responsive interface becomes usable if the "QEMU" recover method described above is used;

      The IP addresses mentioned above are for 2 different VM virtual network interfaces

      Notes:

      • The PCI device intensively used under PCI rescan is affected;
      • The recover is not working by PCI device remove & rescan;
      • The KVM trace show that the interface IRQ is pending:
        kworker/5:1-7655 [005] .... 6224674.474316: vgic_update_irq_pending: VCPU: 0, IRQ 101, level: 0
        kworker/4:0-31773 [004] .... 6224674.504352: vgic_update_irq_pending: VCPU: 0, IRQ 101, level: 1
        kworker/4:0-31773 [004] .... 6224674.504353: vgic_update_irq_pending: VCPU: 0, IRQ 101, level: 0
        kworker/5:1-7655 [005] .... 6224674.534326: vgic_update_irq_pending: VCPU: 0, IRQ 101, level: 1

            juraj.linkes Juraj Linkeš
            banulucian Lucian Banu
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: