-
Bug
-
Resolution: Unresolved
-
Medium
-
None
-
HOST:
HW: hierofalcon2, ARM64
OS: Fedora 27
QEMU: 2.10.1-3.fc27
LIBVIRT: 3.7.0-4.fc27
Linux kernel: 4.14.14-300.fc27.aarch64 #1 SMP Fri Jan 19 12:52:12 UTC 2018 aarch64 aarch64 aarch64 GNU/LinuxVM:
OS host: Fedora 27
Linux kernel: 4.15.15-300.fc27.aarch64 #1 SMP Mon Apr 2 23:00:39 UTC 2018 aarch64 aarch64 aarch64 GNU/LinuxHOST: HW: hierofalcon2, ARM64 OS: Fedora 27 QEMU: 2.10.1-3.fc27 LIBVIRT: 3.7.0-4.fc27 Linux kernel: 4.14.14-300.fc27.aarch64 #1 SMP Fri Jan 19 12:52:12 UTC 2018 aarch64 aarch64 aarch64 GNU/Linux VM: OS host: Fedora 27 Linux kernel: 4.15.15-300.fc27.aarch64 #1 SMP Mon Apr 2 23:00:39 UTC 2018 aarch64 aarch64 aarch64 GNU/Linux
The guest OS (Fedora, Ubuntu) becomes unresponsive during CSIT crypto suite execution.
When the issue occurs the next CSIT tests fails due to SSH timeout and the target is not responsive by ssh/virtual console anymore.
The VM can be recovered in two modes:
- reboot the VM (virsh destroy & start);
- connect by gdb to QEMU builtin gdbserver, stop and resume from execution the guest;
host# gdb
(gdb) target remote :1235
(gdb) CTRL+C
(gdb) info threads
Id Target Id Frame
- 1 Thread 1 (CPU#0 [running]) 0xffff285d47283cec in ?? ()
2 Thread 2 (CPU#1 [halted ]) 0xffff285d47283cec in ?? ()
(gdb) continue
There is a simplified test scenario based on CSIT which can be used to reproduce a issue with a similar behavior.
These steps are executed which shall be executed in parallel on host:
- Execute PCI rescan in loop on VM using the ssh connection (see the script attached - ./qemu_kvm_guest_hang.sh 192.168.122.18)
- Flood one VM network interface with ICMP packets (e.g. ping -s 8 -i 0.02 192.168.121.22)
What to observe when the issue appears:
- No ping reply and the SSH on that IP address in not working anymore;
- The other interfaces are accessible, but these can become unusable also if the traffic is generated on them during the rescan;
- The non-responsive interface becomes usable if the "QEMU" recover method described above is used;
The IP addresses mentioned above are for 2 different VM virtual network interfaces
Notes:
- The PCI device intensively used under PCI rescan is affected;
- The recover is not working by PCI device remove & rescan;
- The KVM trace show that the interface IRQ is pending:
kworker/5:1-7655 [005] .... 6224674.474316: vgic_update_irq_pending: VCPU: 0, IRQ 101, level: 0
kworker/4:0-31773 [004] .... 6224674.504352: vgic_update_irq_pending: VCPU: 0, IRQ 101, level: 1
kworker/4:0-31773 [004] .... 6224674.504353: vgic_update_irq_pending: VCPU: 0, IRQ 101, level: 0
kworker/5:1-7655 [005] .... 6224674.534326: vgic_update_irq_pending: VCPU: 0, IRQ 101, level: 1