-
Bug
-
Resolution: Done
-
Medium
-
None
-
None
-
An AWS EC2 instance:
- Type: c5.2xlarge
- OS: Ubuntu 18.04, Kernel Version: 4.15.0-1021-aws
- Disk Size: 32GB
- Two network adapters (one for DPDK and one for SSH-ing to the machine)
VPP:
- Compiled VPP master, commit: 0f09a47f10e29cabeb98f571e5c4f0c18a54564d
- Bundled with DPDK: dpdk-19.05
- startup.conf attached below
- Interactive Commands:
- set int state VirtualFunctionEthernet0/5/0 up
- Where ens5 was bound to igb_uio
- set int ip address VirtualFunctionEthernet0/5/0 192.168.1.101/24
- Where 192.168.1.101 is the original local IP on the bounded interface
- ip route add 192.168.1.0/24 via 192.168.1.1 VirtualFunctionEthernet0/5/0
- ip route add 0.0.0.0/0 via 192.168.1.1
- ping 1.1.1.1 source VirtualFunctionEthernet0/5/0 repeat 2
- Sanity check
- Sanity check
- set int state VirtualFunctionEthernet0/5/0 up
Linux:
- open file descriptor limit: 500000
- DPDK loaded with igb_uio.ko and 2048 * 2KB hugepage mappings for NUMA systems.
An AWS EC2 instance: Type: c5.2xlarge OS: Ubuntu 18.04, Kernel Version: 4.15.0-1021-aws Disk Size: 32GB Two network adapters (one for DPDK and one for SSH-ing to the machine) VPP: Compiled VPP master, commit: 0f09a47f10e29cabeb98f571e5c4f0c18a54564d Bundled with DPDK: dpdk-19.05 startup.conf attached below Interactive Commands: set int state VirtualFunctionEthernet0/5/0 up Where ens5 was bound to igb_uio set int ip address VirtualFunctionEthernet0/5/0 192.168.1.101/24 Where 192.168.1.101 is the original local IP on the bounded interface ip route add 192.168.1.0/24 via 192.168.1.1 VirtualFunctionEthernet0/5/0 ip route add 0.0.0.0/0 via 192.168.1.1 ping 1.1.1.1 source VirtualFunctionEthernet0/5/0 repeat 2 Sanity check Linux: open file descriptor limit: 500000 DPDK loaded with igb_uio.ko and 2048 * 2KB hugepage mappings for NUMA systems.
VPP master, commit: 0f09a47f10e29cabeb98f571e5c4f0c18a54564d
[Reproducible] Running VPP in the background and using VCL to wrap a sample application throws many different errors (many times, all of the errors below).
Sample program: using epoll to listen on a single socket, accepting all connections (and adding them to epoll). Acts as a an ECHO server (messages of constant size of 1024 bytes) with reused connections.
Running with a small load (around 100 connections) works well.
Running with around 1000+ connections shows some of the errors, only a couple of times.
Running with 35K connections throws the following errors:
In VPP:
- app_send_io_evt_rx:587: evt q rings full
- mq_try_lock_and_alloc_msg:167: failed to alloc msg
- app_enqueue_evt:542: evt q full
- svm_msg_q_free_msg:184: message out of order
- only once at end, sometimes:
- Aborted
- and the VPP crashes...
- sometimes, after a failed run:
- Segmentation fault
- and the only way to start the application (that I found is using a reboot)
In the wrapped application:
- svm_msg_q_free_msg:184: message out of order
- ssvm_delete_shm:211: unlink segment '2562-55': No such file or directory (errno 2)
- vcl_session_accepted:552: vcl<2513:0>: session overlap handle 4294978441 state 4!
- only once at end, sometimes:
- vl_client_disconnect:331: queue drain: 585
Notes:
- The sample application is run with a single thread (of a single process).
- All connections are made at the beginning (non-blocking, one a time though) and are used throughout the test.
- Trying to run the program multi-process (bind->fork and also fork->bind) or multi-threaded (pthread_create) fails (where obviously running it with POSIX API works great with very high loads).