-
Bug
-
Resolution: Done
-
Medium
-
None
-
None
In contiv-VPP deployment, a client busybox pod running on the master node communicates with a deployment of 100 nginx pods on a worker node. There is also a service assigned to the deployment (using a Cluster IP of 10.96.1.1).
If the client communicates only directly with the nginx instances' pod IPs or via the cluster IP things seem to work ok.
However when the client uses the cluster IP and then reverts to using the pod IPs we see wgets failing. A VPP packet trace shows a packet from the nginx pod incorrectly matching a dynamic NAT rule and being NATted such that it appears to come from the cluster IP. Since the source busybox pod issued a wget to the pod IP this will fail.
Our working assumption here is that the test reused a source pod which had been previously used for a wget via the NAT (and for which session state had been retained). We were doing 10K wgets (100 to each nginx pod in the non-NAT case and all 10K to the cluster IP in the NAT case - which should be load balanced as 100 wgets to each of 100 pods).
in the snippet below the busybox IP is 10.1.1.3. The Pod IP is 10.1.2.98. The cluster IP is 10.96.1.1, and the overlay VXLAN tunnel runs between 192.168.16.1 (the master) and 192.168.16.2 (the worker).
00:10:32:437562: dpdk-input GigabitEthernet1/0/1 rx queue 0 buffer 0x24394b: current data 14, length 110, free-list 0, clone-count 0, totlen-nifb 0, trace 0x0 l4-cksum-computed l4-cksum-correct l2-hdr-offset 0 l3-hdr-offset 14 PKT MBUF: port 0, nb_segs 1, pkt_len 124 buf_len 2176, data_len 124, ol_flags 0x180, data_off 128, phys_addr 0x67ee5340 packet_type 0x211 l2_len 0 l3_len 0 outer_l2_len 0 outer_l3_len 0 Packet Offload Flags PKT_RX_IP_CKSUM_GOOD (0x0080) IP cksum of RX pkt. is valid PKT_RX_L4_CKSUM_GOOD (0x0100) L4 cksum of RX pkt. is valid Packet Types RTE_PTYPE_L2_ETHER (0x0001) Ethernet packet RTE_PTYPE_L3_IPV4 (0x0010) IPv4 packet without extension headers RTE_PTYPE_L4_UDP (0x0200) UDP packet IP4: c0:8c:60:8b:95:dd -> c0:8c:60:8b:c3:55 UDP: 192.168.16.2 -> 192.168.16.1 tos 0x00, ttl 253, length 110, checksum 0x1c2b fragment id 0x0000 UDP: 2447 -> 4789 length 90, checksum 0x0000 00:10:32:437581: ip4-input-no-checksum UDP: 192.168.16.2 -> 192.168.16.1 tos 0x00, ttl 253, length 110, checksum 0x1c2b fragment id 0x0000 UDP: 2447 -> 4789 length 90, checksum 0x0000 00:10:32:437588: nat44-out2in NAT44_OUT2IN: sw_if_index 1, next index 1, session index -1 00:10:32:437589: ip4-lookup fib 0 dpo-idx 6 flow hash: 0x00000000 UDP: 192.168.16.2 -> 192.168.16.1 tos 0x00, ttl 253, length 110, checksum 0x1c2b fragment id 0x0000 UDP: 2447 -> 4789 length 90, checksum 0x0000 00:10:32:437593: ip4-local UDP: 192.168.16.2 -> 192.168.16.1 tos 0x00, ttl 253, length 110, checksum 0x1c2b fragment id 0x0000 UDP: 2447 -> 4789 length 90, checksum 0x0000 00:10:32:437595: ip4-udp-lookup UDP: src-port 2447 dst-port 4789 00:10:32:437596: vxlan4-input VXLAN decap from vxlan_tunnel0 vni 10 next 1 error 0 00:10:32:437600: l2-input l2-input: sw_if_index 5 dst 1a:2b:3c:4d:5e:01 src 1a:2b:3c:4d:5e:02 00:10:32:437607: l2-fwd l2-fwd: sw_if_index 5 dst 1a:2b:3c:4d:5e:01 src 1a:2b:3c:4d:5e:02 bd_index 1 00:10:32:437609: ip4-input TCP: 10.1.2.98 -> 10.1.1.3 tos 0x00, ttl 63, length 60, checksum 0x2456 fragment id 0x0000, flags DONT_FRAGMENT TCP: 80 -> 52958 seq. 0x5eba446b ack 0xb683b13e flags 0x12 SYN ACK, tcp header: 40 bytes window 28960, checksum 0x49c9 00:10:32:437611: nat44-in2out NAT44_IN2OUT_FAST_PATH: sw_if_index 3, next index 3, session -1 00:10:32:437613: nat44-in2out-slowpath NAT44_IN2OUT_SLOW_PATH: sw_if_index 3, next index 0, session 7099 00:10:32:437616: ip4-lookup fib 0 dpo-idx 7 flow hash: 0x00000000 TCP: 10.96.1.1 -> 10.1.1.3 tos 0x00, ttl 63, length 60, checksum 0x2558 fragment id 0x0000, flags DONT_FRAGMENT TCP: 80 -> 52958 seq. 0x5eba446b ack 0xb683b13e flags 0x12 SYN ACK, tcp header: 40 bytes window 28960, checksum 0x4acb 00:10:32:437616: ip4-rewrite tx_sw_if_index 6 dpo-idx 7 : ipv4 via 10.1.1.3 tap2: 00000000000202fe163f5f0f0800 flow hash: 0x00000000 00000000: 00000000000202fe163f5f0f08004500003c000040003e0626580a6001010a01 00000020: 01030050cede5eba446bb683b13ea01271204acb0000020405b40402 00:10:32:437617: tap2-output tap2 IP4: 02:fe:16:3f:5f:0f -> 00:00:00:00:00:02 TCP: 10.96.1.1 -> 10.1.1.3 tos 0x00, ttl 62, length 60, checksum 0x2658 fragment id 0x0000, flags DONT_FRAGMENT TCP: 80 -> 52958 seq. 0x5eba446b ack 0xb683b13e flags 0x12 SYN ACK, tcp header: 40 bytes window 28960, checksum 0x4acb