Uploaded image for project: 'vpp'
  1. vpp
  2. VPP-1266

Incorrect source NAT in multi-node Contiv deployment

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Medium Medium
    • 18.07
    • None
    • S-NAT
    • None

      In contiv-VPP deployment, a client busybox pod running on the master node communicates with a deployment of 100 nginx pods on a worker node.   There is also a service assigned to the deployment (using a Cluster IP of 10.96.1.1).

      If the client communicates only directly with the nginx instances' pod IPs or via the cluster IP things seem to work ok.

      However when the client uses the cluster IP and then reverts to using the pod IPs we see wgets failing.    A VPP packet trace shows a packet from the nginx pod incorrectly matching a dynamic NAT rule and being NATted such that it appears to come from the cluster IP.  Since the source busybox pod issued a wget to the pod IP this will fail.

      Our working assumption here is that the test reused a source pod which had been previously used for a wget via the NAT (and for which session state had been retained).   We were doing 10K wgets (100 to each nginx pod in the non-NAT case and all 10K to the cluster IP in the NAT case - which should be load balanced as 100 wgets to each of 100 pods).

      in the snippet below the busybox IP is 10.1.1.3.  The Pod IP is 10.1.2.98.  The cluster IP is 10.96.1.1, and the overlay VXLAN tunnel runs between 192.168.16.1 (the master) and 192.168.16.2 (the worker).

      00:10:32:437562: dpdk-input
        GigabitEthernet1/0/1 rx queue 0
        buffer 0x24394b: current data 14, length 110, free-list 0, clone-count 0, totlen-nifb 0, trace 0x0
                         l4-cksum-computed l4-cksum-correct l2-hdr-offset 0 l3-hdr-offset 14 
        PKT MBUF: port 0, nb_segs 1, pkt_len 124
          buf_len 2176, data_len 124, ol_flags 0x180, data_off 128, phys_addr 0x67ee5340
          packet_type 0x211 l2_len 0 l3_len 0 outer_l2_len 0 outer_l3_len 0
          Packet Offload Flags
            PKT_RX_IP_CKSUM_GOOD (0x0080) IP cksum of RX pkt. is valid
            PKT_RX_L4_CKSUM_GOOD (0x0100) L4 cksum of RX pkt. is valid
          Packet Types
            RTE_PTYPE_L2_ETHER (0x0001) Ethernet packet
            RTE_PTYPE_L3_IPV4 (0x0010) IPv4 packet without extension headers
            RTE_PTYPE_L4_UDP (0x0200) UDP packet
        IP4: c0:8c:60:8b:95:dd -> c0:8c:60:8b:c3:55
        UDP: 192.168.16.2 -> 192.168.16.1
          tos 0x00, ttl 253, length 110, checksum 0x1c2b
          fragment id 0x0000
        UDP: 2447 -> 4789
          length 90, checksum 0x0000
      00:10:32:437581: ip4-input-no-checksum
        UDP: 192.168.16.2 -> 192.168.16.1
          tos 0x00, ttl 253, length 110, checksum 0x1c2b
          fragment id 0x0000
        UDP: 2447 -> 4789
          length 90, checksum 0x0000
      00:10:32:437588: nat44-out2in
        NAT44_OUT2IN: sw_if_index 1, next index 1, session index -1
      00:10:32:437589: ip4-lookup
        fib 0 dpo-idx 6 flow hash: 0x00000000
        UDP: 192.168.16.2 -> 192.168.16.1
          tos 0x00, ttl 253, length 110, checksum 0x1c2b
          fragment id 0x0000
        UDP: 2447 -> 4789
          length 90, checksum 0x0000
      00:10:32:437593: ip4-local
          UDP: 192.168.16.2 -> 192.168.16.1
            tos 0x00, ttl 253, length 110, checksum 0x1c2b
            fragment id 0x0000
          UDP: 2447 -> 4789
            length 90, checksum 0x0000
      00:10:32:437595: ip4-udp-lookup
        UDP: src-port 2447 dst-port 4789
      00:10:32:437596: vxlan4-input
        VXLAN decap from vxlan_tunnel0 vni 10 next 1 error 0
      00:10:32:437600: l2-input
        l2-input: sw_if_index 5 dst 1a:2b:3c:4d:5e:01 src 1a:2b:3c:4d:5e:02
      00:10:32:437607: l2-fwd
        l2-fwd:   sw_if_index 5 dst 1a:2b:3c:4d:5e:01 src 1a:2b:3c:4d:5e:02 bd_index 1
      00:10:32:437609: ip4-input
        TCP: 10.1.2.98 -> 10.1.1.3
          tos 0x00, ttl 63, length 60, checksum 0x2456
          fragment id 0x0000, flags DONT_FRAGMENT
        TCP: 80 -> 52958
          seq. 0x5eba446b ack 0xb683b13e
          flags 0x12 SYN ACK, tcp header: 40 bytes
          window 28960, checksum 0x49c9
      00:10:32:437611: nat44-in2out
        NAT44_IN2OUT_FAST_PATH: sw_if_index 3, next index 3, session -1
      00:10:32:437613: nat44-in2out-slowpath
        NAT44_IN2OUT_SLOW_PATH: sw_if_index 3, next index 0, session 7099
      00:10:32:437616: ip4-lookup
        fib 0 dpo-idx 7 flow hash: 0x00000000
        TCP: 10.96.1.1 -> 10.1.1.3
          tos 0x00, ttl 63, length 60, checksum 0x2558
          fragment id 0x0000, flags DONT_FRAGMENT
        TCP: 80 -> 52958
          seq. 0x5eba446b ack 0xb683b13e
          flags 0x12 SYN ACK, tcp header: 40 bytes
          window 28960, checksum 0x4acb
      00:10:32:437616: ip4-rewrite
        tx_sw_if_index 6 dpo-idx 7 : ipv4 via 10.1.1.3 tap2: 00000000000202fe163f5f0f0800 flow hash: 0x00000000
        00000000: 00000000000202fe163f5f0f08004500003c000040003e0626580a6001010a01
        00000020: 01030050cede5eba446bb683b13ea01271204acb0000020405b40402
      00:10:32:437617: tap2-output
        tap2
        IP4: 02:fe:16:3f:5f:0f -> 00:00:00:00:00:02
        TCP: 10.96.1.1 -> 10.1.1.3
          tos 0x00, ttl 62, length 60, checksum 0x2658
          fragment id 0x0000, flags DONT_FRAGMENT
        TCP: 80 -> 52958
          seq. 0x5eba446b ack 0xb683b13e
          flags 0x12 SYN ACK, tcp header: 40 bytes
          window 28960, checksum 0x4acb
      
      

        1. vpp-interface.txt
          3 kB
        2. vpp-interface-address.txt
          0.3 kB
        3. vpp-nat44-interfaces.txt
          0.2 kB
        4. vpp-nat44-sessions.txt
          2.24 MB
        5. vpp-nat44-static-mappings.txt
          4 kB

            matfabia Matus Fabian
            raszabo Rastislav Szabo
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: