Uploaded image for project: 'vpp'
  1. vpp
  2. VPP-1697

Endpoint dependent NAT does not work in multi-thread mode

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Medium Medium
    • None
    • None
    • nat
    • None

      In Contiv-VPP (VPP 19.04.1, but the same issue is present in 19.01), we use endpoint dependent NAT, as described below. It works fine in single-thread mode, but does not work in multi-thread mode.

      The issue can be easily reproduced by enabling multi-threading in Contiv-VPP, e.g. by adding this into the VPP startup config:

      cpu {
       main-core 0
       corelist-workers 1-3
      }

      With this config, CoreDNS pods (deployed automatically, part of the k8s control plane) cannot communicate with k8s API (virtual IP 10.96.0.1 NATed on VPP). Even more, the vswitch VPP crashes after enabaling packet trace:

      $ kubectl get pods --all-namespaces
      NAMESPACE     NAME                              READY   STATUS             RESTARTS   AGE     IP          NODE      NOMINATED NODE   READINESS GATES
      kube-system   contiv-vswitch-n7b67              1/1     Running            2          4m1s    10.0.2.15   lubuntu   <none>           <none>
      kube-system   coredns-fb8b8dccf-pppkm           0/1     CrashLoopBackOff   4          4m1s    10.1.1.2    lubuntu   <none>           <none>
      kube-system   coredns-fb8b8dccf-wfjt6           0/1     CrashLoopBackOff   3          4m1s    10.1.1.3    lubuntu   <none>           <none>

      Error log from coreDNS:

      E0605 08:46:10.086500       1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:322: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout

      The same works fine in single-thread mode.

      NAT config on VPP:

      vpp# sh nat44 interfaces 
      NAT44 interfaces:
       tap0 in out
       loop2 in out
       tap1 in out
       tap2 in out
      
      vpp# sh nat44 static mappings 
      NAT44 static mappings:
       tcp local 10.0.2.15:6443 external 10.96.0.1:443 vrf 0 self-twice-nat out2in-only
       tcp local 10.0.2.15:12379 external 192.168.16.1:32379 vrf 0 self-twice-nat out2in-only
       tcp local 10.0.2.15:12379 external 10.0.2.15:32379 vrf 0 self-twice-nat out2in-only
       tcp local 10.0.2.15:12379 external 10.111.130.0:12379 vrf 0 self-twice-nat out2in-only
       tcp external 10.96.0.10:9153 self-twice-nat out2in-only
       local 10.1.1.2:9153 vrf 1 probability 1
       local 10.1.1.3:9153 vrf 1 probability 1
       tcp external 10.96.0.10:53 self-twice-nat out2in-only
       local 10.1.1.2:53 vrf 1 probability 1
       local 10.1.1.3:53 vrf 1 probability 1
       udp external 10.96.0.10:53 self-twice-nat out2in-only
       local 10.1.1.2:53 vrf 1 probability 1
       local 10.1.1.3:53 vrf 1 probability 1
      
      vpp# sh nat44 addresses 
      NAT44 pool addresses:
      NAT44 twice-nat pool addresses:
      10.1.1.254
       tenant VRF independent
       0 busy udp ports
       0 busy tcp ports
       0 busy icmp ports
      
      vpp# sh inter addr
      local0 (dn):
      loop0 (up):
        L3 192.168.16.1/24
      loop1 (up):
        L3 10.1.1.1/24 ip4 table-id 1 fib-idx 1
      loop2 (up):
        L2 bridge bd-id 1 idx 1 shg 1 bvi
        L3 192.168.30.1/24 ip4 table-id 1 fib-idx 1
      tap0 (up):
        L3 172.30.1.1/24
      tap1 (up): 
        unnumbered, use loop1
        L3 10.1.1.1/24 ip4 table-id 1 fib-idx 1
      tap2 (up): 
        unnumbered, use loop1
        L3 10.1.1.1/24 ip4 table-id 1 fib-idx 1
      

      Startup config:

      nat {
       endpoint-dependent
       translation hash buckets 1048576
       translation hash memory 268435456
       user hash buckets 1024
       max translations per user 10000
      }

       

            fivarga89 Filip Varga
            raszabo Rastislav Szabo
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: