Uploaded image for project: 'vpp'
  1. vpp
  2. VPP-1213

VPP crashes trying to access sm->worker which is NULL

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Highest Highest
    • 18.01, 18.04
    • None
    • S-NAT
    • None

      The following sequence of steps causes VPP to crash:

      1. Configure SNAT
      2. Ping from VM to outside so that there is dynamic NAT session
      3. Set static 1:1 mapping for the VM’s fixed IP addr
      4. Check for dynamic NAT sessions using nat44_user_dump followed by nat44_user_session_dump

       VPP crashes when nat44_user_session_dump is executed.

       

      Here is the backtrace:

       

      (gdb) bt

      #0  0x00007fffb286594e in snat_get_worker_in2out_cb (ip0=0x7fffb5c60d70, rx_fib_index0=1) at /home/heat-admin/vpp_stable_1801/build-data/../src/plugins/nat/nat.c:2758

      #1  0x00007fffb287a8e7 in vl_api_nat44_user_session_dump_t_handler (mp=0x300b7924) at /home/heat-admin/vpp_stable_1801/build-data/../src/plugins/nat/nat_api.c:1166

      #2  0x00007ffff7b9abd0 in vl_msg_api_handler_with_vm_node (am=0x7ffff7dda460 <api_main>, the_msg=0x300b7924, vm=0x7ffff7b8c480 <vlib_global_main>, node=0x7fffb5c58000)

          at /home/heat-admin/vpp_stable_1801/build-data/../src/vlibapi/api_shared.c:508

      #3  0x00007ffff7bb6d83 in memclnt_process (vm=0x7ffff7b8c480 <vlib_global_main>, node=0x7fffb5c58000, f=0x0)

          at /home/heat-admin/vpp_stable_1801/build-data/../src/vlibmemory/memory_vlib.c:970

      #4  0x00007ffff78e728e in vlib_process_bootstrap (_a=140736240516176) at /home/heat-admin/vpp_stable_1801/build-data/../src/vlib/main.c:1231

      #5  0x00007ffff5f506b0 in clib_calljmp () at /home/heat-admin/vpp_stable_1801/build-data/../src/vppinfra/longjmp.S:110

      #6  0x00007fffb59f7c20 in ?? ()

      #7  0x00007ffff78e73c3 in vlib_process_startup (vm=0x7fffb5c58000, p=0x1b, f=0x7ffff78e7455 <vlib_process_resume+111>)

          at /home/heat-admin/vpp_stable_1801/build-data/../src/vlib/main.c:1253

      Backtrace stopped: previous frame inner to this frame (corrupt stack?)

      (gdb)

       

      The crash happens because snat_get_worker_in2out_cb is trying to access the variable sm->worker which is NULL:

       

      (gdb) c

      Continuing.

       

      Program received signal SIGSEGV, Segmentation fault.

      0x00007fffb286594e in snat_get_worker_in2out_cb (ip0=0x7fffb5b01d70, rx_fib_index0=0) at /home/heat-admin/vpp_stable_1801/build-data/../src/plugins/nat/nat.c:2758

      2758      if (PREDICT_TRUE (is_pow2 (_vec_len (sm->workers))))

      (gdb)

       

      (gdb) p snat_main

      .

      .

      icmp_match_out2in_cb = 0x7fffb28a24ca <icmp_match_out2in_slow>, num_workers = 1, first_worker_index = 1, next_worker = 0, workers = 0x0,

       

      Here’s the culprit code in nat.c:

       

      2714   if (PREDICT_TRUE (is_pow2 (_vec_len (sm->workers))))

      2715     next_worker_index += sm->workers[hash & (_vec_len (sm->workers) - 1)];

      2716   else

      2717     next_worker_index += sm->workers[hash % _vec_len (sm->workers)];

       

      I set breakpoints and watched sm->worker from the time snat_init() initializes it to NULL and found that it stays NULL across the various function calls. Simply enclosing the above chunk of code within a NULL check prevents the crash:

       

      [root@overcloud-controller-0 bin]# git diff ../../../../src/plugins/nat/nat.c

      diff --git a/src/plugins/nat/nat.c b/src/plugins/nat/nat.c

      index acd1671..cbb261e 100644

      — a/src/plugins/nat/nat.c

      +++ b/src/plugins/nat/nat.c

      @@ -2741,10 +2741,13 @@ snat_get_worker_in2out_cb (ip4_header_t * ip0, u32 rx_fib_index0)

         hash = ip0->src_address.as_u32 + (ip0->src_address.as_u32 >> 8) +

                (ip0->src_address.as_u32 >> 16) + (ip0->src_address.as_u32 >>24);

       

        if (PREDICT_TRUE (is_pow2 (_vec_len (sm>workers))))

          next_worker_index += sm>workers[hash & (_vec_len (sm->workers) - 1)];

      -  else

          next_worker_index += sm>workers[hash % _vec_len (sm->workers)];

      +  if (sm->workers)

      { +    if (PREDICT_TRUE (is_pow2 (_vec_len (sm->workers)))) +      next_worker_index += sm->workers[hash & (_vec_len (sm->workers) - 1)]; +    else +      next_worker_index += sm->workers[hash % _vec_len (sm->workers)]; +  }

       

         return next_worker_index;

       }

       

      But I am not too sure if this is the only change required. Maybe the variable ought not to be NULL? Maybe something else is the matter.

            matfabia Matus Fabian
            ot Onong Tayeng
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: