Uploaded image for project: 'vpp'
  1. vpp
  2. VPP-1923

TCP timer race

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Medium Medium
    • None
    • None
    • TCP
    • None

      The issue was reproduced using a test for Travelping UPF VPP plugin https://github.com/travelping/vpp

      The test creates 40K TCP connections using wrk tool and also uses netem to simulate suboptimal network conditions.

      The work is presently based on commit afc233aa93c3f23b30b756cb4ae2967f968bbbb1 and from May 21 2020 and wan't rebased for a while, but as far as I can see there were no changes in the workings of TCP timers in master since that time.

      The stack trace looks like this:

      /src/vpp/src/vnet/tcp/tcp_output.c:1268 (tcp_prepare_retransmit_segment) assertion `max_deq_bytes != 0' fails

      Program received signal SIGABRT, Aborted.
      0x00007ffff4615f47 in raise () from /lib/x86_64-linux-gnu/libc.so.6
      #0 0x00007ffff4615f47 in raise () from /lib/x86_64-linux-gnu/libc.so.6
      #1 0x00007ffff46178b1 in abort () from /lib/x86_64-linux-gnu/libc.so.6
      #2 0x00000000004071d3 in os_panic () at /src/vpp/src/vpp/vnet/main.c:366
      #3 0x00007ffff54f6609 in debugger () at /src/vpp/src/vppinfra/error.c:84
      #4 0x00007ffff54f6387 in _clib_error (how_to_die=2, function_name=0x0, line_number=0, fmt=0x7ffff76b8570 "%s:%d (%s) assertion `%s' fails") at /src/vpp/src/vppinfra/error.c:143
      #5 0x00007ffff7058893 in tcp_prepare_retransmit_segment (wrk=0x7fffb540c900, tc=0x7fffd86ccac0, offset=0, max_deq_bytes=0, b=0x7fffb4cd0190) at /src/vpp/src/vnet/tcp/tcp_output.c:1268
      #6 0x00007ffff7057a83 in tcp_timer_retransmit_handler (tc=0x7fffd86ccac0) at /src/vpp/src/vnet/tcp/tcp_output.c:1419
      #7 0x00007ffff70a0e5e in tcp_dispatch_pending_timers (wrk=0x7fffb540c900) at /src/vpp/src/vnet/tcp/tcp.c:1107
      #8 0x00007ffff709eded in tcp_update_time (now=50.494292161111133, thread_index=0 '\000') at /src/vpp/src/vnet/tcp/tcp.c:1143
      #9 0x00007ffff7418742 in transport_update_time (time_now=50.494292161111133, thread_index=0 '\000') at /src/vpp/src/vnet/session/transport.c:779
      #10 0x00007ffff73fb63e in session_queue_node_fn (vm=0x7ffff63849c0 <vlib_global_main>, node=0x7fffb48ac640, frame=0x0) at /src/vpp/src/vnet/session/session_node.c:1377
      #11 0x00007ffff60cc4d5 in dispatch_node (vm=0x7ffff63849c0 <vlib_global_main>, node=0x7fffb48ac640, type=VLIB_NODE_TYPE_INPUT, dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x0, last_time_stamp=8435112566995202) at /src/vpp/src/vlib/main.c:1235
      #12 0x00007ffff60c6f75 in vlib_main_or_worker_loop (vm=0x7ffff63849c0 <vlib_global_main>, is_main=1) at /src/vpp/src/vlib/main.c:1815
      #13 0x00007ffff60c8f4a in vlib_main_loop (vm=0x7ffff63849c0 <vlib_global_main>) at /src/vpp/src/vlib/main.c:1990
      #14 0x00007ffff60c8d1f in vlib_main (vm=0x7ffff63849c0 <vlib_global_main>, input=0x7fffb4cd0fa8) at /src/vpp/src/vlib/main.c:2236
      #15 0x00007ffff614f425 in thread0 (arg=140737324272064) at /src/vpp/src/vlib/unix/main.c:658
      #16 0x00007ffff551a794 in clib_calljmp () at /src/vpp/src/vppinfra/longjmp.S:123
      #17 0x00007fffffffd6b0 in ?? ()
      #18 0x00007ffff614efb7 in vlib_unix_main (argc=61, argv=0x71d4d0) at /src/vpp/src/vlib/unix/main.c:730
      #19 0x00000000004068b8 in main (argc=61, argv=0x71d4d0) at /src/vpp/src/vpp/vnet/main.c:291

      The line numbers may be slightly off due to added debug print when this was captured, sorry about that, but I will describe the details of the issue in a moment.

      At the end of this report, there's elog captured for that connection using aforementioned debug print added at the point of assertion.

      What happens here:

      1. SYN is received for the connection which is created in SYN_RCVD state
      2. SYN-ACK is sent back to the client and retransmit timer is set / updated for the connection
      3. Client does not reply in time (due to netem dropping packets) and the retransmit timer is enqueued into wrk->pending_timers via tcp_expired_timers_dispatch(): https://github.com/FDio/vpp/blob/94dbf952204e811c5ec7a39600c25f7b40387f45/src/vnet/tcp/tcp.c#L1246
      4. The timer handler is called https://github.com/FDio/vpp/blob/94dbf952204e811c5ec7a39600c25f7b40387f45/src/vnet/tcp/tcp_output.c#L1308
      5. As the connection is in SYN_RCVD state, SYN+ACK is retransmitted https://github.com/FDio/vpp/blob/94dbf952204e811c5ec7a39600c25f7b40387f45/src/vnet/tcp/tcp_output.c#L1410-L1443
      6. The timer is rearmed
      7. For several times, the process repeats itself (go to p.3)

      Then, at some point, an ACK is received from the server after p.3 but before p.4. So the timer handler is already enqueued for retransmit, but the connection goes into ESTABLISHED state before this happens. This results in a retransmit attempt with no data enqueued here https://github.com/FDio/vpp/blob/94dbf952204e811c5ec7a39600c25f7b40387f45/src/vnet/tcp/tcp_output.c#L1381 , n_bytes being 0, after which an assertion failure happens and VPP dies.

      The problem here is that tcp_timer_reset() doesn't remove pending entries from the queue https://github.com/FDio/vpp/blob/94dbf952204e811c5ec7a39600c25f7b40387f45/src/vnet/tcp/tcp_timer.h#L31 and doing so is not actually easy. As a band-aid solution, I made the retransmit handler tolerate the zero-retransmit:

      diff --git a/src/vnet/tcp/tcp_output.c b/src/vnet/tcp/tcp_output.c
      index afc652029..5cbe36494 100644
      — a/src/vnet/tcp/tcp_output.c
      +++ b/src/vnet/tcp/tcp_output.c
      @@ -1383,6 +1383,11 @@ tcp_timer_retransmit_handler (tcp_connection_t * tc)
      /* Send the first unacked segment. If we're short on buffers, return

      • as soon as possible */
        n_bytes = clib_min (tc->snd_mss, tc->snd_nxt - tc->snd_una);
        + if (n_bytes == 0)
        + { + // late SYN+ACK retransmit timer firing + return; + }
        n_bytes = tcp_prepare_retransmit_segment (wrk, tc, 0, n_bytes, &b);
        if (!n_bytes)
        {

      but that doesn't seem to be a right solution, as there may be other issues lurking due to assumption that the timers which are reset don't fire after that.

      With this fix, TCP crashes for me later in this test with different diagnostics, which may or may not be related to timers. Maybe it's actually a bug in SVM FIFO, didn't investigate that enough yet (f->start_chunk being zero when f_update_ooo_deq()) is called:

      Program received signal SIGSEGV, Segmentation fault.
      0x00007ffff5e5d24f in f_update_ooo_deq (f=0x7fff0fb35880, start_pos=0,
      end_pos=45) at /src/vpp/src/svm/svm_fifo.c:660
      660 /src/vpp/src/svm/svm_fifo.c: No such file or directory.
      #0 0x00007ffff5e5d24f in f_update_ooo_deq (f=0x7fff0fb35880, start_pos=0,
      end_pos=45) at /src/vpp/src/svm/svm_fifo.c:660
      #1 0x00007ffff5e5d186 in svm_fifo_peek (f=0x7fff0fb35880, offset=0, len=45,
      Type <return> to continue, or q <return> to quit

      d: Wed, 15 Jul 2020 14:37:38 GMT\r\nConnection: keep-alive\r\nETag: \"5f0f14b"...)
      at /src/vpp/src/svm/svm_fifo.c:1054
      #2 0x00007ffff740605e in session_tx_fill_buffer (vm=0x7ffff63a59c0 <vlib_global_main>,
      ctx=0x7fffb4e1ebc0, b=0x1001d94000, n_bufs=0x7fffb56981ce, peek_data=1 '\001')
      at /src/vpp/src/vnet/session/session_node.c:667
      #3 0x00007ffff740156c in session_tx_fifo_read_and_snd_i (wrk=0x7fffb4e1eb80, node=0x7fffb4922080,
      elt=0x7fffd7650a1c, n_tx_packets=0x7fffb56987d8, peek_data=1 '\001')
      at /src/vpp/src/vnet/session/session_node.c:1064
      #4 0x00007ffff7400703 in session_tx_fifo_peek_and_snd (wrk=0x7fffb4e1eb80, node=0x7fffb4922080,
      e=0x7fffd7650a1c, n_tx_packets=0x7fffb56987d8) at /src/vpp/src/vnet/session/session_node.c:1112
      #5 0x00007ffff740a2ac in session_event_dispatch_io (wrk=0x7fffb4e1eb80, node=0x7fffb4922080,
      elt=0x7fffd7650a1c, thread_index=0, n_tx_packets=0x7fffb56987d8)
      at /src/vpp/src/vnet/session/session_node.c:1272
      #6 0x00007ffff7403fa2 in session_queue_node_fn (vm=0x7ffff63a59c0 <vlib_global_main>,
      node=0x7fffb4922080, frame=0x0) at /src/vpp/src/vnet/session/session_node.c:1432
      #7 0x00007ffff60ee175 in dispatch_node (vm=0x7ffff63a59c0 <vlib_global_main>, node=0x7fffb4922080,
      type=VLIB_NODE_TYPE_INPUT, dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x0,
      last_time_stamp=8465767685985762) at /src/vpp/src/vlib/main.c:1235
      #8 0x00007ffff60e8c15 in vlib_main_or_worker_loop (vm=0x7ffff63a59c0 <vlib_global_main>, is_main=1)
      at /src/vpp/src/vlib/main.c:1815
      #9 0x00007ffff60eabea in vlib_main_loop (vm=0x7ffff63a59c0 <vlib_global_main>)
      at /src/vpp/src/vlib/main.c:1990
      #10 0x00007ffff60ea9bf in vlib_main (vm=0x7ffff63a59c0 <vlib_global_main>, input=0x7fffb5698fa8)
      at /src/vpp/src/vlib/main.c:2236
      #11 0x00007ffff6170fa5 in thread0 (arg=140737324407232) at /src/vpp/src/vlib/unix/main.c:658
      #12 0x00007ffff553d744 in clib_calljmp () at /src/vpp/src/vppinfra/longjmp.S:123
      #13 0x00007fffffffd6b0 in ?? ()
      #14 0x00007ffff6170b37 in vlib_unix_main (argc=61, argv=0x71d4d0) at /src/vpp/src/vlib/unix/main.c:730
      #15 0x00000000004068b8 in main (argc=61, argv=0x71d4d0) at /src/vpp/src/vpp/vnet/main.c:291

      Additional info 

      elog for the TCP connection:

      22.828418901: lcl: 198.19.0.2:80
      22.828419081: rmt: 198.18.0.2:52752
      22.828419141: syn-rx: cidx 9356 sidx 9358 irs 4158682863
      22.828419197: state: syn-rcvd cidx 9356 sidx 9358
      22.828437277: synack-tx: cidx 9356 sidx 9358 snd_una 0 snd_nxt 1 rcv_nxt 1
      22.857900159: out: flags 12, bytes 40
      22.957510075: timer-pop: retransmit cidx 9356 sidx 9358
      23.145276195: cc: rxt-timeout snd_space 4019 snd_una 0 out 0 flight 1
      23.145292312: synack-rxt: cidx 9356 sidx 9358 snd_nxt 1 rcv_nxt 1
      23.145445507: out: flags 12, bytes 40
      23.384408719: timer-pop: retransmit cidx 9356 sidx 9358
      23.607211673: cc: rxt-timeout snd_space 4019 snd_una 0 out 0 flight 1
      23.607237791: synack-rxt: cidx 9356 sidx 9358 snd_nxt 1 rcv_nxt 1
      23.608412308: out: flags 12, bytes 40
      23.851170613: timer-pop: retransmit cidx 9356 sidx 9358
      23.999213139: cc: rxt-timeout snd_space 4019 snd_una 0 out 0 flight 1
      23.999226549: synack-rxt: cidx 9356 sidx 9358 snd_nxt 1 rcv_nxt 1
      24.002257791: out: flags 12, bytes 40
      24.214047256: timer-pop: retransmit cidx 9356 sidx 9358
      24.371062133: cc: rxt-timeout snd_space 4019 snd_una 0 out 0 flight 1
      24.371074807: synack-rxt: cidx 9356 sidx 9358 snd_nxt 1 rcv_nxt 1
      24.371108788: out: flags 12, bytes 40
      24.864990968: timer-pop: retransmit cidx 9356 sidx 9358
      24.901257625: cc: rxt-timeout snd_space 4019 snd_una 0 out 0 flight 1
      24.901269878: synack-rxt: cidx 9356 sidx 9358 snd_nxt 1 rcv_nxt 1
      24.901411387: out: flags 12, bytes 40
      25.765769822: timer-pop: retransmit cidx 9356 sidx 9358
      25.882664424: cc: rxt-timeout snd_space 4019 snd_una 0 out 0 flight 1
      25.882679897: synack-rxt: cidx 9356 sidx 9358 snd_nxt 1 rcv_nxt 1
      25.882852569: out: flags 12, bytes 40
      27.529723082: timer-pop: retransmit cidx 9356 sidx 9358
      27.586602849: cc: rxt-timeout snd_space 4019 snd_una 0 out 0 flight 1
      27.586618753: synack-rxt: cidx 9356 sidx 9358 snd_nxt 1 rcv_nxt 1
      27.586855949: out: flags 12, bytes 40
      30.820255771: timer-pop: retransmit cidx 9356 sidx 9358
      30.941896684: cc: rxt-timeout snd_space 4019 snd_una 0 out 0 flight 1
      30.941908639: synack-rxt: cidx 9356 sidx 9358 snd_nxt 1 rcv_nxt 1
      30.941976769: out: flags 12, bytes 40
      37.476458664: timer-pop: retransmit cidx 9356 sidx 9358
      37.548429419: cc: rxt-timeout snd_space 4019 snd_una 0 out 0 flight 1
      37.548438776: synack-rxt: cidx 9356 sidx 9358 snd_nxt 1 rcv_nxt 1
      37.548603744: out: flags 12, bytes 40
      50.387905019: timer-pop: retransmit cidx 9356 sidx 9358
      50.469505878: state: established cidx 9356 sidx 9358
      50.469517778: in: order len 45 written 45 rcv_nxt 46 rcv_wnd(o) 4096
      50.472787714: enq: length 45
      50.476222077: ack-tx: acked 45 rcv_nxt 46 rcv_wnd 65536 snd_nxt 1 snd_wnd 62720
      50.476222412: huh?: rcv_wnd 65536 obsd 65536 av 65491 rcv_nxt 46 rcv_las 46
      50.476521576: out: flags 10, bytes 32
      50.494423659: cc_stat: cwnd 4020 flight 0 space 4014 ssthresh 2147483647 snd_wnd 62720
      50.494424496: rcv_stat: rto 1112 srtt 212 mrtt-us 1000000 rttvar 225
      50.494424548: snd_stat: cc_space 4020 sacked 0 lost 0 out 0 rxt 0
      50.494424771: cc: rxt-timeout snd_space 4020 snd_una 1 out 0 flight 0

            florin.coras Florin Coras
            ivan4th Ivan Shvedunov
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: