-
Improvement
-
Resolution: Cannot Reproduce
-
Medium
-
None
-
None
-
None
VPP version 18.07
If more than 1 client connect to server (using VCL lib) simultaneously, the server just hang up.
Debug details:
- Most of time, the event thread makes an infinitive loop (vce_event_thread_fn - vcl_event.c:205):
- With fifo_depth = 2, recycle_count = 1 and ev->recycle = 1, the following code loop endlessly:
while (1)
{
uword fifo_depth = clib_fifo_elts (evt->event_index_fifo);
while ((fifo_depth == 0) || (recycle_count == fifo_depth))
/* Remove event */
clib_spinlock_lock (&(evt->events_lockp));
clib_fifo_sub1 (evt->event_index_fifo, ev_idx);
ev = vce_get_event_from_index (evt, ev_idx);
ASSERT(ev);
if (recycle_count && ev->recycle)
clib_spinlock_lock (&evt->handlers_lockp);
...
}
- Therefore, the vppcom_session_accept function (vppcom.c:2950) also run into some troubles:
- reg->ev_idx hasn't been set so "ev" is equal to NULL
- reg->handler_cond has never been signaled
- Then programs just wait at pthread_cond_timedwait until timeout
/* Register handler for connect_request event on listen_session_index */
vce_event_key_t evk;
evk.session_index = listen_session_index;
evk.eid = VCL_EVENT_CONNECT_REQ_ACCEPTED;
reg = vce_register_handler (&vcm->event_thread, &evk,
vce_connect_request_handler_fn, 0);
clib_spinlock_lock (&(vcm->event_thread.events_lockp));
ev = vce_get_event_from_index (&vcm->event_thread, reg->ev_idx);
pthread_mutex_lock (®->handler_lock);
while (!ev)
{
clib_spinlock_unlock (&(vcm->event_thread.events_lockp));
rv = pthread_cond_timedwait (®->handler_cond,
®->handler_lock, &ts);
if (rv == ETIMEDOUT)
clib_spinlock_lock (&(vcm->event_thread.events_lockp));
ev = vce_get_event_from_index (&vcm->event_thread, reg->ev_idx);
}
result = vce_get_event_data (ev, sizeof (*result));
client_session_index = result->accepted_session_index;
clib_spinlock_unlock (&(vcm->event_thread.events_lockp));
Reproduce:
Using the vpp's vcl_test_server program also gives the same behavior. About client, any tcp client (netcat, openssl s_client) will do.
VPP version 18.04
Since the event thread code is different, it doesn't create a loop there. But programs still stuck at the vppcom_session_accept at the same place.
When 2 connection come at once, it jumps into the while loop and wait at pthread_cond_timedwait. Then when another connection come, it serves the previous connections and puts the new connection waiting. I think some events might be overwrote or deleted.