-
Bug
-
Resolution: Unresolved
-
Medium
-
None
-
None
With some small probability (say 1 in 200 runs), performance testbed stays reserved, because ansible cleanup fails to kill containers due to docker daemon not responding. This can happen before or after tests are run.
It should be possible to extend the ansible cleanup, so it restarts the docker daemon when this happens, thus completing the cleanup successfully.
Symptom (from [0]):
TASK [cleanup : Kill container - Get running Docker containers] ****************
fatal: [10.30.51.44]: FAILED! => changed=false
cmd: docker ps -aq
delta: '0:00:00.045410'
end: '2020-11-04 02:10:42.822197'
msg: non-zero return code
rc: 1
start: '2020-11-04 02:10:42.776787'
stderr: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
stderr_lines:
- Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
stdout: ''
stdout_lines: <omitted>
TASK [cleanup : fail] **********************************************************
fatal: [10.30.51.44]: FAILED! => changed=false
msg: Kill containers failed!
to retry, use: --limit @/w/workspace/csit-vpp-perf-report-iterative-2009-2n-skx/resources/tools/testbed-setup/ansible/site.retry
PLAY RECAP *********************************************************************
10.30.51.44 : ok=9 changed=5 unreachable=0 failed=2
10.30.51.45 : ok=17 changed=6 unreachable=0 failed=0
++ die 'Failed to run ansible on host!'