Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Medium
Fix Version/s: bulk release
Affects Version/s: None
Labels:
None

Epic Link:
Framework

With some small probability (say 1 in 200 runs), performance testbed stays reserved, because ansible cleanup fails to kill containers due to docker daemon not responding. This can happen before or after tests are run.

It should be possible to extend the ansible cleanup, so it restarts the docker daemon when this happens, thus completing the cleanup successfully.

Symptom (from [0]):

TASK [cleanup : Kill container - Get running Docker containers] ****************
fatal: [10.30.51.44]: FAILED! => changed=false
cmd: docker ps -aq
delta: '0:00:00.045410'
end: '2020-11-04 02:10:42.822197'
msg: non-zero return code
rc: 1
start: '2020-11-04 02:10:42.776787'
stderr: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
stderr_lines:

Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
stdout: ''
stdout_lines: <omitted>

TASK [cleanup : fail] **********************************************************
fatal: [10.30.51.44]: FAILED! => changed=false
msg: Kill containers failed!
to retry, use: --limit @/w/workspace/csit-vpp-perf-report-iterative-2009-2n-skx/resources/tools/testbed-setup/ansible/site.retry

PLAY RECAP *********************************************************************
10.30.51.44 : ok=9 changed=5 unreachable=0 failed=2
10.30.51.45 : ok=17 changed=6 unreachable=0 failed=0

++ die 'Failed to run ansible on host!'

[0] https://logs.fd.io/production/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2009-2n-skx/91/console.log.gz

Assignee:: Unassigned

Reporter:: Vratko Polak

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 13/Nov/20 12:58 PM

Updated:: 24/Mar/21 7:19 AM

Details

Description

Attachments

Activity

People

Dates