Uploaded image for project: 'csit'
  1. csit
  2. CSIT-1770

Ansible cleanup should handle unresponsive docker daemon

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Medium Medium
    • bulk release
    • None
    • None

      With some small probability (say 1 in 200 runs), performance testbed stays reserved, because ansible cleanup fails to kill containers due to docker daemon not responding. This can happen before or after tests are run.

      It should be possible to extend the ansible cleanup, so it restarts the docker daemon when this happens, thus completing the cleanup successfully.

      Symptom (from [0]):

      TASK [cleanup : Kill container - Get running Docker containers] ****************
      fatal: [10.30.51.44]: FAILED! => changed=false
      cmd: docker ps -aq
      delta: '0:00:00.045410'
      end: '2020-11-04 02:10:42.822197'
      msg: non-zero return code
      rc: 1
      start: '2020-11-04 02:10:42.776787'
      stderr: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
      stderr_lines:

      • Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
        stdout: ''
        stdout_lines: <omitted>

      TASK [cleanup : fail] **********************************************************
      fatal: [10.30.51.44]: FAILED! => changed=false
      msg: Kill containers failed!
      to retry, use: --limit @/w/workspace/csit-vpp-perf-report-iterative-2009-2n-skx/resources/tools/testbed-setup/ansible/site.retry

      PLAY RECAP *********************************************************************
      10.30.51.44 : ok=9 changed=5 unreachable=0 failed=2
      10.30.51.45 : ok=17 changed=6 unreachable=0 failed=0

      ++ die 'Failed to run ansible on host!'

      [0] https://logs.fd.io/production/vex-yul-rot-jenkins-1/csit-vpp-perf-report-iterative-2009-2n-skx/91/console.log.gz

            Unassigned Unassigned
            vrpolak Vratko Polak
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: