環境
datadog-agent
# /opt/datadog-agent/bin/agent/agent version Agent 6.9.0 - Commit: 4bbd2c9 - Serialization version: 4.7.1
datadog-agentはansibleで以下のroleでインストール
$ ansible-galaxy info Datadog.datadog Role: Datadog.datadog description: Install Datadog agent and configure checks active: True commit: 81e3921afa069678e60ece56ea6a164494d55b6c commit_message: Release 2.4.0 commit_url: https://api.github.com/repos/DataDog/ansible-datadog/git/commits/81e3921afa069678e60e company: created: 2018-07-23T20:15:50.491026Z download_count: 399542 forks_count: 125 github_branch: master github_repo: ansible-datadog github_user: DataDog id: 27743 imported: 2018-10-25T17:52:51.891854-04:00 is_valid: True issue_tracker_url: https://github.com/DataDog/ansible-datadog/issues license: Apache2 min_ansible_version: 2.2 modified: 2018-10-25T21:52:51.892008Z open_issues_count: 19 path: ['/Users/s04270/.ansible/roles', '/usr/share/ansible/roles', '/etc/ansible/roles'] role_type: ANS stargazers_count: 136 travis_status_url:
事象
/opt/datadog-agent/bin/agent/agent status
でdockerのステータスを確認すると以下のエラー
docker ------ Instance ID: docker [ERROR] Total Runs: 95 Metric Samples: Last Run: 0, Total: 0 Events: Last Run: 0, Total: 0 Service Checks: Last Run: 0, Total: 0 Average Execution Time : 0s Error: permanent failure in dockerutil: retry number exceeded No traceback Warning: Error initialising check: permanent failure in dockerutil: retry number exceeded
原因
dd-agentユーザーがdockerグループに所属してない為、情報を取得する権限がない
issueが立ってて対応するプルリク出てるけどまだマージされてないのでroleのバージョン上げても改善されない(というか現時点では2.4.0が最新)
対応
dd-agnetユーザーをdockerグループに所属させる
issueにも書いてあるけどansibleだと以下を実行
- name: ensure dd-agent is in docker group become: yes user: name: dd-agent groups: docker append: yes notify: restart datadog-agent tags: datadog
実行後はステータスがOKに
docker ------ Instance ID: docker [OK] Total Runs: 2 Metric Samples: Last Run: 92, Total: 184 Events: Last Run: 0, Total: 0 Service Checks: Last Run: 1, Total: 2 Average Execution Time : 23ms