nvidia-docker2 升级文档

说明:鉴于当前应用版本均需支持cuda10,而对于docker环境,只有nvidia-docker2的版
本才可支持cuda10。特给出以下升级步骤,安装包。

docker环境要求

  • **docker 版本 **

    docker-ce 18.06+

  • **nvidia-docker 版本 **

    nvidia-docker2+

检查nvidia-docker版本

$ nvidia-docker version

如上图所示,nvidia-docke 版本为1.0.1 ,docker 版本为18.03.1-ce,版本过低

升级docker及nvidia-docker版本

  • **卸载nvidia-docker **

    $ yum remove nvidia‐docker
    
  • **下载安装包 **

  • **拷贝安装包nvidia-docker2.zip至/opt/,然后解压 **

    $ unzip nvidia-docker2.zip
    
  • **安装 **

    # 进入安装包目录
    $ cd  nvidia-docker2
    $ rpm ‐i libnvidia‐container1‐1.0.5‐1.x86_64.rpm
    $ rpm ‐i libnvidia‐container‐tools‐1.0.5‐1.x86_64.rpm
    $ rpm ‐i nvidia‐container‐runtime‐3.1.4‐1.x86_64.rpm
    $ rpm ‐i nvidia‐container‐toolkit‐1.0.5‐2.x86_64.rpm
    $ rpm ‐i nvidia‐docker2‐2.2.2‐1.noarch.rpm
    

参考链接:内部资料

运行镜像出错

docker: Error response from daemon: Unknown runtime specified nvidia.
解决方法
​原来是nvidia-docker 没有注册:
具体的:
To register the nvidia runtime, use the method below that is best suited to your environment.
You might need to merge the new argument with your existing configuration.

请先检查本地是否有对应的配置文件,查看其中的值,然后再进行操作。以免误操作。

sudo mkdir -p /etc/systemd/system/docker.service.d
sudo tee /etc/systemd/system/docker.service.d/override.conf <<EOF
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd --host=fd:// --add-runtime=nvidia=/usr/bin/nvidia-container-runtime
EOF
sudo systemctl daemon-reload
sudo systemctl restart docker
sudo tee /etc/docker/daemon.json <<EOF
{
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}
EOF
sudo pkill -SIGHUP dockerd
sudo systemctl restart docker

参考链接:https://blog.csdn.net/weixin_32820767/article/details/80538510

更多推荐