import torch   #交互模式出现如下错误
OSError: libcurand.so.10: cannot open shared object file: No such file or directory

解决方法:
apt-get update && apt-get install -y --no-install-recommends gnupg2 ca-certificates

apt-key add /tmp/jetson.key   #jetson.key需下载

echo "deb https://repo.download.nvidia.com/jetson/common r32.4 main" >> /etc/apt/sources.list

apt-get update
apt-get install cuda-toolkit-10-2   #最后安装这个包解决
import torch  #交互模式运行代码出现以下错误
OSError: /usr/lib/aarch64-linux-gnu/libcudnn.so.8: file too short
解决方案:

apt-get install libcudnn8
#交互模式运行以下代码
import torch
torch.cuda.is_available()
>>True
torch.zeros((1,2,3,4), device=0)  #运行此行代码出现以下错误
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.


分析原因:

NVIDIA L4T Pytorch 和tensorflow镜像的算力是基于Jetson Products生成的镜像,然而Jetson Products产品的算力是5.3、6.2和7.2。但是我的服务器T4显卡算力是7.5,所以解决办法是PyTorch源码编译以及算力修改环境变量为:export TORCH_CUDA_ARCH_LIST="7.5"



#NVIDIA L4T Pytorch 和tensorflow镜像
https://catalog.ngc.nvidia.com/orgs/nvidia/containers/l4t-pytorch

#PyTorch for Jetson - version 1.11 now available;轮子下载
https://forums.developer.nvidia.com/t/pytorch-for-jetson-version-1-11-now-available/72048
pytorch源码编译安装依赖包:
apt-get install libcudnn8-dev
pip3 install pyyaml

下载pytorch-1.10.0-github.zip    #官网自行下载,或者git clone https://gitee.com/EwenWan/pytorch.git
unzip pytorch-1.10.0-github.zip   #解压。然后cd pytorch

#编译torch
python3 setup.py install

测试GPU的torch安装是否成功操作如下:
import torch
torch.cuda.is_available()
torch.zeros((1,2,3,4), device=0)   #一切都成功了,开始部署算法


编译中出现的错误:
AttributeError: module 'distutils' has no attribute 'version'
解决方法:
原因:
setuptools版本过高

解决办法:
安装低版本setuptools
pip uninstall setuptools
pip install setuptools==59.5.0 //需要比你之前的低 

 

#torchvison源码编译
#下载链接:https://github.com/pytorch/vision
#本次下载版本:v0.11.3

cd vision
python3 setup.py install
#pip豆瓣源
pip3 install pefile -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com


########部署算法安装依赖包出现的错误###########
#错误:ModuleNotFoundError: No module named ‘setuptools_rust‘
解决办法:
pip3 install -U pip setuptools
##########环境##########
torch   1.10.0
torchvision 0.11.3
cuda10.2
python3.6
kylin操作系统
ARM处理器
nvidia T4显卡

更多推荐