TensorFlow 安装笔记
Table of Contents
1 源码编译安装
1.1 安装环境和准备文件
安装环境为 Ubuntu 18.04 操作系统,通过编译安装 TensorFlow 1.12.0。准备文件如 下:
ubuntu-18.04.1-desktop-amd64.iso
cuda_10.0.130_410.48_linux.run
cudnn-10.0-linux-x64-v7.4.2.24.tgz
nccl_2.3.7-1+cuda10.0_x86_64.txz
bazel-0.18.0-installer-linux-x86_64.sh
1.2 安装操作系统
安装操作系统参考 这篇
1.3 更新系统的源
sudo apt-get update sudo apt-get upgrade
1.4 安装 Python 的环境
TensorFlow 12 依赖的 Python 环境是 Python 3.6,先安装 Python 的相关依赖
sudo apt install -y python-dev python-pip python3-dev python3-pip
1.5 安装 CUDA 编译工具
检验系统正确地识别到了 NVIDIA 显卡
lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation Device 1cbb (rev a1) 01:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)
查看系统和 gcc 的版本
uname -m && cat /etc/*release gcc --version
安装编译过程中的依赖
sudo apt-get install -y build-essential \ cmake git unzip zip \ python-dev python3-dev python-pip python3-pip
安装内核的头文件依赖
# 查看内核版本 uname -r # 安装内核的头文件 sudo apt-get install linux-headers-$(uname -r)
参考 CUDA Documentation 文档安装 CUDA 驱动,这里使用 run 文件来安装
首先禁用 Nouveau 驱动
sudo touch /etc/modprobe.d/blacklist-nouveau.conf sudo echo 'blacklist nouveau' > /etc/modprobe.d/blacklist-nouveau.conf sudo echo 'options nouveau modeset=0' >> /etc/modprobe.d/blacklist-nouveau.conf
更新内核 initramfs
sudo update-initramfs -u
删除之前安装的 CUDA
sudo apt-get purge nvidia* sudo apt-get autoremove sudo apt-get autoclean sudo rm -rf /usr/local/cuda*
通过 run 文件安装 CUDA 驱动
sudo sh cuda_10.0.130_410.48_linux.run
安装驱动后的操作
echo 'export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}' >> ~/.bashrc echo 'export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' >> ~/.bashrc source ~/.bashrc sudo ldconfig # 查看 CUDA 是否安装成功 nvidia-smi
1.6 安装 cuDNN 和 NCCL
安装 cuDNN
tar xvf cudnn-10.0-linux-x64-v7.4.2.24.tgz
cuda ├── include │ └── cudnn.h ├── lib64 │ ├── libcudnn.so -> libcudnn.so.7 │ ├── libcudnn.so.7 -> libcudnn.so.7.4.2 │ ├── libcudnn.so.7.4.2 │ └── libcudnn_static.a └── NVIDIA_SLA_cuDNN_Support.txt
复制文件到 CUDA 中
sudo cp -R cuda/include/* /usr/local/cuda-10.0/include sudo cp -R cuda/lib64/* /usr/local/cuda-10.0/lib64
安装 NCCL
tar xvf nccl_2.3.7-1+cuda10.0_x86_64.txz
nccl_2.3.7-1+cuda10.0_x86_64 ├── include │ └── nccl.h ├── lib │ ├── libnccl.so -> libnccl.so.2 │ ├── libnccl.so.2 -> libnccl.so.2.3.7 │ ├── libnccl.so.2.3.7 │ └── libnccl_static.a └── LICENSE.txt
复制文件到 CUDA 目录中
cd nccl_2.3.7-1+cuda10.0_x86_64
sudo cp -R include/* /usr/local/cuda-10.0/include
sudo cp -R lib/* /usr/local/cuda-10.0/lib64
1.7 安装 Python 依赖包
最好启动一个 virtualenv
mkdir ~/.venv && cd ~/.venv # 安装 virtualenv 依赖 pip install virtualenv # 新建虚拟环境 virtualenv tfenv --python=python3 # 另外一种建虚拟环境的方式 python3 -m virtualenv tfenv # 启用虚拟环境 source ~/.venv/tfenv/bin/activate
然后安装相关依赖包
pip install pip six numpy==1.15.4 wheel setuptools mock future>=0.17.1 pip install keras_applications==1.0.6 --no-deps pip install keras_preprocessing==1.0.5 --no-deps
1.8 安装 bazel 编译工具
# bazel 依赖 sudo apt-get install -y build-essential pkg-config zip g++ zlib1g-dev unzip ./bazel-0.18.0-installer-linux-x86_64.sh --user echo 'export PATH=${HOME}/bin:${PATH:+:${PATH}}' >> ~/.bashrc source ~/.bashrc sudo ldconfig
1.9 使用 TensorFlow 源码编译和安装
git clone https://github.com/tensorflow/tensorflow.git git checkout r1.12
配置编译系统
./tensorflow-gpu$ ./configure WARNING: Processed legacy workspace file /home/ub64/Code/github/tensorflow-gpu/tools/bazel.rc. This file will not be processed in the next release of Bazel. Please read https://github.com/bazelbuild/bazel/issues/6319 for further information, including how to upgrade. WARNING: An illegal reflective access operation has occurred WARNING: Illegal reflective access by com.google.protobuf.UnsafeUtil (file:/home/ub64/.cache/bazel/_bazel_ub64/install/f1e11885a5cc7ba9947679cffb18bf94/_embedded_binaries/A-server.jar) to field java.lang.String.value WARNING: Please consider reporting this to the maintainers of com.google.protobuf.UnsafeUtil WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations WARNING: All illegal access operations will be denied in a future release WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown". You have bazel 0.18.0 installed. Please specify the location of python. [Default is /home/ub64/.venv/gpu/bin/python]: Found possible Python library paths: /home/ub64/.venv/gpu/lib/python3.6/site-packages . Please input the desired Python library path to use. Default is [/home/ub64/.venv/gpu/lib/python3.6/site-packages] Do you wish to build TensorFlow with Apache Ignite support? [Y/n]: Y Apache Ignite support will be enabled for TensorFlow. Do you wish to build TensorFlow with XLA JIT support? [Y/n]: Y XLA JIT support will be enabled for TensorFlow. Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: N No OpenCL SYCL support will be enabled for TensorFlow. Do you wish to build TensorFlow with ROCm support? [y/N]: N No ROCm support will be enabled for TensorFlow. Do you wish to build TensorFlow with CUDA support? [y/N]: y CUDA support will be enabled for TensorFlow. Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 9.0]: 10.0 Please specify the location where CUDA 10.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-10.0 Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 7.4.2 Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-10.0]: /usr/local/cuda-10.0 Do you wish to build TensorFlow with TensorRT support? [y/N]: N No TensorRT support will be enabled for TensorFlow. Please specify the NCCL version you want to use. If NCCL 2.2 is not installed, then you can use version 1.3 that can be fetched automatically but it may have worse performance with multiple GPUs. [Default is 2.2]: 2.3.7 Please specify the location where NCCL 2 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-10.0]:/usr/local/cuda-10.0 NCCL found at /usr/local/cuda-10.0/lib64/libnccl.so.2 Assuming NCCL header path is /usr/local/cuda-10.0/lib64/../include/nccl.h Please specify a list of comma-separated Cuda compute capabilities you want to build with. You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1]: 6.1 Do you want to use clang as CUDA compiler? [y/N]: N nvcc will be used as CUDA compiler. Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: /usr/bin/gcc Do you wish to build TensorFlow with MPI support? [y/N]: N No MPI support will be enabled for TensorFlow. Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: N Not configuring the WORKSPACE for Android builds. Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details. --config=mkl # Build with MKL support. --config=monolithic # Config for mostly static monolithic build. --config=gdr # Build with GDR support. --config=verbs # Build with libverbs support. --config=ngraph # Build with Intel nGraph support. Configuration finished (gpu) ub64@hpz2:~/Code/github/tensorflow-gpu$
编译源代码
# CPU only bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package # GPU version with CUDA bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
打 wheel 包并安装
./bazel-bin/tensorflow/tools/pip_package/build_pip_package tensorflow_pkg pip install tensorflow_pkg/tensorflow-1.12.0-cp36-cp36m-linux_x86_64.whl
2 使用 pip 安装
使用 Python 自带的安装包工具 pip 安装比较方便的。根据目前官网的上的文档,目前 Python3 已经更新到了 Python3.7,但是 TensorFlow 目前只支持到 Python3.6,所以在 安装 Python 时安装 Python3.6。另外,pip 的安装和设置请参考 here 。
2.1 可安装的包
官网上给出以下可选安装包:
- tensorflow: Current release for CPU-only (recommended for beginners)
- tensorflow-gpu: Current release with GPU support (Ubuntu and Windows)
- tf-nightly: Nightly build for CPU-only (unstable)
- tf-nightly-gpu: Nightly build with GPU support (unstable, Ubuntu and Windows)
直接使用下面指令安装即可:
pip install tensorflow
2.2 更新 TensorFlow 包
pip install --upgrade tensorflow
3 测试程序
官网是给出的使用 keras 运行 mnist 测试样例。
# mnist.py import tensorflow as tf mnist = tf.keras.datasets.mnist (x_train, y_train), (x_test, y_test) = mnist.load_data() x_train, x_test = x_train / 255.0, x_test / 255.0 model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(), tf.keras.layers.Dense(512, activation=tf.nn.relu), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10, activation=tf.nn.softmax) ]) model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) model.fit(x_train, y_train, epochs=5) model.evaluate(x_test, y_test)
运行上述文件
python mnist.py