TensorFlow 安装笔记

1. 源码编译安装
2. 使用 pip 安装
- 2.1. 可安装的包
- 2.2. 更新 TensorFlow 包
3. 测试程序
4. 参考链接

1 源码编译安装

1.1 安装环境和准备文件

安装环境为 Ubuntu 18.04 操作系统，通过编译安装 TensorFlow 1.12.0。准备文件如下：

ubuntu-18.04.1-desktop-amd64.iso
cuda_10.0.130_410.48_linux.run
cudnn-10.0-linux-x64-v7.4.2.24.tgz
nccl_2.3.7-1+cuda10.0_x86_64.txz
bazel-0.18.0-installer-linux-x86_64.sh

1.2 安装操作系统

安装操作系统参考这篇

1.3 更新系统的源

sudo apt-get update
sudo apt-get upgrade

1.4 安装 Python 的环境

TensorFlow 12 依赖的 Python 环境是 Python 3.6，先安装 Python 的相关依赖

sudo apt install -y python-dev python-pip python3-dev python3-pip

1.5 安装 CUDA 编译工具

检验系统正确地识别到了 NVIDIA 显卡

lspci | grep -i nvidia

01:00.0 VGA compatible controller: NVIDIA Corporation Device 1cbb (rev a1)
01:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)

查看系统和 gcc 的版本

uname -m && cat /etc/*release
gcc --version

安装编译过程中的依赖

sudo apt-get install -y build-essential \
     cmake git unzip zip \
     python-dev python3-dev python-pip python3-pip

安装内核的头文件依赖

# 查看内核版本
uname -r
# 安装内核的头文件
sudo apt-get install linux-headers-$(uname -r)

参考 CUDA Documentation 文档安装 CUDA 驱动，这里使用 run 文件来安装

首先禁用 Nouveau 驱动

sudo touch /etc/modprobe.d/blacklist-nouveau.conf
sudo echo 'blacklist nouveau' > /etc/modprobe.d/blacklist-nouveau.conf
sudo echo 'options nouveau modeset=0' >> /etc/modprobe.d/blacklist-nouveau.conf

更新内核 initramfs

sudo update-initramfs -u

删除之前安装的 CUDA

sudo apt-get purge nvidia*
sudo apt-get autoremove
sudo apt-get autoclean
sudo rm -rf /usr/local/cuda*

通过 run 文件安装 CUDA 驱动

sudo sh cuda_10.0.130_410.48_linux.run

安装驱动后的操作

echo 'export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' >> ~/.bashrc

source ~/.bashrc
sudo ldconfig

# 查看 CUDA 是否安装成功
nvidia-smi

1.6 安装 cuDNN 和 NCCL

安装 cuDNN

tar xvf cudnn-10.0-linux-x64-v7.4.2.24.tgz

cuda
├── include
│   └── cudnn.h
├── lib64
│   ├── libcudnn.so -> libcudnn.so.7
│   ├── libcudnn.so.7 -> libcudnn.so.7.4.2
│   ├── libcudnn.so.7.4.2
│   └── libcudnn_static.a
└── NVIDIA_SLA_cuDNN_Support.txt

复制文件到 CUDA 中

sudo cp -R cuda/include/* /usr/local/cuda-10.0/include
sudo cp -R cuda/lib64/* /usr/local/cuda-10.0/lib64

安装 NCCL

tar xvf nccl_2.3.7-1+cuda10.0_x86_64.txz

nccl_2.3.7-1+cuda10.0_x86_64
├── include
│   └── nccl.h
├── lib
│   ├── libnccl.so -> libnccl.so.2
│   ├── libnccl.so.2 -> libnccl.so.2.3.7
│   ├── libnccl.so.2.3.7
│   └── libnccl_static.a
└── LICENSE.txt

复制文件到 CUDA 目录中

cd nccl_2.3.7-1+cuda10.0_x86_64
sudo cp -R include/* /usr/local/cuda-10.0/include
sudo cp -R lib/* /usr/local/cuda-10.0/lib64

1.7 安装 Python 依赖包

最好启动一个 virtualenv

mkdir ~/.venv && cd ~/.venv

# 安装 virtualenv 依赖
pip install virtualenv

# 新建虚拟环境
virtualenv tfenv --python=python3
# 另外一种建虚拟环境的方式
python3 -m virtualenv tfenv

# 启用虚拟环境
source ~/.venv/tfenv/bin/activate

然后安装相关依赖包

pip install pip six numpy==1.15.4 wheel setuptools mock future>=0.17.1
pip install keras_applications==1.0.6 --no-deps
pip install keras_preprocessing==1.0.5 --no-deps

1.8 安装 bazel 编译工具

#  bazel 依赖
sudo apt-get install -y build-essential pkg-config zip g++ zlib1g-dev unzip

./bazel-0.18.0-installer-linux-x86_64.sh --user
echo 'export PATH=${HOME}/bin:${PATH:+:${PATH}}' >> ~/.bashrc
source ~/.bashrc
sudo ldconfig

1.9 使用 TensorFlow 源码编译和安装

git clone https://github.com/tensorflow/tensorflow.git
git checkout r1.12

配置编译系统

./tensorflow-gpu$ ./configure
WARNING: Processed legacy workspace file /home/ub64/Code/github/tensorflow-gpu/tools/bazel.rc. This file will not be processed in the next release of Bazel. Please read https://github.com/bazelbuild/bazel/issues/6319 for further information, including how to upgrade.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.google.protobuf.UnsafeUtil (file:/home/ub64/.cache/bazel/_bazel_ub64/install/f1e11885a5cc7ba9947679cffb18bf94/_embedded_binaries/A-server.jar) to field java.lang.String.value
WARNING: Please consider reporting this to the maintainers of com.google.protobuf.UnsafeUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.18.0 installed.
Please specify the location of python. [Default is /home/ub64/.venv/gpu/bin/python]:

Found possible Python library paths:
/home/ub64/.venv/gpu/lib/python3.6/site-packages
.
Please input the desired Python library path to use. Default is [/home/ub64/.venv/gpu/lib/python3.6/site-packages]

Do you wish to build TensorFlow with Apache Ignite support? [Y/n]: Y
Apache Ignite support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [Y/n]: Y
XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: N
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with ROCm support? [y/N]: N
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 9.0]: 10.0

Please specify the location where CUDA 10.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-10.0

Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 7.4.2

Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-10.0]: /usr/local/cuda-10.0

Do you wish to build TensorFlow with TensorRT support? [y/N]: N
No TensorRT support will be enabled for TensorFlow.

Please specify the NCCL version you want to use. If NCCL 2.2 is not installed, then you can use version 1.3 that can be fetched automatically but it may have worse performance with multiple GPUs. [Default is 2.2]: 2.3.7

Please specify the location where NCCL 2 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-10.0]:/usr/local/cuda-10.0

NCCL found at /usr/local/cuda-10.0/lib64/libnccl.so.2
Assuming NCCL header path is /usr/local/cuda-10.0/lib64/../include/nccl.h
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1]: 6.1

Do you want to use clang as CUDA compiler? [y/N]: N
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: /usr/bin/gcc

Do you wish to build TensorFlow with MPI support? [y/N]: N
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: N
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
--config=mkl # Build with MKL support.
--config=monolithic # Config for mostly static monolithic build.
--config=gdr # Build with GDR support.
--config=verbs # Build with libverbs support.
--config=ngraph # Build with Intel nGraph support.
Configuration finished
(gpu) ub64@hpz2:~/Code/github/tensorflow-gpu$

编译源代码

# CPU only
bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package
# GPU version with CUDA
bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

打 wheel 包并安装

./bazel-bin/tensorflow/tools/pip_package/build_pip_package tensorflow_pkg
pip install tensorflow_pkg/tensorflow-1.12.0-cp36-cp36m-linux_x86_64.whl

2 使用 pip 安装

使用 Python 自带的安装包工具 pip 安装比较方便的。根据目前官网的上的文档，目前 Python3 已经更新到了 Python3.7，但是 TensorFlow 目前只支持到 Python3.6，所以在安装 Python 时安装 Python3.6。另外，pip 的安装和设置请参考 here 。

2.1 可安装的包

官网上给出以下可选安装包：

tensorflow: Current release for CPU-only (recommended for beginners)
tensorflow-gpu: Current release with GPU support (Ubuntu and Windows)
tf-nightly: Nightly build for CPU-only (unstable)
tf-nightly-gpu: Nightly build with GPU support (unstable, Ubuntu and Windows)

直接使用下面指令安装即可：

pip install tensorflow

2.2 更新 TensorFlow 包

pip install --upgrade tensorflow

3 测试程序

官网是给出的使用 keras 运行 mnist 测试样例。

# mnist.py
import tensorflow as tf
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(512, activation=tf.nn.relu),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)

运行上述文件

python mnist.py

4 参考链接

Tensorflow zh-site