TensorFlow 常用模块

pepure · August 9, 2020, 7:56am

数据集的元素数量为张量第 0 位的大小。具体示例如下：

上面这句话有个错别字，应该是第 0 维，

snowkylin · August 10, 2020, 4:07pm

请提供具体代码，以及注意图模式需要使用 @tf.function 。可以参考 https://github.com/snowkylin/tensorflow-handbook/blob/master/source/_static/code/zh/tools/tensorboard/grad_v2.py

snowkylin · August 10, 2020, 4:07pm

感谢，已经修正~

Yeguiiren · August 11, 2020, 9:25am

源代码：

import tensorflow as tf
from B_MLP_CNN import MLP
from B_MLP_CNN import MNISTLoader

# 设置仅在需要时申请显存空间
gpus = tf.config.list_physical_devices (device_type='GPU')
for gpu in gpus:
    tf.config.experimental.set_memory_growth (device=gpu, enable=True)

num_batches = 1000
batch_size = 50
learning_rate = 0.001
log_dir = 'tensorboard'

model = MLP ()
data_loader = MNISTLoader ()
optimizer = tf.keras.optimizers.Adam (learning_rate=learning_rate)
summary_writer = tf.summary.create_file_writer (log_dir)     # 实例化记录器
tf.summary.trace_on (graph = True, profiler=True)            # 开启 Trace
for batch_index in range (num_batches):
    X, y = data_loader.get_batch (batch_size)
    with tf.GradientTape () as tape:
        y_pred = model (X)
        loss = tf.keras.losses.sparse_categorical_crossentropy (y_true=y, y_pred=y_pred)
        loss = tf.reduce_mean (loss)
        print ("batch %d: loss %f" % (batch_index, loss.numpy ()))
        with summary_writer.as_default ():                           # 指定记录器
            tf.summary.scalar ("loss", loss, step=batch_index)       # 将当前损失函数的值写入记录器
    grads = tape.gradient (loss, model.variables)
    optimizer.apply_gradients (grads_and_vars=zip (grads, model.variables))
with summary_writer.as_default ():
    tf.summary.trace_export (name="model_trace", step=0, profiler_outdir=log_dir)    # 保存 Trace 信息到文件

报错提示：

tensorflow-gpu：2.3.0，tensorboard：2.3.0

snowkylin · August 11, 2020, 11:21am

需要使用 @tf.function 以图执行模式执行代码才会有计算图显示出来，默认的即时执行模式是没有计算图的。包括手册正文也提到：

如果使用了 tf.function 建立了计算图，也可以点击 “Graphs” 查看图结构。

@tf.function 使用方式可参考 https://tf.wiki/zh_hans/basic/tools.html#tf-function

Yeguiiren · August 11, 2020, 11:47am

明白了，谢谢解答。另外注意到在 “简单粗暴” 的迁移学习例子中对于 mobilNetV2 图像输入的像素是放缩在了（0，1），而在官网上的手册说 mobileNetV2 图像输入像素应该在（-1，1）之间，不知道这二者对模型性能会造成什么影响？
官网：

“简单粗暴”：

snowkylin · August 13, 2020, 1:13am

这里主要展示 TensorFlow 的使用方式，在这些预处理细节上确实欠考虑。您可以尝试一下改到 [-1, 1] 之间，看看结果是否会有所提升。

Yeguiiren · August 13, 2020, 1:29am

好的，谢谢解答

zhukewen1998 · August 31, 2020, 10:57am

zhukewen1998 · August 31, 2020, 11:00am

Windows PowerShell
版权所有 (C) Microsoft Corporation。保留所有权利。

尝试新的跨平台 PowerShell https://aka.ms/pscore6

PS C:\Users\Steve> conda activate base

CommandNotFoundError: Your shell has not been properly configured to use ‘conda activate’.
If using ‘conda activate’ from a batch script, change your
invocation to ‘CALL conda.bat activate’.

To initialize your shell, run

$ conda init <SHELL_NAME>

Currently supported shells are:

bash
cmd.exe
fish
zsh
powershell

See ‘conda init --help’ for more information and options.

IMPORTANT: You may need to close and restart your shell after running ‘conda init’.

PS C:\Users\Steve> & C:/Users/Steve/Anaconda3/python.exe c:/Users/Steve/PycharmProjects/tensorflow-handbook-master/source/_static/code/zh/model/linear/linear.py
Traceback (most recent call last):
File “c:/Users/Steve/PycharmProjects/tensorflow-handbook-master/source/static/code/zh/model/linear/linear.py", line 1, in
import tensorflow as tf
File "C:\Users\Steve\AppData\Roaming\Python\Python37\site-packages\tensorflow_init.py”, line 41, in
from tensorflow.python.tools import module_util as module_util
File "C:\Users\Steve\AppData\Roaming\Python\Python37\site-packages\tensorflow\python_init.py", line 40, in
from tensorflow.python.eager import context
File “C:\Users\Steve\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\eager\context.py”, line 28, in
from absl import logging
ModuleNotFoundError: No module named ‘absl’
PS C:\Users\Steve>

为什么运行代码库上的代码会有以上报错

snowkylin · August 31, 2020, 11:20am

看起来你的 conda activate base 命令并没有执行成功。在 Windows 下，需要打开开始菜单中的 “Anaconda Prompt” 进入 Anaconda 的命令行环境。请参考 “ TensorFlow 安装与环境配置” 一章确保自己正确安装了 TensorFlow。

zhukewen1998 · August 31, 2020, 12:00pm

我 pycharm 没问题，就 vscode 不行，vscode 在 Anaconda 下 tf2 环境下运行的呀

snowkylin · August 31, 2020, 12:20pm

那可以按照终端的提示，运行 conda init powershell，然后重启 vscode。
本手册推荐使用 PyCharm，我本人在 vscode 下写的 python 程序不多。

sc-learner · September 29, 2020, 9:12am

请问现在profile的使用是不是又不一样了？我直接跑那个mlp和tensor board profile的程序，报了下面这些warning，然后在tensorboard里面没有显示profile的内容。

WARNING:tensorflow:From /mnt/sdb1/miniconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/ops/summary_ops_v2.py:1259: stop (from tensorflow.python.eager.profiler) is deprecated and will be removed after 2020-07-01.
Instructions for updating:
use tf.profiler.experimental.stop instead.
2020-09-29 17:07:57.968717: I tensorflow/core/profiler/internal/gpu/device_tracer.cc:223] GpuTracer has collected 0 callback api events and 0 activity events.
WARNING:tensorflow:From /mnt/sdb1/miniconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/ops/summary_ops_v2.py:1259: save (from tensorflow.python.eager.profiler) is deprecated and will be removed after 2020-07-01.
Instructions for updating:
tf.python.eager.profiler has deprecated, use tf.profiler instead.
WARNING:tensorflow:From /mnt/sdb1/miniconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/eager/profiler.py:151: maybe_create_event_file (from tensorflow.python.eager.profiler) is deprecated and will be removed after 2020-07-01.
Instructions for updating:
tf.python.eager.profiler has deprecated, use tf.profiler instead.

sc-learner · September 30, 2020, 4:46am

tf.data那个猫狗示例我也跑了一下，试了全部四种设置（多线程或prefetch），结果速度也还是差不多的样子，没有上面说的那么明显。这是怎么回事呢？
我试的pc上内存32G，显存8G。

小秀才 · December 4, 2020, 7:53am

请问简单粗暴的tensorflow2这本书是，对于tensorfow2.0以后的版本都适用吗？我发现书中版"tf.train.tensorflow"中，我的gpu-tensorflow2.0中，train类下没有tensorflow，

snowkylin · December 6, 2020, 5:23pm

可以检查是否使用了

pip install -U tensorboard-plugin-profile

安装了 TensorBoard 的 Profile 插件。

关于并行化加速的效果，在不同硬件配置下可能表现不同，建议检查是否正确配置了 GPU 环境。

snowkylin · December 6, 2020, 5:24pm

没有找到本书的哪里有“tf.train.tensorflow”这种写法。如果有的话请指出在哪一节的第几段，或者拍个照。

jjl001 · December 7, 2020, 9:35am

老师，我在写代码时有两个问题向您请教。
1.prefetch可以用多CPU吗？我在2张卡训练时发现GPU瞬间的利用率非常高，能达到100%，但持续时间很短。有时候会变成0，有时候一个卡高一个卡低。这个是prefetch导致的吗？发现CPU利用率低，所以想问一下prefetch能不能多核运算。
2.在使用多卡时会报错。 No OpKernel was registered to support Op ‘NcclAllReduce’ used by {{node Adam/NcclAllReduce}} with these attrs:[reduction=‘sum’, shared_name=‘c1’, T=DT_FLOAT, num_devices=2],
目前参考https://www.zhihu.com/question/356838795/answer/905231600 进行修改，但虽然能够运行，训练loss=nan。

snowkylin · December 7, 2020, 3:27pm

按照我的理解，Prefetch主要是预读取数据，瓶颈在于磁盘IO速度而非运算过程。
这个我也没有什么经验。一般来说在Linux底下操作坑比较少。