九幽:
那个 create_inference_graph 好像知道怎么写了,其实明白过来感觉之前的问答都比较边缘化,,,我是一直没清楚这个推理图该怎么创建,里面该怎么写,然后今天又综合考虑了一下,感觉训练图和推理图应该是一个图啊,也就是你说的他们都用的 create_model,所以他们就是一样的网络结构,不是我之前问的说 create_inference_graph 里面是网络层结构定义,这个定义是在 create_model 里的,而我不明白的地方其实是 create_model 之前的那些语句,结果那些语句原来是数据处理,跟网络结构没关系,,,,
所以我的 mnist 里面就应该直接调用之前训练用的 build_network,那里面是网络定义,所以在推理图的部分应该再次调用 build_network,然后直接用训练时用的 output 给 create_eval_graph 来用
2018-11-30 16:01
九幽:
我还是在 mnist 里面练习了一下,想验证下上面说的对不对,但是怎么改都报错,我把所有步骤都放到一个文件里了,因为像 speech 那样分开写报错也一样
我觉得结构应该是没什么问题,就是细节的语法可能有问题,写好之后一直报的错误都是 ValueError: Training op found in graph, exiting {‘ApplyGradientDescent’}
然后我感觉是因为没有创建出两个图来,但是创建了两个图,又报错说数据应该来源于一个图,,,
不知道大神有没有时间看一下代码,,我是不知道怎么改了,,,不确定我写的对不对,,,先贴在这吧,,
import tensorflow as tf
import os.path
from tensorflow.python.framework import graph_util
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets ("MNIST_data/", one_hot=True)
x = tf.placeholder ("float", shape=[None, 784], name='input')
y = tf.placeholder ("float", shape=[None, 10], name='labels')
keep_prob = tf.placeholder ("float", name='keep_prob')
def build_network ():
def weight_variable (shape):
initial = tf.truncated_normal (shape, stddev=0.1)
return tf.Variable (initial)
def bias_variable (shape):
initial = tf.constant (0.1, shape=shape)
return tf.Variable (initial)
# convolution and pooling
def conv2d (x, W):
return tf.nn.conv2d (x, W, strides=[1, 1, 1, 1], padding='VALID')
def max_pool_2x2 (x):
return tf.nn.max_pool (x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')
# convolution layer
def lenet5_layer (layer, weight, bias):
W_conv = weight_variable (weight)
b_conv = bias_variable (bias)
h_conv = conv2d (layer, W_conv) b_conv
return max_pool_2x2 (h_conv)
# connected layer
def dense_layer (layer, weight, bias):
W_fc = weight_variable (weight)
b_fc = bias_variable (bias)
return tf.matmul (layer, W_fc) b_fc
# first layer
with tf.name_scope ('first') as scope:
x_image = tf.pad (tf.reshape (x, [-1,28,28,1]), [[0,0],[2,2],[2,2],[0,0]])
firstlayer = lenet5_layer (x_image, [5,5,1,6], [6])
# second layer
with tf.name_scope ('second') as scope:
secondlayer = lenet5_layer (firstlayer, [5,5,6,16], [16])
# third layer
with tf.name_scope ('third') as scope:
W_conv3 = weight_variable ([5,5,16,120])
b_conv3 = bias_variable ([120])
thirdlayerconv = conv2d (secondlayer, W_conv3) b_conv3
thirdlayer = tf.reshape (thirdlayerconv, [-1,120])
# dense layer1
with tf.name_scope ('dense1') as scope:
dense_layer1 = dense_layer (thirdlayer, [120,84], [84])
# dense layer2
with tf.name_scope ('dense2') as scope:
dense_layer2 = dense_layer (dense_layer1, [84,10], [10])
finaloutput = tf.nn.softmax (tf.nn.dropout (dense_layer2, keep_prob), name="softmax")
print ('finaloutput:', finaloutput)
return finaloutput
def create_training_graph ():
# g = tf.get_default_graph ()
logits = build_network ()
# Create the back propagation and training evaluation machinery in the graph.
with tf.name_scope ('cross_entropy'):
cross_entropy_mean = tf.reduce_mean (tf.nn.softmax_cross_entropy_with_logits_v2 (labels=y, logits=logits))
print ('cost:', cross_entropy_mean)
# if FLAGS.quantize:
tf.contrib.quantize.create_training_graph (quant_delay=0) # input_graph=g,
optimize = tf.train.GradientDescentOptimizer (1e-5).minimize (cross_entropy_mean)
prediction_labels = tf.argmax (logits, axis=1, name="output")
correct_prediction = tf.equal (tf.argmax (logits, 1), tf.argmax (y, 1))
accuracy = tf.reduce_mean (tf.cast (correct_prediction, "float"))
with tf.get_default_graph ().name_scope ('eval'):
tf.summary.scalar ('cross_entropy', cross_entropy_mean)
tf.summary.scalar ('accuracy', accuracy)
# if is_training:
return dict (
x=x,
y=y,
keep_prob=keep_prob,
optimize=optimize,
cost=cross_entropy_mean,
correct_prediction=correct_prediction,
accuracy=accuracy,
)
def train_network (graph):
init = tf.global_variables_initializer ()
saver = tf.train.Saver ()
with tf.Session () as sess:
sess.run (init)
for i in range (200):
batch = mnist.train.next_batch (50)
if i % 100 == 0:
train_accuracy = sess.run ([graph ['accuracy']], feed_dict={
graph ['x']:batch [0],
graph ['y']:batch [1],
graph ['keep_prob']: 1.0})
print ("step %d, training accuracy %g"%(i, train_accuracy [0]))
sess.run ([graph ['optimize']], feed_dict={
graph ['x']:batch [0],
graph ['y']:batch [1],
graph ['keep_prob']:0.5})
test_accuracy = sess.run ([graph ['accuracy']], feed_dict={
graph ['x']: mnist.test.images,
graph ['y']: mnist.test.labels,
graph ['keep_prob']: 1.0})
print ("Test accuracy %g" % test_accuracy [0])
saver.save (sess, '/home/angela/tensorflow/tensorflow/Mnist_train/mnist_fakequantize.ckpt')
tf.train.write_graph (sess.graph_def, '/home/angela/tensorflow/tensorflow/Mnist_train/', 'mnist_fakequantize.pbtxt', True)
def main ():
g1 = create_training_graph ()
train_network (g1)
sess = tf.InteractiveSession ()
g2 = tf.Graph ()
with g2.as_default ():
build_network () # is_training=False
# if FLAGS.quantize:
tf.contrib.quantize.create_eval_graph ()
# load_variables_from_checkpoint (sess, '/home/angela/tensorflow/tensorflow/Mnist_train/mnist_fakequantize.ckpt')
# Turn all the variables into inline constants inside the graph and save it.
frozen_graph_def = graph_util.convert_variables_to_constants (
sess, sess.graph_def, ['softmax'])
tf.train.write_graph (
frozen_graph_def,
os.path.dirname ('/home/angela/tensorflow/tensorflow/Mnist_train/mnist_frozen_graph.pb'),
os.path.basename ('/home/angela/tensorflow/tensorflow/Mnist_train/mnist_frozen_graph.pb'),
as_text=False)
tf.logging.info ('Saved frozen graph to %s', '/home/angela/tensorflow/tensorflow/Mnist_train/mnist_frozen_graph.pb')
main ()
2018-11-30 19:13
Zongjun: 回复 九幽:
感觉你彻底明白了哈。inference 用的图是用同一个 create_model 函数产生出来的,用来 inference 的图结构当然得和训练的时候的结构一样啊,要是图都不一样,训练了也没用啊。只是 create_training_graph 和 create_eval_graph 对各自的图的处理效果不同而已。所以还是必须要创建两个图出来。
我大体看了一下你的代码,你的这个报错的源代码是这个链接:https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/quantize/python/quantize_graph.py
显然你这个报错来源于 create_training_graph 或者 create_eval_graph 或者两者皆报错。我的建议是和 speech 那样分开写两个文件,各创造一个图。看看这个报错是从 train 出来的还是从 eval 出来的。这样比较好 debug。
2018-12-1 01:49
九幽: 回复 Zongjun:
恩,现在不像之前思路不清无从下手了,至少可以动手写一写了,分开的我也写了,报的错误就是上面提到的,训练节点已经存在的问题
我看位置是 create_eval_graph 那里,因为 create_eval_graph 那部分单独写就是像 freeze.py 一样的结构,然后调用网络那里它又训练了,我写的 if is_training 也没用,可能没写对吧,,,我记得分开也是一样的,我再试试吧,应该就是推理图那里调用网络结构有问题,,,
2018-12-1 10:54
九幽: 回复 Zongjun:
分开写一直是训练不报错,freeze 报错,错误:
ValueError: Tensor (“first/Variable:0”, shape=(5, 5, 1, 6), dtype=float32_ref) must be from the same graph as Tensor (“Pad:0”, shape=(?, 32, 32, 1), dtype=float32).
这个 float32_ref 的错误,我之前用 frozen 指令转化 pb 模型的时候就报过这个错误,这个是训练的数据类型和 freeze 用的不一样吗?
2018-12-3 13:01
Zongjun: 回复 九幽:
这个报错我没遇见过,感觉和你的 Model 有关系。你得确保两个图是完全分开的,然后两个图的结构一致(除了 dropout 这些只属于 training 的 node)。我简单搜了一下,stackoverflow 上有一个类似的问题,看看能不能帮到你,链接:python 2.7 - ValueError: Tensor must be from the same graph as Tensor - Stack Overflow
2018-12-4 01:53
九幽: 回复 Zongjun:
多谢,改了确实不报错了,我分开写的完全是按照 speech 的例子写的,顺序结构都一样照搬过去的,但现在就是运行 freeze 就会重新训练,然后调试了一下,发现到 create_eval_graph 就不运行了,也保存不出最后的 pb 文件,,,
freeze 这里我只改了 create_inference_graph,写的是
logits = mnist_fakequantize.build_network (is_training=False)
tf.nn.softmax (logits, name=‘output’)
只调用 build_network,build_network 改成了去掉后面 optimize 那部分的,只到 finaloutput 那句话,就是这样,然后就不知道为啥不保存了,,,,
至于又训练一遍,难道是训练的开关没加好?但只调用 build_network,为什么还会训练呢?想不明白,,
最后,mnist 原来用的是 AdamOptimizer,几百次就到了 0.9 多,但是换成了 GradientDescentOptimizer,两万次准确率也很低,,只改这一处,别的没动,我看网上有人换了准确率也是 0.9 多,就是不知道我的为什么这么低,一直不收敛
2018-12-4 16:11
九幽: 回复 Zongjun:
重复训练的问题改好了,好开心~原来还是要把 build_network 单独提出来,我重写了一个文件放进去,然后再调用就好了~查看 pb 图和 tflite 转化的图也都是对的了~
至于那个 AdamOptimizer 和 GradientDescentOptimizer 的问题,是不是用哪个都行啊?根据自己的情况选择合适的优化器?
但是我这里两个出来的准确率相差那么悬殊是什么原因呢?
2018-12-4 17:44
Zongjun: 回复 九幽:
哈哈,恭喜你走完整个流程了。optimizer 还是用 mnist 原来的吧,具体差别可以看看 adam 的 paper。adam 应该用的是 moving average of parameters。用 gradient descent 那个你要调 learning rate 这个参数的,learning rate 过大就不会收敛。你用 adam 准确略正常的话就还是用 adam 吧,毕竟 mnist 默认的。
2018-12-5 02:25
根据帖子点评整理。