Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inference error Assign requires shapes of both tensors to match. lhs shape= [5472,410] rhs shape= [84827,410] #15

Open
lk1983823 opened this issue Aug 27, 2019 · 7 comments

Comments

@lk1983823
Copy link

我对Poetry的数据进行了训练, 完成以后使用保存的模型进行推断发现如下错误:

Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from EXP-poetry_mem50/model-2000.ckpt
Traceback (most recent call last):
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [5472,410] rhs shape= [84827,410]
	 [[{{node save/Assign}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1276, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [5472,410] rhs shape= [84827,410]
	 [[node save/Assign (defined at train_gpu.py:571) ]]

Caused by op 'save/Assign', defined at:
  File "train_gpu.py", line 740, in <module>
    tf.app.run()
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "train_gpu.py", line 501, in main
    inference(n_token, cutoffs, "/gpu:0")
  File "train_gpu.py", line 571, in inference
    saver = tf.train.Saver()
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 832, in __init__
    self.build()
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 844, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 881, in _build
    build_save=build_save, build_restore=build_restore)
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 513, in _build_internal
    restore_sequentially, reshape)
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 354, in _AddRestoreOps
    assign_ops.append(saveable.restore(saveable_tensors, shapes))
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/training/saving/saveable_object_util.py", line 73, in restore
    self.op.get_shape().is_fully_defined())
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/ops/state_ops.py", line 223, in assign
    validate_shape=validate_shape)
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/ops/gen_state_ops.py", line 64, in assign
    use_locking=use_locking, name=name)
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [5472,410] rhs shape= [84827,410]
	 [[node save/Assign (defined at train_gpu.py:571) ]]


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "train_gpu.py", line 740, in <module>
    tf.app.run()
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "train_gpu.py", line 501, in main
    inference(n_token, cutoffs, "/gpu:0")
  File "train_gpu.py", line 583, in inference
    saver.restore(sess, eval_ckpt_path)
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1312, in restore
    err, "a mismatch between the current graph and the graph")
tensorflow.python.framework.errors_impl.InvalidArgumentError: Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Assign requires shapes of both tensors to match. lhs shape= [5472,410] rhs shape= [84827,410]
	 [[node save/Assign (defined at train_gpu.py:571) ]]

Caused by op 'save/Assign', defined at:
  File "train_gpu.py", line 740, in <module>
    tf.app.run()
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "train_gpu.py", line 501, in main
    inference(n_token, cutoffs, "/gpu:0")
  File "train_gpu.py", line 571, in inference
    saver = tf.train.Saver()
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 832, in __init__
    self.build()
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 844, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 881, in _build
    build_save=build_save, build_restore=build_restore)
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 513, in _build_internal
    restore_sequentially, reshape)
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 354, in _AddRestoreOps
    assign_ops.append(saveable.restore(saveable_tensors, shapes))
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/training/saving/saveable_object_util.py", line 73, in restore
    self.op.get_shape().is_fully_defined())
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/ops/state_ops.py", line 223, in assign
    validate_shape=validate_shape)
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/ops/gen_state_ops.py", line 64, in assign
    use_locking=use_locking, name=name)
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/home/lukuan/.pyenv/versions/anaconda3-5.0.1/envs/lk_TC/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Assign requires shapes of both tensors to match. lhs shape= [5472,410] rhs shape= [84827,410]
	 [[node save/Assign (defined at train_gpu.py:571) ]]

在训练的过程中, 由于现存受限, 我对模型的参数进行了修改, 设置为
N_LAYER=2 (减少了模型的层数)
BSZ=64, TGT_LEN=100(为了能够找到/data/poetry/record_info-train.bsz-64.tlen-100.json)
train_steps=1000(为了能够尽快看到验证的效果)
save_steps=400
在inference阶段, 我在train_gpu.py 的504行, main函数中,修改了dataset_name = "poetry"
但是出现以上错误.
不知道是不是因为我修改了上述参数所导致? 谢谢

@lk1983823
Copy link
Author

我在训练doupo的时候,也是出现了同样的问题.

@Yuanchenbo
Copy link

楼上的两位,请问问题解决了没

@976985088
Copy link

训练doupo没有改任何参数,推断出现同样的问题

@ZJiaBin
Copy link

ZJiaBin commented Sep 26, 2019

我遇到了类似的问题发现是因为词表不一致的问题,可以看一下 train_gpu.py 文件中,inference函数的dataset_name与train函数是否一致?

@scorpio2017
Copy link

我遇到了类似的问题发现是因为词表不一致的问题,可以看一下 train_gpu.py 文件中,inference函数的dataset_name与train函数是否一致?

什么意思?与train函数的什么是否一致?

@GaoPeng97
Copy link
Owner

抱歉各位,回复的比较晚,大家看下readme 中”注意在inference的时候记得修改train_gpu.py中第504行,改成你想inference的数据集名字“ 有没有注意到

@GaoPeng97
Copy link
Owner

各位 不知道你们还在不在这个repo上跟进,回复各位较晚了 非常抱歉,问题已经修复,感谢你们的反馈, 问题来源于之前在data/doupo/ 下有一个记录corpus 的信息的json,和一个cache.pkl, 这样在运行bash base_doupo_gpu.sh train_data 时,如果识别到cache.pkl, 就不会重新生成训练数据,会直接使用原来的有问题的那个,所以简单的解决办法就是把除*.txt 以外的都直接删掉即可。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants