Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can i get entity vecs given specified entity set #23

Open
Cantoria opened this issue Mar 27, 2019 · 4 comments
Open

How can i get entity vecs given specified entity set #23

Cantoria opened this issue Mar 27, 2019 · 4 comments

Comments

@Cantoria
Copy link

Hi, I read your code and i know i can get all entity vecs by changing learn_a.lua -entities flag. I don't need such big vec set. How can i train entity vecs given specified entity set? Thanks.

@Cantoria
Copy link
Author

By the way, when i run step 9,(I didn't run steps before, but i've downloaded all files in polybox)
it appears an error

==> Loading entity wikiid - name map ---> t7 file NOT found. Loading from disk (slower). Out f = /home/xuhongbo/syh/syh/deep-ed/data/generated/ent_name_id_map.t7 ==> Loading disambiguation index Done loading disambiguation index Still loading entity wikiid - name map ... /home/xuhongbo/torch/install/bin/lua: ...me/xuhongbo/torch/install/share/lua/5.1/tds/hash.lua:108: bad argument #1 to 'pairs' (table expected, got userdata) stack traceback: [C]: in function 'pairs' ...me/xuhongbo/torch/install/share/lua/5.1/tds/hash.lua:108: in function 'write' .../xuhongbo/torch/install/share/lua/5.1/torch/File.lua:210: in function 'writeObject' .../xuhongbo/torch/install/share/lua/5.1/torch/File.lua:388: in function 'save' entities/ent_name2id_freq/ent_name_id.lua:76: in main chunk [C]: in function 'dofile' entities/ent_name2id_freq/e_freq_gen.lua:16: in main chunk [C]: in function 'dofile' .../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: ?

It seems that Some errors happened in generating file ent_name_id_map.t7, and i got a file ent_name_id_map.t7 in generated file(only 35B). I really don't know lua language, Please tell me what's wrong, thanks!

@octavian-ganea
Copy link
Contributor

Hi. The set of entities for which the current code trains entity embeddings is defined here:
https://github.com/dalab/deep-ed/blob/master/entities/relatedness/relatedness.lua#L253-L328

You would have to modify this code to train with a different set of entities.

As per your error, I am not sure. Try to delete your ent_name_id_map.t7 and redo that step. These t7 files are not rewritten when you change code or data and thus, have to be deleted manually and then regenerated.

@Cantoria
Copy link
Author

Cantoria commented Apr 13, 2019

https://github.com/dalab/deep-ed/blob/master/entities/relatedness/relatedness.lua#L253-L328

Hi, I've got the reason why the error happens. I used lua 5.1, and it doesn't support torch. So i installed lua 5.3. It works.
Besides, for the first question, I've modify the codes, so i can train the ent vec via specific entity set. But i got some t7 files in ./data/generated/ent_vecs path. Here is moditied code (in main code, former line 253-328):

if not paths.filep(rewtr_t7filename) then
  print('  ---> t7 file NOT found. Loading reltd_ents_wikiid_to_rltdid from txt file instead (slower).')

  -- Gather the restricted set of entities for which we train entity embeddings:
  local rltd_all_ent_wikiids = tds.Hash()

  -- 1) From the relatedness dataset
  for ent_wikiid,_ in pairs(reltd_ents_direct_validate) do
    rltd_all_ent_wikiids[ent_wikiid] = 1
  end
  for ent_wikiid,_ in pairs(reltd_ents_direct_test) do
    rltd_all_ent_wikiids[ent_wikiid] = 1
  end

  -- 1.1) From a small dataset (used for debugging / unit testing).
  for _,line in pairs(ent_lines_4EX) do
    local parts = split(line, '\t')
    assert(table_len(parts) == 3)
    ent_wikiid = tonumber(parts[1])
    assert(ent_wikiid)
    rltd_all_ent_wikiids[ent_wikiid] = 1
  end

  -- 2) From all ED datasets: (I 've deleted)
  --3) From specific entity set (Here i add some code)
  local specific_entity_files = 'specific_entity_file'
  if not paths.filep(opt.root_data_dir .. 'basic_data/' .. specific_entity_files) then
    print("No specific entity file!")
  else
    dofile 'entities/ent_name2id_freq/ent_name_id.lua'
    it, _ = io.open(opt.root_data_dir .. 'basic_data/' .. specific_entity_files)
    local line = it:read()
    while(line) do
      ent_wikiid = e_id_name.ent_name2wikiid[line]
      rltd_all_ent_wikiids[ent_wikiid] = 1
    end
  end
  --codes below aren't changed
  -- Insert unk_ent_wikiid
  local unk_ent_wikiid = 1
  rltd_all_ent_wikiids[unk_ent_wikiid] = 1
  
  -- Sort all wikiids
  local sorted_rltd_all_ent_wikiids = tds.Vec()
  for ent_wikiid,_ in pairs(rltd_all_ent_wikiids) do
    sorted_rltd_all_ent_wikiids:insert(ent_wikiid)
  end
  sorted_rltd_all_ent_wikiids:sort(function(a,b) return a < b end)
  
  local reltd_ents_wikiid_to_rltdid = tds.Hash()
  for rltd_id,wikiid in pairs(sorted_rltd_all_ent_wikiids) do
    reltd_ents_wikiid_to_rltdid[wikiid] = rltd_id
  end
  
  rewtr = tds.Hash()
  rewtr.reltd_ents_wikiid_to_rltdid = reltd_ents_wikiid_to_rltdid
  rewtr.reltd_ents_rltdid_to_wikiid = sorted_rltd_all_ent_wikiids
  rewtr.num_rltd_ents = #sorted_rltd_all_ent_wikiids

  print('Writing reltd_ents_wikiid_to_rltdid to t7 File for future usage.')
  torch.save(rewtr_t7filename, rewtr)
  print('    Done saving.')

Is that correct?(specific entity files record entity name per line)
And i noticed you added a small dataset in step 1 and step 1.1. Can i remove this step? If i can't, does the small dataset influence the final entity vec?

@octavian-ganea
Copy link
Contributor

Thanks for your input.

Yes, the small dataset in 1.1 can be removed, it was just for debugging (containing < 10 entities if i recall well).

To access the specific entity vectors, you have first to load the t7 file via https://github.com/dalab/deep-ed/blob/master/entities/relatedness/relatedness.lua#L331 and then access the specific entity vectors using the dictionaries in the rewtr hashtable object.
https://github.com/dalab/deep-ed/blob/master/entities/relatedness/relatedness.lua#L321-L324
Given a wiki ID of an entity, you first find its rltdid using rewtr.reltd_ents_wikiid_to_rltdid[your_wiki_id], and then you access its embedding using the rltdid row of the entity embedding tensor (from the t7 file). See an example here: https://github.com/dalab/deep-ed/blob/master/entities/pretrained_e2v/e2v.lua#L3-L28 . Sorry, this code could have been made easier ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants