Skip to content

Commit

Permalink
Merge pull request #1178 from CDL-Dryad/ror-funder-test
Browse files Browse the repository at this point in the history
For examining crossref funders vs ror mapping
  • Loading branch information
sfisher authored Apr 24, 2023
2 parents 7e06f20 + 111c282 commit dde74e1
Show file tree
Hide file tree
Showing 4 changed files with 87 additions and 0 deletions.
5 changes: 5 additions & 0 deletions app/models/stash_engine/xref_funder_to_ror.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
module StashEngine
class XrefFunderToRor < ApplicationRecord
self.table_name = 'stash_engine_xref_funder_to_rors'
end
end
8 changes: 8 additions & 0 deletions db/migrate/20230419184528_create_xref_funder_to_rors.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
class CreateXrefFunderToRors < ActiveRecord::Migration[6.1]
def change
create_table :stash_engine_xref_funder_to_rors do |t|
t.string :xref_id, index: true
t.string :ror_id, index: true
end
end
end
35 changes: 35 additions & 0 deletions documentation/sql_queries/fundref_to_ror_comparisons.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Crossref Funder ID to ROR comparison

## Evaluating the mapping of Crossref Funder IDs to ROR IDs

In order to evaluate how well these identifiers map from one to the other, it's easiest to get the identifiers into
our database so we can join and compare them.

Run a rake task like this example which will put a mapping in stash_engine_xref_funder_to_rors:
```bash
RAILS_ENV=<environment> bundle exec rails affiliation_import:populate_funder_ror_mapping <path/to/ror/dump.json>
```

After the import you can run the following query to see how items in the database map and that
names seem sane where a mapping exists.

```sql
SELECT DISTINCT c.contributor_name as xref_name, x.`xref_id`, x.ror_id, r.name as ror_name
FROM dcs_contributors c
JOIN `stash_engine_xref_funder_to_rors` x
ON c.name_identifier_id = x.`xref_id`
JOIN stash_engine_ror_orgs r
ON x.ror_id = r.ror_id
WHERE c.identifier_type = 'crossref_funder_id' and c.contributor_type = 'funder';
```

To see unmatched identifiers where a ror matching doesn't exist from the identifiers:

```sql
SELECT DISTINCT c.contributor_name as xref_name, c.name_identifier_id
FROM dcs_contributors c
LEFT JOIN stash_engine_xref_funder_to_rors x
ON c.name_identifier_id = x.xref_id
WHERE c.identifier_type = 'crossref_funder_id' and c.contributor_type = 'funder'
AND c.name_identifier_id <> '' AND x.ror_id IS NULL;
```
39 changes: 39 additions & 0 deletions lib/tasks/affiliation_import.rake
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,45 @@ namespace :affiliation_import do
puts "DONE! Elapsed time: #{Time.at(Time.now - start_time).utc.strftime('%H:%M:%S')}"
end

desc 'Populate fundref_id to ror_id mapping table'
task populate_funder_ror_mapping: :environment do
$stdout.sync = true # keeps stdout from buffering which causes weird delays such as with tail -f

if ARGV.length != 1
puts 'Please enter the path to the ROR dump json file as an argument'
exit
end

ror_dump_file = ARGV[0]
exit unless File.exist?(ror_dump_file)

ActiveRecord::Base.connection.truncate(StashEngine::XrefFunderToRor.table_name)
fundref_ror_mapping = {}
File.open(ror_dump_file, 'r') do |f|
data = JSON.parse(f.read)
data.each do |org|
ror_id = org['id']
fundref_ids = org.dig('external_ids', 'FundRef', 'all')
next if fundref_ids.blank?

fundref_ids.each do |fundref_id|
fundref_ror_mapping[fundref_id] = ror_id
end
end
end

to_insert = []
fundref_ror_mapping.each_with_index do |(fundref_id, ror_id), index|
to_insert << { xref_id: "http://dx.doi.org/10.13039/#{fundref_id}", ror_id: ror_id }
if index % 1000 == 0 && index > 0
StashEngine::XrefFunderToRor.insert_all(to_insert)
to_insert = []
end
end
StashEngine::XrefFunderToRor.insert_all(to_insert) unless to_insert.empty?
puts 'Done updating fundref to ror mapping table'
end

def do_author_merge(a1, a2)
# keep the text of the name that is longest, but
# keep the affiliation for the author that was updated most recently
Expand Down

0 comments on commit dde74e1

Please sign in to comment.