Skip to content

Commit

Permalink
feat(openalex): Sort affiliations by number of works
Browse files Browse the repository at this point in the history
  • Loading branch information
annelhote committed Jan 17, 2025
1 parent cee5e6d commit 7ad877b
Show file tree
Hide file tree
Showing 5 changed files with 8 additions and 7 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Works magnet
# Works-magnet

[![Discord Follow](https://dcbadge.vercel.app/api/server/TudsqDqTqb?style=flat)](https://discord.gg/TudsqDqTqb)
![license](https://img.shields.io/github/license/dataesr/works-magnet)
Expand Down
2 changes: 1 addition & 1 deletion client/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
<meta name="google-site-verification" content="_w-F9sijoMQg6zOyO8yiJOAm_ZYxQ720ysRRq9K2psM" />
<meta name="google-site-verification" content="6hRuX0N3vV6ahdIot4Od8sI5aABG9Q1yYN-R8FpSy5w" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Works magnet</title>
<title>Works-magnet</title>
</head>

<body>
Expand Down
8 changes: 4 additions & 4 deletions client/src/i18n/en.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"corpus-title": "Build up a corpus of publications or datasets",
"feedback-description-1": "AI techniques are used on a large scale on the corpus of publications. In particular, OpenAlex associates affiliation signatures with ROR identifiers. Errors and omissions can occur, distorting the results obtained from OpenAlex data, such as the (open) Leiden ranking. Helping to correct these data improves the reliability and representativeness of these analyses.",
"feedback-description-2": "Similarly, tools are used to detect mentions of software and data sets in the full text of publications. Here too, these detections may contain errors. The works-magnet can be used to explore all the mentions detected in the French corpus and to report any errors. These reports will be added to the learning database, helping to train new, more reliable detection models.",
"feedback-description-2": "Similarly, tools are used to detect mentions of software and data sets in the full text of publications. Here too, these detections may contain errors. The Works-magnet can be used to explore all the mentions detected in the French corpus and to report any errors. These reports will be added to the learning database, helping to train new, more reliable detection models.",
"feedback-title": "Improve the automatic detections made by AI",
"datasets-tile-title": "🗃 Find the datasets affiliated to your institution",
"datasets-tile-detail-1": "🔎 Explore the most frequent raw affiliation strings retrieved in the French Open Science Monitor data and in OpenAlex for your query (datasets only).",
Expand All @@ -21,7 +21,7 @@
"publications-tile-detail-3": "💾 Save (export to a file) those decisions and the publications corpus you just built.",
"tagline": "Retrieve and promote the scholarly works of your institution.",
"about-title": "Works-magnet : a tool for metadata specialists",
"about-1": "The works-magnet offers a number of curation functions. The tool is therefore aimed at specialists who can judge the quality of metadata. The tool interrogates various massive databases (OpenAlex, BSO, Datacite) and formats the information to facilitate exploration and correction where necessary. These massive databases each use large-scale automatic processing tools. An expert curatorial eye is needed to continue to improve the quality of the metadata. As far as possible, the corrections resulting from these curations are fed back upstream to ensure that the same errors are not repeated. The aim is to build up a set of high-quality metadata that can be re-used by anyone who wishes to do so. The aim is to propose a new curation paradigm: curation is not carried out in a two-way relationship between a producer and a consumer of data. Instead, a group of data users propose corrections that are visible to all, and that can benefit everyone. As well as correcting the cases reported, in some cases the data collected can also be re-used as learning data for algorithms, so that the models used in the future are more effective.",
"about-2": "One of the major features of the works-magnet is the improvement of affiliation metadata. This is of vital importance in the bibliometric analyses carried out by institutions. They are also used to establish the Leiden ranking (open). Improving the quality of this data will enable us to benefit collectively from more reliable monitoring and analysis tools in the future.",
"about-3": "The works-magnet aims to fill a gap: at a time when ‘open research information’ (to use the terms of the Barcelona Declaration) is progressing rapidly, there is a clear need to draw on the knowledge of experts to raise the level of quality of metadata. It is highly likely that in the months and years to come, more structured networks for collecting information will be set up, for example by OpenAlex or following the work of the COMET working group initiated by the California Digital Library. It may take a long time to set up this kind of structure, and the works-magnet is positioning itself to meet this need as of now."
"about-1": "The Works-magnet offers a number of curation functions. The tool is therefore aimed at specialists who can judge the quality of metadata. The tool interrogates various massive databases (OpenAlex, BSO, Datacite) and formats the information to facilitate exploration and correction where necessary. These massive databases each use large-scale automatic processing tools. An expert curatorial eye is needed to continue to improve the quality of the metadata. As far as possible, the corrections resulting from these curations are fed back upstream to ensure that the same errors are not repeated. The aim is to build up a set of high-quality metadata that can be re-used by anyone who wishes to do so. The aim is to propose a new curation paradigm: curation is not carried out in a two-way relationship between a producer and a consumer of data. Instead, a group of data users propose corrections that are visible to all, and that can benefit everyone. As well as correcting the cases reported, in some cases the data collected can also be re-used as learning data for algorithms, so that the models used in the future are more effective.",
"about-2": "One of the major features of the Works-magnet is the improvement of affiliation metadata. This is of vital importance in the bibliometric analyses carried out by institutions. They are also used to establish the Leiden ranking (open). Improving the quality of this data will enable us to benefit collectively from more reliable monitoring and analysis tools in the future.",
"about-3": "The Works-magnet aims to fill a gap: at a time when ‘open research information’ (to use the terms of the Barcelona Declaration) is progressing rapidly, there is a clear need to draw on the knowledge of experts to raise the level of quality of metadata. It is highly likely that in the months and years to come, more structured networks for collecting information will be set up, for example by OpenAlex or following the work of the COMET working group initiated by the California Digital Library. It may take a long time to set up this kind of structure, and the Works-magnet is positioning itself to meet this need as of now."
}
2 changes: 1 addition & 1 deletion client/src/i18n/fr.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"corpus-title": "Constituer un corpus de publications ou de jeux de données",
"feedback-description-1": "Des techniques d'IA sont utilisées à grande échelle sur le corpus des publications. En particulier, OpenAlex associe ainsi les signatures d'affiliation à des identifiants ROR. Des erreurs et oublis peuvent advenir, faussant les résultats obtenus à partir des données OpenAlex comme le classement (ouvert) de Leiden. Contribuer à corriger ces données améliore la fiabilité et la représentativité de ces analyses.",
"feedback-description-2": "De même, des outils de détection des mentions de logiciels et de jeux de données sont utilisées sur le texte intégral des publications. Là aussi, ces détections peuvent contenir des erreurs. Le works-magnet permet d'explorer toutes les mentions détectées sur le corpus français et de signaler les erreurs. Ces signalements seront ajoutés à la base d'apprentissage, aidant ainsi à entraîner de nouveaux modèles de détection plus fiables.",
"feedback-description-2": "De même, des outils de détection des mentions de logiciels et de jeux de données sont utilisées sur le texte intégral des publications. Là aussi, ces détections peuvent contenir des erreurs. Le Works-magnet permet d'explorer toutes les mentions détectées sur le corpus français et de signaler les erreurs. Ces signalements seront ajoutés à la base d'apprentissage, aidant ainsi à entraîner de nouveaux modèles de détection plus fiables.",
"feedback-title": "Améliorer les détections automatiques faites par IA",
"datasets-tile-title" : "🗃 Les jeux de données de mon institution",
"datasets-tile-detail-1" : "🔎 Explorer les signatures d'affiliation brutes les plus fréquentes récupérées dans les données françaises de l'Open Science Monitor et dans OpenAlex pour votre requête (jeux de données uniquement)",
Expand Down
1 change: 1 addition & 0 deletions server/src/routes/affiliations.routes.js
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,7 @@ const getOpenAlexAffiliations = async ({ options, resetCache = false }) => {
console.time(
`4. Query ${queryId} | Serialization ${options.affiliationStrings}`,
);
uniqueAffiliations.sort((a, b) => b.worksNumber - a.worksNumber);
const affiliations = await chunkAndCompress(uniqueAffiliations);
console.log(
'serialization',
Expand Down

0 comments on commit 7ad877b

Please sign in to comment.