Skip to content

Unveiling the Spatiotemporal Dynamics of Global Brain Circulation: A Comprehensive Corpus (2000-2024)

Notifications You must be signed in to change notification settings

Computational-social-science/GBCD

Repository files navigation

Global Brain Circulation Dynamics (GBCD) corpus

The global competition for human capital is fuelled by intricate brain circulation dynamics, where individuals with specialized skills traverse geographic, organizational, and national boundaries to address workforce demands. However, a comprehensive framework for integrating and interpreting heterogeneous data on global brain circulation remains elusive. Here we introduce the Global Brain Circulation Dynamics (GBCD) corpus, a longitudinally integrated repository of geo-information encompassing 223 countries/regions from 2000 to 2024.

Garnered from diachronic narrative texts, the GBCD corpus provides granular insights into transnational brain circulation patterns and their interconnections with sociocultural progress. Continuously updated to reflect spatiotemporal dynamics, the GBCD corpus serves as a definitive reference for real-time and ex-post analysis of global brain circulation. Our analysis reveals two pivotal findings:

  • narrative brain circulation closely mirrors physical brain mobility
  • geopolitical relations and spatiotemporal dynamics exhibit distinct patterns across countries/regions

The GBCD corpus establishes a novel benchmark for examining spatiotemporal brain circulation worldwide, empowering policymakers to develop evidence-based strategies for attracting and retaining human capital in rapidly evolving global landscape.

Corpus

The GBCD corpus is a comprehensive dataset comprising 2,904,663,710 tokens, structured into two distinct corpora: diachronic and synchronic. The corpus encompasses 1,764,234 entries related to brain circulation features, with the diachronic corpus accounting for 1,311,616 entries that span a 24-year period (2000-2024). Notably, the diachronic corpus is continuously updated in real-time, ensuring the data remains current and relevant for both real-time and ex-post analyses of brain circulation. In contrast, the synchronic corpus contains 452,618 entries, deliberately excluding timestamp features to facilitate synchronic research.

Version Update Time Corpus Entry Count Processed Token Count Token Count Sentence Count
V1.0 2024-8-29 Diachronic corpus 623,072 1,134,253,949 422,954,074 16,914,973
Synchronic corpus 348,508 606,015,828 158,891,392 11,250,558
V2.0 2024-12-16 Diachronic corpus 1,111,644 2,087,930,788 707,785,647 38,900,418
Synchronic corpus 452,618 816,732,922 328,842,410 19,253,646

Data record

The corpus captures key attributes relevant to brain circulation, including origin, destination, diachronic narrative text, URL, and timestamp. Notably, geographic entities are mapped to the global country or region level, facilitating the analysis of transnational brain circulation. Each country or region is accompanied by Countrycode, ISO2, and ISO3 identifiers, enabling multidimensional organization of brain circulation data. Furthermore, we distinguish between origin and destination in geographic entities related to circulation flow, allowing for the representation of brain gain and brain drain, and providing insights into bilateral brain circulation between countries/rigions.

Summary information about the GBCD corpus

Data Label Data Description Data Type
circulation id Unique circulation behaviour text identification int
content The narrative text content in the web address long text
countrycode ISO country code string
URL Source links to transfer narrative text, usually pointing to web pages and domain names string
timestamp Month and Year of transfer behaviour described in the text date object
sampling The collection timestamp of the text data in the source dataset date object
iso2code Country ISO 2 letter code string
iso3code Country ISO 3 letter code string
origin The origin of the circulation behaviour, expressed as geopolitical entity, including country or region string
destination The destination of circulation behaviour, expressed as geopolitical entity, including country or region string

Regions without internationally recognized sovereignty:

These regions do not possess formal recognition or authority under international law, meaning they lack official ISO codes and CountryCodes. As a result, they are not represented in global standards used for identifying sovereign states.

Geospatial representation:

To delineate the geographic boundaries of such regions, we rely on Polygon-type geospatial data. This approach allows for the precise definition of the spatial extent of these areas, even in the absence of formal sovereignty. The polygon format enables the mapping of complex territorial claims or disputed regions, capturing their exact geographic features.

Structured information for countries/regions:

Detailed and structured data related to these regions, as well as fully recognized countries, can be accessed in the Supplementary information . This repository includes comprehensive information about their geographic, political, and other relevant attributes, offering an in-depth look at the regions' boundaries, history, and territorial disputes.

Geographic entity criteria

The GBCD corpus spans 223 countrie/regions worldwide, encompassing 193 UN member states, one observer state, and 29 non-sovereign island territories.Our national geographic divisions adhere to methods endorsed by the United Nations Statistics Division for international statistical data collection, ensuring consistency and compatibility with global standards.

  • Member State of the United Nations: refers to a sovereign country that has been officially admitted to the United Nations (UN) and holds full membership status. Member States enjoy voting rights, participate in all UN activities, and are bound by the principles outlined in the UN Charter.
  • Non-Member Observer State of the United Nations: refers to an entity recognized by the United Nations General Assembly that has observer status, granting it certain privileges and participation rights in UN activities, but without full membership or voting rights in the General Assembly.
  • Territories and Islands without Internationally Recognized Sovereignty: refer to territories and islands that declare themselves as independent or autonomous but lack widespread recognition as sovereign states under international law or by the global community, including the United Nations.

Data mining

Leveraging data mining techniques on the GBCD corpus enables researchers to map and characterize the brain circulation patterns of skilled professionals across different countries. Further more, researchers can gain a deeper understanding of the complex dynamics underlying brain circulation and make informed decisions to address the challenges. This study highlights the potential of data-driven approaches to inform policy and promote more effective brain circulation strategies.

Domain name distribution by continents and fields.

image

Geographical heterogeneity of national brain circulation frequency.

image

Geographical trajectory network of transnational brain circulation.

image

Dynamic indicators of national brain circulation flux.

image

Usage Notes

The GBCD corpus enables the comprehensive assessment and characterization of global brain circulation, facilitating planning and analysis at the national and geographic levels. To ensure high data quality and extensive geographic coverage, specific names, materials, and map layouts have been employed. It is essential to note that these choices do not imply any endorsement or stance by the authors or their respective countries regarding the legal status of any nation, territory, or region. Additionally, the depiction of borders and boundaries on the maps is purely indicative and does not signify formal recognition or acceptance by the publisher. The maps and database are intended to provide a neutral representation of geographic information, and any interpretation or inference of political boundaries or affiliations is explicitly excluded.

Citing this work

The relevant paper is currently under review, during which time this repository is private. Once it goes public, a bibtex reference will be provided here.

About

Unveiling the Spatiotemporal Dynamics of Global Brain Circulation: A Comprehensive Corpus (2000-2024)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages