-
Notifications
You must be signed in to change notification settings - Fork 85
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #138 from petrobras/documentation_improvements
Add resources about the 3W community
- Loading branch information
Showing
8 changed files
with
154 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
# Table of Content | ||
|
||
* [Introduction](#introduction) | ||
* [Citations](#citations) | ||
* [Main Institutions by Country](#main-institutions-by-country) | ||
* [All Institutions by Country](#all-institutions-by-country) | ||
* [Stars by Country](#stars-by-country) | ||
* [Forks by Country](#forks-by-country) | ||
|
||
# Introduction | ||
|
||
The 3W Community is gradually expanding and is made up of independent professionals and representatives of research institutions, startups, companies and oil operators from different countries. | ||
|
||
The following sections provide more information about this community. | ||
|
||
# Citations | ||
|
||
The 3W Dataset was useful and is cited by the works listed [here](../LIST_OF_CITATIONS.md). These are basically papers, final graduation projects, master's degree dissertations, and doctoral theses. In general, these works are carried out by representatives of institutions. | ||
|
||
## Main Institutions by Country | ||
|
||
The panel below shows for each country covered so far which institution gave rise to the largest number of citations (representatives x published works citing the 3W Dataset). In the event of a tie, all the tied institutions are presented for the same country. | ||
|
||
![Main Institutions by Country](../images/citations_main_institutions_by_country.png) | ||
|
||
## All Institutions by Country | ||
|
||
The following panel shows the geographical dispersion of all identified institutions that have published works citing the 3W Dataset. | ||
|
||
![All Institutions by Country](../images/citations_all_institutions_by_country.png) | ||
|
||
# Stars by Country | ||
|
||
Below we have the locations of the GitHub users who gave stars to the 3W Project repository. It's important to note that not all GitHub users make their locations publicly available. | ||
|
||
![Stars](../images/stars_by_country.png) | ||
|
||
# Forks by Country | ||
|
||
Now we have the locations of the GitHub users who generated forks of the 3W Project repository. Fork is a kind of copy that facilitates use, customization and contributions in Git repositories. | ||
|
||
![Forks](../images/forks_by_country.png) |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
"""This script processes an Excel file named 'citations.xlsx' containing citations | ||
to the 3W Dataset and generates a Markdown file listing these citations. | ||
The citations include relevant details such as authors, titles, institutions, | ||
categories, years, and links, formatted in a consistent way. The resulting | ||
Markdown file is saved in the specified output directory. | ||
Note: | ||
- The file 'citations.xlsx' must be located in the directory 'C:\\Users\\Public'. | ||
- The sheet name within the Excel file must be 'citations1'. | ||
- The file must include the following columns: 'Author', 'Title', 'Institution/Event', | ||
'Category', 'Year', and 'Link'. | ||
""" | ||
|
||
import os | ||
import pandas as pd | ||
|
||
# Important paths | ||
# | ||
EXCEL_PATH = r"C:\Users\Public\citations.xlsx" | ||
SHEET_NAME = "citations1" | ||
OUTPUT_DIR = r"C:\Users\Public" | ||
MD_PATH = os.path.join(OUTPUT_DIR, "LIST_OF_CITATIONS.md") | ||
|
||
# Fixed header for the Markdown file | ||
# | ||
HEADER = """ | ||
As far as we know, the 3W Dataset was useful and cited by the works listed below. If you know any other paper, final graduation project, master's degree dissertation or doctoral thesis that cites the 3W Dataset, we will be grateful if you let us know by commenting [this](https://github.com/Petrobras/3W/discussions/3) discussion. If you use any resource published in this repository, we ask that it be properly cited in your work. Click on the ***Cite this repository*** link on this repository landing page to access different citation formats supported by the GitHub citation feature. | ||
This file (`LIST_OF_CITATIONS.md`) was generated automatically from records maintained in the `citations.xlsx` file. | ||
""" | ||
|
||
|
||
# Methods | ||
# | ||
def format_citation(row): | ||
"""Formats a citation using non-empty columns from the row. | ||
Args: | ||
row (pd.Series): A row from the DataFrame containing citation details. | ||
Returns: | ||
str: A formatted citation string. | ||
""" | ||
columns = ["Author", "Title", "Institution/Event", "Category", "Year", "Link"] | ||
parts = [str(row[col]) for col in columns if pd.notna(row[col])] | ||
return ". ".join(parts) + "." | ||
|
||
|
||
def process_excel_to_markdown(): | ||
"""Processes the Excel file to generate a Markdown file with formatted citations. | ||
Raises: | ||
FileNotFoundError: If the Excel file is not found in the specified path. | ||
ValueError: If the required columns are not present in the Excel file. | ||
""" | ||
if not os.path.exists(EXCEL_PATH): | ||
raise FileNotFoundError( | ||
f"The file 'citations.xlsx' was not found in the directory " | ||
f"C:\\Users\\Public. Please ensure the file is placed in this directory and run the script again." | ||
) | ||
|
||
# Read the Excel file | ||
df = pd.read_excel(EXCEL_PATH, sheet_name=SHEET_NAME) | ||
|
||
# Check for required columns | ||
required_columns = [ | ||
"Author", | ||
"Title", | ||
"Institution/Event", | ||
"Category", | ||
"Year", | ||
"Link", | ||
] | ||
if not all(col in df.columns for col in required_columns): | ||
raise ValueError( | ||
f"The file 'citations.xlsx' must contain the following columns: " | ||
f"{', '.join(required_columns)}." | ||
) | ||
|
||
# Apply formatting to each row | ||
df["Formatted"] = df.apply(format_citation, axis=1) | ||
|
||
# Create a list of formatted citations | ||
formatted_citations = "\n\n".join( | ||
[f"1. {citation}" for citation in df["Formatted"]] | ||
) | ||
|
||
# Combine header and citations | ||
final_content = HEADER + formatted_citations | ||
|
||
# Ensure the output directory exists and write the Markdown file | ||
os.makedirs(OUTPUT_DIR, exist_ok=True) | ||
with open(MD_PATH, "w", encoding="utf-8") as file: | ||
file.write(final_content) | ||
|
||
print(f"Updated Markdown file saved at: {MD_PATH}") | ||
|
||
|
||
# Main execution | ||
# | ||
if __name__ == "__main__": | ||
process_excel_to_markdown() |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.