Skip to content

Commit

Permalink
Merge pull request #137 from tpsiqueira/documentation_improvements
Browse files Browse the repository at this point in the history
Add resources about the 3W community
  • Loading branch information
ricardoevvargas authored Jan 22, 2025
2 parents 86fe70d + 68e4d6b commit addfca2
Show file tree
Hide file tree
Showing 8 changed files with 154 additions and 2 deletions.
11 changes: 9 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,8 @@
* [Structure](#structure-1)
* [Incorporated Problems](#incorporated-problems)
* [Examples of Use](#examples-of-use)
* [Reproducibility](#reproducibility)
* [Reproducibility](#reproducibility)
* [3W Community](#3w-community)

# Introduction

Expand Down Expand Up @@ -176,4 +177,10 @@ $ python
* To initialize a local Jupyter Notebook server:
```
$ jupyter notebook
```
```

# 3W Community

The 3W Community is gradually expanding and is made up of independent professionals and representatives of research institutions, startups, companies and oil operators from different countries.

More information about this community can be found [here](community/README.md).
42 changes: 42 additions & 0 deletions community/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Table of Content

* [Introduction](#introduction)
* [Citations](#citations)
* [Main Institutions by Country](#main-institutions-by-country)
* [All Institutions by Country](#all-institutions-by-country)
* [Stars by Country](#stars-by-country)
* [Forks by Country](#forks-by-country)

# Introduction

The 3W Community is gradually expanding and is made up of independent professionals and representatives of research institutions, startups, companies and oil operators from different countries.

The following sections provide more information about this community.

# Citations

The 3W Dataset was useful and is cited by the works listed [here](../LIST_OF_CITATIONS.md). These are basically papers, final graduation projects, master's degree dissertations, and doctoral theses. In general, these works are carried out by representatives of institutions.

## Main Institutions by Country

The panel below shows for each country covered so far which institution gave rise to the largest number of citations (representatives x published works citing the 3W Dataset). In the event of a tie, all the tied institutions are presented for the same country.

![Main Institutions by Country](../images/citations_main_institutions_by_country.png)

## All Institutions by Country

The following panel shows the geographical dispersion of all identified institutions that have published works citing the 3W Dataset.

![All Institutions by Country](../images/citations_all_institutions_by_country.png)

# Stars by Country

Below we have the locations of the GitHub users who gave stars to the 3W Project repository. It's important to note that not all GitHub users make their locations publicly available.

![Stars](../images/stars_by_country.png)

# Forks by Country

Now we have the locations of the GitHub users who generated forks of the 3W Project repository. Fork is a kind of copy that facilitates use, customization and contributions in Git repositories.

![Forks](../images/forks_by_country.png)
Binary file added community/citations.xlsx
Binary file not shown.
103 changes: 103 additions & 0 deletions community/gen_list_of_citations.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
"""This script processes an Excel file named 'citations.xlsx' containing citations
to the 3W Dataset and generates a Markdown file listing these citations.
The citations include relevant details such as authors, titles, institutions,
categories, years, and links, formatted in a consistent way. The resulting
Markdown file is saved in the specified output directory.
Note:
- The file 'citations.xlsx' must be located in the directory 'C:\\Users\\Public'.
- The sheet name within the Excel file must be 'citations1'.
- The file must include the following columns: 'Author', 'Title', 'Institution/Event',
'Category', 'Year', and 'Link'.
"""

import os
import pandas as pd

# Important paths
#
EXCEL_PATH = r"C:\Users\Public\citations.xlsx"
SHEET_NAME = "citations1"
OUTPUT_DIR = r"C:\Users\Public"
MD_PATH = os.path.join(OUTPUT_DIR, "LIST_OF_CITATIONS.md")

# Fixed header for the Markdown file
#
HEADER = """
As far as we know, the 3W Dataset was useful and cited by the works listed below. If you know any other paper, final graduation project, master's degree dissertation or doctoral thesis that cites the 3W Dataset, we will be grateful if you let us know by commenting [this](https://github.com/Petrobras/3W/discussions/3) discussion. If you use any resource published in this repository, we ask that it be properly cited in your work. Click on the ***Cite this repository*** link on this repository landing page to access different citation formats supported by the GitHub citation feature.
This file (`LIST_OF_CITATIONS.md`) was generated automatically from records maintained in the `citations.xlsx` file.
"""


# Methods
#
def format_citation(row):
"""Formats a citation using non-empty columns from the row.
Args:
row (pd.Series): A row from the DataFrame containing citation details.
Returns:
str: A formatted citation string.
"""
columns = ["Author", "Title", "Institution/Event", "Category", "Year", "Link"]
parts = [str(row[col]) for col in columns if pd.notna(row[col])]
return ". ".join(parts) + "."


def process_excel_to_markdown():
"""Processes the Excel file to generate a Markdown file with formatted citations.
Raises:
FileNotFoundError: If the Excel file is not found in the specified path.
ValueError: If the required columns are not present in the Excel file.
"""
if not os.path.exists(EXCEL_PATH):
raise FileNotFoundError(
f"The file 'citations.xlsx' was not found in the directory "
f"C:\\Users\\Public. Please ensure the file is placed in this directory and run the script again."
)

# Read the Excel file
df = pd.read_excel(EXCEL_PATH, sheet_name=SHEET_NAME)

# Check for required columns
required_columns = [
"Author",
"Title",
"Institution/Event",
"Category",
"Year",
"Link",
]
if not all(col in df.columns for col in required_columns):
raise ValueError(
f"The file 'citations.xlsx' must contain the following columns: "
f"{', '.join(required_columns)}."
)

# Apply formatting to each row
df["Formatted"] = df.apply(format_citation, axis=1)

# Create a list of formatted citations
formatted_citations = "\n\n".join(
[f"1. {citation}" for citation in df["Formatted"]]
)

# Combine header and citations
final_content = HEADER + formatted_citations

# Ensure the output directory exists and write the Markdown file
os.makedirs(OUTPUT_DIR, exist_ok=True)
with open(MD_PATH, "w", encoding="utf-8") as file:
file.write(final_content)

print(f"Updated Markdown file saved at: {MD_PATH}")


# Main execution
#
if __name__ == "__main__":
process_excel_to_markdown()
Binary file added images/citations_all_institutions_by_country.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/forks_by_country.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/stars_by_country.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit addfca2

Please sign in to comment.