Merge pull request #137 from tpsiqueira/documentation_improvements

Add resources about the 3W community
petrobras · Jan 22, 2025 · addfca2 · addfca2
2 parents 86fe70d + 68e4d6b
commit addfca2
Show file tree

Hide file tree

Showing 8 changed files with 154 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -36,7 +36,8 @@
   * [Structure](#structure-1)
   * [Incorporated Problems](#incorporated-problems)
   * [Examples of Use](#examples-of-use)
-  * [Reproducibility](#reproducibility)  
+  * [Reproducibility](#reproducibility)
+* [3W Community](#3w-community)
 
 # Introduction
 
@@ -176,4 +177,10 @@ $ python
 * To initialize a local Jupyter Notebook server:
 ```
 $ jupyter notebook
-```
+```
+
+# 3W Community
+
+The 3W Community is gradually expanding and is made up of independent professionals and representatives of research institutions, startups, companies and oil operators from different countries.
+
+More information about this community can be found [here](community/README.md).
diff --git a/community/README.md b/community/README.md
@@ -0,0 +1,42 @@
+# Table of Content
+
+* [Introduction](#introduction)
+* [Citations](#citations)
+  * [Main Institutions by Country](#main-institutions-by-country)
+  * [All Institutions by Country](#all-institutions-by-country)
+* [Stars by Country](#stars-by-country)
+* [Forks by Country](#forks-by-country)
+
+# Introduction
+
+The 3W Community is gradually expanding and is made up of independent professionals and representatives of research institutions, startups, companies and oil operators from different countries.
+
+The following sections provide more information about this community.
+
+# Citations
+
+The 3W Dataset was useful and is cited by the works listed [here](../LIST_OF_CITATIONS.md). These are basically papers, final graduation projects, master's degree dissertations, and doctoral theses. In general, these works are carried out by representatives of institutions.
+
+## Main Institutions by Country
+
+The panel below shows for each country covered so far which institution gave rise to the largest number of citations (representatives x published works citing the 3W Dataset). In the event of a tie, all the tied institutions are presented for the same country.
+
+![Main Institutions by Country](../images/citations_main_institutions_by_country.png)
+
+## All Institutions by Country
+
+The following panel shows the geographical dispersion of all identified institutions that have published works citing the 3W Dataset.
+
+![All Institutions by Country](../images/citations_all_institutions_by_country.png)
+
+# Stars by Country
+
+Below we have the locations of the GitHub users who gave stars to the 3W Project repository. It's important to note that not all GitHub users make their locations publicly available.
+
+![Stars](../images/stars_by_country.png)
+
+# Forks by Country
+
+Now we have the locations of the GitHub users who generated forks of the 3W Project repository. Fork is a kind of copy that facilitates use, customization and contributions in Git repositories.
+
+![Forks](../images/forks_by_country.png)
diff --git a/community/citations.xlsx b/community/citations.xlsx
diff --git a/community/gen_list_of_citations.py b/community/gen_list_of_citations.py
@@ -0,0 +1,103 @@
+"""This script processes an Excel file named 'citations.xlsx' containing citations 
+to the 3W Dataset and generates a Markdown file listing these citations.
+
+The citations include relevant details such as authors, titles, institutions, 
+categories, years, and links, formatted in a consistent way. The resulting 
+Markdown file is saved in the specified output directory.
+
+Note: 
+- The file 'citations.xlsx' must be located in the directory 'C:\\Users\\Public'.
+- The sheet name within the Excel file must be 'citations1'.
+- The file must include the following columns: 'Author', 'Title', 'Institution/Event', 
+  'Category', 'Year', and 'Link'.
+"""
+
+import os
+import pandas as pd
+
+# Important paths
+#
+EXCEL_PATH = r"C:\Users\Public\citations.xlsx"
+SHEET_NAME = "citations1"
+OUTPUT_DIR = r"C:\Users\Public"
+MD_PATH = os.path.join(OUTPUT_DIR, "LIST_OF_CITATIONS.md")
+
+# Fixed header for the Markdown file
+#
+HEADER = """
+As far as we know, the 3W Dataset was useful and cited by the works listed below. If you know any other paper, final graduation project, master's degree dissertation or doctoral thesis that cites the 3W Dataset, we will be grateful if you let us know by commenting [this](https://github.com/Petrobras/3W/discussions/3) discussion. If you use any resource published in this repository, we ask that it be properly cited in your work. Click on the ***Cite this repository*** link on this repository landing page to access different citation formats supported by the GitHub citation feature.
+
+This file (`LIST_OF_CITATIONS.md`) was generated automatically from records maintained in the `citations.xlsx` file.
+"""
+
+
+# Methods
+#
+def format_citation(row):
+    """Formats a citation using non-empty columns from the row.
+
+    Args:
+        row (pd.Series): A row from the DataFrame containing citation details.
+
+    Returns:
+        str: A formatted citation string.
+    """
+    columns = ["Author", "Title", "Institution/Event", "Category", "Year", "Link"]
+    parts = [str(row[col]) for col in columns if pd.notna(row[col])]
+    return ". ".join(parts) + "."
+
+
+def process_excel_to_markdown():
+    """Processes the Excel file to generate a Markdown file with formatted citations.
+
+    Raises:
+        FileNotFoundError: If the Excel file is not found in the specified path.
+        ValueError: If the required columns are not present in the Excel file.
+    """
+    if not os.path.exists(EXCEL_PATH):
+        raise FileNotFoundError(
+            f"The file 'citations.xlsx' was not found in the directory "
+            f"C:\\Users\\Public. Please ensure the file is placed in this directory and run the script again."
+        )
+
+    # Read the Excel file
+    df = pd.read_excel(EXCEL_PATH, sheet_name=SHEET_NAME)
+
+    # Check for required columns
+    required_columns = [
+        "Author",
+        "Title",
+        "Institution/Event",
+        "Category",
+        "Year",
+        "Link",
+    ]
+    if not all(col in df.columns for col in required_columns):
+        raise ValueError(
+            f"The file 'citations.xlsx' must contain the following columns: "
+            f"{', '.join(required_columns)}."
+        )
+
+    # Apply formatting to each row
+    df["Formatted"] = df.apply(format_citation, axis=1)
+
+    # Create a list of formatted citations
+    formatted_citations = "\n\n".join(
+        [f"1. {citation}" for citation in df["Formatted"]]
+    )
+
+    # Combine header and citations
+    final_content = HEADER + formatted_citations
+
+    # Ensure the output directory exists and write the Markdown file
+    os.makedirs(OUTPUT_DIR, exist_ok=True)
+    with open(MD_PATH, "w", encoding="utf-8") as file:
+        file.write(final_content)
+
+    print(f"Updated Markdown file saved at: {MD_PATH}")
+
+
+# Main execution
+#
+if __name__ == "__main__":
+    process_excel_to_markdown()
diff --git a/images/citations_all_institutions_by_country.png b/images/citations_all_institutions_by_country.png
diff --git a/images/citations_main_institutions_by_country.png b/images/citations_main_institutions_by_country.png
diff --git a/images/forks_by_country.png b/images/forks_by_country.png
diff --git a/images/stars_by_country.png b/images/stars_by_country.png