Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add T011–T014 to clean 4 ADB ATO data sets #81

Conversation

Dima-Faraji
Copy link

No description provided.

@khaeru khaeru changed the title T011-14 datasets-py_file Add T011–T014 to clean 4 ADB ATO data sets Dec 11, 2023
@khaeru
Copy link
Contributor

khaeru commented Dec 11, 2023

Hi @Dima-Faraji —thank you very much for your contributions to the iTEM Open Data code! We appreciate that members of the community, such as yourself, make efforts to expand the scope and quality of the data included in this shared resource.

In this review, I will do a few things:

  1. Summarize what I think is being added by the commits on this branch. (This is information that should go in the description of the pull request. If I am guessing incorrectly, please correct me by providing the correct explanation.)
  2. Identify TODOs to merge this branch/PR.
  3. Identify TODOs for further clean-up (after the branch is merged).

To be clear, I understand from you and @soniayeh that the course in which you did this work is coming to an end, so you may not have dedicated time to do (2) or (3). So those TODOs are mentioned without saying who will do them or when—it could be you, me, or someone else; sooner or later.

What is added

  • Four data-cleaning modules, each cleaning one data flow from the Asian Development Bank (ADB) Asian Transport Outlook (ATO); I guess the 2023 edition:
    • item.historical.T011 → TAS-PAT-017(1)
    • item.historical.T012 → TAS-FRA-005(2)
    • item.historical.T013 → TAS-PAG-005(2)
    • item.historical.T014 → TAS-VEP-005(2)
  • Several files in dima/…, including:
    • regions.yaml —appears similar to this file.
    • sources.yaml —appears to contain entries corresponding to the 4 added modules, plus the original contents of this file.
    • Files like T011.py that appear similar to the above.
    • Jupyter notebook versions of the above files.

TODOs to merge the branch

  • Choose a new name for T012. Notice that this file already exists on the main branch here, and the changes on this branch delete its original contents.
  • Deduplicate the code in the 4 added modules. For instance, the files T011.py and T012.py are about 350 lines, but >250 of these lines are identical in both files; this makes it difficult to identify which steps are common versus specific to each input data flow. One way to achieve this would be to define the class AtoWorkbook in a single file, then import it in each of the 4 modules, setting some different parameters or creating a subclass.
  • Provide a single entry-point in each module. Existing data modules like T000 do not execute any code when they are imported; they instead provide a function like process() that triggers processing of the data. Looking at your current code, this could call the method atoWorkBook.process_input_data().
  • Create a pull request in transportenergy/metadata that makes the necessary additions to sources.yaml, currently contained on this branch, and any other metadata that is needed by your code. Remove all other files from the dima/ folder; you can provide them as attachments to this PR if they are records of the development process.
  • Add documentation, for instance in the docstring at the top of each module. Explain, at least, which specific data are used: at what URL can the original input data be found?
  • Ensure the tests pass. This can be started by rebasing the branch on transportenergy:main.

TODOs for follow-up

@khaeru khaeru added enh Enhancements & new features help wanted historical Historical transport statistics labels May 22, 2024
@mperezbravo mperezbravo changed the base branch from main to T011-T014-Datasets September 15, 2024 10:13
@mperezbravo mperezbravo changed the base branch from T011-T014-Datasets to main September 15, 2024 10:18
@mperezbravo mperezbravo changed the base branch from main to T011-T014-Datasets September 15, 2024 10:30
@mperezbravo mperezbravo changed the base branch from T011-T014-Datasets to main September 15, 2024 10:35
@mperezbravo mperezbravo changed the base branch from main to T011-T014-Datasets September 15, 2024 10:50
@mperezbravo mperezbravo reopened this Sep 15, 2024
@mperezbravo mperezbravo deleted the branch transportenergy:T011-T014-Datasets September 15, 2024 11:08
@mperezbravo mperezbravo mentioned this pull request Sep 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enh Enhancements & new features help wanted historical Historical transport statistics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants