Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ summarize best practices for data science session #2

Merged
merged 3 commits into from
Sep 20, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions index.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ azure/running_nextflow_on_azure
:caption: Python

python/percent_notebooks
python/best_practices
```

```{toctree}
Expand Down
Binary file added python/assets/better_comments_todo_tree.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added python/assets/lint_pylance_in_vscode.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
72 changes: 72 additions & 0 deletions python/best_practices.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Coding (best) practices for Data Science

> Author: Henry Webel

Being asked to show some coding best practices for an internal retreat, I assembeled
enryH marked this conversation as resolved.
Show resolved Hide resolved
some low hanging fruits in reach for everyone and some pratices I learned to appreciate.
enryH marked this conversation as resolved.
Show resolved Hide resolved

## Use a formater
enryH marked this conversation as resolved.
Show resolved Hide resolved

When you write code, you should at least use some sort of formatter. `black` is common choice
enryH marked this conversation as resolved.
Show resolved Hide resolved
as it allows you to format code in a user-defined linelength consistently. It even can
break too long strings into it's parts - leaving only long comments and docstrings to you
enryH marked this conversation as resolved.
Show resolved Hide resolved
for adoption.

`black` or `autopep8` are also availbe next to `isort` for sorting imports in VSCode as
enryH marked this conversation as resolved.
Show resolved Hide resolved
extension, so your files are formatted everytime you save these
([link](https://code.visualstudio.com/docs/python/formatting)).

## Use a linter

Too long lines, unpassed arguments or mutable objects as default function parameters you can
identify using a linter like `flake8` or `ruff`. Tools like
enryH marked this conversation as resolved.
Show resolved Hide resolved
[`Pylint` in VSCode](https://code.visualstudio.com/docs/python/linting)
allow you to get in editor highlighting of Code issues and links with hints on how to fix them
enryH marked this conversation as resolved.
Show resolved Hide resolved
![screenshot with typehints](assets/lint_pylance_in_vscode.png)

Example: Using the linter you can for example if you did not pass an argument to a function
enryH marked this conversation as resolved.
Show resolved Hide resolved
as was fixed in this commit [18b675](https://github.com/Multiomics-Analytics-Group/acore/pull/2/commits/18b67516b25de30cf6fd4bb640422aa8e0642b08) in `run_umap` (you will need to unfold the first file to see the full picture).


## Better Comments and ToDo Trees

[Better Comments](https://marketplace.visualstudio.com/items?itemName=aaron-bond.better-comments)
allows you to highlight comments in code using different colors
`# ? warning` or `# ? question` or `# TODO` . If you add `?` and `!` to the list of expression to list
in a
[ToDo Tree](https://marketplace.visualstudio.com/items?itemName=Gruntfuggly.todo-tree)
you can easily keep a list of todos in your code - allowing you to go
through them from time to time and prioritze.

![Highlighted Comments and ToDo Tree example](assets/better_comments_todo_tree.png)

## Text based Notebook (percent format) with jupytext and papermill

[`jupytext`](https://jupytext.readthedocs.io/) is a lightweight tool to keep scripts either as notebooks (`.ipynb`) or simpler text based file formats, such as markdown files (`.md`) which can be easily rendered on GitHub or python files (`.py`) which can be executed in VSCode’s interactive shell and are better for version control. Some tools still need ipynb to work, e.g. `papermill`. Therefore it is handy to keep different version of a script in sync. Otherwise one can also only use python files and render these as notebook in e.g.
[jupyter lab](https://jupytext.readthedocs.io/en/latest/text-notebooks.html). Especially if the code is only kept for version control, but executed versions are keep in a project folder using a workflow environment (as `snakemake` or `nextflow`) this comes in handy.
enryH marked this conversation as resolved.
Show resolved Hide resolved

You can see an example of the percent notebook in the [percent_notebooks](project:percent_notebooks.py) section.

I showed how to sync a text based percent notebook and execute it using `papermill`
(without) specifying arguments on the commmand line:

```bash
jupytext --to ipynb -k - -o - example_nb.py | papermill - path/to/executed_example.ipynb
```

If you want to keep some formats in sync and only sync one of these and only push one type to git
- specifying e.g. a `.gitignore` the types you want to only have locally.
Each folder can have a `.jupytext.toml` file to specify the formats you want to keep in sync
in that folder e.g.:

```toml
# percent format and ipynb format in sync
formats = "ipynb,py:percent"
```

## Copilot in VSCode

Ghosttext, chats and inline chats are great ways to get suggestions on the code you are
writing. You can apply for a free version as a [(PhD) student](https://github.com/education/students)
or [instructor](https://github.com/education/teachers). Currently alternatives wit a free-tier as [codium](https://codeium.com/) are also available.
enryH marked this conversation as resolved.
Show resolved Hide resolved

Loading