Corrected spelling mistakes in M5, M6 and M8

SkafteNicki · Jan 7, 2025 · 55e5a20 · 55e5a20
1 parent eec418d
commit 55e5a20
Show file tree

Hide file tree

Showing 3 changed files with 32 additions and 32 deletions.
diff --git a/s2_organisation_and_version_control/code_structure.md b/s2_organisation_and_version_control/code_structure.md
@@ -27,15 +27,15 @@ or maintain
     (PLoP '97/EuroPLoP '97) Monticello, Illinois, September 1997
 
 We are here going to focus on the organization of data science projects and machine learning projects. The core
-difference this kind of projects introduces compared to more traditional systems is *data*. The key to modern machine
+difference this kind of project introduces compared to more traditional systems is *data*. The key to modern machine
 learning is without a doubt the vast amounts of data that we have access to today. It is therefore not unreasonable that
 data should influence our choice of code structure. If we had another kind of application, then the layout of our
 codebase should probably be different.
 
 ## Cookiecutter
 
 We are in this course going to use the tool [cookiecutter](https://cookiecutter.readthedocs.io/en/latest/README.html),
-which is tool for creating projects from *project templates*. A project template is in short just an overall structure
+which is a tool for creating projects from *project templates*. A project template is in short just an overall structure
 of how you want your folders, files etc. to be organized from the beginning. For this course we are going to be using a
 custom [MLOps template](https://github.com/SkafteNicki/mlops_template). The template is essentially a fork of the
 [cookiecutter data science template](https://github.com/drivendata/cookiecutter-data-science) that has been used for a
@@ -87,7 +87,7 @@ a lot of projects using `setup.py + setup.cfg`, so it is good to at least know a
 
 === "pyproject.toml"
 
-    `pyproject.toml` is the new standardized way of describing project metadata in a declaratively way, introduced in
+    `pyproject.toml` is the new standardized way of describing project metadata in a declarative way, introduced in
     [PEP 621](https://peps.python.org/pep-0621/). It is written in [toml format](https://toml.io/en/) which is easy to
     read. At the very least your `pyproject.toml` file should include the `[build-system]` and `[project]` sections:
 
@@ -159,8 +159,8 @@ a lot of projects using `setup.py + setup.cfg`, so it is good to at least know a
     )
     ```
 
-    Essentially, the it is the exact same meta information as in `pyproject.toml`, just written directly in Python
-    syntax instead of `toml`. Because there was a wish to deperate this meta information into a separate file, the
+    Essentially, it is the exact same meta information as in `pyproject.toml`, just written directly in Python
+    syntax instead of `toml`. Because there was a wish to separate this meta information into a separate file, the
     `setup.cfg` file was created which can contain the exact same information as `setup.py` just in a declarative
     config.
 
@@ -173,7 +173,7 @@ a lot of projects using `setup.py + setup.cfg`, so it is good to at least know a
     # ...
     ```
 
-    This non-standardized way of providing meta information regarding a package was essentially what lead to the
+    This non-standardized way of providing meta information regarding a package was essentially what led to the
     creation of `pyproject.toml`.
 
 Regardless of what way a project is configured, after creating the above files, the correct way to install them would be
@@ -188,7 +188,7 @@ pip install -e .
 !!! note "Developer mode in Python"
 
     The `-e` is short for `--editable` mode also called
-    [developer mode](https://setuptools.pypa.io/en/latest/userguide/development_mode.html). Since we will continuously
+    [developer mode](https://setuptools.pypa.io/en/latest/userguide/development_mode.html). Since we will be continuously
     iterating on our package this is the preferred way to install our package, because that means that we do not have
     to run `pip install` every time we make a change. Essentially, in developer mode changes in the Python source code
     can immediately take place without requiring a new installation.
@@ -236,20 +236,20 @@ your head around where files are located.
         When asked for a project name you should follow the
         [PEP8](https://peps.python.org/pep-0008/#package-and-module-names) guidelines for naming packages. This means
         that the name should be all lowercase and if you want to separate words, you should use underscores. For example
-        `my_project` is a valid name, while `MyProject` is not. Additionally, the packaage name cannot start with a
+        `my_project` is a valid name, while `MyProject` is not. Additionally, the package name cannot start with a
         number.
 
     ??? note "Flat-layout vs src-layout"
 
         There are two common choices on how layout your source directory. The first is called *src-layout*
-        where the source code is always place in a `src/<project_name>` folder and the second is called *flat-layout*
-        where the source code is place is just placed in a `<project_name>` folder. The template we are using in this
+        where the source code is always placed in a `src/<project_name>` folder and the second is called *flat-layout*
+        where the source code is just placed in a `<project_name>` folder. The template we are using in this
         course is using the src-layout, but there are
         [pros and cons](https://packaging.python.org/en/latest/discussions/src-layout-vs-flat-layout/) for both.
 
 3. After having created your new project, the first step is to also create a corresponding virtual environment and
-    install any needed requirements. If you have a virtual environment from yesterday feel free to use that else create
-    an new. Then install the project in that environment
+    install any needed requirements. If you have a virtual environment from yesterday feel free to use that, otherwise create
+    a new one. Then install the project in that environment
 
     ```bash
     pip install -e .
@@ -270,7 +270,7 @@ your head around where files are located.
 5. This template comes with a `tasks.py` which uses the [invoke](https://www.pyinvoke.org/) framework to define project
     tasks. You can learn more about the framework in the last optional [module](cli.md) in today's session. However, for
     now just know that `tasks.py` is a file that can be used to specify common tasks that you want to run in your
-    project. It is similar to `Markefile`s if you are familiar with them. Try out some of the pre-defined tasks:
+    project. It is similar to `Makefile`s if you are familiar with them. Try out some of the pre-defined tasks:
 
     ```bash
     # first install invoke
@@ -350,7 +350,7 @@ your head around where files are located.
 
 12. (Optional) Feel free to create more files/visualizations (what about investigating/exploring the data distribution?)
 
-13. (Optional) Lets say that you are not satisfied with the template I have recommended that you use, which is
+13. (Optional) Let's say that you are not satisfied with the template I have recommended that you use, which is
     completely fine. What should you then do? You should of course create your own template! This is actually not that
     hard to do.
 

diff --git a/s2_organisation_and_version_control/dvc.md b/s2_organisation_and_version_control/dvc.md
@@ -8,15 +8,15 @@
 
 !!! warning
 
-    Since August 2024, Google have changed their policy for the Google Drive API. This means that the proceduce for
-    setting up DVC with Google Drive has changed. The following exercises therefore needs extra authentication to work.
+    Since August 2024, Google has changed their policy for the Google Drive API. This means that the procedure for
+    setting up DVC with Google Drive has changed. The following exercises therefore need extra authentication to work.
     You therefore have two options:
 
     1. Skip these exercises for now. We are going to revisit DVC later in the course when we get access to a more
         permanent storage solution in this [module](../s6_the_cloud/using_the_cloud.md).
 
     2. Follow the instructions below to authenticate DVC with Google Drive. As a starting point read the following
-        [Github issue](https://github.com/iterative/dvc/issues/10516#issuecomment-2289652067) and then follow the
+        [GitHub issue](https://github.com/iterative/dvc/issues/10516#issuecomment-2289652067) and then follow the
         instructions
         [here](https://dvc.org/doc/user-guide/data-management/remote-storage/google-drive#using-a-custom-google-cloud-project-recommended).
         for setting up a custom Google Cloud project.
@@ -34,7 +34,7 @@ Because this is an important concept there exist a couple of frameworks that hav
 [DVC](https://dvc.org/), [DAGsHub](https://dagshub.com/), [Hub](https://www.activeloop.ai/),
 [Modelstore](https://modelstore.readthedocs.io/en/latest/) and [ModelDB](https://github.com/VertaAI/modeldb/).
 Regardless of what framework, they all implement somewhat the same concept: instead of storing the actual data files
-or in general storing any large *artifacts* files we instead store a pointer to these large flies. We then version
+or in general storing any large *artifacts* files we instead store a pointer to these large files. We then version
 control the point instead of the artifact.
 
 <figure markdown>
@@ -45,7 +45,7 @@ control the point instead of the artifact.
 </figure>
 
 We are in this course going to use `DVC` provided by [iterative.ai](https://iterative.ai/) as they also provide tools
-for automatizing machine learning, which we are going to focus on later.
+for automating machine learning, which we are going to focus on later.
 
 ## DVC: What is it?
 
@@ -147,7 +147,7 @@ it contains excellent tutorials.
     `dvc` converts the data into [content-addressable storage](https://en.wikipedia.org/wiki/Content-addressable_storage)
     which makes data much faster to get. Finally, make sure that your data is not stored in your GitHub repository.
 
-    After authenticating the first time, DVC should be setup without having to authenticate again. If you for some
+    After authenticating the first time, DVC should be set up without having to authenticate again. If you for some
     reason encounter that DVC fails to authenticate, you can try to reset the authentication. Locate the file
     `$CACHE_HOME/pydrive2fs/{gdrive_client_id}/default.json` where `$CACHE_HOME` depends on your operating system:
 
@@ -163,7 +163,7 @@ it contains excellent tutorials.
 
     Delete the complete `{gdrive_client_id}` folder and retry authenticating with `dvc push`.
 
-9. After completing the above steps, it is very easy for others (or yourself) to get setup with both
+9. After completing the above steps, it is very easy for others (or yourself) to get set up with both
     code and data by simply running
 
     ```bash
@@ -177,7 +177,7 @@ it contains excellent tutorials.
 
 10. Let's now look at the process of creating a new version of our data. We are going to add some new data to our
     dataset and version control this as well. The new data can be downloaded from this
-    [Google Driver folder](https://drive.google.com/drive/folders/1JTjbom7IrB41Chx6uxLCN16ZwIxHHVw1?usp=sharing)
+    [Google Drive folder](https://drive.google.com/drive/folders/1JTjbom7IrB41Chx6uxLCN16ZwIxHHVw1?usp=sharing)
     or by running these two commands:
 
     ```bash
@@ -186,7 +186,7 @@ it contains excellent tutorials.
     ```
 
     Copy the data to your `data/raw` folder and then rerun your data pipeline to incorporate the new data into the
-    files in your `processed` folder. The new data should are 4 files with train images and 4 files with train targets,
+    files in your `processed` folder. The new data should be 4 files with train images and 4 files with train targets,
     a total of 20000 additional observations.
 
 11. Redo the above steps, adding the new data using `dvc`, committing and tagging the metafiles e.g. the following
@@ -211,7 +211,7 @@ it contains excellent tutorials.
     your model checkpoints.
 
 In general `dvc` is a great framework for version-controlling data and models. However, it is important to note that it
-does have some performance issue when dealing with datasets that consist of many files. Therefore, if you are ever
+does have some performance issues when dealing with datasets that consist of many files. Therefore, if you are ever
 working with a dataset that consists of many small files, it can be a
 [good idea to](https://fizzylogic.nl/2023/01/13/did-you-know-dvc-doesn-t-handle-large-datasets-neither-did-we-and-here-s-how-we-fixed-it):
 
@@ -228,10 +228,10 @@ working with a dataset that consists of many small files, it can be a
     ??? success "Solution"
 
         Similar to a git repository having a `.git` directory, a repository using dvc needs to have a `.dvc` folder.
-        Alternatively you can you the `dvc status` command.
+        Alternatively you can use the `dvc status` command.
 
 2. Assume you just added a folder called `data/` that you want to track with `dvc`. What is the sequence of 5 commands
-    to successful version control the folder? (assuming you already setup a remote)
+    to successfully version control the folder? (assuming you already set up a remote)
 
     ??? success "Solution"
 
@@ -246,6 +246,6 @@ working with a dataset that consists of many small files, it can be a
 That's all for today. With the combined power of `git` and `dvc` we should be able to version control everything in
 our development pipeline such that no changes are lost (assuming we commit regularly). It should be noted that `dvc`
 offers more than just data version control, so if you want to deep dive into `dvc` we recommend their
-[pipeline](https://dvc.org/doc/user-guide/project-structure/pipelines-files) feature and how this can be used to setup
-version controlled [experiments](https://dvc.org/doc/command-reference/exp). Note that we are going to revisit `dvc`
+[pipeline](https://dvc.org/doc/user-guide/project-structure/pipelines-files) feature and how this can be used to set up
+version-controlled [experiments](https://dvc.org/doc/command-reference/exp). Note that we are going to revisit `dvc`
 later for a more permanent (and large-scale) storage solution.
diff --git a/s2_organisation_and_version_control/git.md b/s2_organisation_and_version_control/git.md
@@ -103,7 +103,7 @@ working together on the same project.
 
 ### ❔ Exercises
 
-1. In your GitHub account create an repository, where the intention is that you upload the code from the final
+1. In your GitHub account create a repository, where the intention is that you upload the code from the final
     exercise from yesterday
 
     1. After creating the repository, clone it to your computer
@@ -240,7 +240,7 @@ working together on the same project.
     4. Finally, commit the merge and try to push.
 
 8. (Optional) The above exercises have focused on how to use git from the terminal, which I highly recommend learning.
-    However, if you are using a proper editor they also have build in support for version control. We recommend getting
+    However, if you are using a proper editor they also have built-in support for version control. We recommend getting
     familiar with these features (here is a tutorial for
     [VS Code](https://code.visualstudio.com/docs/editor/versioncontrol))
 
@@ -250,7 +250,7 @@ working together on the same project.
 
     ??? success "Solution"
 
-        You can check if there is a ".git" directory. Alternative you can use the `git status` command.
+        You can check if there is a ".git" directory. Alternatively you can use the `git status` command.
 
 2. Explain what the file `gitignore` is used for?
 
@@ -288,7 +288,7 @@ That covers the basics of git to get you started. In the exercise folder you can
 with the most useful commands for future reference. Finally, we want to point out another awesome feature of GitHub:
 in browser editor. Sometimes you have a small edit that you want to make, but still would like to do this in a
 IDE/editor. Or you may be in the situation where you are working from another device than your usual developer machine.
-GitHub has an built-in editor that can simply be enabled by changing any URL from
+GitHub has a built-in editor that can simply be enabled by changing any URL from
 
 ```bash
 https://github.com/username/repository