diff --git a/README.md b/README.md index 8629f2ab..cdb623fb 100644 --- a/README.md +++ b/README.md @@ -56,7 +56,7 @@ label ![pilot](https://shields.io/badge/-pilot-31E930). We welcome all contributions to improve the lesson! Maintainers will do their best to help you if you have any questions, concerns, or experience any difficulties along the way. -We'd like to ask you to familiarise yourself with our [Contribution Guide](CONTRIBUTING.md) and have a look at +We would like to ask you to familiarise yourself with our [Contribution Guide](CONTRIBUTING.md) and have a look at the [more detailed guidelines][lesson-example] on proper formatting, instructions on compiling and rendering the lesson locally, and making any changes and adding new content or episodes. diff --git a/_episodes/00-setting-the-scene.md b/_episodes/00-setting-the-scene.md index bcedc240..3782ad2f 100644 --- a/_episodes/00-setting-the-scene.md +++ b/_episodes/00-setting-the-scene.md @@ -25,8 +25,8 @@ So, you have gained basic software development skills either by self-learning or e.g., a [novice Software Carpentry course][swc-lessons]. You have been applying those skills for a while by writing code to help with your work and you feel comfortable developing code and troubleshooting problems. -However, your software has now reached a point where there’s too much code to be kept in one script. -Perhaps it's involving more researchers (developers) and users, +However, your software has now reached a point where there is too much code to be kept in one script. +Perhaps it is involving more researchers (developers) and users, and more collaborative development effort is needed to add new functionality while ensuring previous development efforts remain functional and maintainable. diff --git a/_episodes/10-section1-intro.md b/_episodes/10-section1-intro.md index 08725410..e93ca10c 100644 --- a/_episodes/10-section1-intro.md +++ b/_episodes/10-section1-intro.md @@ -20,7 +20,7 @@ and introducing the project that we will be working on throughout the course. In order to build working (research) software efficiently and to do it in collaboration with others rather than in isolation, you will have to get comfortable with using a number of different tools interchangeably -as they’ll make your life a lot easier. +as they will make your life a lot easier. There are many options when it comes to deciding which software development tools to use for your daily tasks - we will use a few of them in this course that we believe make a difference. @@ -124,6 +124,6 @@ Therefore, one should be aware of these guidelines and adhere to whatever the project you are working on has specified. In Python, we will be looking at a convention called PEP8. -Let's get started with setting up our software development environment! +Let us get started with setting up our software development environment! {% include links.md %} diff --git a/_episodes/11-software-project.md b/_episodes/11-software-project.md index b7e85a02..409d2b13 100644 --- a/_episodes/11-software-project.md +++ b/_episodes/11-software-project.md @@ -246,7 +246,7 @@ A novice will often make up the structure of their code as they go along. However, for more advanced software development, we need to plan and design this structure - called a *software architecture* - beforehand. -Let's have a quick look into what a software architecture is +Let us have a quick look into what a software architecture is and which architecture is used by our software project before we start adding more code to it. diff --git a/_episodes/12-virtual-environments.md b/_episodes/12-virtual-environments.md index 77aa3a30..8d1ad5c5 100644 --- a/_episodes/12-virtual-environments.md +++ b/_episodes/12-virtual-environments.md @@ -25,7 +25,7 @@ the `requirements.txt` file." ## Introduction So far we have cloned our software project from GitHub and inspected its contents and architecture a bit. We now want to run our code to see what it does - -let's do that from the command line. +let us do that from the command line. For the most part of the course we will run our code and interact with Git from the command line. While we will develop and debug our code using the PyCharm IDE @@ -236,7 +236,7 @@ This will create the target directory for the virtual environment > and avoid issues that could prove difficult to trace and debug. {: .callout} -For our project let's create a virtual environment called "venv". +For our project let us create a virtual environment called "venv". First, ensure you are within the project root directory, then: ~~~ @@ -342,7 +342,7 @@ When you’re done working on your project, you can exit the environment with: ~~~ {: .language-bash} -If you've just done the `deactivate`, +If you have just done the `deactivate`, ensure you reactivate the environment ready for the next part: ~~~ @@ -565,7 +565,7 @@ In the above command, we tell the command line two things: As we can see, the Python interpreter ran our script, which threw an error - `inflammation-analysis.py: error: the following arguments are required: infiles`. It looks like the script expects a list of input files to process, -so this is expected behaviour since we don't supply any. +so this is expected behaviour since we do not supply any. We will fix this error in a moment. ## Optional exercises diff --git a/_episodes/13-ides.md b/_episodes/13-ides.md index e2f684cb..2f611ecb 100644 --- a/_episodes/13-ides.md +++ b/_episodes/13-ides.md @@ -80,10 +80,10 @@ and Microsoft's free [Visual Studio Code (VS Code)](https://code.visualstudio.co ## Using the PyCharm IDE -Let's open our project in PyCharm now and familiarise ourselves with some commonly used features. +Let us open our project in PyCharm now and familiarise ourselves with some commonly used features. ### Opening a Software Project -If you don't have PyCharm running yet, start it up now. +If you do not have PyCharm running yet, start it up now. You can skip the initial configuration steps which just go through selecting a theme and other aspects. You should be presented with a dialog box that asks you what you want to do, @@ -156,7 +156,7 @@ and PyCharm is clever enough to understand it. so we can also use this environment for other projects if we wish. 6. Select `OK` in the `Add Python Interpreter` window. Back in the `Preferences` window, you should select "Python 3.11 (python-intermediate-inflammation)" - or similar (that you've just added) from the `Project Interpreter` drop-down list. + or similar (that you have just added) from the `Project Interpreter` drop-down list. Note that a number of external libraries have magically appeared under the "Python 3.11 (python-intermediate-inflammation)" interpreter, @@ -170,7 +170,7 @@ and has added these libraries effectively replicating our virtual environment in Also note that, although the names are not the same - this is one and the same virtual environment and changes done to it in PyCharm will propagate to the command line and vice versa. -Let's see this in action through the following exercise. +Let us see this in action through the following exercise. > ## Exercise: Compare External Libraries in the Command Line and PyCharm > Can you recall two places where information about our project's dependencies @@ -314,15 +314,15 @@ You can also verify this from the command line by listing the `venv/lib/python3.11/site-packages` subdirectory. Note, however, that `requirements.txt` is not updated - as we mentioned earlier this is something you have to do manually. -Let's do this as an exercise. +Let us do this as an exercise. > ## Exercise: Update `requirements.txt` After Adding a New Dependency > Export the newly updated virtual environment into `requirements.txt` file. > > >> ## Solution ->> Let's verify first that the newly installed library `pytest` is appearing in our virtual environment ->> but not in `requirements.txt`. First, let's check the list of installed packages: +>> Let us verify first that the newly installed library `pytest` is appearing in our virtual environment +>> but not in `requirements.txt`. First, let us check the list of installed packages: >> ~~~ >> (venv) $ python3 -m pip list >> ~~~ @@ -412,7 +412,7 @@ and use on top of virtual environments. (i.e. the virtual environment and interpreter you configured earlier in this episode) in the `Python interpreter` field. 5. You can give this run configuration a name at the top of the window if you like - - e.g. let's name it `inflammation analysis`. + e.g. let us name it `inflammation analysis`. 6. You can optionally configure run parameters and environment variables in the same window - we do not need this at the moment. 7. Select `Apply` to confirm these settings. @@ -438,7 +438,7 @@ and use on top of virtual environments. Now you know how to configure and manipulate your environment in both tools (command line and PyCharm), which is a useful parallel to be aware of. -Let's have a look at some other features afforded to us by PyCharm. +Let us have a look at some other features afforded to us by PyCharm. ### Syntax Highlighting The first thing you may notice is that code is displayed using different colours. @@ -446,7 +446,7 @@ Syntax highlighting is a feature that displays source code terms in different colours and fonts according to the syntax category the highlighted term belongs to. It also makes syntax errors visually distinct. Highlighting does not affect the meaning of the code itself - -it's intended only for humans to make reading code and finding errors easier. +it is intended only for humans to make reading code and finding errors easier. ![Syntax Highlighting Functionality in PyCharm](../fig/pycharm-syntax-highlighting.png){: .image-with-shadow width="1000px" } @@ -569,7 +569,7 @@ We will get back to this error shortly - for now, the good thing is that we managed to set up our project for development both from the command line and PyCharm and are getting the same outputs. Before we move on to fixing errors and writing more code, -let's have a look at the last set of tools for collaborative code development +Let us have a look at the last set of tools for collaborative code development which we will be using in this course - Git and GitHub. diff --git a/_episodes/14-collaboration-using-git.md b/_episodes/14-collaboration-using-git.md index ed5e2897..9b119dfb 100644 --- a/_episodes/14-collaboration-using-git.md +++ b/_episodes/14-collaboration-using-git.md @@ -36,7 +36,7 @@ test it to make sure it works correctly and as expected, then record your changes using version control and share your work with others via a shared and centrally backed-up repository. -Firstly, let's remind ourselves how to work with Git from the command line. +Firstly, let us remind ourselves how to work with Git from the command line. ## Git Refresher Git is a version control system for tracking changes in computer files @@ -117,7 +117,7 @@ Software development lifecycle with Git
## Checking-in Changes to Our Project -Let's check-in the changes we have done to our project so far. +Let us check-in the changes we have done to our project so far. The first thing to do upon navigating into our software project's directory root is to check the current status of our local working directory and repository. @@ -162,7 +162,7 @@ and stop notifying us about it. Edit your `.gitignore` file in PyCharm and add a line containing "venv/" and another one containing ".venv/". It does not matter much in this case where within the file you add these lines, -so let's do it at the end. +so let us do it at the end. Your `.gitignore` should look something like this: ~~~ @@ -256,7 +256,7 @@ $ git pull ~~~ {: .language-bash} -Now we've ensured our repository is synchronised with the remote one, +Now we have ensured our repository is synchronised with the remote one, we can now push our changes: ~~~ @@ -324,11 +324,11 @@ $ git branch ~~~ {: .output} -At the moment, there's only one branch (`main`) +At the moment, there is only one branch (`main`) and hence only one version of the code available. When you create a Git repository for the first time, by default you only get one version (i.e. branch) - `main`. -Let's have a look at why having different branches might be useful. +Let us have a look at why having different branches might be useful. ### Feature Branch Software Development Workflow While it is technically OK to commit your changes directly to `main` branch, @@ -341,7 +341,7 @@ Each feature branch should have its own meaningful name - indicating its purpose (e.g. "issue23-fix"). If we keep making changes and pushing them directly to `main` branch on GitHub, then anyone who downloads our software from there will get all of our work in progress - -whether or not it's ready to use! +whether or not it is ready to use! So, working on a separate branch for each feature you are adding is good for several reasons: * it enables the main branch to remain stable @@ -382,7 +382,7 @@ Whichever is the case for you, a good rule of thumb is - nothing that is broken should be in `main`. ### Creating Branches -Let's create a `develop` branch to work on: +Let us create a `develop` branch to work on: ~~~ $ git branch develop @@ -435,7 +435,7 @@ the commits will happen on the `develop` branch and will not affect the version of the code in `main`. We add and commit things to `develop` branch in the same way as we do to `main`. -Let's make a small modification to `inflammation/models.py` in PyCharm, +Let us make a small modification to `inflammation/models.py` in PyCharm, and, say, change the spelling of "2d" to "2D" in docstrings for functions `daily_mean()`, `daily_max()` and @@ -505,7 +505,7 @@ $ git push -u origin develop > We still prefer to explicitly state this information in commands. {: .callout} -Let's confirm that the new branch `develop` now exist remotely on GitHub too. +Let us confirm that the new branch `develop` now exist remotely on GitHub too. From the `Code` tab in your repository in GitHub, click the branch dropdown menu (currently showing the default branch `main`). You should see your `develop` branch in the list too. @@ -526,7 +526,7 @@ $ git push origin develop {: .language-bash} > ## What is the Relationship Between Originating and New Branches? -> It's natural to think that new branches have a parent/child relationship +> it is natural to think that new branches have a parent/child relationship > with their originating branch, > but in actual Git terms, branches themselves do not have parents > but single commits do. diff --git a/_episodes/15-coding-conventions.md b/_episodes/15-coding-conventions.md index dd69b3f1..277daf31 100644 --- a/_episodes/15-coding-conventions.md +++ b/_episodes/15-coding-conventions.md @@ -65,7 +65,7 @@ a description of a new feature in Python, etc. > However, know when to be inconsistent - > sometimes style guide recommendations are just not applicable. > When in doubt, use your best judgment. -> Look at other examples and decide what looks best. And don't hesitate to ask! +> Look at other examples and decide what looks best. And do not hesitate to ask! > {: .callout} @@ -279,7 +279,7 @@ Avoid extraneous whitespace in the following situations: augmented assignment (+=, -= etc.), comparisons (==, <, >, !=, <>, <=, >=, in, not in, is, is not), booleans (and, or, not). -- Don't use spaces around the = sign +- Do not use spaces around the = sign when used to indicate a keyword argument assignment or to indicate a default value for an unannotated function parameter ~~~ @@ -319,7 +319,7 @@ e.g HTTPServerError) As with other style guide recommendations - consistency is key. Follow the one already established in the project, if there is one. -If there isn't, follow any standard language style (such as +If there is not, follow any standard language style (such as [PEP 8](https://www.python.org/dev/peps/pep-0008/) for Python). Failing that, just pick one, document it and stick to it. @@ -362,7 +362,7 @@ A good rule of thumb is to assume that someone will *always* read your code at a and this includes a future version of yourself. It can be easy to forget why you did something a particular way in six months' time. Write comments as complete sentences and in English -unless you are 100% sure the code will never be read by people who don't speak your language. +unless you are 100% sure the code will never be read by people who do not speak your language. > ## The Good, the Bad, and the Ugly Comments > As a side reading, check out the @@ -396,9 +396,9 @@ def fahr_to_cels(fahr): ~~~ {: .language-python} -Python doesn't have any multi-line comments, +Python does not have any multi-line comments, like you may have seen in other languages like C++ or Java. -However, there are ways to do it using *docstrings* as we'll see in a moment. +However, there are ways to do it using *docstrings* as we will see in a moment. The reader should be able to understand a single function or method from its code and its comments, @@ -416,7 +416,7 @@ and comments must be accurate and updated with the code, because an incorrect comment causes more confusion than no comment at all. > ## Exercise: Improve Code Style of Our Project -> Let's look at improving the coding style of our project. +> let us look at improving the coding style of our project. > First, from the project root, use `git switch` to create a new feature branch called `style-fixes` > from our develop branch. > (Note that at this point `develop` and `main` branches @@ -494,7 +494,7 @@ because an incorrect comment causes more confusion than no comment at all. >> (and class) definitions with two blank lines). >> Note how PyCharm is warning us by underlining the whole line below. >> ->> Finally, let's add and commit our changes to the feature branch. +>> Finally, let us add and commit our changes to the feature branch. >> We will check the status of our working directory first. >> >> ~~~ @@ -515,7 +515,7 @@ because an incorrect comment causes more confusion than no comment at all. >> >> Git tells us we are on branch `style-fixes` >> and that we have unstaged and uncommited changes to `inflammation-analysis.py`. ->> Let's commit them to the local repository. +>> let us commit them to the local repository. >> >> ~~~ >> $ git add inflammation-analysis.py @@ -706,7 +706,7 @@ help(fibonacci) > > > > As expected, Git tells us we are on branch `style-fixes` > > and that we have unstaged and uncommited changes to `inflammation/models.py`. -> > Let's commit them to the local repository. +> > Let us commit them to the local repository. > > ~~~ > > $ git add inflammation/models.py > > $ git commit -m "Docstring improvements." @@ -719,7 +719,7 @@ In the previous exercises, we made some code improvements on feature branch `sty We have committed our changes locally but have not pushed this branch remotely for others to have a look at our code before we merge it onto the `develop` branch. -Let's do that now, namely: +Let us do that now, namely: - push `style-fixes` to GitHub - merge `style-fixes` into `develop` (once we are happy with the changes) @@ -742,7 +742,7 @@ $ git push origin main {: .language-bash} > ## Typical Code Development Cycle -> What you've done in the exercises in this episode mimics a typical software development workflow - +> What you have done in the exercises in this episode mimics a typical software development workflow - > you work locally on code on a feature branch, > test it to make sure it works correctly and as expected, > then record your changes using version control diff --git a/_episodes/16-verifying-code-style-linters.md b/_episodes/16-verifying-code-style-linters.md index f195cd46..e951092f 100644 --- a/_episodes/16-verifying-code-style-linters.md +++ b/_episodes/16-verifying-code-style-linters.md @@ -14,17 +14,17 @@ keypoints: ## Verifying Code Style Using Linters -We've seen how we can use PyCharm to help us format our Python code in a consistent style. +We have seen how we can use PyCharm to help us format our Python code in a consistent style. This aids reusability, since consistent-looking code is easier to modify -since it's easier to read and understand. +since it is easier to read and understand. We can also use tools, called [**code linters**](https://en.wikipedia.org/wiki/Lint_%28software%29), to identify consistency issues in a report-style. Linters analyse source code to identify and report on stylistic and even programming errors. -Let's look at a very well used one of these called `pylint`. +Let us look at a very well used one of these called `pylint`. -First, let's ensure we are on the `style-fixes` branch once again. +First, let us ensure we are on the `style-fixes` branch once again. ~~~ $ git switch style-fixes @@ -83,7 +83,7 @@ Pylint recommendations are given as warnings or errors, and Pylint also scores the code with an overall mark. We can look at a specific file (e.g. `inflammation-analysis.py`), or a package (e.g. `inflammation`). -Let's look at our `inflammation` package and code inside it (namely `models.py` and `views.py`). +Let us look at our `inflammation` package and code inside it (namely `models.py` and `views.py`). From the project root do: ~~~ @@ -140,7 +140,7 @@ see the "W0611: Unused numpy imported as np (unused-import)" warning. It is important to note that while tools such as Pylint are great at giving you a starting point to consider how to improve your code, -they won't find everything that may be wrong with it. +they will not find everything that may be wrong with it. > ## How Does Pylint Calculate the Score? > @@ -156,7 +156,7 @@ they won't find everything that may be wrong with it. > For example, with a total of 31 statements of models.py and views.py, > with a count of the errors shown above, we get a score of 8.00. > Note whilst there is a maximum score of 10, given the formula, -> there is no minimum score - it's quite possible to get a negative score! +> there is no minimum score - it is quite possible to get a negative score! {: .callout} > ## Exercise: Further Improve Code Style of Our Project diff --git a/_episodes/20-section2-intro.md b/_episodes/20-section2-intro.md index 0fba72dc..d3833462 100644 --- a/_episodes/20-section2-intro.md +++ b/_episodes/20-section2-intro.md @@ -14,12 +14,12 @@ allowing us to more comprehensively and rapidly find faults in code, as well as - "The use of test techniques and infrastructures such as **parameterisation** and **Continuous Integration** can help scale and further automate our testing process." --- -We've just set up a suitable environment for the development of our software project +We have just set up a suitable environment for the development of our software project and are ready to start coding. However, we want to make sure that the new code we contribute to the project is actually correct and is not breaking any of the existing code. So, in this section, -we'll look at testing approaches that can help us ensure +we will look at testing approaches that can help us ensure that the software we write is behaving as intended, and how we can diagnose and fix issues once faults are found. Using such approaches requires us to change our practice of development. diff --git a/_episodes/21-automatically-testing-software.md b/_episodes/21-automatically-testing-software.md index b263054f..2a164cad 100644 --- a/_episodes/21-automatically-testing-software.md +++ b/_episodes/21-automatically-testing-software.md @@ -24,7 +24,7 @@ keypoints: Being able to demonstrate that a process generates the right results is important in any field of research, -whether it's software generating those results or not. +whether it is software generating those results or not. So when writing software we need to ask ourselves some key questions: - Does the code we develop work the way it should do? @@ -41,7 +41,7 @@ Automation can help, and automation where possible is a good thing - it enables us to define a potentially complex process in a repeatable way that is far less prone to error than manual approaches. Once defined, automation can also save us a lot of effort, particularly in the long run. -In this episode we'll look into techniques of automated testing to +In this episode we will look into techniques of automated testing to improve the predictability of a software change, make development more productive, and help us produce code that works as expected and produces desired results. @@ -77,16 +77,16 @@ There are three main types of automated tests: - **Regression tests** make sure that your program's output hasn't changed, for example after making changes your code to add new functionality or fix a bug. -For the purposes of this course, we'll focus on unit tests. -But the principles and practices we'll talk about can be built on +For the purposes of this course, we will focus on unit tests. +But the principles and practices we wll talk about can be built on and applied to the other types of tests too. ## Set Up a New Feature Branch for Writing Tests -We're going to look at how to run some existing tests and also write some new ones, -so let's ensure we're initially on our `develop` branch. +We are going to look at how to run some existing tests and also write some new ones, +so let us ensure we are initially on our `develop` branch. We will create a new feature branch called `test-suite` off the `develop` branch - -a common term we use to refer to sets of tests - that we'll use for our test writing work: +a common term we use to refer to sets of tests - that we will use for our test writing work: ~~~ $ git switch develop @@ -95,7 +95,7 @@ $ git switch -c test-suite {: .language-bash} Good practice is to write our tests around the same time we write our code on a feature branch. -But since the code already exists, we're creating a feature branch for just these extra tests. +But since the code already exists, we are creating a feature branch for just these extra tests. Git branches are designed to be lightweight, and where necessary, transient, and use of branches for even small bits of work is encouraged. @@ -109,7 +109,7 @@ we will merge all of the work into `main`. ## Inflammation Data Analysis -Let's go back to our [patient inflammation software project](/11-software-project/index.html#patient-inflammation-study-project). +Let us go back to our [patient inflammation software project](/11-software-project/index.html#patient-inflammation-study-project). Recall that it is based on a clinical trial of inflammation in patients who have been given a new treatment for arthritis. There are a number of datasets in the `data` directory @@ -119,7 +119,7 @@ and are each stored in comma-separated values (CSV) format: each row holds information for a single patient, and the columns represent successive days when inflammation was measured in patients. -Let's take a quick look at the data now from within the Python command line console. +Let us take a quick look at the data now from within the Python command line console. Change directory to the repository root (which should be in your home directory `~/python-intermediate-inflammation`), ensure you have your virtual environment activated in your command line terminal @@ -224,7 +224,7 @@ but simplicity here allows us to reason about what's happening - and what we need to test - more easily. -Let's now look into how we can test each of our application's statistical functions +Let us now look into how we can test each of our application's statistical functions to ensure they are functioning correctly. @@ -253,8 +253,8 @@ part of NumPy's testing library - to test that our calculated result is the same as our expected result. This function explicitly checks the array's shape and elements are the same, and throws an `AssertionError` if they are not. -In particular, note that we can't just use `==` or other Python equality methods, -since these won't work properly with NumPy arrays in all cases. +In particular, note that we cannot just use `==` or other Python equality methods, +since these will not work properly with NumPy arrays in all cases. We could then add to this with other tests that use and test against other values, and end up with something like: @@ -274,7 +274,7 @@ npt.assert_array_equal(daily_mean(test_input), test_result) ~~~ {: .language-python} -However, if we were to enter these in this order, we'll find we get the following after the first test: +However, if we were to enter these in this order, we will find we get the following after the first test: ~~~ ... @@ -289,19 +289,19 @@ Max relative difference: 0.5 ~~~ {: .output} -This tells us that one element between our generated and expected arrays doesn't match, +This tells us that one element between our generated and expected arrays does not match, and shows us the different arrays. We could put these tests in a separate script to automate the running of these tests. But a Python script halts at the first failed assertion, so the second and third tests aren't run at all. -It would be more helpful if we could get data from all of our tests every time they're run, +It would be more helpful if we could get data from all of our tests every time they are run, since the more information we have, -the faster we're likely to be able to track down bugs. +the faster we are likely to be able to track down bugs. It would also be helpful to have some kind of summary report: if our set of tests - known as a **test suite** - includes thirty or forty tests (as it well might for a complex function or library that's widely used), -we'd like to know how many passed or failed. +we would like to know how many passed or failed. Going back to our failed first test, what was the issue? As it turns out, the test itself was incorrect, and should have read: @@ -325,7 +325,7 @@ Otherwise, our tests hold little value. ### Using a Testing Framework Keeping these things in mind, -here's a different approach that builds on the ideas we've seen so far +here's a different approach that builds on the ideas we have seen so far but uses a **unit testing framework**. In such a framework we define our tests we want to run as functions, and the framework automatically runs each of these functions in turn, @@ -333,7 +333,7 @@ summarising the outputs. And unlike our previous approach, it will run every test regardless of any encountered test failures. -Most people don't enjoy writing tests, +Most people do not enjoy writing tests, so if we want them to actually do it, it must be easy to: @@ -343,7 +343,7 @@ it must be easy to: - Understand those tests' results Test results must also be reliable. -If a testing tool says that code is working when it's not, +If a testing tool says that code is working when it is not, or reports problems when there actually aren't any, people will lose faith in it and stop using it. @@ -399,7 +399,7 @@ these are a specification of: and using `assert_array_equal()` to test its validity - Expected outputs, e.g. our `test_result` NumPy array that we test against -Also, we're defining each of these things for a test case we can run independently +Also, we are defining each of these things for a test case we can run independently that requires no manual intervention. Going back to our list of requirements, how easy is it to run these tests? @@ -409,14 +409,14 @@ You can use it to test things like Python functions, database operations, or even things like service APIs - essentially anything that has inputs and expected outputs. -We'll be using Pytest to write unit tests, +We will be using Pytest to write unit tests, but what you learn can scale to more complex functional testing for applications or libraries. > ## What About Unit Testing Frameworks in Python and Other Languages? > > Other unit testing frameworks exist for Python, > including Nose2 and Unittest, with Unittest supplied as part of the standard Python library. -> It's also worth noting that Pytest supports tests written for Unittest, +> It is also worth noting that Pytest supports tests written for Unittest, > a useful feature if you wish to prioritise use of the standard library initially, > but retain the option to move Pytest in the future. > @@ -442,7 +442,7 @@ but what you learn can scale to more complex functional testing for applications > - unittest-style unit tests can be run from pytest out of the box! > > A common challenge, particularly at the intermediate level, is the selection of a suitable tool from many alternatives -> for a given task. Once you've become accustomed to object-oriented programming you may find unittest a better fit +> for a given task. Once you have become accustomed to object-oriented programming you may find unittest a better fit > for a particular project or team, so you may want to revisit it at a later date. {: .callout} @@ -583,7 +583,7 @@ ensuring that ourselves (and others) always have a set of tests to verify our code at each step of development. This way, when we implement a new feature, we can check a) that the feature works using a test we write for it, and -b) that the development of the new feature doesn't break any existing functionality. +b) that the development of the new feature does not break any existing functionality. ### What About Testing for Errors? @@ -608,7 +608,7 @@ with `import pytest` so that we can use `pytest`'s `raises()` function. Run all your tests as before. -Since we've installed `pytest` to our environment, +Since we have installed `pytest` to our environment, we should also regenerate our `requirements.txt`: ~~~ @@ -616,7 +616,7 @@ $ python3 -m pip freeze > requirements.txt ~~~ {: .language-bash} -Finally, let's commit our new `test_models.py` file, +Finally, let us commit our new `test_models.py` file, `requirements.txt` file, and test cases to our `test-suite` branch, and push this new branch and all its commits to GitHub: diff --git a/_episodes/22-scaling-up-unit-testing.md b/_episodes/22-scaling-up-unit-testing.md index abbe9eb2..a52f48db 100644 --- a/_episodes/22-scaling-up-unit-testing.md +++ b/_episodes/22-scaling-up-unit-testing.md @@ -10,23 +10,23 @@ objectives: - "Use code coverage to understand how much of our code is being tested using unit tests" keypoints: - "We can assign multiple inputs to tests using parametrisation." -- "It's important to understand the **coverage** of our tests across our code." +- "it is important to understand the **coverage** of our tests across our code." - "Writing unit tests takes time, so apply them where it makes the most sense." --- ## Introduction -We're starting to build up a number of tests that test the same function, +We are starting to build up a number of tests that test the same function, but just with different parameters. However, continuing to write a new function for every single test case -isn't likely to scale well as our development progresses. +is not likely to scale well as our development progresses. How can we make our job of writing tests more efficient? And importantly, as the number of tests increases, how can we determine how much of our code base is actually being tested? ## Parameterising Our Unit Tests -So far, we've been writing a single function for every new test we need. +So far, we have been writing a single function for every new test we need. But when we simply want to use the same test code but with different data for another test, it would be great to be able to specify multiple sets of data to use with the same test code. Test **parameterisation** gives us this. @@ -52,7 +52,7 @@ def test_daily_mean(test, expected): {: .language-python} Here, we use Pytest's **mark** capability to add metadata to this specific test - -in this case, marking that it's a parameterised test. +in this case, marking that it is a parameterised test. `parameterize()` function is actually a [Python **decorator**](https://www.programiz.com/python-programming/decorator). A decorator, when applied to a function, @@ -73,7 +73,7 @@ and check to see if it equals `[0, 0]` (our `expected` argument). Similarly, our second test will run `daily_mean()` with `[ [1, 2], [3, 4], [5, 6] ]` and check it produces `[3, 4]`. -The big plus here is that we don't need to write separate functions for each of the tests - +The big plus here is that we do not need to write separate functions for each of the tests - our test code can remain compact and readable as we write more tests and adding more tests scales better as our code becomes more complex. @@ -119,8 +119,8 @@ and adding more tests scales better as our code becomes more complex. Try them out! -Let's commit our revised `test_models.py` file and test cases to our `test-suite` branch -(but don't push them to the remote repository just yet!): +Let us commit our revised `test_models.py` file and test cases to our `test-suite` branch +(but do not push them to the remote repository just yet!): ~~~ $ git add tests/test_models.py @@ -131,16 +131,16 @@ $ git commit -m "Add parameterisation mean, min, max test cases" ## Code Coverage - How Much of Our Code is Tested? -Pytest can't think of test cases for us. +Pytest cannot think of test cases for us. We still have to decide what to test and how many tests to run. Our best guide here is economics: -we want the tests that are most likely to give us useful information that we don't already have. +we want the tests that are most likely to give us useful information that we do not already have. For example, if `daily_mean(np.array([[2, 0], [4, 0]])))` works, -there's probably not much point testing `daily_mean(np.array([[3, 0], [4, 0]])))`, -since it's hard to think of a bug that would show up in one case but not in the other. +there is probably not much point testing `daily_mean(np.array([[3, 0], [4, 0]])))`, +since it is hard to think of a bug that would show up in one case but not in the other. Now, we should try to choose tests that are as different from each other as possible, -so that we force the code we're testing to execute in all the different ways it can - +so that we force the code we are testing to execute in all the different ways it can - to ensure our tests have a high degree of **code coverage**. A simple way to check the code coverage for a set of tests is @@ -197,14 +197,14 @@ TOTAL 9 1 89% ~~~ {: .output} -So there's still one statement not being tested at line 18, -and it turns out it's in the function `load_csv()`. +So there is still one statement not being tested at line 18, +and it turns out it is in the function `load_csv()`. Here we should consider whether or not to write a test for this function, and, in general, any other functions that may not be tested. Of course, if there are hundreds or thousands of lines that are not covered it may not be feasible to write tests for them all. But we should prioritise the ones for which we write tests, considering -how often they're used, +how often they are used, how complex they are, and importantly, the extent to which they affect our program's results. @@ -218,7 +218,7 @@ $ cat requirements.txt {: .language-bash} You'll notice `pytest-cov` and `coverage` have been added. -Let's commit this file and push our new branch to GitHub: +Let us commit this file and push our new branch to GitHub: ~~~ $ git add requirements.txt @@ -287,7 +287,7 @@ $ git push origin test-suite > you can test against that, > conducting multiple test runs that take advantage of the randomness > to fill the known "space" of expected results. -> Note that this isn't as precise or complete, +> Note that this is not as precise or complete, > and bear in mind this could mean you need to run *a lot* of tests > which may take considerable time. {: .callout} @@ -309,9 +309,9 @@ write the code. The main advantages are: - It forces us to think about how our code will be used before we write it -- It prevents us from doing work that we don't need to do, e.g. "I might need this later..." -- It forces us to test that the tests _fail_ before we've implemented the code, meaning we - don't inadvertently forget to add the correct asserts. +- It prevents us from doing work that we do not need to do, e.g. "I might need this later..." +- It forces us to test that the tests _fail_ before we have implemented the code, meaning we + do not inadvertently forget to add the correct asserts. You may also see this process called **Red, Green, Refactor**: 'Red' for the failing tests, @@ -329,7 +329,7 @@ a complex program requires a much higher investment in testing than a simple one Putting it another way, a small script that is only going to be used once, to produce one figure, -probably doesn't need separate testing: +probably does not need separate testing: its output is either correct or not. A linear algebra library that will be used by thousands of people in twice that number of applications over the course of a decade, @@ -337,7 +337,7 @@ on the other hand, definitely does. The key is identify and prioritise against what will most affect the code's ability to generate accurate results. -It's also important to remember that unit testing cannot catch every bug in an application, +it is also important to remember that unit testing cannot catch every bug in an application, no matter how many tests you write. To mitigate this manual testing is also important. Also remember to test using as much input data as you can, diff --git a/_episodes/23-continuous-integration-automated-testing.md b/_episodes/23-continuous-integration-automated-testing.md index 2440ec8b..029a793d 100644 --- a/_episodes/23-continuous-integration-automated-testing.md +++ b/_episodes/23-continuous-integration-automated-testing.md @@ -21,20 +21,20 @@ keypoints: ## Introduction -So far we've been manually running our tests as we require. -Once we've made a change, +So far we have been manually running our tests as we require. +Once we have made a change, or added a new feature with accompanying tests, we can re-run our tests, giving ourselves (and others who wish to run them) increased confidence that everything is working as expected. -Now we're going to take further advantage of automation +Now we are going to take further advantage of automation in a way that helps testing scale across a development team with very little overhead, using **Continuous Integration**. ## What is Continuous Integration? -The automated testing we've done so far only takes into account +The automated testing we have done so far only takes into account the state of the repository we have on our own machines. In a software project involving multiple developers working and pushing changes on a repository, it would be great to know holistically how all these changes are affecting our codebase @@ -56,7 +56,7 @@ Once complete, it presents a report to let you see what happened. There are many CI infrastructures and services, free and paid for, and subject to change as they evolve their features. -We'll be looking at [GitHub Actions](https://github.com/features/actions) - +We will be looking at [GitHub Actions](https://github.com/features/actions) - which unsurprisingly is available as part of GitHub. @@ -66,7 +66,7 @@ which unsurprisingly is available as part of GitHub. YAML is a text format used by GitHub Action workflow files. It is also increasingly used for configuration files and storing other types of data, -so it's worth taking a bit of time looking into this file format. +so it is worth taking a bit of time looking into this file format. [YAML](https://www.commonwl.org/user_guide/yaml/) (a recursive acronym which stands for "YAML Ain't Markup Language") @@ -84,11 +84,11 @@ first_scaled_by: Hans Meyer ~~~ {: .language-yaml} -In general, you don't need quotes for strings, +In general, you do not need quotes for strings, but you can use them when you want to explicitly distinguish between numbers and strings, e.g. `height_metres: "5892"` would be a string, but in the above example it is an integer. -It turns out Hans Meyer isn't the only first ascender of Kilimanjaro, +It turns out Hans Meyer is not the only first ascender of Kilimanjaro, so one way to add this person as another value to this key is by using YAML **arrays**, like this: @@ -127,7 +127,7 @@ with the last of these being another nested key with the keys `year` and `by`. Note the convention of using two spaces for tabs, instead of Python's four. We can also combine maps and arrays to describe more complex data. -Let's say we want to add more detail to our list of initial ascenders: +Let us say we want to add more detail to our list of initial ascenders: ~~~ ... @@ -157,13 +157,13 @@ shakespeare_couplet: | They key `shakespeare_couplet` would hold the full two line string, preserving the new line after sorrow. -As we'll see shortly, GitHub Actions workflows will use all of these. +As we will see shortly, GitHub Actions workflows will use all of these. ### Defining Our Workflow -With a GitHub repository there's a way we can set up CI +With a GitHub repository there is a way we can set up CI to run our tests automatically when we commit changes. -Let's do this now by adding a new file to our repository whilst on the `test-suite` branch. +Let us do this now by adding a new file to our repository whilst on the `test-suite` branch. First, create the new directories `.github/workflows`: ~~~ @@ -174,7 +174,7 @@ $ mkdir -p .github/workflows This directory is used specifically for GitHub Actions, allowing us to specify any number of workflows that can be run under a variety of conditions, which is also written using YAML. -So let's add a new YAML file called `main.yml` +So let us add a new YAML file called `main.yml` (note its extension is `.yml` without the `a`) within the new `.github/workflows` directory: @@ -218,7 +218,7 @@ jobs: ***Note**: be sure to create this file as `main.yml` within the newly created `.github/workflows` directory, -or it won't work!* +or it will not work!* So as well as giving our workflow a name - CI - we indicate with `on` that we want this workflow to run when we `push` commits to our repository. @@ -229,7 +229,7 @@ and each one would run in parallel. Next, we define what our build job will do. With `runs-on` we first state which operating systems we want to use, in this case just Ubuntu for now. -We'll be looking at ways we can scale this up to testing on more systems later. +We will be looking at ways we can scale this up to testing on more systems later. Lastly, we define the `step`s that our job will undertake in turn, to set up the job's environment and run our tests. @@ -247,11 +247,11 @@ Each of these steps are: Otherwise, if we wanted to test against for example Python 3.10, by specifying `3.10` without the quotes, it would be interpreted as the number `3.1` which - - although it's the same number as `3.10` - + although it is the same number as `3.10` - would be interpreted as the wrong version! - **Install latest version of pip, dependencies, and our inflammation package:** In order to locally install our `inflammation` package - it's good practice to upgrade the version of pip that is present first, + it is good practice to upgrade the version of pip that is present first, then we use pip to install our package dependencies. Once installed, we can use `python3 -m pip install -e .` as before to install our own package. We use `run` here to run theses commands in the CI shell environment @@ -287,13 +287,13 @@ $ git push origin test-suite Since we are only committing the GitHub Actions configuration file to the `test-suite` branch for the moment, only the contents of this branch will be used for CI. -We can pass this file upstream into other branches (i.e. via merges) when we're happy it works, +We can pass this file upstream into other branches (i.e. via merges) when we are happy it works, which will then allow the process to run automatically on these other branches. This again highlights the usefulness of the feature-branch model - -we can work in isolation on a feature until it's ready to be passed upstream +we can work in isolation on a feature until it is ready to be passed upstream without disrupting development on other branches, and in the case of CI, -we're starting to see its scaling benefits across a larger scale development team +we are starting to see its scaling benefits across a larger scale development team working across potentially many branches. ### Checking Build Progress and Reports @@ -350,7 +350,7 @@ Using a build matrix we can specify testing environments and parameters (such as operating system, Python version, etc.) and new jobs will be created that run our tests for each permutation of these. -Let's see how this is done using GitHub Actions. +Let us see how this is done using GitHub Actions. To support this, we define a `strategy` as a `matrix` of operating systems and Python versions within `build`. We then use `matrix.os` and `matrix.python-version` to reference these configuration possibilities @@ -403,7 +403,7 @@ This way, every possible permutation of Python versions 3.10 and 3.11 with the latest versions of Ubuntu, Mac OS and Windows operating systems will be tested and we can expect 6 build jobs in total. -Let's commit and push this change and see what happens: +Let us commit and push this change and see what happens: ~~~ $ git add .github/workflows/main.yml diff --git a/_episodes/24-diagnosing-issues-improving-robustness.md b/_episodes/24-diagnosing-issues-improving-robustness.md index 2db5c82b..95847698 100644 --- a/_episodes/24-diagnosing-issues-improving-robustness.md +++ b/_episodes/24-diagnosing-issues-improving-robustness.md @@ -207,7 +207,7 @@ looking something like the following: ![Running pytest in PyCharm](../fig/pytest-pycharm-run-tests.png){: .image-with-shadow width="1000px"} We can also run our test functions individually. -First, let's check that our PyCharm running and testing configurations are correct. +First, let us check that our PyCharm running and testing configurations are correct. Select `Run` > `Edit Configurations...` from the PyCharm menu, and you should see something like the following: @@ -221,8 +221,8 @@ was configured when we set up how to run our script from within PyCharm. The second - `pytest in test_models.py` under `Python tests` - is our recent test configuration. -If you see just these, you're good to go. -We don't need any others, +If you see just these, you are good to go. +We do not need any others, so select any others you see and click the `-` button at the top to remove them. This will avoid any confusion when running our tests separately. Click `OK` when done. @@ -351,7 +351,7 @@ You should be rewarded with: {: .callout} > ## Debugging Outside of an IDE -> It is worth being aware of the fact that you don't need to use an IDE to debug code, +> It is worth being aware of the fact that you do not need to use an IDE to debug code, > although it does certainly make it easier! > The Python standard library comes with a command-line capable debugger built in, called [pdb](https://docs.python.org/3/library/pdb.html). > The easiest way to use it is to put one of these lines @@ -372,7 +372,7 @@ However, when writing your test cases, it is important to consider parameterising them by unusual or extreme values, in order to test all the edge or corner cases that your code could be exposed to in practice. Generally speaking, it is at these extreme cases that you will find your code failing, -so it's beneficial to test them beforehand. +so it is beneficial to test them beforehand. What is considered an "edge case" for a given component depends on what that component is meant to do. @@ -518,7 +518,7 @@ In the previous section, we made a few design choices for our `patient_normalise 1. We are implicitly converting any `NaN` and negative values to 0, 2. Normalising a constant 0 array of inflammation results in an identical array of 0s, -3. We don't warn the user of any of these situations. +3. We do not warn the user of any of these situations. This could have be handled differently. We might decide that we do not want to silently make these changes to the data, @@ -705,7 +705,7 @@ This approach is useful when explicitly checking the precondition is too costly. ## Improving Robustness with Automated Code Style Checks -Let's re-run Pylint over our project after having added some more code to it. +Let us re-run Pylint over our project after having added some more code to it. From the project root do: ~~~ @@ -726,7 +726,7 @@ inflammation/models.py:60:4: W0622: Redefining built-in 'max' (redefined-builtin The above output indicates that by using the local variable called `max` in the `patient_normalise` function, we have redefined a built-in Python function called `max`. -This isn't a good idea and may have some undesired effects +This is not a good idea and may have some undesired effects (e.g. if you redefine a built-in name in a global scope you may cause yourself some trouble which may be difficult to trace). @@ -742,8 +742,8 @@ you may cause yourself some trouble which may be difficult to trace). It may be hard to remember to run linter tools every now and then. Luckily, we can now add this Pylint execution to our continuous integration builds as one of the extra tasks. -Since we're adding an extra feature to our CI workflow, -let's start this from a new feature branch from the `develop` branch: +Since we are adding an extra feature to our CI workflow, +let us start this from a new feature branch from the `develop` branch: ~~~ $ git switch -c pylint-ci develop # note a shorthand for creating a branch from another and switching to it @@ -763,9 +763,9 @@ we can add the following step to our `steps` in `.github/workflows/main.yml`: {: .language-bash} Note we need to add `--fail-under=0` otherwise -the builds will fail if we don't get a 'perfect' score of 10! -This seems unlikely, so let's be more pessimistic. -We've also added `--reports=y` which will give us a more detailed report of the code analysis. +the builds will fail if we do not get a 'perfect' score of 10! +This seems unlikely, so let us be more pessimistic. +We have also added `--reports=y` which will give us a more detailed report of the code analysis. Then we can just add this to our repo and trigger a build: @@ -798,13 +798,13 @@ $ pylint --generate-rcfile > .pylintrc ~~~ {: .language-bash} -Looking at this file, you'll see it's already pre-populated. +Looking at this file, you'll see it is already pre-populated. No behaviour is currently changed from the default by generating this file, but we can amend it to suit our team's coding style. For example, a typical rule to customise - favoured by many projects - is the one involving line length. -You'll see it's set to 100, so let's set that to a more reasonable 120. -While we're at it, let's also set our `fail-under` in this file: +You'll see it is set to 100, so let us set that to a more reasonable 120. +While we are at it, let us also set our `fail-under` in this file: ~~~ ... @@ -817,17 +817,17 @@ max-line-length=120 ~~~ {: .language-bash} -Don't forget to remove the `--fail-under` argument to Pytest +do not forget to remove the `--fail-under` argument to Pytest in our GitHub Actions configuration file too, -since we don't need it anymore. +since we do not need it anymore. -Now when we run Pylint we won't be penalised for having a reasonable line length. +Now when we run Pylint we will not be penalised for having a reasonable line length. For some further hints and tips on how to approach using Pylint for a project, see [this article](https://pythonspeed.com/articles/pylint/). ## Merging to `develop` Branch -Now we're happy with our test suite, we can merge this work +Now we are happy with our test suite, we can merge this work (which currently only exist on our `test-suite` branch) with our parent `develop` branch. Again, this reflects us working with impunity on a logical unit of work, @@ -844,7 +844,7 @@ $ git merge test-suite {: .language-bash} Then, assuming there are no conflicts, -we can push these changes back to the remote repository as we've done before: +we can push these changes back to the remote repository as we have done before: ~~~ $ git push origin develop diff --git a/_episodes/30-section3-intro.md b/_episodes/30-section3-intro.md index 6c1048a9..fe58a9ff 100644 --- a/_episodes/30-section3-intro.md +++ b/_episodes/30-section3-intro.md @@ -79,7 +79,7 @@ Someone who is engineering software takes a wider view: - Who will (or may) be involved: software is written for *stakeholders*. This may only be the researcher initially, but there is an understanding that others may become involved later - (even if that isn't evident yet). + (even if that is not evident yet). A good rule of thumb is to always assume that code will be read and used by others later on, which includes yourself! - Software (or code) is an asset: software inherently contains value - @@ -125,7 +125,7 @@ each stage's outputs flow into the next stage sequentially. Whether projects or people that develop software are aware of them or not, these stages are followed implicitly or explicitly in every software project. What is required for a project (during requirements gathering) is always considered, for example, -even if it isn't explored sufficiently or well understood. +even if it is not explored sufficiently or well understood. Following a **process** of development offers some major benefits: @@ -134,7 +134,7 @@ Following a **process** of development offers some major benefits: if that stage has completed successfully before proceeding to the next one (and even if the next stage is not warranted at all - for example, it may be discovered during requirements of design - that development of the software isn't practical or even required). + that development of the software is not practical or even required). - **Predictability:** each stage is given attention in a logical sequence; the next stage should not begin until prior stages have completed. Returning to a prior stage is possible and may be needed, but may prove expensive, diff --git a/_episodes/31-software-requirements.md b/_episodes/31-software-requirements.md index f8d9bcc3..4000fe3f 100644 --- a/_episodes/31-software-requirements.md +++ b/_episodes/31-software-requirements.md @@ -19,8 +19,8 @@ keypoints: --- The requirements of our software are the basis on which the whole project rests - -if we get the requirements wrong, we'll build the wrong software. -However, it's unlikely that we'll be able to determine all of the requirements upfront. +if we get the requirements wrong, we will build the wrong software. +However, it is unlikely that we will be able to determine all of the requirements upfront. Especially when working in a research context, requirements are flexible and may change as we develop our software. @@ -31,7 +31,7 @@ but at a high level a useful way to split them is into *business requirements*, *user requirements*, and *solution requirements*. -Let's take a look at these now. +Let us take a look at these now. ### Business Requirements @@ -136,7 +136,7 @@ They fall into two key categories: #### The Importance of Non-functional Requirements When considering software requirements, -it's *very* tempting to just think about the features users need. +it is *very* tempting to just think about the features users need. However, many design choices in a software project quite rightly depend on the users themselves and the environment in which the software is expected to run, and these aspects should be considered as part of the software's non-functional requirements. @@ -281,7 +281,7 @@ or does our design need to be revisited? It may not need any changes at all, but if it does not fit logically our design will need a bigger rethink so the new requirement can be implemented in a sensible way. -We'll look at this a bit later in this section, +We will look at this a bit later in this section, but simply adding new code without considering how the design and implementation need to change at a high level can make our software increasingly messy and difficult to change in the future. diff --git a/_episodes/32-software-architecture-design.md b/_episodes/32-software-architecture-design.md index 177393f8..bbf8733b 100644 --- a/_episodes/32-software-architecture-design.md +++ b/_episodes/32-software-architecture-design.md @@ -131,7 +131,7 @@ Now that we know what goals we should aspire to, let us take a critical look at software project and try to identify ways in which it can be improved. Our software project contains a pre-existing branch `full-data-analysis` which contains code for a new feature of our -inflammation analysis software, which we'll consider as a contribution by another developer. +inflammation analysis software, which we will consider as a contribution by another developer. Recall that you can see all your branches as follows: ~~~ @@ -180,7 +180,7 @@ calculates and compares standard deviation across all the data by day and finaly >> plotting the graph you would have to change the `analysis_data()` function. >> * **Hard to modify or test:** it only analyses a set of CSV data files >> matching a very particular hardcoded `inflammation*.csv` pattern, which seems an unreasonable assumption. ->> What if someone wanted to use files which don't match this naming convention? +>> What if someone wanted to use files which do not match this naming convention? >> * **Hard to modify:** it does not have any tests so we cannot be 100% confident the code does >> what it claims to do; any changes to the code may break something and it would be harder and >> more time-consuming to figure out what. diff --git a/_episodes/33-code-decoupling-abstractions.md b/_episodes/33-code-decoupling-abstractions.md index bbfbc22e..acc4c208 100644 --- a/_episodes/33-code-decoupling-abstractions.md +++ b/_episodes/33-code-decoupling-abstractions.md @@ -52,14 +52,14 @@ Benefits of using these techniques include having the codebase that is: * easier to maintain, as changes can be isolated from other parts of the code. -Let's start redesigning our code by introducing some of the abstraction techniques +Let us start redesigning our code by introducing some of the abstraction techniques to incrementally decouple it into smaller components to improve its overall design. In the code from our current branch `full-data-analysis`, you may have noticed that loading data from CSV files from a `data` directory is "hardcoded" into the `analyse_data()` function. Data loading is a functionality separate from data analysis, so firstly -let's decouple the data loading part into a separate component (function). +let us decouple the data loading part into a separate component (function). > ## Exercise: Decouple Data Loading from Data Analysis > @@ -303,7 +303,7 @@ type of polymorphism enabling methods and operators to take parameters of differ We will have a look at the *interface-based polymorphism*. In OOP, it is possible to have different object classes that conform to the same interface. -For example, let's have a look at the following class representing a `Rectangle`: +For example, let us have a look at the following class representing a `Rectangle`: ```python class Rectangle: diff --git a/_episodes/35-software-architecture-revisited.md b/_episodes/35-software-architecture-revisited.md index 9a50cb58..4fc2e2f8 100644 --- a/_episodes/35-software-architecture-revisited.md +++ b/_episodes/35-software-architecture-revisited.md @@ -3,7 +3,7 @@ title: "Software Architecture Revisited" teaching: 15 exercises: 30 questions: -- "How do we handle code contributions that don't fit within our existing architecture?" +- "How do we handle code contributions that do not fit within our existing architecture?" objectives: - "Analyse new code to identify Model, View, Controller aspects." - "Refactor new code to conform to an MVC architecture." @@ -13,7 +13,7 @@ keypoints: - "Try to leave the code in a better state that you found it." --- -In the previous few episodes we've looked at the importance and principles of good software architecture and design, +In the previous few episodes we have looked at the importance and principles of good software architecture and design, and how techniques such as code abstraction and refactoring fulfil that design within an implementation, and help us maintain and improve it as our code evolves. @@ -105,7 +105,7 @@ $ git merge full-data-analysis ~~~ {: .language-bash} -Let's now have a closer look at our Controller, and how can handling command line arguments in Python +Let us now have a closer look at our Controller, and how can handling command line arguments in Python (which is something you may find yourself doing often if you need to run the code from a command line tool). @@ -332,6 +332,6 @@ and maintained within a team by having multiple people have a look and comment on key code changes to see how they fit within the codebase. Such reviews check the correctness of the new code, test coverage, functionality changes, and confirm that they follow the coding guides and best practices. -Let's have a look at some code review techniques available to us. +Let us have a look at some code review techniques available to us. {% include links.md %} diff --git a/_episodes/40-section4-intro.md b/_episodes/40-section4-intro.md index 7b8b1a54..6d5eba08 100644 --- a/_episodes/40-section4-intro.md +++ b/_episodes/40-section4-intro.md @@ -26,18 +26,18 @@ This process of having multiple team members comment on key code changes is call this is one of the most important practices of collaborative software development that helps ensure the ‘good’ coding standards are achieved and maintained within a team, as well as increasing knowledge about the codebase across the team. -We'll thus look at the benefits of reviewing code, +We will thus look at the benefits of reviewing code, in particular, the value of this type of activity within a team, and how this can fit within various ways of team working. -We'll see how GitHub can support code review activities via pull requests, +We will see how GitHub can support code review activities via pull requests, and how we can do these ourselves making use of best practices. -After that, we'll look at some general principles of software maintainability +After that, we will look at some general principles of software maintainability and the benefits that writing maintainable code can give you. There will also be some practice at identifying problems with existing code, and some general, established practices you can apply -when writing new code or to the code you've already written. -We'll also look at how we can package software for release and distribution, +when writing new code or to the code you have already written. +We will also look at how we can package software for release and distribution, using **Poetry** to manage our Python dependencies and produce a code package we can use with a Python package indexing service to illustrate these principles. diff --git a/_episodes/41-code-review.md b/_episodes/41-code-review.md index 33709776..d90aca87 100644 --- a/_episodes/41-code-review.md +++ b/_episodes/41-code-review.md @@ -22,7 +22,7 @@ help the development of software in a team environment, but in an individual set Despite developing tests to check our code - no one else from the team had a look at our code before we merged it into the main development stream. Software is often designed and built as part of a team, -so in this episode we'll be looking at how to manage the process of team software development +so in this episode we will be looking at how to manage the process of team software development and improve our code by engaging in code review process with other team members. ## Collaborative Code Development Models @@ -112,7 +112,7 @@ and has gained popularity within the software development community in recent ye Pull requests are fundamental to how teams review and improve code on GitHub (and similar code sharing platforms) - -they let you tell others about changes you've pushed to a branch in a repository on GitHub +they let you tell others about changes you have pushed to a branch in a repository on GitHub and that your code is ready for review. Once a pull request is opened, you can discuss and review the potential changes with others on the team @@ -161,7 +161,7 @@ how you create the feature branch. In either model, once you are ready to merge your changes in - you will need to specify the base branch and the compare branch. -Let's see this in action - +Let us see this in action - you are going to act as a reviewer on a proposed change to the codebase contributed by a fictional colleague on one of your fellow learner's repository. One of your fellow learners will review the proposed changes on your repository. @@ -241,8 +241,8 @@ Start by understanding what the code _should_ do, by reading the specification/u the pull request description or talking to the developer if need be. In this case, understand what [SR1.1.1](../31-software-requirements/index.html#solution-requirements) means. -Once you're happy, start reading the code (skip the test code for now - we will come back to it later). -You're going to be assessing the code in the following key areas. +Once you are happy, start reading the code (skip the test code for now - we will come back to it later). +you are going to be assessing the code in the following key areas. ##### Is the proposed code readable? diff --git a/_episodes/42-software-reuse.md b/_episodes/42-software-reuse.md index 77122ce2..0705aaf7 100644 --- a/_episodes/42-software-reuse.md +++ b/_episodes/42-software-reuse.md @@ -20,15 +20,15 @@ keypoints: --- ## Introduction -In previous episodes we've looked at skills, practices, and tools to help us +In previous episodes we have looked at skills, practices, and tools to help us design and develop software in a collaborative environment. -In this lesson we'll be looking at -a critical piece of the development puzzle that builds on what we've learnt so far - +In this lesson we will be looking at +a critical piece of the development puzzle that builds on what we have learnt so far - sharing our software with others. ## The Levels of Software Reusability - Good Practice Revisited -Let's begin by taking a closer look at software reusability and what we want from it. +Let us begin by taking a closer look at software reusability and what we want from it. Firstly, whilst we want to ensure our software is reusable by others, as well as ourselves, we should be clear what we mean by 'reusable'. @@ -65,20 +65,20 @@ Where 'others', of course, can include a future version of ourselves. Reproducibility is a cornerstone of science, and scientists who work in many disciplines are expected to document -the processes by which they've conducted their research so it can be reproduced by others. +the processes by which they have conducted their research so it can be reproduced by others. In medicinal, pharmacological, and similar research fields for example, researchers use logbooks which are then used to write up protocols and methods for publication. -Many things we've covered so far contribute directly to making our software +Many things we have covered so far contribute directly to making our software reproducible - and indeed reusable - by others. -A key part of this we'll cover now is software documentation, +A key part of this we will cover now is software documentation, which is ironically very often given short shrift in academia. This is often the case even in fields where the documentation and publication of research method is otherwise taken very seriously. A few reasons for this are that writing documentation is often considered: -- A low priority compared to actual research (if it's even considered at all) +- A low priority compared to actual research (if it is even considered at all) - Expensive in terms of effort, with little reward - Writing documentation is boring! @@ -87,8 +87,8 @@ and is most effective when used to explain complex interfaces or behaviour, or the reasoning behind why something is coded a certain way. But code comments only go so far. -Whilst it's certainly arguable that writing documentation isn't as exciting as writing code, -it doesn't have to be expensive and brings many benefits. +Whilst it is certainly arguable that writing documentation is not as exciting as writing code, +it does not have to be expensive and brings many benefits. In addition to enabling general reproducibility by others, documentation... - Helps bring new staff researchers and developers up to speed quickly with using the software @@ -103,8 +103,8 @@ In addition to enabling general reproducibility by others, documentation... - Importantly, it can enable others to understand the software sufficiently to *modify and reuse* it to do different things -In the next section we'll see that writing -a sensible minimum set of documentation in a single document doesn't have to be expensive, +In the next section we will see that writing +a sensible minimum set of documentation in a single document does not have to be expensive, and can greatly aid reproducibility. ### Writing a README @@ -112,7 +112,7 @@ and can greatly aid reproducibility. A README file is the first piece of documentation (perhaps other than publications that refer to it) that people should read to acquaint themselves with the software. -It concisely explains what the software is about and what it's for, +It concisely explains what the software is about and what it is for, and covers the steps necessary to obtain and install the software and use it to accomplish basic tasks. Think of it not as a comprehensive reference of all functionality, @@ -120,8 +120,8 @@ but more a short tutorial with links to further information - hence it should contain brief explanations and be focused on instructional steps. Our repository already has a README that describes the purpose of the repository for this workshop, -but let's replace it with a new one that describes the software itself. -First let's delete the old one: +but let us replace it with a new one that describes the software itself. +First let us delete the old one: ~~~ $ rm README.md @@ -137,16 +137,16 @@ or as source files for rendering them with formatting structures, and are very quick to write. GitHub provides a very useful [guide to writing Markdown][github-markdown] for its repositories. -Let's start writing `README.md` using a text editor of your choice and add the following line. +Let us start writing `README.md` using a text editor of your choice and add the following line. ~~~ # Inflam ~~~ {: .language-markdown} -So here, we're giving our software a name. +So here, we are giving our software a name. Ideally something unique, short, snappy, and perhaps to some degree an indicator of what it does. -We would ideally rename the repository to reflect the new name, but let's leave that for now. +We would ideally rename the repository to reflect the new name, but let us leave that for now. In Markdown, the `#` designates a heading, two `##` are used for a subheading, and so on. The Software Sustainability Institute's [guide on naming projects][ssi-choosing-name] @@ -160,7 +160,7 @@ Inflam is a data management system written in Python that manages trial data use ~~~ {: .language-markdown} -To give readers an idea of the software's capabilities, let's add some key features next: +To give readers an idea of the software's capabilities, let us add some key features next: ~~~ # Inflam @@ -177,7 +177,7 @@ Here are some key features of Inflam: {: .language-markdown} As well as knowing what the software aims to do and its key features, -it's very important to specify what other software and related dependencies +it is very important to specify what other software and related dependencies are needed to use the software (typically called `dependencies` or `prerequisites`): ~~~ @@ -205,7 +205,7 @@ The following optional packages are required to run Inflam's unit tests: ~~~ {: .language-markdown} -Here we're making use of Markdown links, +Here we are making use of Markdown links, with some text describing the link within `[]` followed by the link itself within `()`. One really neat feature - and a common practice - of using many CI infrastructures is that @@ -237,7 +237,7 @@ but there are other aspects we should also cover: - *Credits/acknowledgements:* where appropriate, be sure to credit those who have helped in the software's development or inspired it - *Citation:* particularly for academic software, - it's a very good idea to specify a reference to an appropriate academic publication + it is a very good idea to specify a reference to an appropriate academic publication so other academics can cite use of the software in their own publications and media. You can do this within a separate [CITATION text file](https://github.com/citation-file-format/citation-file-format) @@ -248,7 +248,7 @@ For more verbose sections, there are usually just highlights in the README with links to further information, which may be held within other Markdown files within the repository or elsewhere. -We'll finish these off later. +We will finish these off later. See [Matias Singer's curated list of awesome READMEs](https://github.com/matiassingers/awesome-readme) for inspiration. ### Other Documentation @@ -258,7 +258,7 @@ writing and making available that's beyond the scope of this course. The key is to consider which audiences you need to write for, e.g. end users, developers, maintainers, etc., and what they need from the documentation. -There's a Software Sustainability Institute +There is a Software Sustainability Institute [blog post on best practices for research software documentation](https://www.software.ac.uk/blog/2019-06-21-what-are-best-practices-research-software-documentation) that helpfully covers the kinds of documentation to consider and other effective ways to convey the same information. @@ -344,9 +344,9 @@ or [tl;dr Legal](https://tldrlegal.com/) sites can help. ## Merging into `main` -Once you've done these updates, +Once you have done these updates, commit your changes, -and if you're doing this work on a feature branch also ensure you merge it into `develop`, +and if you are doing this work on a feature branch also ensure you merge it into `develop`, e.g.: ~~~ @@ -355,7 +355,7 @@ $ git merge my-feature-branch ~~~ {: .language-bash} -Finally, once we've fully tested our software +Finally, once we have fully tested our software and are confident it works as expected on `develop`, we can merge our `develop` branch into `main`: @@ -373,7 +373,7 @@ The software on your `main` branch is now ready for release. There are many ways in which Git and GitHub can help us make a software release from our code. One of these is via **tagging**, where we attach a human-readable label to a specific commit. -Let's see what tags we currently have in our repository: +Let us see what tags we currently have in our repository: ~~~ $ git tag @@ -441,7 +441,7 @@ index 4818abb..5b8e7fd 100644 + +## Contributing +- Create an issue [here](https://github.com/Onoddil/python-intermediate-inflammation/issues) -+ - What works, what doesn't? You tell me ++ - What works, what does not? You tell me +- Randomly edit some code and see if it improves things, then submit a [pull request](https://github.com/Onoddil/python-intermediate-inflammation/pulls) +- Just yell at me while I edit the code, pair programmer style! + @@ -490,7 +490,7 @@ $ git push origin v1.0.0 {: .callout} We can now use the more memorable tag to refer to this specific commit. -Plus, once we've pushed this back up to GitHub, +Plus, once we have pushed this back up to GitHub, it appears as a specific release within our code repository which can be downloaded in compressed `.zip` or `.tar.gz` formats. Note that these downloads just contain the state of the repository at that commit, @@ -517,7 +517,7 @@ which may include funding council, institutional, national, and even international policies and laws. -Within Europe, for example, there's the need to conform to things like [GDPR][gdpr]. -It's a very good idea to make yourself aware of these aspects. +Within Europe, for example, there is the need to conform to things like [GDPR][gdpr]. +it is a very good idea to make yourself aware of these aspects. {% include links.md %} diff --git a/_episodes/43-software-release.md b/_episodes/43-software-release.md index a0670dca..b918dd0f 100644 --- a/_episodes/43-software-release.md +++ b/_episodes/43-software-release.md @@ -1,4 +1,5 @@ --- +--- title: "Packaging Code for Release and Distribution" teaching: 0 exercises: 20 @@ -17,7 +18,7 @@ keypoints: ## Why Package our Software? -We've now got our software ready to release - +We have now got our software ready to release - the last step is to package it up so that it can be distributed. For very small pieces of software, @@ -30,12 +31,12 @@ e.g. a list of dependencies. By distributing our code as a package, we reduce the complexity of fetching, installing and integrating it for the end-users. -In this session we'll introduce +In this session we will introduce one widely used method for building an installable package from our code. There are range of methods in common use, -so it's likely you'll also encounter projects which take different approaches. +so it is likely you will also encounter projects which take different approaches. -There's some confusing terminology in this episode around the use of the term "package". +There is some confusing terminology in this episode around the use of the term "package". This term is used to refer to both: - A directory containing Python files / modules and an `__init__.py` - a "module package" - A way of structuring / bundling a project for easier distribution and installation - @@ -45,9 +46,9 @@ This term is used to refer to both: ### Installing Poetry -Because we've recommended GitBash if you're using Windows, -we're going to install Poetry using a different method to the officially recommended one. -If you're on MacOS or Linux, +Because we have recommended GitBash if you are using Windows, +we are going to install Poetry using a different method to the officially recommended one. +If you are on MacOS or Linux, are comfortable with installing software at the command line and want to use Poetry to manage multiple projects, you may instead prefer to follow the official @@ -73,12 +74,12 @@ $ which poetry ~~~ {: .output} -If you don't get similar output, -make sure you've got the correct virtual environment activated. +If you do not get similar output, +make sure you have got the correct virtual environment activated. Poetry can also handle virtual environments for us, so in order to behave similarly to how we used them previously, -let's change the Poetry config to put them in the same directory as our project: +let us change the Poetry config to put them in the same directory as our project: ~~~ bash $ poetry config virtualenvs.in-project true @@ -97,7 +98,7 @@ It is described in Make sure you are in the root directory of your software project and have activated your virtual environment, -then we're ready to begin. +then we are ready to begin. To create a `pyproject.toml` file for our code, we can use `poetry init`. This will guide us through the most important settings - @@ -181,10 +182,10 @@ When we add a dependency using Poetry, Poetry will add it to the list of dependencies in the `pyproject.toml` file, add a reference to it in a new `poetry.lock` file, and automatically install the package into our virtual environment. -If we don't yet have a virtual environment activated, +If we do not yet have a virtual environment activated, Poetry will create it for us - using the name `.venv`, so it appears hidden unless we do `ls -a`. -Because we've already activated a virtual environment, Poetry will use ours instead. +Because we have already activated a virtual environment, Poetry will use ours instead. The `pyproject.toml` file has two separate lists, allowing us to distinguish between runtime and development dependencies. @@ -199,7 +200,7 @@ These two sets of dependencies will be used in different circumstances. When we build our package and upload it to a package repository, Poetry will only include references to our runtime dependencies. This is because someone installing our software through a tool like `pip` is only using it, -but probably doesn't intend to contribute to the development of our software +but probably does not intend to contribute to the development of our software and does not require development dependencies. In contrast, if someone downloads our code from GitHub, @@ -208,7 +209,7 @@ and installs the project that way, they will get both our runtime and development dependencies. If someone is downloading our source code, that suggests that they intend to contribute to the development, -so they'll need all of our development tools. +so they will need all of our development tools. Have a look at the `pyproject.toml` file again to see what's changed. @@ -219,9 +220,9 @@ make sure that our code is organised in the recommended structure. This is the Python module structure - a directory containing an `__init__.py` and our Python source code files. Make sure that the name of this Python package -(`inflammation` - unless you've renamed it) +(`inflammation` - unless you have renamed it) matches the name of your distributable package in `pyproject.toml` -unless you've chosen to explicitly list the module packages. +unless you have chosen to explicitly list the module packages. By convention distributable package names use hyphens, whereas module package names use underscores. @@ -229,7 +230,7 @@ While we could choose to use underscores in a distributable package name, we cannot use hyphens in a module package name, as Python will interpret them as a minus sign in our code when we try to import them. -Once we've got our `pyproject.toml` configuration done and our project is in the right structure, +Once we have got our `pyproject.toml` configuration done and our project is in the right structure, we can go ahead and build a distributable version of our software: ~~~ @@ -240,12 +241,12 @@ $ poetry build This should produce two files for us in the `dist` directory. The one we care most about is the `.whl` or **wheel** file. This is the file that `pip` uses to distribute and install Python packages, -so this is the file we'd need to share with other people who want to install our software. +so this is the file we would need to share with other people who want to install our software. Now if we gave this wheel file to someone else, they could install it using `pip` - -you don't need to run this command yourself, -you've already installed it using `poetry install` above. +you do not need to run this command yourself, +you have already installed it using `poetry install` above. ~~~ $ python3 -m pip install dist/inflammation*.whl @@ -256,13 +257,13 @@ The star in the line above is a **wildcard**, that means Bash should use any filenames that match that pattern, with any number of characters in place for the star. We could also rely on Bash's autocomplete functionality and type `dist/inflammation`, -then hit the Tab key if we've only got one version built. +then hit the Tab key if we have only got one version built. -After we've been working on our code for a while and want to publish an update, +After we have been working on our code for a while and want to publish an update, we just need to update the version number in the `pyproject.toml` file (using [SemVer](https://semver.org/) perhaps), then use Poetry to build and publish the new version. -If we don't increment the version number, +If we do not increment the version number, people might end up using this version, even though they thought they were using the previous one. Any re-publishing of the package, no matter how small the changes, @@ -275,17 +276,17 @@ $ poetry build ~~~ {: .language-bash} -In addition to the commands we've already seen, +In addition to the commands we have already seen, Poetry contains a few more that can be useful for our development process. For the full list see the [Poetry CLI documentation](https://python-poetry.org/docs/cli/). The final step is to publish our package to a package repository. A package repository could be either public or private - while you may at times be working on public projects, -it's likely the majority of your work will be published internally +it is likely the majority of your work will be published internally using a private repository such as JFrog Artifactory. Every repository may be configured slightly differently, -so we'll leave that to you to investigate. +so we will leave that to you to investigate. ## What if We Need More Control? diff --git a/_episodes/51-managing-software.md b/_episodes/51-managing-software.md index 464cf7f3..9f729f54 100644 --- a/_episodes/51-managing-software.md +++ b/_episodes/51-managing-software.md @@ -55,7 +55,7 @@ go to the `Settings` tab, scroll down to the `Features` section and activate the ![List of project issues in GitHub](../fig/github-issue-list.png){: .image-with-shadow width="1000px"} -Let's go through the process of creating a new issue. +Let us go through the process of creating a new issue. Start by clicking the `New issue` button. ![Creating a new issue in GitHub](../fig/github-new-issue.png){: .image-with-shadow width="1000px"} @@ -80,7 +80,7 @@ The [default labels available in GitHub](https://docs.github.com/en/issues/using - `help wanted` - indicates that a maintainer wants help on an issue or pull request - `invalid` - indicates that an issue, pull request, or discussion is no longer relevant - `question` - indicates that an issue, pull request, or discussion needs more information -- `wontfix` - indicates that work won't continue on an issue, pull request, or discussion +- `wontfix` - indicates that work will not continue on an issue, pull request, or discussion You can also create your own custom labels to help with classifying issues. There are no rules really about naming the labels - @@ -104,7 +104,7 @@ They should also be clear on what the bug reporter considers factual and speculation ("I think it was caused by this"). If an error report was generated from the software itself, -it's a very good idea to include that in the issue. +it is a very good idea to include that in the issue. The `enhancement` label is a great way to communicate your future priorities to your collaborators but also to yourself - @@ -122,10 +122,10 @@ the easier the code is to use, the more widely it will be adopted and the greater impact it will have. One interesting label is `wontfix`, -which indicates that an issue simply won't be worked on for whatever reason. +which indicates that an issue simply will not be worked on for whatever reason. Maybe the bug it reports is outside of the use case of the software, -or the feature it requests simply isn't a priority. -This can make it clear you've thought about an issue and dismissed it. +or the feature it requests simply is not a priority. +This can make it clear you have thought about an issue and dismissed it. > ## Locking and Pinning Issues > The **Lock conversation** and **Pin issue** buttons are both available @@ -211,11 +211,11 @@ GitHub also lets you mention/reference one issue or pull request from another Whilst writing the description of an issue, or commenting on one, if you type # you should see a list of the issues and pull requests on the repository. -They are coloured green if they're open, or white if they're closed. +They are coloured green if they are open, or white if they are closed. Continue typing the issue number, and the list will narrow down, then you can hit Return to select the entry and link the two. For example, if you realise that several of your bugs have common roots, -or that one enhancement can't be implemented before you've finished another, +or that one enhancement cannot be implemented before you have finished another, you can use the mention system to indicate the depending issue(s). This is a simple way to add much more information to your issues. @@ -253,11 +253,11 @@ and link to the commit in question. > and document what conversations were held around particular issues. > As a sole developer, and possibly also the only user of the code, > you might be tempted to not bother with recording issues, comments and new features -> as you don't need to communicate the information to anyone else. +> as you do not need to communicate the information to anyone else. > -> Unfortunately, human memory isn't infallible! +> Unfortunately, human memory is not infallible! > After spending six months on a different topic, -> it's inevitable you'll forget some of the plans you had and problems you faced. +> it is inevitable you'll forget some of the plans you had and problems you faced. > Not documenting these things can lead to you having to > re-learn things you already put the effort into discovering before. > Also, if others are brought on to the project at a later date, @@ -317,7 +317,7 @@ and see if they are suitable for your software development workflow. > to help you plan and track your team's work effectively. {: .callout} -Let's have a quick look at how Projects are created in GitHub - we will not use them much in +Let us have a quick look at how Projects are created in GitHub - we will not use them much in this course but it is good to be aware of how to make use of them when suitable. 1. From your GitHub account's home page (not your repository's home page!), diff --git a/_episodes/52-assessing-software-suitability-improvement.md b/_episodes/52-assessing-software-suitability-improvement.md index 4e14c789..7f261183 100644 --- a/_episodes/52-assessing-software-suitability-improvement.md +++ b/_episodes/52-assessing-software-suitability-improvement.md @@ -10,23 +10,23 @@ objectives: - "Conduct an assessment of software against suitability criteria" - "Describe what should be included in software issue reports and register them" keypoints: -- "It's as important to have a critical attitude to adopting software as we do to developing it." +- "It is as important to have a critical attitude to adopting software as we do to developing it." - "As a team agree on who and to what extent you will support software you make available to others." --- ## Introduction -What we've been looking at so far enables us to adopt +What we have been looking at so far enables us to adopt a more proactive and managed attitude and approach to the software we develop. But we should also adopt this attitude when selecting and making use of third-party software we wish to use. -With pressing deadlines it's very easy to reach for +With pressing deadlines it is very easy to reach for a piece of software that appears to do what you want -without considering properly whether it's a good fit for your project first. +without considering properly whether it is a good fit for your project first. A chain is only as strong as its weakest link, and our software may inherit weaknesses in any dependent software or create other problems. -Overall, when adopting software to use it's important to consider +Overall, when adopting software to use it is important to consider not only whether it has the functionality you want, but a broader range of qualities that are important for your project. Adopting a critical mindset when assessing other software against suitability criteria @@ -89,10 +89,10 @@ will help you adopt the same attitude when assessing your own software for futur > the types of support (e.g. bug resolution, helping develop tailored solutions), > and expectations for support in the future (e.g. when project funding runs out) > -> All of this requires effort, and you can't do everything. -> It's therefore important to agree and be clear on +> All of this requires effort, and you cannot do everything. +> It is therefore important to agree and be clear on > how the software will be supported from the outset, -> whether it's within the context of a single laboratory, +> whether it is within the context of a single laboratory, > project, > or other collaboration, > or across an entire community. diff --git a/_episodes/53-improvement-through-feedback.md b/_episodes/53-improvement-through-feedback.md index 6cc931f7..bd4e830f 100644 --- a/_episodes/53-improvement-through-feedback.md +++ b/_episodes/53-improvement-through-feedback.md @@ -24,7 +24,7 @@ keypoints: When a software project has been around for even just a short amount of time, you'll likely discover many aspects that can be improved. These can come from issues that have been registered via collaborators or users, -but also those you're aware of internally, +but also those you are aware of internally, which should also be registered as issues. When starting a new software project, you'll also have to determine how you'll handle all the requirements. @@ -39,14 +39,14 @@ There are also many other draws on our time in addition to the research, development, and writing of publications that we do, which makes it all the more important to prioritise our time for development effectively. -In this lesson we'll be looking at prioritising work we need to do +In this lesson we will be looking at prioritising work we need to do and what we can use from the agile perspective of project management to help us do this in our software projects. ## Estimation as a Foundation for Prioritisation -For simplicity, we'll refer to our issues as *requirements*, +For simplicity, we will refer to our issues as *requirements*, since that's essentially what they are - new requirements for our software to fulfil. @@ -69,7 +69,7 @@ since we cannot meaningfully prioritise requirements without knowing what the effort tradeoffs will be. Even if we know how important each requirement is, how would we even know if completing the project is possible? -Or if we don't know how long it will take +Or if we do not know how long it will take to deliver those requirements we deem to be critical to the success of a project, how can we know if we can include other less important ones? @@ -187,7 +187,7 @@ you have still delivered it *successfully*. ### GitHub's Milestones -Once we've decided on those we'll work on (i.e. not Won't Haves), +Once we have decided on those we will work on (i.e. not Won't Haves), we can (optionally) use a GitHub's **Milestone** to organise them for a particular timebox. Remember, a milestone is a collection of issues to be worked on in a given period (or timebox). We can create a new one by selecting `Issues` on our repository, @@ -209,14 +209,14 @@ and assign them to our milestone from the `Issues` page or from an individual is ![Milestones in GitHub](../fig/github-assign-milestone.png){: .image-with-shadow width="1000px"} -Let's now use Milestones to plan and prioritise our team's next sprint. +Let us now use Milestones to plan and prioritise our team's next sprint. > ## Exercise: Prioritise! > > Put your stakeholder hats on, and as a team apply MoSCoW to the repository issues > to determine how you will prioritise effort to resolve them in the allotted time. > Try to stick to the 60/20/20 rule, -> and assign all issues you'll be working on (i.e. not `Won't Haves`) to a new milestone, +> and assign all issues you will be working on (i.e. not `Won't Haves`) to a new milestone, > e.g. "Tidy up documentation" or "version 0.1". > > @@ -257,7 +257,7 @@ and serves to highlight any blockers and challenges to meeting the sprint goal. {: .challenge} Depending on how many issues were registered on your repository, -it's likely you won't have resolved all the issues in this first milestone. +it is likely you will not have resolved all the issues in this first milestone. Of course, in reality, a sprint would be over a much longer period of time. In any event, as the development progresses into future sprints any unresolved issues can be reconsidered and prioritised for another milestone, diff --git a/_episodes/60-wrap-up.md b/_episodes/60-wrap-up.md index 8dcf1b27..2436d668 100644 --- a/_episodes/60-wrap-up.md +++ b/_episodes/60-wrap-up.md @@ -48,25 +48,25 @@ which relies on the same best practices taught in this course: - Collaborative techniques and tools play an important part of research software development in teams, but also have benefits in solo development. - We've looked at the benefits of a well-considered development environment, + We have looked at the benefits of a well-considered development environment, using practices, tools and infrastructure to help us write code more effectively in collaboration with others. -- We've looked at the importance of being able to +- We have looked at the importance of being able to verify the correctness of software and automation, and how we can leverage techniques and infrastructure to automate and scale tasks such as testing to save us time - but automation has a role beyond simply testing: what else can you automate that would save you even more time? - Once found, we've also examined how to locate faults in our software. -- We've gone beyond procedural programming and explored different software design paradigms, + Once found, we have also examined how to locate faults in our software. +- We have gone beyond procedural programming and explored different software design paradigms, such as object-oriented and functional styles of programming. - We've contrasted their pros, cons, and the situations in which they work best, + We have contrasted their pros, cons, and the situations in which they work best, and how separation of concerns through modularity and architectural design can help shape good software. - As an intermediate developer, aspects other than technical skills become important, particularly in development teams. - We've looked at the importance of good, + We have looked at the importance of good, consistent practices for team working, and the importance of having a self-critical mindset when developing software, and ways to manage feedback effectively and efficiently. diff --git a/_extras/common-issues.md b/_extras/common-issues.md index dca82e16..3e87bd5f 100644 --- a/_extras/common-issues.md +++ b/_extras/common-issues.md @@ -229,7 +229,7 @@ This issue seems to only occur with older versions of PyCharm - recent versions ### Invalid YAML Issue If YAML is copy+pasted from the course material, it might not get pasted correctly in PyCharm and some extra indentation may occur. -Annoyingly, PyCharm won't flag this up as invalid YAML +Annoyingly, PyCharm will not flag this up as invalid YAML and learners may get all sort of different issues and errors with these files - e.g. ‘actions must start with run or uses’ with GitHub Actions workflows. diff --git a/_extras/databases.md b/_extras/databases.md index 7a2bd96a..023f8768 100644 --- a/_extras/databases.md +++ b/_extras/databases.md @@ -62,7 +62,7 @@ Foreign Keys i.e. this doctor *has a* patient. While relational databases are typically accessed using **SQL queries**, -we're going to use a library to help us translate between Python and the database. +we are going to use a library to help us translate between Python and the database. [SQLAlchemy](https://www.sqlalchemy.org/) is a popular Python library which contains an **Object Relational Mapping** (ORM) framework. @@ -83,7 +83,7 @@ A mapping is the core component of an ORM - it describes how to convert between our Python classes and the contents of our database tables. Typically, we can take our existing classes and convert them into mappings with a little modification, -so we don't have to start from scratch. +so we do not have to start from scratch. ~~~ # file: inflammation/models.py @@ -109,15 +109,15 @@ class Patient(Base): ~~~ {: .language-python} -Now that we've defined how to translate between our Python class and a database table, +Now that we have defined how to translate between our Python class and a database table, we need to hook our code up to an actual database. -The library we're using, SQLAlchemy, does everything through a database **engine**. +The library we are using, SQLAlchemy, does everything through a database **engine**. This is essentially a wrapper around the real database, -so we don't have to worry about which particular database software is being used - +so we do not have to worry about which particular database software is being used - we just need to write code for a generic relational database. -For these lessions we're going to use the SQLite engine +For these lessions we are going to use the SQLite engine as this requires almost no configuration and no external software. Most relational database software runs as a separate service which we can connect to from our code. This means that in a large scale environment, @@ -128,13 +128,13 @@ Some examples of databases which are used like this are PostgreSQL, MySQL and MS On the other hand, SQLite runs entirely within our software and uses only a single file to hold its data. -It won't give us +It will not give us the extremely high performance or reliability of a properly configured PostgreSQL database, -but it's good enough in many cases and much less work to get running. +but it is good enough in many cases and much less work to get running. -Let's write some test code to setup and connect to an SQLite database. -For now we'll store the database in memory rather than an actual file - -it won't actually allow us to store data after the program finishes, +Let us write some test code to setup and connect to an SQLite database. +For now we will store the database in memory rather than an actual file - +it will not actually allow us to store data after the program finishes, but it allows us not to worry about **migrations**. > ## Migrations @@ -143,9 +143,9 @@ but it allows us not to worry about **migrations**. > we need to get the database to update its tables to make sure they match the new format. > This is what the `Base.metadata.create_all` method does - > creates all of these tables from scratch -> because we're using an in-memory database which we know will be removed between runs. +> because we are using an in-memory database which we know will be removed between runs. > -> If we're actually storing data persistently, +> If we are actually storing data persistently, > we need to make sure that when we change the mapping, > we update the database tables without damaging any of the data they currently contain. > We could do this manually, @@ -157,7 +157,7 @@ but it allows us not to worry about **migrations**. > will compare our mappings to the known state of the database > and generate a Python file which updates the database to the necessary state. > -> Migrations can be quite complex, so we won't be using them here - +> Migrations can be quite complex, so we will not be using them here - > but you may find it useful to read about them later. {: .callout} @@ -175,7 +175,7 @@ def test_sqlalchemy_patient_search(): """Test that we can save and retrieve patient data from a database.""" from inflammation.models import Base, Patient - # Setup a database connection - we're using a database stored in memory here + # Setup a database connection - we are using a database stored in memory here engine = create_engine('sqlite:///:memory:', echo=True) Session = sessionmaker(bind=engine) session = Session() @@ -195,27 +195,27 @@ def test_sqlalchemy_patient_search(): ~~~ {: .language-python} -For this test, we've imported our models inside the test function, +For this test, we have imported our models inside the test function, rather than at the top of the file like we normally would. This is not recommended in normal code, -as it means we're paying the performance cost of importing every time we run the function, +as it means we are paying the performance cost of importing every time we run the function, but can be useful in test code. Since each test function only runs once per test session, -this performance cost isn't as important as a function we were going to call many times. -Additionally, if we try to import something which doesn't exist, it will fail - +this performance cost is not as important as a function we were going to call many times. +Additionally, if we try to import something which does not exist, it will fail - by imporing inside the test function, we limit this to that specific test failing, rather than the whole file failing to run. ### Relationships -Relational databases don't typically have an 'array of numbers' column type, +Relational databases do not typically have an 'array of numbers' column type, so how are we going to represent our observations of our patients' inflammation? Well, our first step is to create a table of observations. We can then use a **foreign key** to point from the observation to a patient, so we know which patient the data belongs to. The table also needs a column for the actual measurement - -we'll call this `value` - +we will call this `value` - and a column for the day the measurement was taken on. We can also use the ORM's `relationship` helper function @@ -255,20 +255,20 @@ class Patient(Base): > ## Time is Hard > -> We're using an integer field to store the day on which a measurement was taken. +> We are using an integer field to store the day on which a measurement was taken. > This keeps us consistent with what we had previously -> as it's essentialy the position of the measurement in the Numpy array. +> as it is essentialy the position of the measurement in the Numpy array. > It also avoids us having to worry about managing actual date / times. > -> The Python `datetime` module we've used previously in the Academics example would be useful here, +> The Python `datetime` module we have used previously in the Academics example would be useful here, > and most databases have support for 'date' and 'time' columns, -> but to reduce the complexity, we'll just use integers here. +> but to reduce the complexity, we will just use integers here. {: .callout} Our test code for this is going to look very similar to our previous test code, so we can copy-paste it and make a few changes. This time, after setting up the database, we need to add a patient and an observation. -We then test that we can get the observations from a patient we've searched for. +We then test that we can get the observations from a patient we have searched for. ~~~ # file: tests/test_models.py @@ -279,7 +279,7 @@ def test_sqlalchemy_observations(): """Test that we can save and retrieve inflammation observations from a database.""" from inflammation.models import Base, Observation, Patient - # Setup a database connection - we're using a database stored in memory here + # Setup a database connection - we are using a database stored in memory here engine = create_engine('sqlite:///:memory:', echo=True) Session = sessionmaker(bind=engine) session = Session() @@ -303,9 +303,9 @@ def test_sqlalchemy_observations(): ~~~ {: .language-python} -Finally, let's put in a way to convert all of our observations into a Numpy array, +Finally, let us put in a way to convert all of our observations into a Numpy array, so we can use our previous analysis code. -We'll use the `property` decorator here again, +We will use the `property` decorator here again, to create a method that we can use as if it was a normal data attribute. ~~~ @@ -336,7 +336,7 @@ class Patient(Base): ~~~ {: .language-python} -Once again we'll copy-paste the test code and make some changes. +Once again we will copy-paste the test code and make some changes. This time we want to create a few observations for our patient and test that we can turn them into a Numpy array. @@ -347,7 +347,7 @@ def test_sqlalchemy_observations_to_array(): """Test that we can save and retrieve inflammation observations from a database.""" from inflammation.models import Base, Observation, Patient - # Setup a database connection - we're using a database stored in memory here + # Setup a database connection - we are using a database stored in memory here engine = create_engine('sqlite:///:memory:') Session = sessionmaker(bind=engine) session = Session() @@ -371,7 +371,7 @@ def test_sqlalchemy_observations_to_array(): > ## Further Array Testing > -> There's an important feature of the behaviour of our `Patient.values` property +> there is an important feature of the behaviour of our `Patient.values` property > that's not currently being tested. > What is this feature? > Write one or more extra tests to cover this feature. @@ -385,7 +385,7 @@ def test_sqlalchemy_observations_to_array(): > > > > If this is intended behaviour, > > it would be useful to write a test for it, -> > to ensure that we don't break it in future. +> > to ensure that we do not break it in future. > > Using tests in this way is known as **regression testing**. > > > {: .solution} @@ -393,16 +393,16 @@ def test_sqlalchemy_observations_to_array(): > ## Refactoring for Reduced Redundancy > -> You've probably noticed that there's a lot of replicated code in our database tests. -> It's fine if some code is replicated a bit, +> You have probably noticed that there is a lot of replicated code in our database tests. +> It is fine if some code is replicated a bit, > but if you keep needing to copy the same code, > that's a sign it should be refactored. > > Refactoring is the process of changing the structure of our code, > without changing its behaviour, > and one of the main benefits of good test coverage is that it makes refactoring easier. -> If we've got a good set of tests, -> it's much more likely that we'll detect any changes to behaviour - +> If we have got a good set of tests, +> it is much more likely that we will detect any changes to behaviour - > even when these changes might be in the tests themselves. > > Try refactoring the database tests to see if you can @@ -413,9 +413,9 @@ def test_sqlalchemy_observations_to_array(): > ## Advanced Challenge: Connecting More Views > -> We've added the ability to store patient records in the database, +> We have added the ability to store patient records in the database, > but not actually connected it to any useful views. -> There's a common pattern in data management software +> there is a common pattern in data management software > which is often refered to as **CRUD** - Create, Read, Update, Delete. > These are the four fundamental views that we need to provide > to allow people to manage their data effectively. @@ -426,7 +426,7 @@ def test_sqlalchemy_observations_to_array(): > show an existing record, > update an existing record > and delete an existing record. -> It's also sometimes useful to provide a view which lists all existing records for each type - +> It is also sometimes useful to provide a view which lists all existing records for each type - > for example, a list of all patients would probably be useful, > but a list of all observations might not be. > diff --git a/_extras/functional-programming.md b/_extras/functional-programming.md index 2d051cba..76808355 100644 --- a/_extras/functional-programming.md +++ b/_extras/functional-programming.md @@ -39,7 +39,7 @@ The key difference is that functional programming is focussed on **what** transformations are done to the data, rather than **how** these transformations are performed (i.e. a detailed sequence of steps which update the state of the code to reach a desired state). -Let's compare and contrast examples of these two programming paradigms. +Let us compare and contrast examples of these two programming paradigms. ## Functional vs Procedural Programming @@ -75,7 +75,7 @@ and how to change the state of the program and advance towards the result. They often use *iteration* to repeat a series of steps. Functional programming, on the other hand, typically uses *recursion* - an ability of a function to call/repeat itself until a particular condition is reached. -Let's see how it is used in the functional programming example below +Let us see how it is used in the functional programming example below to achieve a similar effect to that of iteration in procedural programming. ~~~ @@ -122,7 +122,7 @@ are called *pure functions*. > ## Exercise: Pure Functions > > Which of these functions are pure? -> If you're not sure, explain your reasoning to someone else, do they agree? +> If you are not sure, explain your reasoning to someone else, do they agree? > > ~~~ > def add_one(x): @@ -219,7 +219,7 @@ passed around or returned as results from other functions This is why functional programming is suitable for processing data efficiently - in particular in the world of Big Data, where code is much smaller than the data, sending the code to where data is located is cheaper and faster than the other way round. -Let's see how we can do data processing using functional programming. +Let us see how we can do data processing using functional programming. ## MapReduce Data Processing Approach @@ -286,7 +286,7 @@ print(list(squares)) > but at that point we should be using a ‘normal’ Python function instead. > > ~~~ -> # Don't do this +> # Do not do this > add_one = lambda x: x + 1 > > # Do this instead @@ -323,7 +323,7 @@ print(list(result)) > exceeded the given threshold. > > Ordinarily we would use Numpy's own `map` feature, -> but for this exercise, let's try a solution without it. +> but for this exercise, let us try a solution without it. > > > > ## Solution @@ -358,7 +358,7 @@ While not a pure functional concept, comprehensions provide data generation functionality and can be used to achieve the same effect as the built-in "pure functional" function `map()`. They are commonly used and actually recommended as a replacement of `map()` in modern Python. -Let's have a look at some examples. +Let us have a look at some examples. ~~~ integers = range(5) @@ -419,7 +419,7 @@ print(double_even_ints) > ~~~ > {: .output} > -> Finally, there’s one last ‘comprehension’ in Python - a *generator expression* - +> Finally, there is one last ‘comprehension’ in Python - a *generator expression* - > a type of an iterable object which we can take values from and loop over, > but does not actually compute any of the values until we need them. > Iterable is the generic term for anything we can loop or iterate over - @@ -454,7 +454,7 @@ print(double_even_ints) {: .callout} -Let's now have a look at reducing the elements of a data collection into a single result. +Let us now have a look at reducing the elements of a data collection into a single result. ### Reducing @@ -529,7 +529,7 @@ you need to import it from library `functools`. {: .challenge} ### Putting It All Together -Let's now put together what we have learned about map and reduce so far +Let us now put together what we have learned about map and reduce so far by writing a function that calculates the sum of the squares of the values in a list using the MapReduce approach. @@ -563,7 +563,7 @@ print(sum_of_squares([-1, -2, -3])) Now let’s assume we’re reading in these numbers from an input file, so they arrive as a list of strings. -We'll modify the function so that it passes the following tests: +We will modify the function so that it passes the following tests: ~~~ print(sum_of_squares(['1', '2', '3'])) @@ -590,7 +590,7 @@ def sum_of_squares(sequence): Finally, like comments in Python, we’d like it to be possible for users to comment out numbers in the input file they give to our program. -We'll finally extend our function so that the following tests pass: +We will finally extend our function so that the following tests pass: ~~~ print(sum_of_squares(['1', '2', '3'])) @@ -699,7 +699,7 @@ a decorator can take a function, modify/decorate it, then return the resulting f This is possible because Python treats functions as first-class objects that can be passed around as normal data. Here, we discuss decorators in more detail and learn how to write our own. -Let's look at the following code for ways on how to "decorate" functions. +Let us look at the following code for ways on how to "decorate" functions. ~~~ def with_logging(func): diff --git a/_extras/object-oriented-programming.md b/_extras/object-oriented-programming.md index 8155e89d..5b3a43a1 100644 --- a/_extras/object-oriented-programming.md +++ b/_extras/object-oriented-programming.md @@ -36,7 +36,7 @@ and code becomes a series of interactions between objects. One of the main difficulties we encounter when building more complex software is how to structure our data. -So far, we've been processing data from a single source and with a simple tabular structure, +So far, we have been processing data from a single source and with a simple tabular structure, but it would be useful to be able to combine data from a range of different sources and with more data than just an array of numbers. @@ -53,7 +53,7 @@ but often we need to have more structure than this. For example, we may need to attach more information about the patients and store this alongside our measurements of inflammation. -We can do this using the Python data structures we're already familiar with, +We can do this using the Python data structures we are already familiar with, dictionaries and lists. For instance, we could attach a name to each of our patients: @@ -133,22 +133,22 @@ patients = [ > > > > > > What would happen if the `data` and `names` inputs were different lengths? > > > -> > > If `names` is longer, we'll loop through, until we run out of rows in the `data` input, -> > > at which point we'll stop processing the last few names. -> > > If `data` is longer, we'll loop through, but at some point we'll run out of names - -> > > but this time we try to access part of the list that doesn't exist, -> > > so we'll get an exception. +> > > If `names` is longer, we will loop through, until we run out of rows in the `data` input, +> > > at which point we will stop processing the last few names. +> > > If `data` is longer, we will loop through, but at some point we will run out of names - +> > > but this time we try to access part of the list that does not exist, +> > > so we will get an exception. > > > > > > A better solution would be to use the `zip` function, > > > which allows us to iterate over multiple iterables without needing an index variable. > > > The `zip` function also limits the iteration to whichever of the iterables is smaller, -> > > so we won't raise an exception here, +> > > so we will not raise an exception here, > > > but this might not quite be the behaviour we want, -> > > so we'll also explicitly `assert` that the inputs should be the same length. +> > > so we will also explicitly `assert` that the inputs should be the same length. > > > Checking that our inputs are valid in this way is an example of a precondition, > > > which we introduced conceptually in an earlier episode. > > > -> > > If you've not previously come across the `zip` function, +> > > If you have not previously come across the `zip` function, > > > read [this section](https://docs.python.org/3/library/functions.html#zip) > > > of the Python documentation. > > > @@ -232,7 +232,7 @@ The behaviours we may have seen previously include: ## Encapsulating Data -Let's start with a minimal example of a class representing our patients. +Let us start with a minimal example of a class representing our patients. ~~~ # file: inflammation/models.py @@ -252,7 +252,7 @@ Alice ~~~ {: .output} -Here we've defined a class with one method: `__init__`. +Here we have defined a class with one method: `__init__`. This method is the **initialiser** method, which is responsible for setting up the initial values and structure of the data inside a new instance of the class - @@ -296,7 +296,7 @@ As we saw with the `__init__` method previously, we do not need to explicitly provide a value for the `self` argument, this is done for us by Python. -Let's add another method on our Patient class that adds a new observation to a Patient instance. +Let us add another method on our Patient class that adds a new observation to a Patient instance. ~~~ # file: inflammation/models.py @@ -341,14 +341,14 @@ print(alice.observations) Note also how we used `day=None` in the parameter list of the `add_observation` method, then initialise it if the value is indeed `None`. This is one of the common ways to handle an optional argument in Python, -so we'll see this pattern quite a lot in real projects. +so we will see this pattern quite a lot in real projects. > ## Class and Static Methods > -> Sometimes, the function we're writing doesn't need access to +> Sometimes, the function we are writing does not need access to > any data belonging to a particular object. > For these situations, we can instead use a **class method** or a **static method**. -> Class methods have access to the class that they're a part of, +> Class methods have access to the class that they are a part of, > and can access data on that class - > but do not belong to a specific instance of that class, > whereas static methods have access to neither the class nor its instances. @@ -515,7 +515,7 @@ parameterising unit tests and functional programming - In this case the `property` decorator is taking the `last_observation` function and modifying its behaviour, so it can be accessed as if it were a normal attribute. -It is also possible to make your own decorators, but we won't cover it here. +It is also possible to make your own decorators, but we will not cover it here. ## Relationships Between Classes @@ -543,9 +543,9 @@ for example in our inflammation project, we might want to say that a doctor *has* patients or that a patient *has* observations. -In the case of our example, we're already saying that patients have observations, -so we're already using composition here. -We're currently implementing an observation as a dictionary with a known set of keys though, +In the case of our example, we are already saying that patients have observations, +so we are already using composition here. +We are currently implementing an observation as a dictionary with a known set of keys though, so maybe we should make an `Observation` class as well. ~~~ @@ -594,8 +594,8 @@ print(obs) ~~~ {: .output} -Now we're using a composition of two custom classes to -describe the relationship between two types of entity in the system that we're modelling. +Now we are using a composition of two custom classes to +describe the relationship between two types of entity in the system that we are modelling. ### Inheritance @@ -684,16 +684,16 @@ who is a Person but not a Patient. We see in the example above that to say that a class inherits from another, we put the **parent class** (or **superclass**) in brackets after the name of the **subclass**. -There's something else we need to add as well - -Python doesn't automatically call the `__init__` method on the parent class +There is something else we need to add as well - +Python does not automatically call the `__init__` method on the parent class if we provide a new `__init__` for our subclass, -so we'll need to call it ourselves. +so we will need to call it ourselves. This makes sure that everything that needs to be initialised on the parent class has been, before we need to use it. -If we don't define a new `__init__` method for our subclass, +If we do not define a new `__init__` method for our subclass, Python will look for one on the parent class and use it automatically. This is true of all methods - -if we call a method which doesn't exist directly on our class, +if we call a method which does not exist directly on our class, Python will search for it among the parent classes. The order in which it does this search is known as the **method resolution order** - a little more on this in the Multiple Inheritance callout below. @@ -713,7 +713,7 @@ before we can properly initialise a `Patient` model with their inflammation data > When deciding how to implement a model of a particular system, > you often have a choice of either composition or inheritance, > where there is no obviously correct choice. -> For example, it's not obvious whether a photocopier *is a* printer and *is a* scanner, +> For example, it is not obvious whether a photocopier *is a* printer and *is a* scanner, > or *has a* printer and *has a* scanner. > > ~~~ @@ -751,11 +751,11 @@ before we can properly initialise a `Patient` model with their inflammation data > {: .language-python} > > Both of these would be perfectly valid models and would work for most purposes. -> However, unless there's something about how you need to use the model +> However, unless there is something about how you need to use the model > which would benefit from using a model based on inheritance, -> it's usually recommended to opt for **composition over inheritance**. +> it is usually recommended to opt for **composition over inheritance**. > This is a common design principle in the object oriented paradigm and is worth remembering, -> as it's very common for people to overuse inheritance once they've been introduced to it. +> as it is very common for people to overuse inheritance once they have been introduced to it. > > For much more detail on this see the > [Python Design Patterns guide](https://python-patterns.guide/gang-of-four/composition-over-inheritance/). @@ -766,7 +766,7 @@ before we can properly initialise a `Patient` model with their inflammation data > **Multiple Inheritance** is when a class inherits from more than one direct parent class. > It exists in Python, but is often not present in other Object Oriented languages. > Although this might seem useful, like in our inheritance-based model of the photocopier above, -> it's best to avoid it unless you're sure it's the right thing to do, +> it is best to avoid it unless you are sure it is the right thing to do, > due to the complexity of the inheritance heirarchy. > Often using multiple inheritance is a sign you should instead be using composition - > again like the photocopier model above. @@ -775,13 +775,13 @@ before we can properly initialise a `Patient` model with their inflammation data > ## Exercise: A Model Patient > -> Let's use what we have learnt in this episode and combine it with what we have learnt on +> Let us use what we have learnt in this episode and combine it with what we have learnt on > [software requirements](../31-software-requirements/index.html) > to formulate and implement a > [few new solution requirements](../31-software-requirements/index.html#exercise-new-solution-requirements) > to extend the model layer of our clinical trial system. > -> Let's start with extending the system such that there must be +> Let us start with extending the system such that there must be > a `Doctor` class to hold the data representing a single doctor, which: > > - must have a `name` attribute @@ -789,17 +789,17 @@ before we can properly initialise a `Patient` model with their inflammation data > > In addition to these, try to think of an extra feature you could add to the models > which would be useful for managing a dataset like this - -> imagine we're running a clinical trial, what else might we want to know? +> imagine we are running a clinical trial, what else might we want to know? > Try using Test Driven Development for any features you add: > write the tests first, then add the feature. > The tests have been started for you in `tests/test_patient.py`, > but you will probably want to add some more. > -> Once you've finished the initial implementation, do you have much duplicated code? +> Once you have finished the initial implementation, do you have much duplicated code? > Is there anywhere you could make better use of composition or inheritance > to improve your implementation? > -> For any extra features you've added, +> For any extra features you have added, > explain them and how you implemented them to your neighbour. > Would they have implemented that feature in the same way? > diff --git a/_extras/persistence.md b/_extras/persistence.md index 46614553..81c60b29 100644 --- a/_extras/persistence.md +++ b/_extras/persistence.md @@ -44,7 +44,7 @@ If we want to bring in this data, modify it somehow, and save it back to a file, all using our existing MVC architecture pattern, -we'll need to: +we will need to: - Write some code to perform data import / export (**persistence**) - Add some views we can use to modify the data @@ -58,10 +58,10 @@ and is handled by a **serialiser**. Serialisation is the process of exporting our structured data to a usually text-based format for easy storage or transfer, while deserialisation is the opposite. -We're going to be making a serialiser for our patient data, +We are going to be making a serialiser for our patient data, but since there are many different formats we might eventually want to use to store the data, -we'll also make sure it's possible to add alternative serialisers later and swap between them. -So let's start by creating a base class +we will also make sure it is possible to add alternative serialisers later and swap between them. +So let us start by creating a base class to represent the concept of a serialiser for our patient data - then we can specialise this to make serialisers for different formats by inheriting from this base class. @@ -71,7 +71,7 @@ If we create some alternative serialisers for different data formats, we know that we will be able to use them all in exactly the same way. This technique is part of an approach called **design by contract**. -We'll call our base class `PatientSerializer` and put it in file `inflammation/serializers.py`. +We will call our base class `PatientSerializer` and put it in file `inflammation/serializers.py`. ~~~ # file: inflammation/serializers.py @@ -103,13 +103,13 @@ class PatientSerializer: Our serialiser base class has two pairs of class methods (denoted by the `@classmethod` decorators), one to serialise (save) the data and one to deserialise (load) it. -We're not actually going to implement any of them quite yet +We are not actually going to implement any of them quite yet as this is just a template for how our real serialisers should look, -so we'll raise `NotImplementedError` to make this clear +so we will raise `NotImplementedError` to make this clear if anyone tries to use this class directly. -The reason we've used class methods is that -we don't need to be able to pass any data in using the `__init__` method, -as we'll be passing the data to be serialised directly to the `save` function. +The reason we have used class methods is that +we do not need to be able to pass any data in using the `__init__` method, +as we will be passing the data to be serialised directly to the `save` function. There are many different formats we could use to store our data, but a good one is [**JSON** (JavaScript Object Notation)](https://en.wikipedia.org/wiki/JSON). @@ -121,7 +121,7 @@ used across most common programming languages. Data in JSON format is structured using nested **arrays** (very similar to Python lists) and **objects** (very similar to Python dictionaries). -For example, we're going to try to use this format to store data about our patients: +For example, we are going to try to use this format to store data about our patients: ~~~ [ @@ -157,7 +157,7 @@ If we wanted to represent this data in CSV format, the most natural way would be to have two separate files: one with each row representing a patient, the other with each row representing an observation. -We'd then need to use a unique identifier to link each observation record to the relevant patient. +We would then need to use a unique identifier to link each observation record to the relevant patient. This is how relational databases work, but it would be quite complicated to manage this ourselves with CSVs. @@ -187,7 +187,7 @@ def test_patients_json_serializer(): serializers.PatientJSONSerializer.save(patients, output_file) patients_new = serializers.PatientJSONSerializer.load(output_file) - # Check that we've got the same data back + # Check that we have got the same data back for patient_new, patient in zip(patients_new, patients): assert patient_new.name == patient.name @@ -200,14 +200,14 @@ def test_patients_json_serializer(): Here we set up some patient data, which we save to a file named `patients.json`. We then load the data from this file and check that the results match the input. -With our test, we know what the correct behaviour looks like - now it's time to implement it. -For this, we'll use one of Python's built-in libraries. +With our test, we know what the correct behaviour looks like - now it is time to implement it. +For this, we will use one of Python's built-in libraries. Among other more complex features, the `json` library provides functions for converting between Python data structures and JSON formatted text files. Our test also didn't specify what the structure of our output data should be, so we need to make that decision here - -we'll use the format we used as JSON example earlier. +we will use the format we used as JSON example earlier. ~~~ # file: inflammation/serializers.py @@ -275,8 +275,8 @@ but we need to write a serializer for our observation model as well! Since this new serializer is not a type of `PatientSerializer`, we need to inherit from a new base class which holds the design that is shared between `PatientSerializer` and `ObservationSerializer`. -Since we don't actually need to save the observation data to a file independently, -we won't worry about implementing the `save` and `load` methods for the `Observation` model. +Since we do not actually need to save the observation data to a file independently, +we will not worry about implementing the `save` and `load` methods for the `Observation` model. ~~~ # file: inflammation/serializers.py @@ -351,8 +351,8 @@ class PatientSerializer(Serializer): {: .language-python} > ## Linking it All Together -> We've now got some code which we can use to save and load our patient data, -> but we've not yet linked it up so people can use it. +> We have now got some code which we can use to save and load our patient data, +> but we have not yet linked it up so people can use it. > > Try adding some views to work with our patient data using the JSON serialiser. > When you do this, think about the design of the command line interface - @@ -369,7 +369,7 @@ class PatientSerializer(Serializer): > > The reason for this is that, > by default, `==` comparing two instances of a class -> tests whether they're stored at the same location in memory, +> tests whether they are stored at the same location in memory, > rather than just whether they contain the same data. > > Add some code to the `Patient` and `Observation` classes, @@ -396,11 +396,11 @@ class PatientSerializer(Serializer): > **Hint:** The only component that needs to be changed is `Serializer` - > this should not require any changes to the other classes. > -> **Hint:** The abc module documentation refers to metaclasses - don't worry about these. +> **Hint:** The abc module documentation refers to metaclasses - do not worry about these. > A metaclass is a template for creating a class (classes are instances of a metaclass), > just like a class is a template for creating objects (objects are instances of a class), -> but this isn't necessary to understand -> if you're just using them to create your own abstract base classes. +> but this is not necessary to understand +> if you are just using them to create your own abstract base classes. {: .challenge} > ## Advanced Challenge: CSV Serialization @@ -411,7 +411,7 @@ class PatientSerializer(Serializer): > see the documentation for the [csv module](https://docs.python.org/3/library/csv.html). > This module provides a CSV reader and writer which are a bit more flexible, > but slower for purely numeric data, -> than the ones we've seen previously as part of NumPy. +> than the ones we have seen previously as part of NumPy. > > Can you think of any cases when a CSV might not be a suitable format to hold our patient data? {: .challenge} diff --git a/_extras/programming-paradigms.md b/_extras/programming-paradigms.md index b22d8e26..9574ac51 100644 --- a/_extras/programming-paradigms.md +++ b/_extras/programming-paradigms.md @@ -44,7 +44,7 @@ from the imperative and declarative families that may be useful to you - **Procedural Programming**, **Functional Programming** and **Object-Oriented Programming**. Note, however, that most of the languages can be used with multiple paradigms, and it is common to see multiple paradigms within a single program - -so this classification of programming languages based on the paradigm they use isn't as strict. +so this classification of programming languages based on the paradigm they use is not as strict. ### Procedural Programming @@ -56,7 +56,7 @@ and the one we used up to this point, where we group code into *procedures performing a single task, with exactly one entry and one exit point*. In most modern languages we call these **functions**, instead of procedures - -so if you are grouping your code into functions, this might be the paradigm you're using. +so if you are grouping your code into functions, this might be the paradigm you are using. By grouping code like this, we make it easier to reason about the overall structure, since we should be able to tell roughly what a function does just by looking at its name. These functions are also much easier to reuse than code outside of functions, @@ -69,7 +69,7 @@ that we are writing just for a single use. Aside from smaller scripts, Procedural Programming is also commonly seen in code focused on high performance, with relatively simple data structures, such as in High Performance Computing (HPC). -These programs tend to be written in C (which doesn't support Object Oriented Programming) +These programs tend to be written in C (which does not support Object Oriented Programming) or Fortran (which did not until recently). HPC code is also often written in C++, but C++ code would more commonly follow an Object Oriented style, @@ -112,11 +112,11 @@ especially when handling **Big Data**. One popular definition of Big Data is data which is too large to fit in the memory of a single computer, with a single dataset sometimes being multiple terabytes or larger. -With datasets like this, we can't move the data around easily, +With datasets like this, we cannot move the data around easily, so we often want to send our code to where the data is instead. By writing our code in a functional style, we also gain the ability to run many operations in parallel -as it is guaranteed that each operation won't interact with any of the others - +as it is guaranteed that each operation will not interact with any of the others - this is essential if we want to process this much data in a reasonable amount of time. You can read more in an [extra episode on Functional Programming](/functional-programming/index.html). @@ -127,7 +127,7 @@ Object Oriented Programming focuses on the specific characteristics of each obje and what each object can do. An object has two fundamental parts - properties (characteristics) and behaviours. In Object Oriented Programming, -we first think about the data and the things that we're modelling - and represent these by objects. +we first think about the data and the things that we are modelling - and represent these by objects. For example, if we are writing a simulation for our chemistry research, we are probably going to need to represent atoms and molecules. @@ -135,7 +135,7 @@ Each of these has a set of properties which we need to know about in order for our code to perform the tasks we want - in this case, for example, we often need to know the mass and electric charge of each atom. So with Object Oriented Programming, -we'll have some **object** structure which represents an atom and all of its properties, +we will have some **object** structure which represents an atom and all of its properties, another structure to represent a molecule, and a relationship between the two (a molecule contains atoms). This structure also provides a way for us to associate code with an object, diff --git a/_extras/software-architecture-extra.md b/_extras/software-architecture-extra.md index 2a61f649..c0751d28 100644 --- a/_extras/software-architecture-extra.md +++ b/_extras/software-architecture-extra.md @@ -34,7 +34,7 @@ from a number of different external data sources. Using this pattern, we can create a component whose responsibility is transforming the calls for data to the expected format, -so the rest of our program doesn't have to worry about it. +so the rest of our program does not have to worry about it. Architecture patterns are similar, but larger scale templates which operate at the level of whole programs, @@ -88,7 +88,7 @@ Often, the software is split into three layers: - **Persistence Layer / Data Access Layer** - This layer handles data storage and provides data to the rest of the system - May include the **Model** components of an MVC pattern - if they're not in the application layer + if they are not in the application layer Although we have drawn similarities here between the layers of a system and the components of MVC, they are actually solutions to different scales of problem. diff --git a/_extras/vscode.md b/_extras/vscode.md index 628df311..20178b43 100644 --- a/_extras/vscode.md +++ b/_extras/vscode.md @@ -26,7 +26,7 @@ to take effect. ## Using the VS Code IDE -Let's open our software project in VS Code and familiarise ourselves with some commonly used features needed for this course. +Let us open our software project in VS Code and familiarise ourselves with some commonly used features needed for this course. ### Opening a Software Project @@ -38,7 +38,7 @@ which we are using in this course. As in the episode on [virtual environments for software development]({{ page.root }}{% link _episodes/12-virtual-environments.md %}), -we'd want to create a virtual environment for our project to work in (unless you have already done so earlier in the course). +we would want to create a virtual environment for our project to work in (unless you have already done so earlier in the course). From the top menu, select `Terminal` > `New Terminal` to open a new terminal (command line) session within the project directory, and run the following command to create a new environment: diff --git a/bin/boilerplate/README.md b/bin/boilerplate/README.md index 060994ae..b5387446 100644 --- a/bin/boilerplate/README.md +++ b/bin/boilerplate/README.md @@ -9,7 +9,7 @@ This repository generates the corresponding lesson website from [The Carpentries We welcome all contributions to improve the lesson! Maintainers will do their best to help you if you have any questions, concerns, or experience any difficulties along the way. -We'd like to ask you to familiarize yourself with our [Contribution Guide](CONTRIBUTING.md) and have a look at +We would like to ask you to familiarize yourself with our [Contribution Guide](CONTRIBUTING.md) and have a look at the [more detailed guidelines][lesson-example] on proper formatting, ways to render the lesson locally, and even how to write new episodes. diff --git a/slides/README.md b/slides/README.md index 45a742a3..ae4deb5a 100644 --- a/slides/README.md +++ b/slides/README.md @@ -14,7 +14,7 @@ python3 -m venv .venv # it is important to use the dot prefix if you are creati . .venv/bin/activate pip install -r slides/requirements.txt # launch jupyter from the top level of this repo, **not** in the slide -# directory or else the relative figure links won't work +# directory or else the relative figure links will not work jupyter-notebook # navigate to the slide file and edit ``` @@ -28,7 +28,7 @@ Use spacebar to advance slides. Presenter view with `t`. Saving the slides from the Jupyter interface should only save to the markdown source file. If you find you have ended up with some `.ipynb` files in the `slides/` directory, -then you have done something wrong. Don't check those `.ipynb` files into version control. +then you have done something wrong. Do not check those `.ipynb` files into version control. ## Slide Export diff --git a/slides/section_1_setting_up_environment.md b/slides/section_1_setting_up_environment.md index d8f3fdac..4a8f9a82 100644 --- a/slides/section_1_setting_up_environment.md +++ b/slides/section_1_setting_up_environment.md @@ -90,8 +90,8 @@ What you will be able to do at the end that should help your work: - This course has necessarily made some decisions about the tools used to demonstrate the concepts being taught - - Python is used as a fairly ubiquitous and syntactically easy language; however, the point needs to be clear that this isn't a course about Python; the course is about software engineering, and it is using Python as the playground to demonstrate the skills and concepts that should be valuable independent of the domain and language - - to this end, I will be trying to draw connections with other languages and development scenarios when applicable since I know Python isn't necessarily the main development language for everyone at UKAEA + - Python is used as a fairly ubiquitous and syntactically easy language; however, the point needs to be clear that this is not a course about Python; the course is about software engineering, and it is using Python as the playground to demonstrate the skills and concepts that should be valuable independent of the domain and language + - to this end, I will be trying to draw connections with other languages and development scenarios when applicable since I know Python is not necessarily the main development language for everyone at UKAEA - Learners should have already been notified about the IDE selection and installation. If the instructor has decided to allow different editors, reiterate any caveats (e.g. happy for you to use these editors, but no guarantee that we can help you if you are stuck). At an intermediate level, it is likely learners already have exposure to a preferred IDE, so they can shoulder more of the responsibility for getting that to work. - GitHub is ubiquitous in software development, and a lot of research code ends up there. Other platforms are similar and so whatever is learnt here will be applicable. - in the long run, you will encounter many more tools than those shown here, and you will form your own preferences; that is fine and we are in no way suggesting these are the definitive tools that should be used by any researcher who codes @@ -114,11 +114,11 @@ What you will be able to do at the end that should help your work: - We want to know how you are doing, and the more information we have about your progress, the better we can tailor the course to you and make it more valuable. - There are two main ways to do this. - Self reporting: Please use the green check mark and red 'x' in Zoom (or stickies if in person) to indicate your status with lessons or the current content; this is a more subtle way of indicating that you need help without interrupting the instructor. The helpers will be keeping an eye on the list of participants and their statuses. Can everyone please check now that they can put the green check mark up. - - Polls within Zoom will also be used to check how you are getting on. Please fill these in and don't ignore them! In person, it is easier to see how people are getting on. + - Polls within Zoom will also be used to check how you are getting on. Please fill these in and do not ignore them! In person, it is easier to see how people are getting on. - Throughout the course, please feel free to interrupt at any point with a question (preferably by raising hand if in person or using the raise hand feature in Zoom or relevant analogue). - Many portions of the course will involve breaking into separate groups to do work. Most of this will be independent work, but there are a few group tasks. There will usually be a helper in your room if you need assistance, but again, they are not all-knowing, so please help other participants if you think you can help. - - There will no doubt be a range of experiences and people moving at different paces in these groups. Please be mindful of that. If you find there is too much chatter and you can't focus on getting things done, feel free to mute audio. - - If you fall behind on the independent exercises, don't worry and prioritise any group work or discussion at the end of a breakout session. You can catch up between sessions. + - There will no doubt be a range of experiences and people moving at different paces in these groups. Please be mindful of that. If you find there is too much chatter and you cannot focus on getting things done, feel free to mute audio. + - If you fall behind on the independent exercises, do not worry and prioritise any group work or discussion at the end of a breakout session. You can catch up between sessions. @@ -197,7 +197,7 @@ The "patient inflammation" example from the Novice Software Carpentry Python Les -- Let's take a look at the project structure +- Let us take a look at the project structure - I like to use `tree` (on Ubuntu installable through apt-get, not sure if it comes with Git for Windows) - With this we see: - README file (that typically describes the project, its usage, installation, authors and how to contribute), @@ -250,7 +250,7 @@ By Soroush Khanlou, https://khanlou.com/2014/03/model-view-whatever/ - Adapter manipulates both the Model and the View. Usually, it accepts input from the View and performs the corresponding action on the Model (changing the state of the model) and then updates the View accordingly. In our simple example project, the file `inflammation-analysis.py` is the Adapter, and it actually does handle user input so it not quite fully abiding by MVA, and actually shares features with another architectural pattern called Model-View-Controller - Some final words on architecture and these particular patterns: - - don't get too caught up determining exactly what functionality should be the responsibility of each component + - do not get too caught up determining exactly what functionality should be the responsibility of each component - the act of splitting things up and thinking about how they will interact through interfaces is where you get the most value - it is likely you were already doing this in an informal fashion, but good to think about it more explicitly **and try to record your design in some appropriate format** @@ -266,7 +266,7 @@ By Soroush Khanlou, https://khanlou.com/2014/03/model-view-whatever/ - Switch to terminal and the directory of the example project at its initial commit - - Make sure you don't have a virtual environment activated, and preferably no numpy or matplotlib in your system python installation. If you do, create a fresh virtual environment that doesn't have these packages. + - Make sure you do not have a virtual environment activated, and preferably no numpy or matplotlib in your system python installation. If you do, create a fresh virtual environment that does not have these packages. - Try to run the analysis script from the command line: `python3 inflammation-analysis.py` - If you are in a clean Python installation, this should throw a `ModuleNotFoundError` which proves we have some external dependencies that are not installed and we need to get through a package manager - Depending on what learners have in their `PYTHONPATH` and site packages for their current default environment, they may or may not have success with this command @@ -274,7 +274,7 @@ By Soroush Khanlou, https://khanlou.com/2014/03/model-view-whatever/ - Before jumping to install matplotlib and numpy, it is worth a thought about other projects we might be currently be working on or in the future - what if they have a requirement for a different version of numpy or matplotlib? or a different python version? how are you going to share your project with collaborators and make sure they have the correct dependencies? - in general, each project is going to have its own unique configuration and set of dependencies - - to solve this in python, we set up a virtual environment for each project, containing a set of libraries that won't interact with others on the system + - to solve this in python, we set up a virtual environment for each project, containing a set of libraries that Will not interact with others on the system - it can be thought of like an isolated partial installation of Python specifically for your project @@ -287,7 +287,7 @@ By Soroush Khanlou, https://khanlou.com/2014/03/model-view-whatever/ - `venv` comes standard in `Python 3.3+` and is the main advantage for its use - - however, important thing to note with `venv` is that you can only ever use the system version of python with it (e.g. if you have Python 3.8 on your system, you can only ever create an virtual environment with Python 3.8). Most of the time this isn't a problem, but if you are in dire need of a particular Python version, then there are other tools that can do that job (next slide). + - however, important thing to note with `venv` is that you can only ever use the system version of python with it (e.g. if you have Python 3.8 on your system, you can only ever create an virtual environment with Python 3.8). Most of the time this is not a problem, but if you are in dire need of a particular Python version, then there are other tools that can do that job (next slide). - Another consequence is that if there is an update of your system installation then your virtual environment will stop working, and you will need to get rid of it and create a new one (more on that later) - `pip` stands for "Pip Installs Packages" and it queries the Python Package Index (PyPI) to install dependencies - it is ubiquitous and compatible with all Python distributions @@ -364,7 +364,7 @@ pip install -r requirements.txt # great reason to have this file - The _coverall_ option these days is to develop in a Docker container (or relevant analogue) - The `Dockerfile` codifies the dependencies and setup for your project - If you are on a cluster, then you might be familiar with the `module` command - - This allows you to get different versions of libraries without installing them yourself (and indeed, because you don't have permission to install them) + - This allows you to get different versions of libraries without installing them yourself (and indeed, because you do not have permission to install them) - Spack and Easy Build are also quite popular package management tools for HPC; Spack has virtual environments! - C++ - CMake is an ubiquitous build tool and overlaps with dependency management @@ -407,8 +407,8 @@ Start from this heading and continue to the end of the page. - Send learners into breakout rooms to read through and try out content from "Using the PyCharm IDE" (~ 30mins, but could be less, so poll helpers after 20 minutes to get a status check from the rooms, or ask directly if in person) - Remind to use status green check when done (or red x if having trouble) - - Encourage learners to try out the features that are being discussed and don't worry about making modifications to their code since it is under version control it will be easy to reset any changes - - Reinforce that we won't be using the version control interface of PyCharm, but it is a perfectly useable feature, and again this comes down to preference + - Encourage learners to try out the features that are being discussed and do not worry about making modifications to their code since it is under version control it will be easy to reset any changes + - Reinforce that we Will not be using the version control interface of PyCharm, but it is a perfectly useable feature, and again this comes down to preference @@ -456,8 +456,8 @@ Start from this heading and go until the "Git Branches" heading. - Git branches - - branches are actually just a pointer to a commit, and that commit _can_ (but doesn't have to) define a distinct or divergent commit history of our main branch - - this allows developers to take "copies" of the code and make their own modifications without making changes to original nor affecting the commit history of the main branch (so others won't see the changes there until they are merged) + - branches are actually just a pointer to a commit, and that commit _can_ (but does not have to) define a distinct or divergent commit history of our main branch + - this allows developers to take "copies" of the code and make their own modifications without making changes to original nor affecting the commit history of the main branch (so others Will not see the changes there until they are merged) - this is the main aspect of git that facilitates collaboration - talk through the image - the best practice is to use a new branch for each separate and self-contained unit/piece of work you want to add to the project. This unit of work is also often called a feature and the branch where you develop it is called a feature branch. Each feature branch should have its own meaningful name - indicating its purpose (e.g. “issue23-fix”). If we keep making changes and pushing them directly to main branch on GitHub, then anyone who downloads our software from there will get all of our work in progress - whether or not it’s ready to use! So, working on a separate branch for each feature you are adding is good for several reasons: @@ -478,7 +478,7 @@ Continue from this heading to the end of the page. - Get learners to go through the remainder of the content from "Creating Branches" onwards (~ 15 minutes) - Once everyone is complete, consider running a quiz. - - You are working on a software project that has a main and develop branch. Feature branches are supposed to be created off of the develop branch, but you mistakenly create your feature branch off of the main branch. You don't realise this until you have already committed some changes, and now you are freaking out because you think you might have affected the code on the main branch. Is this worry valid? + - You are working on a software project that has a main and develop branch. Feature branches are supposed to be created off of the develop branch, but you mistakenly create your feature branch off of the main branch. You do not realise this until you have already committed some changes, and now you are freaking out because you think you might have affected the code on the main branch. Is this worry valid? 1. yes 1. no (correct answer) @@ -522,18 +522,18 @@ Continue from this heading to the end of the page. -- Unless you have particular requirements, it is best to go with a sytle guide that has the majority consensus for a particular language (albeit sometimes this won't exist, so choose what seems best) +- Unless you have particular requirements, it is best to go with a sytle guide that has the majority consensus for a particular language (albeit sometimes this Will not exist, so choose what seems best) - In Python, this is PEP8 - In PyCharm, adherance to PEP8 will automatically be checked and violations flagged for fixing (demonstrate this live) - VSCode can do the same thing with an extension. See the "Extras" section. - It is worth mentioning that at a project level, not everyone will be using the same IDE, so it is better to use an independent tool called a linter that will enforce these style requirements - `black` is a popular but harsh and opinionated tool that can take some getting used to - `flake8` and `pylint` a bit more conventional -> PyCharm can be modified to use one of these directly (outside of the scope of this course) - - C++ doesn't have a language-wide convention for style + - C++ does not have a language-wide convention for style - [`clang-format`](https://clang.llvm.org/docs/ClangFormat.html) is widely used for enforcing formatting, and there are [built-in presets](https://clang.llvm.org/docs/ClangFormatStyleOptions.html#configurable-format-style-options) for existing conventions followed by Google, LLVM, etc. Project specific settings made in a `.clang-format` file. - our guide on C++ for VSCode recommends cpplint: https://intranet.ccfe.ac.uk/software/guides/vscode-cpp.html - Some other useful resources that cover a broader scope than just style and formatting are [Google's C++ Style Guide](https://google.github.io/styleguide/cppguide.html#Formatting) and the [C++ Core Guidelines by Bjarne Stroustrup (the creator of C++)](https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md) - - Fortran also doesn't have a language-wide convention + - Fortran also does not have a language-wide convention - we have a great guide on tooling in VSCode: https://intranet.ccfe.ac.uk/software/guides/vscode-fortran.html - this is a good online resource: https://fortran-lang.org/learn/best_practices @@ -546,7 +546,7 @@ Start from this section heading and go to the end of the page. - Split learners into breakout rooms and get them to work through content starting from "Indentation" section (~ 30 minutes) going to the end of the page - - A lot of these checks for formatting can now be done automatically with your IDE or linters, so don't spend too long absorbing them. It is good to be aware why rules are being applied, but the details of implementation are less important. + - A lot of these checks for formatting can now be done automatically with your IDE or linters, so do not spend too long absorbing them. It is good to be aware why rules are being applied, but the details of implementation are less important. - poll/status check at the end - Some comments after the exercises - There are many different docstring formats, and I tend to not like the Sphynx default very much. Google or Numpy style docstrings much more readable. diff --git a/slides/section_2_ensuring_correctness.md b/slides/section_2_ensuring_correctness.md index 1a2e4c14..cc3296bd 100644 --- a/slides/section_2_ensuring_correctness.md +++ b/slides/section_2_ensuring_correctness.md @@ -66,10 +66,10 @@ jupyter: - Automated testing: codify the expected behaviour of our software such that verification can happen repeatedly without user inspection - Unit tests: tests for small function units of our code (i.e functions, class methods, class objects) - Functional or integration tests: work at a higher level, and test functional paths through your code, e.g. given some specific inputs, a set of interconnected functions across a number of modules (or the entire code) produce the expected result. - - Regression tests: compare the current output of your code (usually an end-to-end result) to make sure it matches previous output that you don't want to change + - Regression tests: compare the current output of your code (usually an end-to-end result) to make sure it matches previous output that you do not want to change - there was a question that came in about drift in regression tests, and the short answer with how to deal with this is first determining whether the output you are tracking is actually an invariant (or something close to an invariant) - If not, then you will necessarily need to allow for relative proximity, but then you might question whether this is a good long term output to base your regression test on. - - In our area and science broadly, invariants tend to be some observable or experimental physical results, so if you test isn't based on that, you are probably going to have a tough time. + - In our area and science broadly, invariants tend to be some observable or experimental physical results, so if you test is not based on that, you are probably going to have a tough time. @@ -131,18 +131,18 @@ Please discuss with your peers. Record answers in the shared document if you can - with about 5 minutes left, remind the groups to have a little discussion about their test data - status check - check answers to question in shared document and briefly summarise - - example answer: You are working on an old plasma magnetohydrodynamics code that has been extensively tested against experiments. You have been tasked with adding some functionality to that code, but you want to make sure that you don't change the key results of the code. You take some inputs for well known runs of the code that have been verified against experiment and save the outputs. You then use the outputs to compare against when you run the code in a test suite with those original inputs. This is basically creating some regression tests for the code, using results that you know are correct because of extensive experimental validation of the code in the past. + - example answer: You are working on an old plasma magnetohydrodynamics code that has been extensively tested against experiments. You have been tasked with adding some functionality to that code, but you want to make sure that you do not change the key results of the code. You take some inputs for well known runs of the code that have been verified against experiment and save the outputs. You then use the outputs to compare against when you run the code in a test suite with those original inputs. This is basically creating some regression tests for the code, using results that you know are correct because of extensive experimental validation of the code in the past. - comments about the limits of testing: - there are some good points there about getting value from testing - what most researchers think: - "Peer review of my paper will be the test" - "Looking at a graph is enough" - - "I don't have time to implement a clunky testing framework" - - it hints that there is a spectrum between throwaway code that doesn't need to be tested and library code used by hundreds in a community that requires extensive testing suites with more than just unit tests + - "I do not have time to implement a clunky testing framework" + - it hints that there is a spectrum between throwaway code that does not need to be tested and library code used by hundreds in a community that requires extensive testing suites with more than just unit tests - where your particular code lies is a tricky question to answer sometimes, but a good rule of thumb is that if there is a chance that someone else will be using it, then you should give some thought to tests - some further thoughts here: https://bielsnohr.github.io/2021/11/29/iccs-part2-and-testing.html - testing has a demonstrably positive impact upon the design on your code - - it must of course also be acknowledged that testing is not the answer to everything, and that it can't substitute for good manual and acceptance testing + - it must of course also be acknowledged that testing is not the answer to everything, and that it cannot substitute for good manual and acceptance testing @@ -179,7 +179,7 @@ Follow along from this section heading to the bottom of the page. - comments - GitLab has very similar functionality and it is common for institutions to host their own GitLab instance internally. These instances will have their own documentation, and it is worthwhile to check if the RSE group or IT services have any guides to using these resources. - Because the supported Python versions are constantly changing, the numbers above might be a little out of date, or inconsistent. - - don't worry about this too much, but if you want to show the current supported Python versions, this site is very useful: https://devguide.python.org/versions/ + - do not worry about this too much, but if you want to show the current supported Python versions, this site is very useful: https://devguide.python.org/versions/ diff --git a/slides/section_3_software_dev_process.md b/slides/section_3_software_dev_process.md index d70b91e3..aea18409 100644 --- a/slides/section_3_software_dev_process.md +++ b/slides/section_3_software_dev_process.md @@ -200,7 +200,7 @@ Maybe ask for people to think of ideas of abstractions in the real world? ## Abstractions -Help to make code easier - as don't have to understand details all at once. +Help to make code easier - as do not have to understand details all at once. Lowers cognitive load for each part. @@ -266,7 +266,7 @@ The main body of it exists in inflammation/compute_data.py in a function called -## Exercise: why isn't this code maintainable? +## Exercise: why is not this code maintainable? How is this code hard to maintain? @@ -324,7 +324,7 @@ Hard to reuse as was very fixed in its behaviour. ## Test Before Refactoring -* Write tests *before* refactoring to ensure we don't change behaviour. +* Write tests *before* refactoring to ensure we do not change behaviour. @@ -386,8 +386,8 @@ Side effects like modifying a global variable or writing a file Pure functions have a number of advantages for maintainable code: - * Easier to read as don't need to know calling context - * Easier to reuse as don't need to worry about invisible dependencies + * Easier to read as do not need to know calling context + * Easier to reuse as do not need to worry about invisible dependencies @@ -412,8 +412,8 @@ Time: 10min Pure functions are also easier to test * Easier to write as can create the input as we need it - * Easier to read as don't need to read any external files - * Easier to maintain - tests won't need to change if the file format changes + * Easier to read as do not need to read any external files + * Easier to maintain - tests will not need to change if the file format changes @@ -499,9 +499,9 @@ Abstractions allow decoupling code -When we have a suitable abstraction, we don't need to worry about the inner workings of the other part +When we have a suitable abstraction, we do not need to worry about the inner workings of the other part For example break of a car, the details of how to slow down are abstracted, so when we change how -breaking works, we don't need to retrain the driver. +breaking works, we do not need to retrain the driver. @@ -624,7 +624,7 @@ total_area = sum(shape.get_area() for shape in my_shapes) Using an interface to call different methods is a technique known as **polymorphism**. -A form of abstraction - we've abstracted what kind of shape we have. +A form of abstraction - we have abstracted what kind of shape we have. @@ -669,9 +669,9 @@ def test_sum_shapes(): assert total_area = 23 ``` -Easier to read this test as don't need to understand how +Easier to read this test as do not need to understand how get_area might work for a real shape. -Focus on testing what we're testing +Focus on testing what we are testing @@ -690,7 +690,7 @@ Time: 15min These are techniques from **object oriented programming**. -There is a lot more that we won't go into: +There is a lot more that we will not go into: * Inheritance * Information hiding @@ -764,7 +764,7 @@ Time: 10min -Next slide if it feels like we've got loads of time. +Next slide if it feels like we have got loads of time. @@ -830,7 +830,7 @@ Practise makes perfect: * Spot signs things could be improved - like duplication * Think about why things are working or not working - * Don't design for an imagined future + * Do not design for an imagined future * Keep refactoring as you go diff --git a/slides/section_4_collaborative_soft_dev.md b/slides/section_4_collaborative_soft_dev.md index b2bf0abb..14150123 100644 --- a/slides/section_4_collaborative_soft_dev.md +++ b/slides/section_4_collaborative_soft_dev.md @@ -140,7 +140,7 @@ Go through the steps described under heading. Stop when you reach **Reviewing a Pair up with someone else in your group and exchange repository links. You will be taking on the role of _Reviewer_ on your partner's repository. Before leaving review comments, read the content under the heading **Reviewing a pull request**. Try to make a comment from each of the main areas identified. -**Don't submit your review just yet!!!** +**Do not submit your review just yet!!!** @@ -166,7 +166,7 @@ When done, select `Request changes` from the list of toggles, then `Submit revie Respond to the _Reviewers_ comments on the PR in _your_ repository. Use the information in **Responding to review comments** to guide your responses. And remember that you can talk to your _Reviewer_ for clarification, just make sure you record that in a comment on the PR. -Don't implement changes that will take more than 5 minutes. Instead, raise them as an issue on your repo for future work, and link to that issue in a comment on the PR. +Do not implement changes that will take more than 5 minutes. Instead, raise them as an issue on your repo for future work, and link to that issue in a comment on the PR. @@ -189,7 +189,7 @@ Don't implement changes that will take more than 5 minutes. Instead, raise them ### Empathy in review comments * Identify positives in code as and when you find them -* Remember different doesn't mean better +* Remember different does not mean better * Only provide a few non-critical suggestions - you are aiming for better rather than perfect * Ask questions to understand why something has been done a certain way rather than assuming you know a better way @@ -235,7 +235,7 @@ Start from the top of this episode page and go to the end. - Learners can skim the first two sections if you have talked about them in the previous slide - Split into breakout rooms for about 50 minutes -- A preface note: if you have been using codimd or hackmd for the shared document, then learners will have already been exposed to Markdown, so this section won't contain much new for them +- A preface note: if you have been using codimd or hackmd for the shared document, then learners will have already been exposed to Markdown, so this section will not contain much new for them - Post episode comments - A README is a great place to start your documentation, but at some point it will outgrow that, and you will need a bigger documentation system. The most popular in Python is Sphinx, which can be used with Markdown or another markup language called ReStructuredText (`.rst` files) - For writing documentation, this is another great link that can be added to the shared document: https://documentation.divio.com/ @@ -269,7 +269,7 @@ Start from the top of this episode page and go to the end. - 📌 Pinning dependencies in a `requirements.txt` has some serious limitations - - It doesn't differentiate between production dependencies (i.e. what our package needs to be used in a standalone manner) and library dependencies (i.e. what our package needs to be used as part of another application, which itself has dependencies) + - It does not differentiate between production dependencies (i.e. what our package needs to be used in a standalone manner) and library dependencies (i.e. what our package needs to be used as part of another application, which itself has dependencies) - It reduces the portability of our code across Python versions: some learners may have encountered this when setting up the CI matrices in GitHub Actions. e.g. Pinning dependencies at Python 3.10 could (and does!) cause issues if those same dependencies need to be installed in a Python 3.8 environment. - It is prone to error: what if we forget to add a dependency to requirements.txt? We could happily use pip to install something into our environment and the code will work, but when someone tries to use it themselves, they will be missing a dependency and the code will error out. In other words, we have two disconnected steps we need to perform when installing a dependency. - Distributing Python packages to popular repositories like PyPI requires more metadata than having a simple `requirements.txt` and we would need to manually create this @@ -298,7 +298,7 @@ poetry --version # check we have access to the poetry executable - Important warning: Poetry explicitly recommends that you shouldn't install Poetry within the virtual environment of a specific project. Rather, it should have its own isolated environment, which the official download script or `pipx` ensures. This is in direct contradiction to what the course material currently recommends. - - So, unless it really isn't possible, encourage learners to follow the link to Poetry's install website and follow instructions there + - So, unless it really is not possible, encourage learners to follow the link to Poetry's install website and follow instructions there - Give learners about 5 mins to complete this and status check at the end @@ -363,7 +363,7 @@ Inspect how `pyproject.toml` has changed. Look at what has gone into `poetry.loc - Give learners a few minutes to do this - Then, quickly run through it and look at the changes in your own repo - Explain when `poetry.lock` is used: if it is present in a repo when `poetry install` is called, the _exact_ versions of dependencies in `poetry.lock` will be used. Again, useful if we are distributing a standalone application. - - Don't check in `poetry.lock` into version control if you want your package to be used as a library + - Do not check in `poetry.lock` into version control if you want your package to be used as a library - Note that the last command is quite important because it puts the current package we are developing into our environment - This means that some of those annoying `ModuleNotFound` errors will be eliminated - Generally, we want to install the package we are working on in our environment diff --git a/slides/section_5_managing_software.md b/slides/section_5_managing_software.md index 45e216cc..5c10cee5 100644 --- a/slides/section_5_managing_software.md +++ b/slides/section_5_managing_software.md @@ -26,7 +26,7 @@ jupyter: - In this section of the course we look at managing the development and evolution of software - how to keep track of the tasks the team has to do, how to improve the quality and reusability of our software for others as well as ourselves, and how to assess other people’s software for reuse within our project. -- We are therefore moving into the realm of software management, not just software development; don't be scared off! +- We are therefore moving into the realm of software management, not just software development; do not be scared off! - We all need to do a bit of project management from time to time