Skip to content

diseasy coding standard

Rasmus Skytte Randløv edited this page Jan 13, 2025 · 5 revisions

The diseasy coding standard closely follows (in large parts) the tidyverse style with some clarifications / deviations.

(code should pass devtools::lint() with no issues)

We also like The Zen of Python.

Console width

Keeping a maximum line length of 80 character is an old convention from a time where widescreen monitors were much less widespread. In order of keeping up with the times, a lines of up to 120 characters in length is accepted.

Object names

We perfer variable names to be concise yet unambiguous.

Good:

start_date <- as.Date("2020-01-01")

for (row_index = seq_len(10)) {
    print(mtcars[row_index, ])
}

first_day_of_the_month   # Not ambiguous

Bad:

sd <- as.Date("2020-01-01")

for (i = seq(10)) {
    print(mtcars[i, ])
}

day_one    # This is not clear what it is the day one of.

Validate function input with checkmate

We prefer to validate input to functions and provide easily understood feedback to the user

Good:

do_something <- function(character_variable, character_vector_variable, date_variable) {
    coll <- checkmate::makeAssertCollection()
    checkmate::assert_character(character_variable, len = 1, add = coll)
    checkmate::assert_character(character_variable, add = coll)
    checkmate::assert_date(date_variable, lower = as.Date("2020-01-01"), add = coll)
    checkmate::reportAssertions(coll)

    ...
}

Bad:

do_something <- function(character_variable, character_vector_variable, date_variable) {
    stopifnot(is.character(character_variable) && length(character_variable) == 1)
    stopifnot(is.character(character_variable))
    stopifnot(inherits(date_variable, "Date"))
    stopifnot(date_variable >= as.Date("2020-01-01"))

    ...
}

Control flow

Nesting

Code should use the minimum nesting needed.

Utilise early return and re-factoring as needed to reduce nesting

Good:

test_valid_date_range <- function(start_date, end_date) {
    if (start_date < as.Date("2020-01-01")) {
        return(FALSE)
    }
    if (end_date > as.Date("2023-12-31")) {
        return(FALSE)
    }
    if (end_date < start_date) {
        return(FALSE)
    }

    return(TRUE)
}

Bad:

test_valid_date_range <- function(start_date, end_date) {

    if (start_date >= as.Date("2020-01-01")) {
        if (end_date <= as.Date("2023-12-31")) {
            if (end_date >= start_date) {
                return(TRUE)
            } else {
                return(FALSE)
            }
        } else {
            return(FALSE)
        }
    } else {
        return(FALSE)
    }
}

If statements

The tidyverse style guide say "& and | should never be used inside of an if clause" as they may return vectors. However, as long as you make sure the final product of the if statement is not a vector, this is allowed.

We prefer the use of return

Code should explicitly show what they return by use of the return functions.

In some cases, it may be acceptable to omit the use of return.

Good:

do_something <- function(data) {
    
    out <- another_function(data)
    
    ...

    return(out)
}

Acceptable:

do_something <- function(data) {
    
    data |>
      another_function() |>
      ...
}

Bad:

do_something <- function(data) {
    
    out <- another_function(data)
    
    ...

    out
}

Explicit is better than implicit.

As from the Zen of Python, we prefer to be explicit.

A lot of examples of R code uses inherently implicit syntax which should be avoided via the .data$ pronoun and double quotes ".

# Good
iris |>
  tidyr::gather("measure", "value", !"Species") |>
  dplyr::arrange(-.data$value)

# Better
iris |>
  tidyr::gather(
    key = "measure", 
    value = "value",
    !"Species" 
  ) |>
  dplyr::arrange(-.data$value)

# Bad
iris |>
  gather(measure, value, -Species) |>
  arrange(value)

Pipes

The %>% (from magrittr) is widely used and also allowed, but it should be considered if the use of %>% provides any advantage over the native |> pipe, introduced in R 4.1.0.

Assignment with pipes

The tidyverse style guide mentions three forms of assignment.

We prefer variable name and assignment on the same line with <-:

# Good
iris_long <- iris |>
  tidyr::gather("measure", "value", !"Species") |>
  dplyr::arrange(-.data$value)

# Acceptable
accidentally_long_variable_name_i_made <-
  also_long_dataset_name |>
  dplyr::filter(.data$box_owner == "Pandora")

# Bad
iris |>
  tidyr::gather("measure", "value", !"Species") |>
  dplyr::arrange(-.data$value) ->
  iris_long

Piping into ggplot

When piping into ggplot, it makes for more readable code with additional indentation to the additional layers (added with +):

# Allowed
iris |>
  dplyr::filter(.data$Species == "setosa") |>
  ggplot(aes(x = Sepal.Width, y = Sepal.Length)) +
    geom_point()

Tests should be specific

When writing tests, tests should be grouped in smaller rather than larger sections.

Each test should test one functionality.

Typically, this comes up when testing R6 classes.

Instead of one large test called "MyClass works", more, smaller tests for individual functionalities of MyClass should be tested.

Good:

test_that("MyClass initialize works", {

  # Test with defaults
  expect_no_condition(MyClass$new())

  # Test with non-defaults
  expect_no_condition(MyClass$new(argument = TRUE))

  # Test malformed inputs
  expect_error(MyClass$new(non_existent_argument = FALSE))
})

test_that("MyClass test_function works", {

  # Create new object for testing
  m <- MyClass$new()

  # Test function with defaults
  expect_no_condition(m$test_function())

  # Test malformed inputs
  expect_error(m$test_function(non_existent_argument = FALSE))
})

Bad:

test_that("MyClass works", {

  # Test with defaults
  expect_no_condition(MyClass$new())

  # Test with non-defaults
  expect_no_condition(MyClass$new(argument = TRUE))

  # Test malformed inputs
  expect_error(MyClass$new(non_existent_argument = FALSE))


  # Create new object for testing
  m <- MyClass$new()

  # Test function with defaults
  expect_no_condition(m$test_function())

  # Test malformed inputs
  expect_error(m$test_function(non_existent_argument = FALSE))
})