3D Printed Heart Model Simulated Data and Analysis

About this Document

This is a short example I made for a reddit user who was asking about the appropriate statistical method to use for an experiment the user was planning.

Link: https://www.reddit.com/r/statistics/comments/hfzqh3/question_which_data_representation_is_most/

Original post:

"Hi I’m trying to do a research on whether a 3D printed heart model will improve my students’ learning in the medical field.

My plan is to have two groups of students to study for a test with and without the aid of the 3D printed heart model respectively. Their test score will be recorded and processed into a graph to be compared.

Another test will be held in six weeks on the same batch of students and scores will be recorded, processed and compared to their previous test scores. The goal is to check whether the 3d heart model will help students to remember what they have learnt.

My question is what type of graphs would be most suitable to analyse these results?"

Consider the problem

Assumptions

I’ll assume that we’re comparing 2 different classes of equal size for this problem. I’ll also assume that any other potentially impactful variables of each student (e.g. sex, aptitude, temperament, etc.) are equally distributed between each class. I’ll say that you’ve got 25 students per class.

To note, the suggestion by user standard_error to use block randomization is a very good suggestion. I.e., this prevents all students in class A having the model while all students in class B lack the model. This entire class separation introduces method bias into your set-up that could only be alleviated by repeating this experiment over multiple semesters/years and randomizing which class is A or B. For the purposes of this demonstration, I’m going to assume that block randomization is not feasible (i.e. students would notice if half of their classmates have a model that they don’t get to use). If this is feasible, feel free to reach back out to me, and I can rewrite this script to account for that

library(dplyr)
library(tidyr)
library(ggplot2)
library(ez)

Simulating the data

set.seed(12)

df <- data.frame(student_id = c(1:50), #An arbitrary id variable that uniquely identifies each student
                 
           heart_model = c(rep("yes", 25), rep("no", 25)), #This variable designates the heart model and essentially designates class A vs. class B
           
           test1_scores = c(rnorm(25, 85, 15), #I'm assuming that heart+ students will have a mean test score of 85 with standard deviation (sd) of 15
                            rnorm(25, 70, 15)) #I'm assuming that hear- students will have a mean test score of 70 with the same sd
           )

df_cor <- df %>% 
  
  mutate(test1_scores = sapply(df$test1_scores, function(x) 
    ifelse(x > 100, 100, x))) %>% #this will cap scores at 100 if the rnorm function pushes them above 100
  
  mutate(test2_scores = ifelse(heart_model == "yes", 
                               yes = test1_scores + rnorm(25, -10, 5), #Test scores will only drop by 10 points with the heart model
                               no = test1_scores + rnorm(25, -20, 5))) #Test scores will drop by 15 points for no heart model

head(df_cor)

##   student_id heart_model test1_scores test2_scores
## 1          1         yes     62.79149     52.57806
## 2          2         yes    100.00000     89.43665
## 3          3         yes     70.64883     62.93297
## 4          4         yes     71.19992     71.30160
## 5          5         yes     55.03537     39.78092
## 6          6         yes     80.91556     74.58882

Plotting the data

Let’s plot the test 1 scores as a function of the heart model variable

ggplot(df_cor, aes(heart_model, test1_scores))+
  geom_boxplot()+
  geom_jitter(height = 0,
              width = 0.1)+
  ggtitle("Boxplot (Median and IQRs) of Test Scores")+
  theme_classic()

Alternatively, we could look at the mean and standard deviation

ggplot(df_cor, aes(heart_model, test1_scores))+
  stat_summary(geom = "crossbar", fun.data = "mean_sdl",
               fun.args = list(mult = 1))+
  geom_jitter(height = 0,
              width = 0.1)+
  ggtitle("Mean and SD of Test Scores")+
  theme_classic()

Now let’s plot the change in scores over time

df_cor %>% 
  pivot_longer(cols = c(test1_scores, test2_scores),
               names_to = "test",
               values_to = "scores") %>% 
  ggplot(.)+
  aes(test, 
      scores, 
      color = heart_model
      )+
  stat_summary(geom = "crossbar", fun.data = "mean_sdl",
               fun.args = list(mult = 0),
               width = 0.5)+
  geom_line(aes(group = student_id), alpha = 0.2)+
  geom_point(aes(group = student_id), size = 3, alpha = 0.5)+
  ggtitle("Difference in Test Scores between Heart Model Groups")+
  theme_classic()+
  labs(caption = "Horizontal bars represent the group means")

Analyze the Data

Consider your question

Now, your question is: “Does the heart model improve student’s retention of the material?” You’re assessing whether there is an interaction between the use of the heart model and students’ abilities to perform on tests reviewing material they learned previously. We can test this using a two-way ANOVA mixed model (completely random on variable heart_model and repeated measures on variable test)

ANOVA

df_cor %>% 
  pivot_longer(cols = c(test1_scores, test2_scores),
               names_to = "test",
               values_to = "scores") %>% 
  mutate(student_id = as.factor(student_id),
         test = as.factor(test)) %>% 
  ezANOVA(dv = scores, 
          wid = student_id,
          between = heart_model,
          within = test)

## Warning: Converting "heart_model" to factor for ANOVA.

## $ANOVA
##             Effect DFn DFd         F            p p<.05        ges
## 2      heart_model   1  48  29.76708 1.679645e-06     * 0.37550009
## 3             test   1  48 592.02132 1.192361e-28     * 0.27284414
## 4 heart_model:test   1  48  95.70919 5.166872e-13     * 0.05719093

Consider the Results

Here, from this ANOVA test, we see that there is a significant interaction between the use of the heart model and students’ performances on the two tests. The hypothesis test portion of this is done. It is not appropriate to test the main effects of each variable given that there is a significant interaction at play. Instead, we can just collect the descriptive summary statistics (ie the difference in means, sd, etc.) in a summary table.

Summary Table

df_cor %>% 
  pivot_longer(cols = c(test1_scores, test2_scores),
               names_to = "test",
               values_to = "scores") %>% 
  group_by(heart_model, test) %>% 
  summarise(mean = mean(scores), sd = sd(scores), range = max(scores) - min(scores),
            best = max(scores)) %>% 
  kableExtra::kable()

## `summarise()` has grouped output by 'heart_model'. You can override using the `.groups` argument.

heart\_model	test	mean	sd	range	best
no	test1\_scores	68.49433	11.67543	53.93438	100.00000
no	test2\_scores	48.02817	11.40422	46.91777	73.62970
yes	test1\_scores	81.10342	12.27680	44.96463	100.00000
yes	test2\_scores	72.37550	13.20723	59.98961	99.77053

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README_files/figure-gfm		README_files/figure-gfm
.gitignore		.gitignore
README.Rmd		README.Rmd
README.md		README.md
Simulation_Student_Test_Scores_Reddit.Rproj		Simulation_Student_Test_Scores_Reddit.Rproj
Students_test_scores_Reddit_post.R		Students_test_scores_Reddit_post.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

3D Printed Heart Model Simulated Data and Analysis

About this Document

Consider the problem

Assumptions

Simulating the data

Plotting the data

Analyze the Data

Consider your question

ANOVA

Consider the Results

Summary Table

About

Releases

Packages

Languages

taustinn11/Simulating_student_test_data

Folders and files

Latest commit

History

Repository files navigation

3D Printed Heart Model Simulated Data and Analysis

About this Document

Consider the problem

Assumptions

Simulating the data

Plotting the data

Analyze the Data

Consider your question

ANOVA

Consider the Results

Summary Table

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages