-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #225 from kgoldfeld/216-nonrandom-distribution-ret…
…urns-a-single-value-when-repeated-values-are-expected 216 nonrandom distribution returns a single value when repeated values are expected
- Loading branch information
Showing
11 changed files
with
254 additions
and
20 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,159 @@ | ||
--- | ||
title: "Customized Distributions" | ||
output: rmarkdown::html_vignette | ||
vignette: > | ||
%\VignetteIndexEntry{Customized Distributions} | ||
%\VignetteEngine{knitr::rmarkdown} | ||
\usepackage[utf8]{inputenc} | ||
--- | ||
|
||
```{r chunkname, echo=-1} | ||
data.table::setDTthreads(2) | ||
``` | ||
|
||
```{r, echo = FALSE, message = FALSE} | ||
library(simstudy) | ||
library(ggplot2) | ||
library(scales) | ||
library(grid) | ||
library(gridExtra) | ||
library(survival) | ||
library(gee) | ||
library(data.table) | ||
library(ordinal) | ||
odds <- function (p) p/(1 - p) # TODO temporary remove when added to package | ||
plotcolors <- c("#B84226", "#1B8445", "#1C5974") | ||
cbbPalette <- c("#B84226","#B88F26", "#A5B435", "#1B8446", | ||
"#B87326","#B8A526", "#6CA723", "#1C5974") | ||
ggtheme <- function(panelback = "white") { | ||
ggplot2::theme( | ||
panel.background = element_rect(fill = panelback), | ||
panel.grid = element_blank(), | ||
axis.ticks = element_line(colour = "black"), | ||
panel.spacing =unit(0.25, "lines"), # requires package grid | ||
panel.border = element_rect(fill = NA, colour="gray90"), | ||
plot.title = element_text(size = 8,vjust=.5,hjust=0), | ||
axis.text = element_text(size=8), | ||
axis.title = element_text(size = 8) | ||
) | ||
} | ||
``` | ||
|
||
Custom distributions can be specified in `defData` and `defDataAdd` by setting the argument *dist* to "custom". When defining a custom distribution, you provide the name of the user-defined function as a string in the *formula* argument. The arguments of the custom function are listed in the *variance* argument, separated by commas and formatted as "**arg_1 = val_form_1, arg_2 = val_form_2, $\dots$, arg_K = val_form_K**". | ||
|
||
Here, the *arg_k's* represent the names of the arguments passed to the customized function, where $k$ ranges from $1$ to $K$. You can use values or formulas for each *val_form_k*. If formulas are used, ensure that the variables have been previously generated. Double dot notation is available in specifying *value_formula_k*. One important requirement of the custom function is that the parameter list used to define the function must include an argument"**n = n**", but do not include $n$ in the definition as part of `defData` or `defDataAdd`. | ||
|
||
### Example 1 | ||
|
||
Here is an example where we would like to generate data from a zero-inflated beta distribution. In this case, there is a user-defined function `zeroBeta` that takes on shape parameters $a$ and $b$, as well as $p_0$, the proportion of the sample that is zero. Note that the function also takes an argument $n$ that will not to be be specified in the data definition; $n$ will represent the number of observations being generated: | ||
|
||
```{r} | ||
zeroBeta <- function(n, a, b, p0) { | ||
betas <- rbeta(n, a, b) | ||
is.zero <- rbinom(n, 1, p0) | ||
betas*!(is.zero) | ||
} | ||
``` | ||
|
||
The data definition specifies a new variable $zb$ that sets $a$ and $b$ to 0.75, and $p_0 = 0.02$: | ||
|
||
```{r} | ||
def <- defData( | ||
varname = "zb", | ||
formula = "zeroBeta", | ||
variance = "a = 0.75, b = 0.75, p0 = 0.02", | ||
dist = "custom" | ||
) | ||
``` | ||
|
||
The data are generated: | ||
|
||
```{r} | ||
set.seed(1234) | ||
dd <- genData(100000, def) | ||
``` | ||
|
||
```{r, echo = FALSE} | ||
dd | ||
``` | ||
|
||
A plot of the data reveals dis-proportion of zero's: | ||
|
||
```{r, fig.width = 6, fig.height = 3, echo = FALSE} | ||
ggplot(data = dd, aes(x = zb)) + | ||
geom_histogram(binwidth = 0.01, boundary = 0, fill = "grey60") + | ||
theme(panel.grid = element_blank()) | ||
``` | ||
|
||
### Example 2 | ||
|
||
In this second example, we are generating sets of truncated Gaussian distributions with means ranging from $-1$ to $1$. The limits of the truncation vary across three different groups. `rnormt` is a customized (user-defined) function that generates the truncated Gaussiandata. The function requires four arguments (the left truncation value, the right truncation value, the distribution average and the standard deviation). | ||
|
||
```{r} | ||
rnormt <- function(n, min, max, mu, s) { | ||
F.a <- pnorm(min, mean = mu, sd = s) | ||
F.b <- pnorm(max, mean = mu, sd = s) | ||
u <- runif(n, min = F.a, max = F.b) | ||
qnorm(u, mean = mu, sd = s) | ||
} | ||
``` | ||
|
||
|
||
In this example, truncation limits vary based on group membership. Initially, three groups are created, followed by the generation of truncated values. For Group 1, truncation occurs within the range of $-1$ to $1$, for Group 2, it's $-2$ to $2$ and for Group 3, it's $-3$ to $3$. We'll generate three data sets, each with a distinct mean denoted by M, using the double-dot notation to implement these different means. | ||
|
||
```{r} | ||
def <- | ||
defData( | ||
varname = "limit", | ||
formula = "1/4;1/2;1/4", | ||
dist = "categorical" | ||
) |> | ||
defData( | ||
varname = "tn", | ||
formula = "rnormt", | ||
variance = "min = -limit, max = limit, mu = ..M, s = 1.5", | ||
dist = "custom" | ||
) | ||
``` | ||
|
||
The data generation requires three calls to `genData`. The output is a list of three data sets: | ||
|
||
```{r} | ||
mus <- c(-1, 0, 1) | ||
dd <-lapply(mus, function(M) genData(100000, def)) | ||
``` | ||
|
||
Here are the first six observations from each of the three data sets: | ||
|
||
```{r, echo=FALSE} | ||
lapply(dd, function(D) head(D)) | ||
``` | ||
|
||
A plot highlights the group differences. | ||
|
||
```{r, fig.width = 8, fig.height = 6, echo = FALSE} | ||
pfunc <- function(dx, i) { | ||
ggplot(data = dx, aes(x = tn)) + | ||
geom_histogram(aes(fill = factor(limit)), binwidth = 0.05, boundary = 0, alpha = .8) + | ||
facet_grid( ~ limit) + | ||
theme(panel.grid = element_blank(), | ||
legend.position = "none") + | ||
scale_fill_manual(values = plotcolors) + | ||
scale_x_continuous(breaks = seq(-3, 3, by =1)) + | ||
scale_y_continuous(limits = c(0, 1000)) + | ||
ggtitle(paste("mu =", mus[i])) | ||
} | ||
plist <- lapply(seq_along(dd), function(a) pfunc(dd[[a]], a)) | ||
grid.arrange(grobs = plist, nrow = 3) | ||
``` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters