irt.Rmd

---
output:
  pdf_document:
    toc: yes
    toc_depth: '4'
  html_document:
    toc: yes
    toc_float: yes
    toc_depth: 4
---

## Classical Test Theory 

Classical test theory, also known as True Score Theory, is mainly to recognize and develop the reliability of psychological tests and assessments in psychological testing

### Assumption

* The correlation between the error and the true score is 0  $\quad \rho(T,E)=0$
* If a person's certain psychological trait can be repeatedly measured several times with parallel tests, the average of their observed scores will be close to the true score  $\quad$  E(X)=T 
* The measurement error is random and obeys a normal random variable with the mean value of zero
* No correlation between errors on each parallel test  $\quad \rho(E_1,E_2)=0$

### CTT Model

There is a linear relationship between the observed and true fractions
$$ X = T + E $$
where X is observed score, T is Ture score and E is error score. Besides

*Variance*:
$$ S^2_X= S^2_T+S^2_E $$

### Limitation

* The sample dependence of statistics, the estimation of validity, reliability, difficulty, discrimination, and other parameters is very dependent on the sample, and the representativeness of the sample to the whole must be emphasized
* Test dependence of measurement scores, because it is difficult to establish "parallel papers", the scores on two different tests measuring the same ability are not comparable. 
* Imprecision of reliability estimation. CTT assumes that the measurement error is the same for subjects of different ability levels, but in fact, a test can only be administered to subjects of comparable ability level and test difficulty to obtain relatively high measurement accuracy.

## Item Response Theory 

Item Response Theory (IRT), also known as Latent Trait Theory or Latent Trait Model, is a modern psychometric theory that uses item characteristics to estimate latent traits in response to the limitations of CTT.

### Assumption

* Unidimensionality of ability hypothesis - meaning that all items comprising a given test measure the same latent trait.
* Local independence hypothesis - meaning that no correlation exists between items for a given subject.
* Item characteristic curve hypothesis - refers to a model of the functional relationship between the probability of a subject's correct response to an item and his or her ability.


### Item Difficulty and  Item Characteristic Cure

Logistic Cure or Item characteristic cure (ICC), is a function relating the probability of correct response to the ability scale.

$$P_i(\theta)=P(U_i=1|\theta)$$

To include the covariates

$$P(U_i|\theta,b_i)=\frac{e^{\theta-b_i}}{1+e^{\theta-b_i}}$$

where $b_i$ is the item difficulty parameter.

Example: 

$$P_1(\theta)=\frac{e^{\theta+1}}{1+e^{\theta+1}}, \quad P_2(\theta)=\frac{e^{\theta-1}}{1+e^{\theta-1}}$$
一条逻辑曲线就是一套试题

When $\theta=b_i$, we say that the ability match the difficulty of the tests and we say $P_i=0.5$.

In practice, we set $b_i \in (-3,3)$. The larger $b_i$, the harder item.


### Outfit

Does the model fit the data well?

*Residual-based fit statistics*  

$$P_i(\theta)=P(U_i=1|\theta), \quad \mathbb{E}(U_{ji})=P(U_i=1|\theta=\theta_j)$$


*Variance*  (Bernulli Distribution)

$$Var(U_{ji})=P(U_i=1|\theta_j)(1-P(U_i=1|\theta_j))=0.05\times 0.95=0.0475$$
假设A每次答对的概率是0.05，所以variance很小

如果B能力更强，答对的概率是0.43，那么对应的variance会是
$$Var(U_{ji})=P(U_i=1|\theta_j)(1-P(U_i=1|\theta_j))=0.43\times 0.57=0.2451$$

如果C是个学霸，答对的概率是0.91,variance也很小

$$Var(U_{ji})=P(U_i=1|\theta_j)(1-P(U_i=1|\theta_j))=0.91\times 0.09=0.0819$$
可以相对应的算标准差， 随着能力（$\theta$）的变化，两边小中间大

*Error Bar*

When a item is 'off-target' (too easy or too hard for a person), the variance is relatively small.
  
*Standardized Residual*

$$Z_{ji}=\frac{U_{ji}-\mathbb{E}(U_{ji})}{\sqrt{Var(U_{ji})}}$$

*Outfit Mean-square Value (MNSQ)*

$$outfit_i=\frac{1}{N}\sum_{j=1}^Nz^2_{ji}$$

Under-fit (bad qustions)

There is a standard in Linacre (2006)

+ $outfit > 2$, bad question, remove
+ $1.5 - 2.0$, not too bad, acceptable
+ $0.5 - 1.5$, good question
+ $<0.5$, not very much influential, acceptable


### Two-parameter logistic model

$$P_i(\theta)=\frac{e^{\theta-b_i}}{1+e^{\theta-bi}}$$


$$P_i(\theta)=\frac{e^{a_i(\theta-b_i)}}{1+e^{a_i(\theta-b_i)}}$$

+ $a_i$ is the discrimination parameter.
+ $b_i$ is the item difficulty parameter.

### Rasch Model

In the Rasch model, the discrinimation parameters of two-parameter logistic models are set to 1, which are equal to the one-parameter logistic models, or they are set to the same value $a>0$
$$P_i(\theta)=\frac{e^{a_i(\theta-b_i)}}{1+e^{a_i(\theta-b_i)}}$$


### A drawback to CTT/UIRT

Classical Test Theory and Item Response Theory

Remedial Course

In traditional method such as classical test theory (CTT)/unidimensional item response theory (UIRT), teachers only use total scores abilities to compare students' learning performance.

Moreover, some students may have the same total score but their mastering skills are differet.

In recent remedial teaching, most teachers use the same learning materials for the students whose total score below than a threshold. 缺点总结，如何改进呢？


### Three-parameter Logistic Model


$$[0,1] \rightarrow [0.2,1]$$

$$P_i(\theta)=c_i+(1-c_i)\frac{e^{a_i(\theta-b_i)}}{1+e^{a_i(\theta-b_i)}}$$

+ $a_i$ discrimination parameter
+ $b_i$ item difficulty parameter
+ $c_i$ guessing parameter

### MTH017 Midterm Example with R

```{r echo=TRUE,message=FALSE,eval=TRUE}
library(CTT)
library(TAM)
library(stringi)
library(tidyverse)
library(readxl)
######setwd("~/Item_Repsone_Theoryu")
#=======
#setwd("~/Item_Repsone_Theoryu")
data1<-read_excel('mth017_mid.xlsx')
data1<-data1 %>% select(2:29)
data2<- apply(data1,2,function(x) str_sub(x,-1,-1))
answer_key<-str_sub(c('1A','2B','3B','4A','5A','6B','7D','8A','9D','10B','11A','12B','13D','14A','15C','16B','17B','18C','19C','20E','21A','22B','23C','24C','25B','26C','27D','28C'),-1,-1)
CTT_score<-score(data2,answer_key,output.scored=T)
#CTT_score$scored
```

```{r echo=TRUE,message=FALSE,eval=TRUE,results='hide'}
library(TAM)
logit1<-tam(CTT_score$scored)
```

Then, we can extract the results


difficulty level

```{r echo=TRUE,message=FALSE,eval=TRUE}
logit1$xsi
```


Mean Sum of Squares
```{r echo=TRUE,message=FALSE,eval=TRUE}
msq.itemfit(logit1)
```

```{r}
plot(logit1,item=1)
```

*Two Parameter IRT*

```{r echo=TRUE,message=FALSE,eval=TRUE,results='hide'}
logit2<-tam.mml.2pl(CTT_score$scored)
```

```{r echo=TRUE,message=FALSE,eval=TRUE}
logit2$item
```

+ AXsi_.Cat1 is the difficulty level

+ B.Cat1.Dim1 is the discrimination level

```{r,results='hide'}
par(mfrow=c(2,2))
for(i in 1:4){
plot(logit2,items=i)
}
```