Simple linear regression is a statistical method used to examine the relationship between two continuous variables: one independent variable (
-
Independent Variable (
$X$ ):
The predictor or explanatory variable. -
Dependent Variable (
$Y$ ):
The response or outcome variable. -
Regression Line:
A line that best fits the data, showing the relationship between$X$ and$Y$ . -
Equation of the Regression Line:
$$Y = \beta_0 + \beta_1X + \epsilon$$ Where:
-
$Y$ : Predicted value of the dependent variable -
$\beta_0$ : Intercept (value of$Y$ when$X = 0$ ) -
$\beta_1$ : Slope (change in$Y$ for a one-unit increase in$X$ ) -
$\epsilon$ : Error term
-
-
Linearity: The relationship between
$X$ and$Y$ is linear. - Independence: Observations are independent of each other.
-
Homoscedasticity: The variance of residuals (errors) is constant across all levels of
$X$ . - Normality: Residuals follow a normal distribution.
Define the dependent (
Examine the dataset for missing values, outliers, and patterns.
Estimate the coefficients (
Assess the fit of the regression line using metrics like
Understand the relationship between
A researcher wants to study the relationship between study hours (
Study Hours ( |
Test Scores ( |
---|---|
2 | 50 |
4 | 55 |
6 | 60 |
8 | 70 |
10 | 85 |
-
Equation of Regression Line:
$$Y = \beta_0 + \beta_1X$$ -
Calculate Coefficients:
Using the least squares method:
-
Slope (
$\beta_1$ ):$$\beta_1 = \frac{\text{Cov}(X, Y)}{\text{Var}(X)}$$ -
Intercept (
$\beta_0$ ):$$\beta_0 = \bar{Y} - \beta_1\bar{X}$$
After calculations:
$$\beta_1 = 3.5, \quad \beta_0 = 46$$ -
-
Regression Equation:
$$Y = 46 + 3.5X$$ -
Predicted Values:
For
$X = 6$ :$$Y = 46 + 3.5(6) = 67$$
-
R-Squared (
$R^2$ ): Measures the proportion of variance in$Y$ explained by$X$ .- Value ranges from 0 to 1.
- Higher values indicate a better fit.
-
Standard Error:
Represents the average distance of the observed values from the regression line. -
P-Value:
Tests the null hypothesis that there is no relationship between$X$ and$Y$ .- If
$p \leq \alpha$ (e.g., 0.05), reject the null hypothesis.
- If
-
Business:
Predicting sales based on advertising expenses. -
Healthcare:
Estimating blood pressure changes based on age. -
Education:
Analyzing the impact of study time on academic performance.
- A scatter plot shows individual data points.
- The regression line depicts the relationship between
$X$ and$Y$ .
Simple linear regression is a foundational statistical tool for analyzing relationships between two variables. By understanding its assumptions, calculations, and interpretations, you can derive meaningful insights and make data-driven decisions.
Next Steps: Multiple Regression