Skip to content

Latest commit

 

History

History
406 lines (305 loc) · 13.5 KB

descriptive-statistics.org

File metadata and controls

406 lines (305 loc) · 13.5 KB

Descriptive Statistics

Title slide

(org-show-animate '("Quantitative Methods" "Descriptive Statistics" "Vikas Rawal" "Prachi Bansal" "" "" ""))

Descriptive Statistics

  • Frequency
  • Measures of central tendency
  • Summary positions
  • Measures of dispersion

Frequency

library(data.table)
data.table(names=c("Anil","Neeraj","Savita","Srimati",
                   "Rekha","Pooja","Alex","Shahina",
                   "Ghazal","Lakshmi","Rahul","Shahrukh",
                   "Naman","Deepak","Shreya","Rukhsana"
                   ),
           salary=c(71,50,65,40,
                    45,42,46,43,
                    45,43,45,45,
                    850,100,46,48
                    )*1000,
           sex=c("M","M","F","F",
                 "F","F","M","F",
                 "F","F","M","M",
                 "M","M","F","F"
                 ))->workers
workers$sno<-c(1:nrow(workers))
workers[,.(sno,names,sex,salary)]
snonamessexsalary
1AnilM71000
2NeerajM50000
3SavitaF65000
4SrimatiF40000
5RekhaF45000
6PoojaF42000
7AlexM46000
8ShahinaF43000
9GhazalF45000
10LakshmiF43000
11RahulM45000
12ShahrukhM45000
13NamanM850000
14DeepakM1e+05
15ShreyaF46000
16RukhsanaF48000
workers[,.(frequency=length(sno)),.(sex)]
sexfrequency
M7
F9
sexfrequency
M7
F9

Measures of Central Tendency

workers[,.(mean_salary=round(mean(salary),1),
            median_salary=quantile(salary,prob=0.5))]
mean_salarymedian_salary
10150045500
workers[,.(mean_salary=round(mean(salary),1),
           median_salary=quantile(salary,prob=0.5)),.(sex)]
sexmean_salarymedian_salary
M172428.650000
F46333.345000

Measures of Position

  • First quartile
  • Second quartile (median)
  • Third quartile
  • Deciles
  • Quintiles
  • Percentiles

Measures of Dispersion

Range and other measures based on positions

$range=max-min$

min_salarymax_salaryrange
40000850000810000
workers[,.(min_salary=min(salary),
            max_salary=max(salary),
            range=max(salary)-min(salary))]

Range and other measures based on positions

  • Distance between any two positions (Deciles, Quintiles, Percentiles) can be used as a measure of dispersion.

$inter.quartile.range=Q3-Q1$

##  summary(workers$salary)
  quantile(workers$salary,probs=c(0.25,0.75))
  quantile(workers$salary,probs=c(0.1,0.9))
  quantile(workers$salary,probs=c(0.1,0.95))
  quantile(workers$salary,probs=c(0.25,0.95))
  quantile(workers$salary,probs=c(0,0.75))

Variance, Standard Deviation and Coefficient of Variation

$variance=\frac{1}{n} × ∑(xi-x)2$

$standard.deviation = \sqrt{variance}$

$cov=\frac{standard.deviation}{mean}$

workers[,.(var_salary=round(var(salary),1),
           sd_salary=round(sqrt(var(salary)),1),
           cov_salary=round(sqrt(var(salary))/mean(salary),2))
        ]
var_salarysd_salarycov_salary
40075200000200187.91.97
students[,.(var_salary=round(var(salary),1),
            sd_salary=round(sqrt(var(salary)),1),
            cov_salary=round(sqrt(var(salary))/mean(salary),2)),.(sex)]
sexvar_salarysd_salarycov_salary
M89680952381299467.81.74
F545000007382.40.16

Graphical Displays of Quantitative Information: Dispersion

Histogram

Histogram with relative densities

productionhist2.png

Boxplot

  • Invented by John Tukey in 1970
  • Many variations proposed since then, though the essential form and idea as remained intact.

Boxplot of wheat yields

boxplotyield1.png

Violin plots

vioplotyield1.png

Boxplots: Useful to identify extreme values

boxplotyield2.png

Boxplots: Useful for comparisons across categories

boxplotyield3.png

Violin plots

vioplotyield3.png

Graphical Displays of Quantitative Information: Common Pitfalls

Common uses of statistical graphics

  • To show trends over time
  • To show mid-point variations across categories
  • To show composition
  • (less commonly, though more usefully) to show/analyse dispersion

Mis-representation

graphics/tufte-insanity.png

Mis-representation

graphics/tufte-fuel.png

Mis-representation

graphics/tufte-fuel2.png

Mis-representation

graphics/nobel-wrong.png

Mis-representation

graphics/nobel-right.png

Mis-representation: illustrations from Thomas Piketty’s work (source Noah Wright)

graphics/piketty1_o.png

Mis-representation: illustrations from Thomas Piketty’s work (source Noah Wright)

graphics/piketty1_c.png

Mis-representation: illustrations from Thomas Piketty’s work (source Noah Wright)

graphics/piketty2_o.png

Mis-representation: illustrations from Thomas Piketty’s work (source Noah Wright)

graphics/piketty2_c.png

The problem multiplied with the coming in of spreadsheets

graphics/chart1.png

graphics/chart2.png

graphics/chart3.png

Paul Krugman on Fiscal Austerity

What does this graph show?

krugman1.png Source: https://www.nytimes.com/2018/11/02/opinion/the-perversion-of-fiscal-policy-slightly-wonkish.html

What did Paul Krugman say?

“Here’s what fiscal policy should do: it should support demand when the economy is weak, and it should pull that support back when the economy is strong. As John Maynard Keynes said, “The boom, not the slump, is the right time for austerity.” And up until 2010 the U.S. more or less followed that prescription. Since then, however, fiscal policy has become perverse: first austerity despite high unemployment, now expansion despite low unemployment.

How could we better show the relationship between unemployment and fiscal austerity

krugman2.png