-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy path03_get_tweeters.qmd
164 lines (123 loc) · 3.88 KB
/
03_get_tweeters.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
---
title: "Using APIs"
subtitle: "SICSS, 2022"
author: Christopher Barrie
format:
revealjs:
chalkboard: true
editor: visual
---
## Introduction
- Why get tweet*ers*?
- Get network characteristics
- Get user demographics
- Linked survey designs
## Introduction
- Before that: getting data outside the API
- Datasets shared online, e.g.,:
- [Harvard Dataverse](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/2FIFLH)
- <https://catalog.docnow.io/>
- [Zenodo](https://zenodo.org/record/4540820#.YmJTYcaEaLc)
## Hydrating
```{r, eval = T, echo = T}
tweet_IDs <- readRDS("data/wm_IDs_samp.rds")
head(tweet_IDs)
```
```{r, eval = F, echo = T}
library(academictwitteR)
hydrated_tweets <- hydrate_tweets(tweet_IDs, errors = T,
data_path = "data/hydrated_tweets/")
```
Batch 1 out of 10 : ids 823360655835168640 to 822916951878103040
Total 78 tweet(s) can't be retrieved.
Total of 22 out of 1000 tweet(s) retrieved.
Batch 2 out of 10 : ids 823501383282241408 to 822817264915378048
Total 154 tweet(s) can't be retrieved.
Total of 46 out of 1000 tweet(s) retrieved.
Batch 3 out of 10 : ids 823087701117337600 to 823227815374061440
Total 225 tweet(s) can't be retrieved.
Total of 75 out of 1000 tweet(s) retrieved.
Batch 4 out of 10 : ids 823045907264499712 to 825034383769767808
Total 301 tweet(s) can't be retrieved.
Total of 99 out of 1000 tweet(s) retrieved.
Batch 5 out of 10 : ids 822984540494950400 to 822996642618101760
Error in make_query(url = endpoint_url, params = params, bearer_token = bearer_token, :
something went wrong. Status code: 400
## Hydrating
Alternatively:
Use the [hydrator](https://github.com/DocNow/hydrator#readme) tool from DocTheNow
## Getting tweeters
```{r, eval = T, echo = T}
library(academictwitteR)
library(dplyr)
library(lubridate)
library(ggplot2)
cjb_ID <- get_user_id("cbarrie")
cjb_ID
```
## Getting whom I follow (friends)
::: panel-tabset
### IDs
```{r, eval = F, echo = T}
userfwing <- get_user_following(cjb_ID)
ids <- userfwing$id
head(ids)
```
```{r, eval = T, echo = F}
userfwing <- readRDS("data/cjbfwing.rds")
ids <- userfwing$id
head(ids)
```
### User data
```{r, eval = T, echo = F}
userfwingsamp <- userfwing %>%
sample_n(20)
kableExtra::kable(userfwingsamp)
```
:::
## User-level inference
Example from [here](https://github.com/euagendas/m3inference)
```{python, eval = F, echo = T}
from m3inference import M3Inference
import pprint
m3 = M3Inference() # see docstring for details
pred = m3.infer('./test/data_resized.jsonl') # also see docstring for details
pprint.pprint(pred)
```
OrderedDict([('720389270335135745',
{'age': {'19-29': 0.1546,
'30-39': 0.114,
'<=18': 0.0481,
'>=40': 0.6833},
'gender': {'female': 0.0066, 'male': 0.9934},
'org': {'is-org': 0.7508, 'non-org': 0.2492}}),
('21447363',
{'age': {'19-29': 0.0157,
'30-39': 0.9837,
'<=18': 0.0004,
'>=40': 0.0002},
'gender': {'female': 0.9866, 'male': 0.0134},
'org': {'is-org': 0.0002, 'non-org': 0.9998}}),
...
...
## User-level inference
```{r, eval = T, echo = T}
devtools::install_github("pablobarbera/twitter_ideology/pkg/tweetscores")
library(tweetscores)
```
## User-level inference
```{r, eval=F, echo = T}
results <- estimateIdeology("cbarrie", ids)
plot(results)
```
```{r, eval=T, echo = F}
results <- readRDS("data/cjbtscore.rds")
plot(results)
```
## User-level geolocation
```{r, eval = T, echo = T}
userfwing %>%
group_by(location) %>%
summarise(count = n()) %>%
top_n(count)
```