-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathphilosophy.rmd
443 lines (304 loc) · 16.2 KB
/
philosophy.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
---
title: "Philosophy"
output:
html_document:
fig_cap: yes
highlight: tango
smooth_scroll: no
theme: flatly
toc: yes
toc_float: yes
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
The idea of the [Typological Atlas of Daghestan](http://lingconlab.ru/dagatlas/) was to create a tool for the visualization of linguistic features typical of the languages of Daghestan.
Although the language sample underlying [TALD](http://lingconlab.ru/dagatlas/) has grown to include a number of neighboring languages (more on that in [Language sample](#the-language-sample) below), the core of the project remains Daghestan.
Our aim is to achieve maximum coverage of languages and dialects. Datasets on particular topics should be updatable, so that our map visualizations become incrementally more accurate. In order to make the resource updatable, contributors can choose to maintain the right to approve or reject any proposed updates or corrections, or they can choose to waive that right. Note that if you want to stay involved, this also entails a responsibility to review and reply to propositions within a certain time-frame. See [Step 5](steps.html) on how to propose changes to a dataset or chapter that is already published.
Data on linguistic features is collected primarily from descriptive literature; as a result, the Atlas can also be helpful in bibliographical research.
# Datapoints
The initial approach of [TALD](http://lingconlab.ru/dagatlas/) assigned one value for a linguistic feature to each language, based on a representative doculect.
This information could then be mapped onto generalized language datapoints, or all villages where the language in question is spoken.
Below are two possible visualizations for the same dummy feature: the initial consonant of various cognates meaning 'bridge'.
```{r, echo=FALSE,out.width="40%", out.height="20%",fig.cap="General language datapoints vs. village datapoints",fig.show='hold',fig.align='center'}
knitr::include_graphics(c("images/walsex.png","images/taldex.png"))
```
<center>
|language|feature |value|form |
|--------|-----------------------------|-----|-----|
|Avar |Initial consonant of 'bridge'|ƛ' |ƛ'o |
|Khwarshi|Initial consonant of 'bridge'|t' |t'eru|
|Karata |Initial consonant of 'bridge'|ƛ' |ƛ'eru|
</center>
A benefit of the visualization on the right, is that it shows the distribution and size of language communities more accurately. A drawback is that it leads to gross overgeneralization and erases dialectal differences, because it is not based on more data than the visualization on the left.
All of the Avar villages, for example, are colored according to data from Standard Avar, while we know that 'bridge' in the Zaqatala dialect spoken in Northern Azerbaijan is pronounced *kːjo*. These villages should thus have a different value.
# Current approach
To improve the accuracy of our visualizations, we currently collect all attested values for a given feature, taking into account any idiom we have data on, including standard languages, dialects spoken in multiple villages and single-village idioms.
**The visualization shows the most accurate level of granularity available for each village / point on the map.**
For example, the Andi language is spoken in 17 villages. There are 9 main villages, each of which has its own idiom. These can be divided into two main dialects: Upper Andi and Lower Andi. In addition, there are 8 villages for which we have no information on the variety spoken there.
Now let us look at a relatively straightforward linguistic feature like **Number of noun classes**, for which we have general data on both the Upper and Lower dialects, and more accurate information on several villages from the Upper group. One of these villages (Rikvani) even has a value that differs from the other varieties we have data for.
The table below summarizes the different values observed for the language, divided by type of idiom (the number of noun classes is indicated between brackets and each value is color-coded).
|Language|Toplevel dialect|Village|
|--------|----------------|-------|
|Andi <span style="color:Firebrick">●</span> (5) |Upper Andi <span style="color:Firebrick">●</span> (5) |Rikvani <span style="color:Thistle">●</span> (6)|
| |Lower Andi <span style="color:MistyRose">●</span> (3) | |
The diagram below shows the dialect grouping of Andi villages and their values for the noun class feature. At the center is the language as a whole: it has the same value as the Upper group and the eponymous village dialect of Andi, which are most representative of the language as a whole. On the map, the unclassified Andi villages (colored grey in the scheme below) will be colored according to the general language information.
```{r, echo=FALSE}
library(DiagrammeR)
DiagrammeR::grViz("digraph {
graph[layout = neato, rankdir = LR]
node [fontsize = 14,
shape = oval,
style = filled,
fillcolor = Firebrick,
fontcolor = white,
color = Firebrick]
# language
Andi
node [fontsize = 10]
# dialects
Upper
node [shape = oval,
style = filled,
fillcolor = WhiteSmoke,
fontcolor = black,
color = WhiteSmoke]
Other
node [shape = oval,
style = filled,
fillcolor = MistyRose,
fontcolor = black,
color = MistyRose]
Lower
# Andi-Upper-Villages
node [fontsize = 8,
shape = oval,
style = filled,
fillcolor = Firebrick,
fontcolor = white,
color = Firebrick]
Chanko
Zilo
Ashali
Andiv [label = 'Andi']
Gunkha
Gagatli
# Rikvani
node [shape = oval,
style = filled,
fillcolor = Thistle,
fontcolor = black,
color = Thistle]
Rikvani
# Lower villages
node [shape = oval,
style = filled,
fillcolor = MistyRose,
fontcolor = black,
color = MistyRose]
Kvankhidatli
Muni
# Other villages
node [shape = oval,
style = filled,
fillcolor = WhiteSmoke,
fontcolor = black,
color = WhiteSmoke]
Mekheturi
Shivor
Khando
Rushukha
Novogagatli
Aytkhan
Dzhugut
Tsibilta
edge [color = black, arrowhead = none]
Andi -> {Other, Upper, Lower}
Other -> {Mekheturi, Shivor, Rushukha, Khando, Tsibilta, Novogagatli, Aytkhan, Dzhugut}
Upper -> {Andiv, Rikvani, Gagatli, Zilo, Ashali, Chanko, Gunkha}
Lower -> {Muni, Kvankhidatli}
}")
```
The coverage of the Atlas is far from sufficient because we lack data for many dialects, and we do not know the dialect affiliation of a large number of villages in the area. We compensate for this shortcoming by encoding the level of accuracy/granularity for each datapoint, and allowing the user to toggle which levels to display (see [Map visualization](#Map-visualization) below). Ideally, our datasets will be updated when new information becomes available.
# Map visualization{.tabset .tabset-fade .tabset-pills}
[TALD](http://lingconlab.ru/dagatlas/) currently offers three different map visualizations:
1. **Language and feature** shows the language affiliation (inner dot) and the value for the linguistic feature (outer dot) for each village.
2. **Data granularity** allows the user to show only certain levels of data accuracy for the feature. For example you can uncheck "language" to remove all the dots that were colored according to general information about the language in the absence of more accurate data. In this case it will remove all Andi villages that have no dialect classification.
3. **General datapoints** displays one datapoint for each language in the sample, showing the language affiliation (inner dot) and the value for the linguistic feature (outer dot) for each point.
You can click on a datapoint to view a pop-up window with the name of the language (with a link to the [Glottolog](https://glottolog.org) database), the village, the granularity of data used to color this datapoint, and the value.
```{r, echo=FALSE, message=FALSE}
# packages
library(tidyverse)
library(lingtypology)
# load data
feature <- read_tsv("dummy_data/dummy_feature.csv")
villages <- read_tsv("dummy_data/dummy_villages.csv")
# remove data not for mapping
feature <- feature[(feature$map == "yes"),]
# split feature data into dialect levels
feature_group <- feature %>%
group_by(type) %>%
group_split()
feature_tl <- data.frame(feature_group[[1]])
feature_v <- data.frame(feature_group[[2]])
feature_tl$granularity <- "toplevel dialect"
feature_v$granularity <- "village dialect"
# merge feature data with villages dataset
## create matching columns
colnames(feature_tl)[colnames(feature_tl) == "idiom"] <- "toplevel_dialect"
colnames(feature_v)[colnames(feature_v) == "idiom"] <- "village_dialect"
## toplevel dialect data
tlevel_villages <- merge(villages, feature_tl, by = "toplevel_dialect")
v_villages <- merge(villages, feature_v, by = "village_dialect")
# [this generates a lot of garbage columns]
## combine dialect data of different granularity
dialect_villages <- full_join(v_villages, tlevel_villages, by = "village")
# [not very convenient, data from different datasets is in different columns]
## do a simple rbind instead
dialect_villages2 <- rbind(v_villages, tlevel_villages)
dialect_villages3 <- dialect_villages2[!duplicated(dialect_villages2$village),]
# [this gives the desired result, but it's cumbersome and will fail
# if we happen to have multiple villages with the same name in our set]
# IMPORTANT: SETS SHOULD BE MERGED IN THE RIGHT ORDER (HIGH GRAN - LOW GRAN)
# SO THAT THE DUPLICATES WITH THE HIGHEST GRANULARITY ARE KEPT
## and now for the villages for which we have no data
# [*adds column for villages that lack a dialect affiliation*]
### isolate general language data
feature_l <- feature %>%
filter(genlang_point == "yes") %>%
mutate(granularity = "language") %>%
mutate(default_level = lang) %>%
select(-idiom)
### create a set of unaffiliated villages
#lost_villages <- villages[villages$default_level == "yes",]
### make them match
### merge feature data and village set
lang_villages <- merge(villages, feature_l, by = "default_level")
### add to the refined set
alldata <- full_join(dialect_villages3, lang_villages, by = "village")
### и еще раз волшебный rbind
alldata2 <- rbind(dialect_villages3, lang_villages)
alldata3 <- alldata2[!duplicated(alldata2$village),]
```
## 1. Language and feature
```{r, echo=FALSE, message=FALSE}
map.feature(alldata3$lang.x,
latitude = alldata3$lat,
longitude = alldata3$lon,
features = alldata3$lang.x,
color = "#003366",
title = "Language",
label = alldata3$village,
stroke.features = as.factor(alldata3$value),
stroke.color = c("MistyRose", "Firebrick", "Thistle"),
stroke.title = alldata3$feature[1],
popup = paste("<b>Village:</b>", alldata3$village, "<br>",
"<b>Data:</b>", alldata3$granularity, "<br>",
"<b>Value:</b>", alldata3$value),
zoom.control = T)
```
## 2. Data granularity
```{r, echo=FALSE, message=FALSE}
# единсвтенное, мне хотелось бы, чтобы можно было посмотреть в popup еще и название идиома рядом с его левелом, например: Upper, toplevel dialect, но не знаю как это реализовать
map.feature(alldata3$lang.x,
latitude = alldata3$lat,
longitude = alldata3$lon,
features = as.factor(alldata3$value),
color = c("MistyRose", "Firebrick", "Thistle"),
width = 10,
title = alldata3$feature[1],
legend.position = "bottomleft",
label = alldata3$village,
control = alldata3$granularity,
popup = paste("<b>Village:</b>", alldata3$village, "<br>",
"<b>Data:</b>", alldata3$granularity, "<br>",
"<b>Value:</b>", alldata3$value),
zoom.control = T)
```
## 3. General datapoints
```{r, echo=FALSE, message=FALSE}
map.feature(feature_l$lang,
features = feature_l$lang,
color = "#003366",
title = "Language",
label = feature_l$lang,
stroke.features = as.factor(feature_l$value),
stroke.color = c("MistyRose", "Firebrick", "Thistle"),
stroke.title = feature_l$feature[1],
zoom.control = T,
zoom.level = 7)
```
# The language sample
As mentioned earlier, [TALD](http://lingconlab.ru/dagatlas/) was originally conceived of as a resource about the languages of Daghestan.
[The East Caucasian villages dataset](https://github.com/sverhees/master_villages) -- a dataset that contains a list of villages in the eastern Caucasus, their coordinates, and the languages spoken there -- was created as a basis for visualizations in the Atlas. Initially it covered all villages of Daghestan and some East Caucasian speaking communities in Georgia and Northern Azerbaijan.
Villages of Chechnya and Ingushetia, and several more communities in Azerbaijan and Georgia, were added later. It was relevant to include these extra datapoints outside of Daghestan for the development of areal hypotheses.
---
## Northeastern Daghestan
Villages located in the northeastern part of Daghestan are not displayed in the Atlas. This area was settled only relatively recently, and is much more ethnically mixed than the rest of the area, as you can see [here](https://sverhees.github.io/master_villages/maps_new.html). A large part of it remains virtually uncharted (though see Yuri Koryakov's efforts to close this gap [here](https://jirzik.livejournal.com/3002.html)), and is currently undergoing changes.
In addition, little to nothing is known about the varieties of the languages spoken there. In some cases we know the village of origin for most of its inhabitants (e.g. Novogagatli was founded by settlers from the Andi village Gagatli), but we do not know how well their dialect is preserved and to what extent their population is ethnically and linguistically homogeneous.
Therefore, we decided to exclude the newly settled region from our visualization (for now).
---
## List of languages
Below is a list of the languages included in our sample, grouped by language family and branch.
#### East Caucasian
* **Avar**
* Avar
* **Andic**
* Akhvakh
* Andi
* Bagvalal
* Botlikh
* Chamalal
* Godoberi
* Karata
* Tindi
* **Tsezic**
* Bezhta
* Hinuq
* Hunzib
* Khwarshi
* Tsez
* **Dargwa**
* Standard Dargwa
* *Chirag*
* Itsari
* Kaitag
* Kubachi
* Mehweb
* Tanty
<font color = "dimgray">Dargwa is considered a single language with a number of highly divergent dialects by some, and a group of distinct but related languages by others. In our map visualizations with one dot per language, several Dargwa varieties are shown, because they are sufficiently divergent. However, since our data comes from descriptive literature, we are unfortunately bound to some extent to the post 1930s view of Dargwa as a single language. In our [villages dataset](https://sverhees.github.io/master_villages/maps_new.html#dialects), for example, a variety like Mehweb forms part of the Northern Dargwa dialect group. We cannot rearrange this underlying tree-structure too drastically for practical reasons.
Unfortunately we do not have a full reference grammar for *Chirag* yet, so you can leave the row for Chirag empty for now.
</font color>
* **Lak**
* Lak
* **Lezgic**
* Agul
* Archi
* Budukh
* Kryz
* Lezgian
* Rutul
* Tabasaran
* Tsakhur
* Udi
* **Khinalug**
* Khinalug
* **Nakh**
* Chechen
* Ingush
* Tsova-Tush (Batsbi)
#### Indo-European
* **Armenic**
* Armenian
<font color = "dimgray">Armenian appeared in our sample due to a small village in northeastern Daghestan where the language is spoken. Although the village was lost after we removed this area from our visualizations, we decided to keep Armenian as an important language of the region.</font color>
* **Iranian**
* Tat
#### Kartvelian
* Georgian
<font color = "dimgray">Georgian is spoken in the Qakh district of Azerbaijan, alongside Tsakhur and Azerbaijani. It is also an important neighboring language.</font color>
#### Turkic
* **Kipchak**
* Kumyk
* Nogai
* **Oghuz**
* Azerbaijani