Skip to content
This repository has been archived by the owner on Nov 2, 2020. It is now read-only.

Sinitic Diversity #1

Open
ztl8702 opened this issue Nov 4, 2017 · 9 comments
Open

Sinitic Diversity #1

ztl8702 opened this issue Nov 4, 2017 · 9 comments
Labels

Comments

@ztl8702
Copy link
Member

ztl8702 commented Nov 4, 2017

Overview

Build a website that gathers and displays information about Sinitic languages.
建立一個展示漢語各分支的資訊(特別是語言保護資訊)的網站。

Goals

  • Let the public appreciate that Chinese is not just Mandarin, not just 'Nihao'.
  • Showcase the work of language revitalisationists of each language community (i.e. 講福建話運動, 粵典, 客語維基)
  • Bring together the Sinitic language revitalisation communities so that one can learn from each other 各社羣可以互通有無,知道其他社羣語言保護者在做什麼,可以效仿
  • Provide news feed of what's happening in each community

Components

  • A map of Sinitic language communities around the world;
  • A tree representation of Sinitic languages
  • For each language: a portal of all the news, resources and information about that language. This is much like the Bloomberg Terminal, with information such as:
    • the current status of the language
    • where it is spoken
    • who are doing revitalisation projects for this language
    • news coverage related to this language (realtime feed)
      • what type of event is covered in the news (education? volunteering work? debate about a certain issue? or publication in the language)
      • what is the location of the news event (Geo location; to city-level)

Bloomberg Terminal Bio

Related Projects

@ztl8702 ztl8702 added the idea label Nov 4, 2017
@ztl8702
Copy link
Member Author

ztl8702 commented Nov 14, 2017

For the "news portal" part

It's kind of like Hacker News or Design News, only that it is not for programmers or designers, but for language revitalisationists. Plus, I want this portal to have more structured information than Hacker News / Design News.

Conceptually that would be a data processing pipeline which:

  • Gathers information about Sintic language revitalisation
  • Transforms and enhances the data to show a "big picture" of what's happening in each community. This process includes:
    • Distinguish which community this article is talking about (閩東語?閩南語?粵語?), and what is the geo-location of the event happening?
    • Figure out what type the article is? (報導一個語言項目/報導一項活動/報導一個人物/評論文章)
    • Sentiment Analysis
    • Keyword Analysis

untitled diagram

Examples of communities to focus on

We will need to distinguish different communities based on language varieties and geo-location. For example, the Hokkien community in Taiwan should be considered separate from the Hokkien community in Malaysia.

(incomplete list)

  • 閩東語
    • 福州的福州話社羣
    • 馬祖的福州話社羣
    • 西馬來西亞的福州話社羣(實兆遠)
    • 東馬來西亞的福州話社羣(詩巫、古晉、美里、民都魯 etc.)
  • 閩南語
    • 臺灣的臺語社羣
    • 馬來西亞的福建話社羣(特別是檳城,或許需要單列)
  • 客家
    • 臺灣的客語社羣
  • 粵語
    • 香港的粵語社羣
  • 吳語

We will need to classify news article into the above categories / locations.

Examples of article types

One article could belong to multiple categories. This is only a rough example.

Projects to learn from technically

@ztl8702
Copy link
Member Author

ztl8702 commented Nov 14, 2017

@ztl8702
Copy link
Member Author

ztl8702 commented Nov 20, 2017

https://lobste.rs/about

@ztl8702
Copy link
Member Author

ztl8702 commented Dec 11, 2017

@laubonghaudoi
Copy link

請問該網站目前的開發進度如何?

@ztl8702
Copy link
Member Author

ztl8702 commented Apr 12, 2019

只是构想。还没有付诸行动。

@laubonghaudoi
Copy link

我目前也有建设一个类似网站的相关构想,请问有无其他联系方式,我们可以讨论一下。

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants