Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep raw data only #12

Open
kaiix opened this issue May 16, 2023 · 3 comments
Open

Keep raw data only #12

kaiix opened this issue May 16, 2023 · 3 comments
Assignees

Comments

@kaiix
Copy link
Collaborator

kaiix commented May 16, 2023

For blogs, it is sufficient to keep only the raw data (blogs/rss/feed-*.xml, blogs/wp-content and the data transformation program, the derived data can be distributed as release assets (e.g. blogs-md.zip)

@hongqn
Copy link
Contributor

hongqn commented May 16, 2023

#19 提交了新的数据转换格式,咱们在这个 issue 里讨论出处理格式转换的具体解决方法吧。

@hongqn
Copy link
Contributor

hongqn commented May 16, 2023

经过之前在 #15 中的讨论 ,我现在支持 @kaiix 的想法。

大致描述如下:

  1. 仓库中保留原始数据
  2. 创建 scripts 目录,放入各种类型转换脚本,并通过 README.md 说明用法
  3. 设置 GitHub Actions ,在 main 分支合并时自动构建 release ,转换出的各种格式用 release assets 的方法提供打包下载。

未来有新格式需求的 pull request ,应当提交转换脚本和 release workflow 的修改即可。

需要细化讨论一下的是,release 是每次合并都进行,还是手工触发。

@yzqzss
Copy link
Collaborator

yzqzss commented May 16, 2023

赞同!

不过 blogs 目录下的情况比较特殊,此前几个 PR 做了 html 标准化、文件重命名、链接本地化等人工操作。
如果未来需要精校 Markdown 、修格式、修坏链的话,还是要人工编辑的。所以 blogs 下的东西恐怕不能用脚本从源 RSS 一路转成最终档。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants