Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New generate general conditions #1

Merged
merged 17 commits into from
Nov 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ jobs:
- name: 🔧 Build
run: |
legal-text-processor reintegrate indexed-tariff-specific-conditions/??.yaml
legal-text-processor generate indexed-tariff-specific-conditions/
legal-text-processor generate

- name: 📂 Generate Directory Listings
uses: jayanta525/[email protected]
Expand Down
20 changes: 16 additions & 4 deletions TODO.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,16 @@
## Deute tècnic

- [ ] Posar els assets a un subdirectori
- [ ] El TOC del document del webforms no te titol (Taula de continguts)
- [ ] Target del TOC ha de ser també multi idioma (o insertem el target de toc a mà)
- [ ] Las listas del general conditions estan rotas la mayoria
- [ ] Provar el generat a webforms
- [ ] Links in a different window
- [ ] import general-conditions in different languages
- [ ] Create a weblate project for general-conditions
- [ ] Combined webform output for general-conditions and indexed


- [ ] Moure les operacions de gha a un makefile
- [ ] Fer servir l'acció de notificacio al chat quan falla
- [ ] Integrar l'script d'importació (bash) a l'script python principal
- [ ] Generalitzar la reintegració (processar tots els masters que tenen un template.md)
- [ ] Generalitzar la generació per declarativament definir:
Expand All @@ -12,6 +20,10 @@
- Paràmetres
- [ ] En la extracció comprovar que la numeració és consecutiva

- [x] TOC strings (title and link text) as translatable
- [x] Reorganitzar per generate generic
- [x] Posar els assets a un subdirectori
- [x] Fer servir l'acció de notificacio al chat quan falla

## Pending unknowns

Expand All @@ -20,8 +32,8 @@
- [ ] Given a markdown how to properly format into html selfcontained
- [ ] Given a markdown how to properly format into html embeded
- [ ] HTML: How to make links to open in a different windows in html
- [ ] HTML: How to generate the TOC
- [ ] HTML: Backlinks
- [x] HTML: How to generate the TOC
- [x] HTML: Backlinks
- [x] PDF: CSS Styling
- [x] PDF: how to generate TOC metadata
- [x] PDF: Page header
Expand Down
462 changes: 462 additions & 0 deletions general-conditions/es.md

Large diffs are not rendered by default.

462 changes: 462 additions & 0 deletions general-conditions/es.yaml

Large diffs are not rendered by default.

104 changes: 104 additions & 0 deletions general-conditions/template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
{PRE}
{CHAPTER_1_MARKDOWN}
{CLAUSE_1_1_MARKDOWN}
{CLAUSE_1_2_MARKDOWN}
{CLAUSE_1_3_MARKDOWN}
{CLAUSE_1_4_MARKDOWN}
{CLAUSE_1_5_MARKDOWN}
{CHAPTER_2_MARKDOWN}
{CLAUSE_2_1_MARKDOWN}
{CLAUSE_2_2_MARKDOWN}
{CLAUSE_2_3_MARKDOWN}
{CLAUSE_2_4_MARKDOWN}
{CHAPTER_3_MARKDOWN}
{CLAUSE_3_1_MARKDOWN}
{CLAUSE_3_2_MARKDOWN}
{CLAUSE_3_3_MARKDOWN}
{CLAUSE_3_4_MARKDOWN}
{CLAUSE_3_5_MARKDOWN}
{CLAUSE_3_6_MARKDOWN}
{CLAUSE_3_7_MARKDOWN}
{CLAUSE_3_8_MARKDOWN}
{CHAPTER_4_MARKDOWN}
{CLAUSE_4_1_MARKDOWN}
{CLAUSE_4_2_MARKDOWN}
{CLAUSE_4_3_MARKDOWN}
{CLAUSE_4_4_MARKDOWN}
{CLAUSE_4_5_MARKDOWN}
{CLAUSE_4_6_MARKDOWN}
{CHAPTER_5_MARKDOWN}
{CLAUSE_5_1_MARKDOWN}
{CLAUSE_5_2_MARKDOWN}
{CLAUSE_5_3_MARKDOWN}
{CLAUSE_5_4_MARKDOWN}
{CLAUSE_5_5_MARKDOWN}
{CHAPTER_6_MARKDOWN}
{CLAUSE_6_1_MARKDOWN}
{CLAUSE_6_2_MARKDOWN}
{CLAUSE_6_3_MARKDOWN}
{CLAUSE_6_4_MARKDOWN}
{CLAUSE_6_5_MARKDOWN}
{CHAPTER_7_MARKDOWN}
{CLAUSE_7_1_MARKDOWN}
{CLAUSE_7_2_MARKDOWN}
{CLAUSE_7_3_MARKDOWN}
{CLAUSE_7_4_MARKDOWN}
{CLAUSE_7_5_MARKDOWN}
{CLAUSE_7_6_MARKDOWN}
{CLAUSE_7_7_MARKDOWN}
{CLAUSE_7_8_MARKDOWN}
{CLAUSE_7_9_MARKDOWN}
{CHAPTER_8_MARKDOWN}
{CLAUSE_8_1_MARKDOWN}
{CLAUSE_8_2_MARKDOWN}
{CLAUSE_8_3_MARKDOWN}
{CLAUSE_8_4_MARKDOWN}
{CLAUSE_8_5_MARKDOWN}
{CLAUSE_8_6_MARKDOWN}
{CHAPTER_9_MARKDOWN}
{CLAUSE_9_1_MARKDOWN}
{CLAUSE_9_2_MARKDOWN}
{CLAUSE_9_3_MARKDOWN}
{CLAUSE_9_4_MARKDOWN}
{CLAUSE_9_5_MARKDOWN}
{CHAPTER_10_MARKDOWN}
{CLAUSE_10_1_MARKDOWN}
{CLAUSE_10_2_MARKDOWN}
{CHAPTER_11_MARKDOWN}
{CLAUSE_11_1_MARKDOWN}
{CLAUSE_11_2_MARKDOWN}
{CLAUSE_11_3_MARKDOWN}
{CLAUSE_11_4_MARKDOWN}
{CLAUSE_11_5_MARKDOWN}
{CLAUSE_11_6_MARKDOWN}
{CHAPTER_12_MARKDOWN}
{CLAUSE_12_1_MARKDOWN}
{CLAUSE_12_2_MARKDOWN}
{CLAUSE_12_3_MARKDOWN}
{CLAUSE_12_4_MARKDOWN}
{CHAPTER_13_MARKDOWN}
{CLAUSE_13_1_MARKDOWN}
{CLAUSE_13_2_MARKDOWN}
{CLAUSE_13_3_MARKDOWN}
{CLAUSE_13_4_MARKDOWN}
{CHAPTER_14_MARKDOWN}
{CLAUSE_14_1_MARKDOWN}
{CLAUSE_14_2_MARKDOWN}
{CLAUSE_14_3_MARKDOWN}
{CHAPTER_15_MARKDOWN}
{CLAUSE_15_1_MARKDOWN}
{CLAUSE_15_2_MARKDOWN}
{CLAUSE_15_3_MARKDOWN}
{CHAPTER_16_MARKDOWN}
{CLAUSE_16_1_MARKDOWN}
{CLAUSE_16_2_MARKDOWN}
{CLAUSE_16_3_MARKDOWN}
{CLAUSE_16_4_MARKDOWN}
{CHAPTER_17_MARKDOWN}
{CHAPTER_18_MARKDOWN}
{CLAUSE_18_1_MARKDOWN}
{CLAUSE_18_2_MARKDOWN}
{CLAUSE_18_3_MARKDOWN}
{CHAPTER_19_MARKDOWN}
{CLAUSE_19_1_MARKDOWN}
{CLAUSE_19_2_MARKDOWN}
2 changes: 1 addition & 1 deletion import_docx.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ process() {

step "Processing language $lang"

pandoc "$1" -o $lang.md --columns 80000 -t gfm-raw_html
pandoc "$1" -o $lang.md --columns 80000 -t gfm-raw_html

# Break paragraphs by phrases
sed -i 's/\([^0-9]\.\) /\1\n/g' $lang.md
Expand Down
75 changes: 36 additions & 39 deletions indexed-tariff-specific-conditions/es.md

Large diffs are not rendered by default.

81 changes: 74 additions & 7 deletions legaltexts/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@
import itertools
from consolemsg import warn, step, error
import difflib
from .toc_generator import add_markdown_toc, add_links_to_toc
from .translate import tr
from typing_extensions import Annotated

help="""\
This CLI tool automates legaltext workflow
Expand Down Expand Up @@ -148,6 +151,25 @@ def generate_pdf(markdown_file: Path, css_file: Path = "pagedlegaltext.css", out
#'--pdf-engine-opt=--pdf-variant=pdf/ua-1',
])

def md_to_html_fragment(markdown: str)->str:
"""
Generates html fragmentf from markdown file
"""
import subprocess
from somutils.testutils import temp_path
with temp_path() as tmp:
markdown_file = tmp/f"input.md"
output_html = tmp/'output.html'
markdown_file.write_text(markdown)
subprocess.run([
'pandoc',
str(markdown_file),
'-t', 'html',
'-o', output_html,
'--wrap=preserve',
])
return output_html.read_text()

app = typer.Typer(
help=help,
)
Expand Down Expand Up @@ -200,20 +222,65 @@ def reintegrate(translation_yaml: list[Path]):
markdown_file.write_text(content)

@app.command()
def generate(master_path: Path):
"""Generates a set of deployable files"""
def generate(target: Annotated[str, typer.Argument()]=''):
if not target or target=='web-pdf':
generate_web_pdf(
master_path=Path('indexed-tariff-specific-conditions'),
output_prefix='web-pdf'
)
if not target or target=='webforms':
generate_webforms_html(
master_path=Path('general-conditions'),
output_prefix='webforms'
)

def generate_web_pdf(master_path: Path, output_prefix: str):
"""Generates a pdf for the website"""
document = master_path.name
output_dir.mkdir(exist_ok=True)
output_template = 'web-pdf-{document}-{lang}.pdf'
for markdown_file in master_path.glob('??.md'):
lang = markdown_file.stem
target = output_dir / output_template.format(
document=document,
lang=lang,
)
output_template = f'{output_prefix}-{document}-{lang}.pdf'
target = output_dir / output_template
step(f"Generating {target}...")
generate_pdf(markdown_file, 'pagedlegaltext.css', target)

def generate_webforms_html(master_path: Path, output_prefix: str):
"""Generates an html fragment to be included in webforms LegalText view"""
document = master_path.name
output_dir.mkdir(exist_ok=True)
for markdown_file in master_path.glob('??.md'):
lang = markdown_file.stem
output_template = f'{output_prefix}-{document}-{lang}.html'
target = output_dir / output_template
step(f"Generating {target}")

step(f" Reading {markdown_file}...")
markdown_content = markdown_file.read_text()

step(f" Generating TOC")
markdown_with_toc = add_markdown_toc(
markdown_content,
place_holder='[TABLE]',
title=tr(lang, 'TOC_TITLE'),
top_level=2,
)

step(f" Generating html...")
html = md_to_html_fragment(markdown_with_toc)

step(f" Adding up-links...")
top="<span id='top'></span>\n\n"
final_content = top+add_links_to_toc(
html,
text=f"{tr(lang, 'TOC_GO_TO_TOC')} ↑",
target="#top",
)

step(f" Writing output")
target.write_text(final_content)



if __name__ == "__main__":
app()
Expand Down
2 changes: 2 additions & 0 deletions legaltexts/i18n/ca.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
TOC_TITLE: Taula de continguts
TOC_GO_TO_TOC: Pujar a l'índex
3 changes: 3 additions & 0 deletions legaltexts/i18n/es.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@

TOC_TITLE: Tabla de contenidos
TOC_GO_TO_TOC: Subir al índice
94 changes: 94 additions & 0 deletions legaltexts/toc_generator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
import re
from bs4 import BeautifulSoup

def add_links_to_toc(html, text, target="#toc"):
"""
>>> add_links_to_toc('<h2>Titol</h2>', text='Torna a dalt')
'<h2>Titol<span class="pujar"> - <a href="#toc">Torna a dalt</a></span></h2>'

>>> add_links_to_toc('<h3>Titol</h3>', text='Torna a dalt')
'<h3>Titol<span class="pujar"> - <a href="#toc">Torna a dalt</a></span></h3>'

>>> add_links_to_toc('<h3>Titol</h3>', text='Go up')
'<h3>Titol<span class="pujar"> - <a href="#toc">Go up</a></span></h3>'

>>> add_links_to_toc('<h3>Titol</h3>', text='Torna a dalt', target="#target")
'<h3>Titol<span class="pujar"> - <a href="#target">Torna a dalt</a></span></h3>'
"""
soup = BeautifulSoup(html, features="html.parser", preserve_whitespace_tags={'p', 'li'})
headers = sum((
soup.find_all(f'h{l}')
for l in range(2,7)
), [])
for header in headers:
uplink = BeautifulSoup(f"<span class='pujar'> - <a href='{target}' /></span>", features="html.parser")
uplink.find('a').string = text
header.append(uplink)
return soup.prettify(formatter=None)

def generate_toc(markdown_text, top_level=None, bottom_level=None, title=None):
"""
>>> md = (
... "Ignored\\n"
... "# 1. level 1\\n"
... "## 1.1. level 2\\n"
... "### 1.1.1. level 3\\n"
... )

>>> generate_toc(md)
'- [1. level 1](#level-1)\\n - [1.1. level 2](#level-2)\\n - [1.1.1. level 3](#level-3)'

>>> generate_toc(md, top_level=2)
'- [1.1. level 2](#level-2)\\n - [1.1.1. level 3](#level-3)'

>>> generate_toc(md, bottom_level=2)
'- [1. level 1](#level-1)\\n - [1.1. level 2](#level-2)'

>>> generate_toc(md, title="Index")
'# Index\\n\\n- [1. level 1](#level-1)\\n - [1.1. level 2](#level-2)\\n - [1.1.1. level 3](#level-3)'

"""
top_level = top_level or 1
toc_title = f"# {title}\n\n" if title else ''
toc = []
for linia in markdown_text.splitlines():
header = re.match(r"^(#{1,6})\s+((?:\d+[.])+)\s+(.*)", linia)
if not header: continue
level = len(header.group(1)) # Determina el nivell de la capçalera
if top_level and level<top_level: continue
if bottom_level and level>bottom_level: continue
numbers = header.group(2)
title = header.group(3).strip()

# Crea el link del titol
link = title.lower().replace(" ", "-").replace(".", "").replace(",", "")
toc.append(f"{' ' * (level - top_level)}- [{numbers} {title}](#{link})")
return toc_title + "\n".join(toc)

def add_markdown_toc(
original_md: str,
title: str|None=None,
place_holder:str = '',
top_level: int = 0,
):
"""
>>> md = (
... "[TOC]\\n"
... "# 1. level 1\\n"
... )

>>> add_markdown_toc(md)
'- [1. level 1](#level-1)\\n\\n[TOC]\\n# 1. level 1\\n'
>>> add_markdown_toc(md, place_holder='[TOC]')
'- [1. level 1](#level-1)\\n# 1. level 1\\n'
>>> add_markdown_toc(md, place_holder='[BAD]')
'- [1. level 1](#level-1)\\n\\n[TOC]\\n# 1. level 1\\n'
"""
toc = generate_toc(original_md, top_level = top_level, title=title)
if place_holder and place_holder in original_md:
return original_md.replace(place_holder, toc)
return '\n\n'.join([toc, original_md])


if __name__ == "__main__":
main()
19 changes: 19 additions & 0 deletions legaltexts/translate.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
from yamlns import ns
from pathlib import Path
from importlib.resources import files as package_files

def build_translations():
if hasattr(build_translations, "translations"):
return build_translations.translations
translations = ns()
for translation_file in package_files('legaltexts.i18n').iterdir():
if translation_file.suffix != '.yaml': continue
lang = translation_file.stem
translations[lang] = ns.loads(translation_file.read_text())
build_translations.translations = translations
return build_translations.translations

def tr(lang, text, *args, **kwds):
translations = build_translations()
return translations[lang][text].format(*args, **kwds)

Loading
Loading