Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Registre d'arrêts : première ébauche #4393

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

ptitfred
Copy link
Contributor

@ptitfred ptitfred commented Dec 17, 2024

Voir l'epic #4354.

Formats supportés :

  • GTFS
  • NeTEx

Résultat : fichier csv sans déduplication (les ids sont ceux issus des ressources) ni rapprochement géographique.

Manquent des tests unitaires.

@ptitfred ptitfred force-pushed the registre-arrets/premier-modele branch 5 times, most recently from 2acbef3 to 274dd93 Compare December 19, 2024 21:58
@ptitfred ptitfred force-pushed the registre-arrets/premier-modele branch from 274dd93 to 604202c Compare December 20, 2024 14:38
Copy link
Contributor

@thbar thbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

J'ai fait une première passe rapide sur la PR (en mode draft mais comme vu avec @ptitfred ça n'a pas bougé depuis quelques temps).

Notes de relecture

(ça pourra servir au reste de @etalab/transport-tech qui lira sûrement cette PR à un moment).

main_id,display_name,data_source_id,data_source_format,parent_id,latitude,longitude,projection,stop_type
main:FR:52121:StopPlace:genARCOM@CHTDEG:CHT,De Gaulle,PAN:resource:80411,netex,,48.073291,5.1465855,utm_wgs84,stop
main:FR:52121:StopPlace:genARCOM@CHTOUDI:CHT,Oudinot,PAN:resource:80411,netex,,48.0784048,5.1455719,utm_wgs84,stop
main:FR:52121:StopPlace:genARCOM@CHTQUEL:CHT,Quellemele,PAN:resource:80411,netex,,48.07965605,5.14367025,utm_wgs84,stop
main:FR:52121:StopPlace:genARCOM@CHTBROT:CHT,Brottes,PAN:resource:80411,netex,,48.0821128,5.1281127,utm_wgs84,stop
main:FR:52121:StopPlace:genARCOM@CHTGROU:CHT,Groupama,PAN:resource:80411,netex,,48.0855051,5.13067385,utm_wgs84,stop
main:FR:52121:StopPlace:genARCOM@CHTTHBIS:CHT,THOMAS,PAN:resource:80411,netex,,48.0818889,5.135882,utm_wgs84,stop
main:FR:52121:StopPlace:genARCOM@CHTPAQU:CHT,Paquerettes,PAN:resource:80411,netex,,48.08639315,5.13853285,utm_wgs84,stop
main:FR:52121:StopPlace:genARCOM@CHTFARA:CHT,Faraday,PAN:resource:80411,netex,,48.08933075,5.13727315,utm_wgs84,stop
main:FR:52121:StopPlace:genARCOM@CHTROCH:CHT,Rochotte,PAN:resource:80411,netex,,48.0917009,5.1381297,utm_wgs84,stop

Points de review

  • Je vois que ça a fini en erreur qui stoppe le processing, je me demande si on peut l'intercepter.
[debug] extract_from_archive Elixir.Transport.Registry.GTFS PAN:resource:82277 /var/folders/tl/1p_8qmn13cbgh_539g1yxyyc0000gn/T/db8c162d-7fa4-4cc8-bffb-d791b569bf61.dat
[debug] Valid Zip archive
** (NimbleCSV.ParseError) unexpected escape character " in "\"MAISON-PONTHIEU - 40\",,50.204352,2.039905,,,,,,0\r\n"
    (nimble_csv 1.2.0) lib/nimble_csv.ex:583: NimbleCSV.RFC4180.escape/6
    (nimble_csv 1.2.0) lib/nimble_csv.ex:453: anonymous fn/4 in NimbleCSV.RFC4180.parse_stream/2
    (elixir 1.16.2) lib/stream.ex:990: Stream.do_transform_user/6
    (elixir 1.16.2) lib/stream.ex:943: Stream.do_transform/5
    (elixir 1.16.2) lib/enum.ex:4396: Enum.reverse/1
    (elixir 1.16.2) lib/enum.ex:3728: Enum.to_list/1
    (transport 0.0.1) lib/registry/gtfs.ex:30: Transport.Registry.GTFS.extract_from_archive/2
    (elixir 1.16.2) lib/stream.ex:613: anonymous fn/4 in Stream.map/2
    (elixir 1.16.2) lib/enum.ex:4839: Enumerable.List.reduce/3
    (elixir 1.16.2) lib/stream.ex:1027: Stream.do_transform_inner_list/7
    (elixir 1.16.2) lib/stream.ex:1828: Enumerable.Stream.do_each/4
    (elixir 1.16.2) lib/stream.ex:943: Stream.do_transform/5
    (elixir 1.16.2) lib/stream.ex:1828: Enumerable.Stream.do_each/4
    (elixir 1.16.2) lib/stream.ex:585: Stream.do_into/4
    (elixir 1.16.2) lib/stream.ex:690: Stream.run/1
    scripts/registre-arrets.exs:1: (file)
  • J'ai déployé sur prochainement pour aller tester
  • Je trouverais ça plus pratique d'avoir en identifiant de la ressource, le UUID datagouv, que notre identifiant "entier" à nous @ptitfred - ça m'a régulièrement évité de passer par un script / indirection pour retrouver le UUID derrière

Beau boulot otherwise, on en reparle cet après-midi !


case value do
nil -> default_value
"" -> default_value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Est-ce qu'il y a un trim en amont ? J'imagine qu'on pourrait dans certains cas tomber sur des valeurs comme " ").

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transform the stream outputed by Unzip to a stream of maps, each map
corresponding to a row from the CSV.
"""
def to_stream_of_maps(file_stream) do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Je me suis demandé si file_stream "is a" %File.Stream{}.

Avec l'arrivée de Elixir 1.18+ et du typage, j'ai l'impression que mettre les types de structs en paramètre va être une bonne idée.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C'est un enumerable de iodata (pas sûr du type en elixir).

# transform the stream to a stream of maps %{column_name1: value1, ...}
|> Stream.transform([], fn r, acc ->
if acc == [] do
{%{}, r |> Enum.map(fn h -> h |> String.replace_prefix("\uFEFF", "") end)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Je ne sais pas si tu as vu qu'on peut trimmer le BOM directement à l'ouverture du stream:

https://hexdocs.pm/elixir/1.18.1/File.html#stream!/3-byte-order-marks-and-read-offset

File.stream!("./test/test.txt", [:trim_bom, encoding: :utf8])

Mais peut-être que tu as déjà vu et que ce n'est pas pratique etc...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non ; j'ai repris naïvement du code existant.

def error(message), do: {:error, message}

@spec cat_results(Stream.t(t(term()))) :: Stream.t(term())
def cat_results(enumerable), do: Stream.flat_map(enumerable, &keep_ok/1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Le nommage m'a un peu pris de court : en voyant cat je crois que j'avais associé ça à une opération sans side-effect (comme cat dans le shell), mais en pratique ça reformatte la donnée.

Copy link
Contributor Author

@ptitfred ptitfred Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Je plaide coupable d'avoir repris la nomenclature d'un autre langage :

  • Data.Maybe.catMaybes :: [Maybe a] -> [a]
  • Data.Maybe.mapMaybe :: (a -> Maybe b) -> [a] -> [b]

J'avoue ne pas savoir pourquoi ce préfixe "cat" a été utilisé là-bas. Je suis ouvert aux suggestions.

require Logger

@spec execute(output_file :: Path.t(), list()) :: :ok
def execute(output_file, opts \\ []) do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vu que c'est l'entry point du script, quand ça sera fini fini j'ajouterais bien une doc sur la méthode.

@ptitfred ptitfred marked this pull request as ready for review January 14, 2025 11:02
@ptitfred ptitfred requested a review from a team as a code owner January 14, 2025 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants