Addressing review feedback: Implementing changes #6

munterfi · 2024-05-03T10:09:45Z

GtfsScheduleFile.java

How about adding a second boolean attribute required? Then one could decide wether to raise an exception and log an error while reading from direcotry / zip archive in GtfsScheduleReader, instead of just logging a warning.

GtfsScheduleParser.java

⁠parseRoute, how do you mean that route types are not standardized? you mean even the data type is not defined, i.e. one can define an integer or string or only that integer values do not always map correctly to the enum type in RouteType? In case of the latter, do you mean that 0 could be tram and rail at the same time? Or that just new arbitrary numbers appear when parsing (e.g. 13 - super random train type?). For the new random number problem, I would add a new type UNKNOWN which would apply for all values that not match one of the defined values. If same numbers have different meanings, there's probably no added value if we import route_type at all. And if there's strings and ints mixed, I would think we could try parsing them differently --> if number check enum directly, if string try checking the enum names in lowercase with whitespace/underscores removed if any matches are found. If non of that works, use UNKNOWN.
Maybe also interesting: https://opentransportdata.swiss/de/cookbook/gtfs/#routestxt
And: https://developers.google.com/transit/gtfs/reference/extended-route-types

⁠parseTrip, there is direction_id (typically 0 or 1) to indicate if the trip is going from a to b or from b to a. Maybe even different stop sequences are indicated with values 1+ (haven't checked). I think this would be the bare minimum we need for RAPTOR, because different direction_ids are by definition already a new route in RAPTOR. However, I think we still have to go further and split routes one by one after parsing them to identify all different stop sequences and create raptor compatible routes.

GtfsScheduleBuilder.java

⁠Why is this in the model directory/package?

As already mentioned in the comment for parser and parseTrip, I think we need to add more logic to the initialization of the Route Objects. Since they will typically contain at least two different stop sequences and should therefore be treated as separate routes in RAPTOR. Maybe instead of initialize, have a route splitter method which returns a collection of routes from one route which can then be concatenated to the clean routes list.
--> Some interesting links: https://developers.google.com/transit/gtfs/data-modeling/route-modeling-guide
--> Also see youtube video (3 min 18s): https://youtu.be/AdArDN4E6Hg?si=r0eKXZPpdZ27YwL-&t=198

StopTime.java

Minor comment and can be ignored since there is no logical error, why are we comparing with departure and not stop_sequence. I thought it's required and would be the exact thing it's meant for. However the gtfs page is down I right now and I couldn't look this up (same goes for direction_id).

- Introduce presence enum with levels: REQUIRED, OPTIONAL, CONDITIONALLY_REQUIRED, CONDITIONALLY_FORBIDDEN, RECOMMENDED. - Throw FileNotFoundException if a required file is not provided.

munterfi · 2024-05-03T10:12:32Z

⁠Why is this in the model directory/package?

Since the construtors of the model entities are package private, which restricts to use the builder to modify a GtfsSchedule. However, semantically the builer itself is not part of the model. But i dont know a different way to achieve this behavior this in Java.

munterfi · 2024-05-03T10:25:22Z

Minor comment and can be ignored since there is no logical error, why are we comparing with departure and not stop_sequence. I thought it's required and would be the exact thing it's meant for. However the gtfs page is down I right now and I couldn't look this up (same goes for direction_id).

The sequence information is redundant given the departure time; therefore, I omitted it in the StopTime class. The sequence is implicitly encoded in the sorted array after building the schedule. However, there could be issues down the road since departure_time is only conditionally required.

One important difference, if we base equality on the sequence information instead of departure time, is that we then cannot compare stop times of different trips anymore, which is done, for example, when initializing the stops in the schedule: https://github.com/naviqore/raptor/blob/0cc5f65cfced412642a53b7975b97952a632efb4/src/main/java/ch/naviqore/gtfs/schedule/model/Stop.java#L27

I suggest we keep it as it is but keep in mind the possibility of trouble with conditionally required departure times if we encounter null pointer exceptions someday.

munterfi · 2024-05-03T10:31:18Z

As already mentioned in the comment for parser and parseTrip, I think we need to add more logic to the initialization of the Route Objects. Since they will typically contain at least two different stop sequences and should therefore be treated as separate routes in RAPTOR. Maybe instead of initialize, have a route splitter method which returns a collection of routes from one route which can then be concatenated to the clean routes list.
--> Some interesting links: https://developers.google.com/transit/gtfs/data-modeling/route-modeling-guide
--> Also see youtube video (3 min 18s): https://youtu.be/AdArDN4E6Hg?si=r0eKXZPpdZ27YwL-&t=198

Good point! I think this functionality belongs to the GtfsToRaptorConverter:
https://github.com/naviqore/raptor/blob/0cc5f65cfced412642a53b7975b97952a632efb4/src/main/java/ch/naviqore/raptor/GtfsToRaptorConverter.java#L27

Perhaps we could inject the schedule into the Converter, allowing it to preprocess the schedule using a RouteSplitter class. This way, when extracting a Raptor for a given day, the preprocessing step wouldn't need to be repeated every time.

munterfi · 2024-05-03T14:07:44Z

⁠parseTrip, there is direction_id (typically 0 or 1) to indicate if the trip is going from a to b or from b to a. Maybe even different stop sequences are indicated with values 1+ (haven't checked). I think this would be the bare minimum we need for RAPTOR, because different direction_ids are by definition already a new route in RAPTOR. However, I think we still have to go further and split routes one by one after parsing them to identify all different stop sequences and create raptor compatible routes.

Since the direction_id is optional in the trips.txt, i would rather keep this via the implicit order of the stops via departure time and handle this case in the RouteSplitter.

- Introduce interface for route type. - Implement with two enums: DefaultRouteType and HierarchicalVehicleType - Parser decides depending on code which type to choose.

…te space

munterfi · 2024-05-03T14:17:54Z

⁠parseRoute, how do you mean that route types are not standardized? you mean even the data type is not defined, i.e. one can define an integer or string or only that integer values do not always map correctly to the enum type in RouteType? In case of the latter, do you mean that 0 could be tram and rail at the same time? Or that just new arbitrary numbers appear when parsing (e.g. 13 - super random train type?). For the new random number problem, I would add a new type UNKNOWN which would apply for all values that not match one of the defined values. If same numbers have different meanings, there's probably no added value if we import route_type at all. And if there's strings and ints mixed, I would think we could try parsing them differently --> if number check enum directly, if string try checking the enum names in lowercase with whitespace/underscores removed if any matches are found. If non of that works, use UNKNOWN.
Maybe also interesting: https://opentransportdata.swiss/de/cookbook/gtfs/#routestxt
And: https://developers.google.com/transit/gtfs/reference/extended-route-types

Solved by 6273663

- Add test extension for parameter injection of the builder. - Adjust existing test.

- Create larger example schedule with more calendar variations and both directions. - Increase test coverage concerning boundary cases in next departures (midnight), nearest stops (same location query) and active trips (dates outside validity or no service day).

- Add IntelliJ code style to project. - Reformat complete project.

…uence - Add GtfsRoutePartitioner and corresponding test. - Adjust GtfsToRaptorConverter to use the GtfsRoutePartitioner. - Fix bug in GtfsScheduleTestBuilder; reversed routes also have reversed stops sequences.

munterfi · 2024-05-04T16:10:08Z

As already mentioned in the comment for parser and parseTrip, I think we need to add more logic to the initialization of the Route Objects. Since they will typically contain at least two different stop sequences and should therefore be treated as separate routes in RAPTOR. Maybe instead of initialize, have a route splitter method which returns a collection of routes from one route which can then be concatenated to the clean routes list.
--> Some interesting links: https://developers.google.com/transit/gtfs/data-modeling/route-modeling-guide
--> Also see youtube video (3 min 18s): https://youtu.be/AdArDN4E6Hg?si=r0eKXZPpdZ27YwL-&t=198

Good point! I think this functionality belongs to the GtfsToRaptorConverter:

https://github.com/naviqore/raptor/blob/0cc5f65cfced412642a53b7975b97952a632efb4/src/main/java/ch/naviqore/raptor/GtfsToRaptorConverter.java#L27

Perhaps we could inject the schedule into the Converter, allowing it to preprocess the schedule using a RouteSplitter class. This way, when extracting a Raptor for a given day, the preprocessing step wouldn't need to be repeated every time.

Solved by f2479d3, introduces GtfsRoutePartitioner.

munterfi · 2024-05-04T16:11:33Z

Hi @clukas1 and @Brunner246, this PR should address all issues in the review.

clukas1 · 2024-05-04T19:26:20Z

⁠parseRoute, how do you mean that route types are not standardized? you mean even the data type is not defined, i.e. one can define an integer or string or only that integer values do not always map correctly to the enum type in RouteType? In case of the latter, do you mean that 0 could be tram and rail at the same time? Or that just new arbitrary numbers appear when parsing (e.g. 13 - super random train type?). For the new random number problem, I would add a new type UNKNOWN which would apply for all values that not match one of the defined values. If same numbers have different meanings, there's probably no added value if we import route_type at all. And if there's strings and ints mixed, I would think we could try parsing them differently --> if number check enum directly, if string try checking the enum names in lowercase with whitespace/underscores removed if any matches are found. If non of that works, use UNKNOWN.
Maybe also interesting: https://opentransportdata.swiss/de/cookbook/gtfs/#routestxt
And: https://developers.google.com/transit/gtfs/reference/extended-route-types

Solved by 6273663

Some minor comments:

What is the supported attribute used for in HierarchicalVerhicalType?
Why do we call it Hierarchical if it doesn't include any reference to it's parent DefaultRouteType? It seems the numbers make the parent type not derivable by simple division. How about adding a attribute parent of type DefaultRouteType? So all Railway related types would have parent DefaultRouteType.RAIL etc.
Same question as everywhere, should we really fail hard if the route type can not be matched? Or should we introduce an UNKNOWN enum value?

clukas1 · 2024-05-04T19:30:13Z

GtfsScheduleFile.java

How about adding a second boolean attribute required? Then one could decide wether to raise an exception and log an error while reading from direcotry / zip archive in GtfsScheduleReader, instead of just logging a warning.

Addressed by @munterfi in 050da21. Nice work! Looking at conditionally_required made me think if we should add a check to check for these conditions. For now either a calendar.txt or a calendar_dates.txt file has to be present at least.

munterfi · 2024-05-06T13:39:20Z

What is the supported attribute used for in HierarchicalVerhicalType?

That's just if Google Maps supports the transport mode. The information was in the table, and i thought maybe at some point its important, but i dont think we need it at the moment. So i will delete it (5958bad), but we still have it in the history.

munterfi · 2024-05-06T13:45:34Z

2. Why do we call it Hierarchical if it doesn't include any reference to it's parent DefaultRouteType? It seems the numbers make the parent type not derivable by simple division. How about adding a attribute parent of type DefaultRouteType? So all Railway related types would have parent DefaultRouteType.RAIL etc.

Hierarchical Vehicle Type (HVT) is the name from the codes of the European TPEG standard. So there is no connection between the original GTFS route types and the HVTs.

There are some mappings avaiblabe here, unfortunately they are not complete:

Existing Value - Meaning - Corresponding New Value
0 - Tram, Light Rail, Streetcar - 900
1 - Subway, Metro - 400
2 - Rail - 100
3 - Bus - 700
4 - Ferry - 1000
5 - Cable Car - 1701
6 - Gondola, Suspended cable car - 1300
7 - Funicular - 1400

Since we don't really need the relationship or mode in general for the Raptor at this point, I would suggest keeping these two types separate and ensure that we can access their values (code description) through the common interface "RouteType".

munterfi · 2024-05-06T13:59:18Z

3. Same question as everywhere, should we really fail hard if the route type can not be matched? Or should we introduce an UNKNOWN enum value?

I would adhere to the GTFS specification and throw an exception (at least in the builder) if a schedule is invalid. Currently, we only process required files and attributes, and we should treat them as guaranteed.

munterfi · 2024-05-06T14:00:08Z

In general, I would advocate for representing the GTFS as closely as possible to the specification and separating the logic needed for Raptor into separate classes that operate on the GTFS schedule (or interface, see below). Since the specification may change in the future, the more we decouple our logic from the GTFS formats, the easier it will be to adapt later.

Maybe it would make sense in future to introduce a more general Schedule interface, implemented by GtfsSchedule, this would also allow to create Raptor from different transit schedule / timetable data format (e.g. MATSim or HAFAS Rohdaten Format HRDF). But I think this is out of scope for this thesis.

munterfi · 2024-05-06T14:18:26Z

GtfsScheduleFile.java

How about adding a second boolean attribute required? Then one could decide wether to raise an exception and log an error while reading from direcotry / zip archive in GtfsScheduleReader, instead of just logging a warning.

Addressed by @munterfi in 050da21. Nice work! Looking at conditionally_required made me think if we should add a check to check for these conditions. For now either a calendar.txt or a calendar_dates.txt file has to be present at least.

I would move this into a new issue and merge this PR, since it is already quite extensive.

clukas1 · 2024-05-06T18:48:25Z

GtfsScheduleFile.java

How about adding a second boolean attribute required? Then one could decide wether to raise an exception and log an error while reading from direcotry / zip archive in GtfsScheduleReader, instead of just logging a warning.

Addressed by @munterfi in 050da21. Nice work! Looking at conditionally_required made me think if we should add a check to check for these conditions. For now either a calendar.txt or a calendar_dates.txt file has to be present at least.

I would move this into a new issue and merge this PR, since it is already quite extensive.

Added to NAV-26

clukas1

Some very minor ideas. But think this pull request is more than good enough to be merged.

clukas1 · 2024-05-06T19:40:47Z

src/test/java/ch/naviqore/raptor/GtfsRoutePartitionerTest.java

+    private GtfsSchedule schedule;
+    private GtfsRoutePartitioner partitioner;
+
+    @BeforeEach


Why not use @BeforeAll and make setUp, schedule, partitioner static. Since this data is not manipulated by any test, I don't see why whe should rebuild the GTFS Schedule instance everytime.

clukas1 · 2024-05-06T19:42:57Z

src/test/java/ch/naviqore/raptor/GtfsRoutePartitionerTest.java

+                        .isNotNull();
+            }
+        }
+    }


Maybe add test, to make sure that the expected number of trips are found in each (or specific cases) subroute instance.

ENH: NAV-14 - Add presence to GTFS file type

050da21

- Introduce presence enum with levels: REQUIRED, OPTIONAL, CONDITIONALLY_REQUIRED, CONDITIONALLY_FORBIDDEN, RECOMMENDED. - Throw FileNotFoundException if a required file is not provided.

munterfi self-assigned this May 3, 2024

munterfi added 2 commits May 3, 2024 16:15

ENH: NAV-14 - Add support for extended route types

6273663

- Introduce interface for route type. - Implement with two enums: DefaultRouteType and HierarchicalVehicleType - Parser decides depending on code which type to choose.

STYLE: NAV-14 - Rename test methods, adjust visibility and remove whi…

a3085ac

…te space

munterfi added 4 commits May 3, 2024 17:44

REFACTOR: NAV-14 - Introduce test builder pattern for GTFS module

45921e3

- Add test extension for parameter injection of the builder. - Adjust existing test.

ORG: NAV-6 - Set code style for project

eac4f5c

- Add IntelliJ code style to project. - Reformat complete project.

ENH: NAV-19 - Partition routes into sub-routes with the same stop seq…

f2479d3

…uence - Add GtfsRoutePartitioner and corresponding test. - Adjust GtfsToRaptorConverter to use the GtfsRoutePartitioner. - Fix bug in GtfsScheduleTestBuilder; reversed routes also have reversed stops sequences.

munterfi requested review from clukas1 and Brunner246 and removed request for clukas1 and Brunner246 May 4, 2024 16:11

ENH: NAV-14 - Remove supported field from HierarchicalVehicleType

5958bad

clukas1 approved these changes May 6, 2024

View reviewed changes

munterfi merged commit efb2890 into main May 6, 2024
1 check passed

munterfi deleted the review/gtfs-read-parse-builder branch May 6, 2024 20:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Addressing review feedback: Implementing changes #6

Addressing review feedback: Implementing changes #6

munterfi commented May 3, 2024

munterfi commented May 3, 2024

munterfi commented May 3, 2024

munterfi commented May 3, 2024 •

edited

Loading

munterfi commented May 3, 2024 •

edited

Loading

munterfi commented May 3, 2024

munterfi commented May 4, 2024

munterfi commented May 4, 2024

clukas1 commented May 4, 2024 •

edited

Loading

clukas1 commented May 4, 2024

GtfsScheduleFile.java

munterfi commented May 6, 2024 •

edited

Loading

munterfi commented May 6, 2024

munterfi commented May 6, 2024

munterfi commented May 6, 2024 •

edited

Loading

munterfi commented May 6, 2024

GtfsScheduleFile.java

clukas1 commented May 6, 2024

GtfsScheduleFile.java

clukas1 left a comment •

edited

Loading

clukas1 May 6, 2024

clukas1 May 6, 2024

Addressing review feedback: Implementing changes #6

Addressing review feedback: Implementing changes #6

Conversation

munterfi commented May 3, 2024

GtfsScheduleFile.java

GtfsScheduleParser.java

GtfsScheduleBuilder.java

StopTime.java

munterfi commented May 3, 2024

munterfi commented May 3, 2024

munterfi commented May 3, 2024 • edited Loading

munterfi commented May 3, 2024 • edited Loading

munterfi commented May 3, 2024

munterfi commented May 4, 2024

munterfi commented May 4, 2024

clukas1 commented May 4, 2024 • edited Loading

clukas1 commented May 4, 2024

GtfsScheduleFile.java

munterfi commented May 6, 2024 • edited Loading

munterfi commented May 6, 2024

munterfi commented May 6, 2024

munterfi commented May 6, 2024 • edited Loading

munterfi commented May 6, 2024

GtfsScheduleFile.java

clukas1 commented May 6, 2024

GtfsScheduleFile.java

clukas1 left a comment • edited Loading

Choose a reason for hiding this comment

clukas1 May 6, 2024

Choose a reason for hiding this comment

clukas1 May 6, 2024

Choose a reason for hiding this comment

munterfi commented May 3, 2024 •

edited

Loading

munterfi commented May 3, 2024 •

edited

Loading

clukas1 commented May 4, 2024 •

edited

Loading

munterfi commented May 6, 2024 •

edited

Loading

munterfi commented May 6, 2024 •

edited

Loading

clukas1 left a comment •

edited

Loading