Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addressing review feedback: Implementing changes #6

Merged
merged 8 commits into from
May 6, 2024

Conversation

munterfi
Copy link
Member

@munterfi munterfi commented May 3, 2024

Review by @clukas1:

GtfsScheduleFile.java

How about adding a second boolean attribute required? Then one could decide wether to raise an exception and log an error while reading from direcotry / zip archive in GtfsScheduleReader, instead of just logging a warning.

GtfsScheduleParser.java

  • ⁠parseRoute, how do you mean that route types are not standardized? you mean even the data type is not defined, i.e. one can define an integer or string or only that integer values do not always map correctly to the enum type in RouteType? In case of the latter, do you mean that 0 could be tram and rail at the same time? Or that just new arbitrary numbers appear when parsing (e.g. 13 - super random train type?). For the new random number problem, I would add a new type UNKNOWN which would apply for all values that not match one of the defined values. If same numbers have different meanings, there's probably no added value if we import route_type at all. And if there's strings and ints mixed, I would think we could try parsing them differently --> if number check enum directly, if string try checking the enum names in lowercase with whitespace/underscores removed if any matches are found. If non of that works, use UNKNOWN.
    Maybe also interesting: https://opentransportdata.swiss/de/cookbook/gtfs/#routestxt
    And: https://developers.google.com/transit/gtfs/reference/extended-route-types

  • ⁠parseTrip, there is direction_id (typically 0 or 1) to indicate if the trip is going from a to b or from b to a. Maybe even different stop sequences are indicated with values 1+ (haven't checked). I think this would be the bare minimum we need for RAPTOR, because different direction_ids are by definition already a new route in RAPTOR. However, I think we still have to go further and split routes one by one after parsing them to identify all different stop sequences and create raptor compatible routes.

GtfsScheduleBuilder.java

  • ⁠Why is this in the model directory/package?
  • As already mentioned in the comment for parser and parseTrip, I think we need to add more logic to the initialization of the Route Objects. Since they will typically contain at least two different stop sequences and should therefore be treated as separate routes in RAPTOR. Maybe instead of initialize, have a route splitter method which returns a collection of routes from one route which can then be concatenated to the clean routes list.
    --> Some interesting links: https://developers.google.com/transit/gtfs/data-modeling/route-modeling-guide
    --> Also see youtube video (3 min 18s): https://youtu.be/AdArDN4E6Hg?si=r0eKXZPpdZ27YwL-&t=198

StopTime.java

Minor comment and can be ignored since there is no logical error, why are we comparing with departure and not stop_sequence. I thought it's required and would be the exact thing it's meant for. However the gtfs page is down I right now and I couldn't look this up (same goes for direction_id).

- Introduce presence enum with levels: REQUIRED, OPTIONAL, CONDITIONALLY_REQUIRED, CONDITIONALLY_FORBIDDEN, RECOMMENDED.
- Throw FileNotFoundException if a required file is not provided.
@munterfi munterfi self-assigned this May 3, 2024
@munterfi
Copy link
Member Author

munterfi commented May 3, 2024

  • ⁠Why is this in the model directory/package?

Since the construtors of the model entities are package private, which restricts to use the builder to modify a GtfsSchedule. However, semantically the builer itself is not part of the model. But i dont know a different way to achieve this behavior this in Java.

@munterfi
Copy link
Member Author

munterfi commented May 3, 2024

Minor comment and can be ignored since there is no logical error, why are we comparing with departure and not stop_sequence. I thought it's required and would be the exact thing it's meant for. However the gtfs page is down I right now and I couldn't look this up (same goes for direction_id).

The sequence information is redundant given the departure time; therefore, I omitted it in the StopTime class. The sequence is implicitly encoded in the sorted array after building the schedule. However, there could be issues down the road since departure_time is only conditionally required.

One important difference, if we base equality on the sequence information instead of departure time, is that we then cannot compare stop times of different trips anymore, which is done, for example, when initializing the stops in the schedule: https://github.com/naviqore/raptor/blob/0cc5f65cfced412642a53b7975b97952a632efb4/src/main/java/ch/naviqore/gtfs/schedule/model/Stop.java#L27

I suggest we keep it as it is but keep in mind the possibility of trouble with conditionally required departure times if we encounter null pointer exceptions someday.

@munterfi
Copy link
Member Author

munterfi commented May 3, 2024

Good point! I think this functionality belongs to the GtfsToRaptorConverter:
https://github.com/naviqore/raptor/blob/0cc5f65cfced412642a53b7975b97952a632efb4/src/main/java/ch/naviqore/raptor/GtfsToRaptorConverter.java#L27

Perhaps we could inject the schedule into the Converter, allowing it to preprocess the schedule using a RouteSplitter class. This way, when extracting a Raptor for a given day, the preprocessing step wouldn't need to be repeated every time.

@munterfi
Copy link
Member Author

munterfi commented May 3, 2024

  • ⁠parseTrip, there is direction_id (typically 0 or 1) to indicate if the trip is going from a to b or from b to a. Maybe even different stop sequences are indicated with values 1+ (haven't checked). I think this would be the bare minimum we need for RAPTOR, because different direction_ids are by definition already a new route in RAPTOR. However, I think we still have to go further and split routes one by one after parsing them to identify all different stop sequences and create raptor compatible routes.

Since the direction_id is optional in the trips.txt, i would rather keep this via the implicit order of the stops via departure time and handle this case in the RouteSplitter.

munterfi added 2 commits May 3, 2024 16:15
- Introduce interface for route type.
- Implement with two enums: DefaultRouteType and HierarchicalVehicleType
- Parser decides depending on code which type to choose.
@munterfi
Copy link
Member Author

munterfi commented May 3, 2024

  • ⁠parseRoute, how do you mean that route types are not standardized? you mean even the data type is not defined, i.e. one can define an integer or string or only that integer values do not always map correctly to the enum type in RouteType? In case of the latter, do you mean that 0 could be tram and rail at the same time? Or that just new arbitrary numbers appear when parsing (e.g. 13 - super random train type?). For the new random number problem, I would add a new type UNKNOWN which would apply for all values that not match one of the defined values. If same numbers have different meanings, there's probably no added value if we import route_type at all. And if there's strings and ints mixed, I would think we could try parsing them differently --> if number check enum directly, if string try checking the enum names in lowercase with whitespace/underscores removed if any matches are found. If non of that works, use UNKNOWN.
    Maybe also interesting: https://opentransportdata.swiss/de/cookbook/gtfs/#routestxt
    And: https://developers.google.com/transit/gtfs/reference/extended-route-types

Solved by 6273663

munterfi added 4 commits May 3, 2024 17:44
- Add test extension for parameter injection of the builder.
- Adjust existing test.
- Create larger example schedule with more calendar variations and both directions.
- Increase test coverage concerning boundary cases in next departures (midnight), nearest stops (same location query) and active trips (dates outside validity or no service day).
- Add IntelliJ code style to project.
- Reformat complete project.
…uence

- Add GtfsRoutePartitioner and corresponding test.
- Adjust GtfsToRaptorConverter to use the GtfsRoutePartitioner.
- Fix bug in GtfsScheduleTestBuilder; reversed routes also have reversed stops sequences.
@munterfi
Copy link
Member Author

munterfi commented May 4, 2024

Good point! I think this functionality belongs to the GtfsToRaptorConverter:

https://github.com/naviqore/raptor/blob/0cc5f65cfced412642a53b7975b97952a632efb4/src/main/java/ch/naviqore/raptor/GtfsToRaptorConverter.java#L27

Perhaps we could inject the schedule into the Converter, allowing it to preprocess the schedule using a RouteSplitter class. This way, when extracting a Raptor for a given day, the preprocessing step wouldn't need to be repeated every time.

Solved by f2479d3, introduces GtfsRoutePartitioner.

@munterfi
Copy link
Member Author

munterfi commented May 4, 2024

Hi @clukas1 and @Brunner246, this PR should address all issues in the review.

@munterfi munterfi requested review from clukas1 and Brunner246 and removed request for clukas1 and Brunner246 May 4, 2024 16:11
@clukas1
Copy link
Member

clukas1 commented May 4, 2024

  • ⁠parseRoute, how do you mean that route types are not standardized? you mean even the data type is not defined, i.e. one can define an integer or string or only that integer values do not always map correctly to the enum type in RouteType? In case of the latter, do you mean that 0 could be tram and rail at the same time? Or that just new arbitrary numbers appear when parsing (e.g. 13 - super random train type?). For the new random number problem, I would add a new type UNKNOWN which would apply for all values that not match one of the defined values. If same numbers have different meanings, there's probably no added value if we import route_type at all. And if there's strings and ints mixed, I would think we could try parsing them differently --> if number check enum directly, if string try checking the enum names in lowercase with whitespace/underscores removed if any matches are found. If non of that works, use UNKNOWN.
    Maybe also interesting: https://opentransportdata.swiss/de/cookbook/gtfs/#routestxt
    And: https://developers.google.com/transit/gtfs/reference/extended-route-types

Solved by 6273663

Some minor comments:

  1. What is the supported attribute used for in HierarchicalVerhicalType?
  2. Why do we call it Hierarchical if it doesn't include any reference to it's parent DefaultRouteType? It seems the numbers make the parent type not derivable by simple division. How about adding a attribute parent of type DefaultRouteType? So all Railway related types would have parent DefaultRouteType.RAIL etc.
  3. Same question as everywhere, should we really fail hard if the route type can not be matched? Or should we introduce an UNKNOWN enum value?

@clukas1
Copy link
Member

clukas1 commented May 4, 2024

GtfsScheduleFile.java

How about adding a second boolean attribute required? Then one could decide wether to raise an exception and log an error while reading from direcotry / zip archive in GtfsScheduleReader, instead of just logging a warning.

Addressed by @munterfi in 050da21. Nice work! Looking at conditionally_required made me think if we should add a check to check for these conditions. For now either a calendar.txt or a calendar_dates.txt file has to be present at least.

@munterfi
Copy link
Member Author

munterfi commented May 6, 2024

  1. What is the supported attribute used for in HierarchicalVerhicalType?

That's just if Google Maps supports the transport mode. The information was in the table, and i thought maybe at some point its important, but i dont think we need it at the moment. So i will delete it (5958bad), but we still have it in the history.

@munterfi
Copy link
Member Author

munterfi commented May 6, 2024

2. Why do we call it Hierarchical if it doesn't include any reference to it's parent DefaultRouteType? It seems the numbers make the parent type not derivable by simple division. How about adding a attribute parent of type DefaultRouteType? So all Railway related types would have parent DefaultRouteType.RAIL etc.

Hierarchical Vehicle Type (HVT) is the name from the codes of the European TPEG standard. So there is no connection between the original GTFS route types and the HVTs.

There are some mappings avaiblabe here, unfortunately they are not complete:

Existing Value - Meaning - Corresponding New Value
0 - Tram, Light Rail, Streetcar - 900
1 - Subway, Metro - 400
2 - Rail - 100
3 - Bus - 700
4 - Ferry - 1000
5 - Cable Car - 1701
6 - Gondola, Suspended cable car - 1300
7 - Funicular - 1400

Since we don't really need the relationship or mode in general for the Raptor at this point, I would suggest keeping these two types separate and ensure that we can access their values (code description) through the common interface "RouteType".

@munterfi
Copy link
Member Author

munterfi commented May 6, 2024

3. Same question as everywhere, should we really fail hard if the route type can not be matched? Or should we introduce an UNKNOWN enum value?

I would adhere to the GTFS specification and throw an exception (at least in the builder) if a schedule is invalid. Currently, we only process required files and attributes, and we should treat them as guaranteed.

@munterfi
Copy link
Member Author

munterfi commented May 6, 2024

In general, I would advocate for representing the GTFS as closely as possible to the specification and separating the logic needed for Raptor into separate classes that operate on the GTFS schedule (or interface, see below). Since the specification may change in the future, the more we decouple our logic from the GTFS formats, the easier it will be to adapt later.

Maybe it would make sense in future to introduce a more general Schedule interface, implemented by GtfsSchedule, this would also allow to create Raptor from different transit schedule / timetable data format (e.g. MATSim or HAFAS Rohdaten Format HRDF). But I think this is out of scope for this thesis.

@munterfi
Copy link
Member Author

munterfi commented May 6, 2024

GtfsScheduleFile.java

How about adding a second boolean attribute required? Then one could decide wether to raise an exception and log an error while reading from direcotry / zip archive in GtfsScheduleReader, instead of just logging a warning.

Addressed by @munterfi in 050da21. Nice work! Looking at conditionally_required made me think if we should add a check to check for these conditions. For now either a calendar.txt or a calendar_dates.txt file has to be present at least.

I would move this into a new issue and merge this PR, since it is already quite extensive.

@clukas1
Copy link
Member

clukas1 commented May 6, 2024

GtfsScheduleFile.java

How about adding a second boolean attribute required? Then one could decide wether to raise an exception and log an error while reading from direcotry / zip archive in GtfsScheduleReader, instead of just logging a warning.

Addressed by @munterfi in 050da21. Nice work! Looking at conditionally_required made me think if we should add a check to check for these conditions. For now either a calendar.txt or a calendar_dates.txt file has to be present at least.

I would move this into a new issue and merge this PR, since it is already quite extensive.

Added to NAV-26

Copy link
Member

@clukas1 clukas1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some very minor ideas. But think this pull request is more than good enough to be merged.

private GtfsSchedule schedule;
private GtfsRoutePartitioner partitioner;

@BeforeEach
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use @BeforeAll and make setUp, schedule, partitioner static. Since this data is not manipulated by any test, I don't see why whe should rebuild the GTFS Schedule instance everytime.

.isNotNull();
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add test, to make sure that the expected number of trips are found in each (or specific cases) subroute instance.

@munterfi munterfi merged commit efb2890 into main May 6, 2024
1 check passed
@munterfi munterfi deleted the review/gtfs-read-parse-builder branch May 6, 2024 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants