Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate how to integrate new FRA report type into data parsing pipeline #3400

Closed
11 tasks
lhuxraft opened this issue Jan 2, 2025 · 0 comments
Closed
11 tasks
Assignees

Comments

@lhuxraft
Copy link
Collaborator

lhuxraft commented Jan 2, 2025

Background

This spike aims to explore and determine the best approach for integrating the new FRA report type into the data parsing pipeline. The focus will be on understanding the requirements for handling the new report type and evaluating how to modify the current parsing logic to accommodate these changes. The investigation should clarify whether new parsing functions or updates to existing logic are necessary, and how to handle reparsing gracefully for FRA files.

Acceptance Criteria

  • A clear understanding of the changes required to the parsing pipeline to handle the new FRA report type.
  • A proposed approach for implementing the parsing logic for FRA files (e.g., adding a dedicated function vs. extending the current one).
  • A list of potential challenges or technical concerns, including how to handle reparsing.
  • Documentation of any recommended changes to the DataFile model or related components.
  • A decision on how a new FRA schema definition is implemented for parsing.

Tasks

  • Review the current data parsing pipeline and identify areas that need to be modified for the new FRA report type.
  • Investigate the best approach for integrating the new FRA sections into the existing parsing functions (e.g., should parse_FRA() be created, or should we extend parse_datafile()?).
  • Determine if reparsing logic needs to be updated to handle FRA files differently from other data files.
  • Explore whether existing RowSchema and Field classes can support creation of FRA schema definitions or if modifications/new classes are needed
  • Document any technical constraints, limitations, or challenges that may impact the implementation.
  • Explore creating separate python package for schema_defs for versioning related to Enhance seed_db to be schema-aware and support versioning #3168 -- add versioning to records (t1 records, parserError, other objects).

Supporting Documentation

Error Report Mockup. Note that the validation rules section (below) is just for note purposes, not spec.
image

Frontend screenshots:

image.png

image.png

image.png

image.png

Open Questions

  • Does creating a parse_FRA() function make more sense rather than trying to shoehorn FRA functionality into parse_datafile()?
    • Refactoring parse_datafile() to be modular and having specific parameters/identifier for FRA or TANF. Split off more functions underneath parse_datafile to handle differing types.
  • Is the addition of an FRA schema definition sufficient with the three undefined report types for parsing?
    • Creation of specific subclasses for ParserError, Field, etc. appear to be necessary
    • calling upon a parent class for file type and/or schema_def itself.

Ticket/work pairing

  1. Tech Memo on refactoring parse_datafile() to move away from the functional programming approach to a more class-based using parent "Parser" class to handle datafile types
  • cover broad changes to changes needed in unit testing
  • Brief-ish documentation around how error report generation implementation will be different for FRA using Version A (inline errors on existing upload) -- different rows/columns, etc.
  1. Tech memo on parser engine versioning (via python package or other)
  • Microservice app exploration? Separate git repo, divorced from TANF-app, versioning managed by DRF?
  1. Tech Memo on new RowSchema and Field classes along with other new subclass/rework
@lhuxraft lhuxraft changed the title Investigate correct method for parsing FRA Investigate how to integrate new FRA report type into data parsing pipeline Jan 2, 2025
@lhuxraft lhuxraft added office hours Refined Ticket has been refined at the backlog refinement labels Jan 3, 2025
@andrew-jameson andrew-jameson removed office hours Refined Ticket has been refined at the backlog refinement labels Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants