You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have a ProcessingOutcome base class, but no JobResult base class. This results in us reimplementing very similar versions of the following code in every since job/task (specifically: the schedule download job, schedule validation, RT parsing, RT validation, schedule unzipping, and schedule conversion to JSON.)
We should add a base class with this and then just subclass it in each instance to avoid copy/pasted code.
class JobResult(PartitionedGCSArtifact):
bucket: ClassVar[str]
table: ClassVar[str]
partition_names: ClassVar[List[str]]
outcomes: List[ProcessingOutcome]
@property
def successes(self) -> List[ProcessingOutcome]:
return [outcome for outcome in self.outcomes if outcome.success]
@property
def failures(self) -> List[ProcessingOutcome]:
return [outcome for outcome in self.outcomes if not outcome.success]
def save(self, fs):
self.save_content(
fs=fs,
content="\n".join(o.json() for o in self.outcomes).encode(),
exclude={"outcomes"},
)
The text was updated successfully, but these errors were encountered:
Currently we've been using .jsonl.gz for data and .jsonl for outcomes. We could continue this pattern and enforce it in the type; it makes it mildly easier to look at outcomes which really shouldn't be that large anyways.
We have a
ProcessingOutcome
base class, but noJobResult
base class. This results in us reimplementing very similar versions of the following code in every since job/task (specifically: the schedule download job, schedule validation, RT parsing, RT validation, schedule unzipping, and schedule conversion to JSON.)We should add a base class with this and then just subclass it in each instance to avoid copy/pasted code.
The text was updated successfully, but these errors were encountered: