Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schematic refactor testing #322

Closed
adamjtaylor opened this issue Nov 17, 2023 · 10 comments
Closed

Schematic refactor testing #322

adamjtaylor opened this issue Nov 17, 2023 · 10 comments
Assignees
Labels
effort-high This one's hard

Comments

@adamjtaylor
Copy link
Contributor

adamjtaylor commented Nov 17, 2023

Confluence docs on testing

Timing for the Refactor testing and official launch:

  • Testing via CLI and API: Nov 16th - Dec 7th (Note for those that haven't used the API before please check it out and let me know if you have any questions)
  • Deploy to DCA Staging for final testing: Targeting Dec 7th (Will give a final call to do any additional testing on DCA)
  • Release to AWS Prod: Targeting Dec 14th

As always, reach out with any questions or concerns - feel free to use the #fair-data-tools slack channel or contact me directly.

@adamjtaylor
Copy link
Contributor Author

Per whiteboarding -> Testing via CLI and API: Nov 16th - Dec 7th (Note for those that haven't used the API before please check it out and let me know if you have any questions needs to be done by 7th.

@aclayton555 aclayton555 added the effort-high This one's hard label Dec 1, 2023
@aclayton555
Copy link
Contributor

aclayton555 commented Dec 1, 2023

Success criteria: Is everything working as it was before? Are manifests still being created as expected (e.g. columns, column order, etc).

Considerations:

  • Not trying to create any new type of output, just trying to improve processes.
  • This is currently applicable to the schematic API and CLI at this time, and has not been pushed to DCA

@aclayton555
Copy link
Contributor

Process: Develop script to test across all components. Heavier lift, but can be reused and add value for future, continued use.

Dedicate half day for this. Complete by Dec 15.

@adamjtaylor adamjtaylor mentioned this issue Dec 12, 2023
13 tasks
@adamjtaylor
Copy link
Contributor Author

I've completed initial testing based on the following makefile.

These were all able to run so I don't see any immediate breaking errors.

init:
	schematic init --config config_example.yml

convert:
	schematic schema convert "data-models/HTAN.model.csv" > convert.log 2>&1

# JSON file containing the data
JSON_FILE := data-models/dca-template-config.json

# Extracting the schema_name values into a list
COMPONENTS := $(shell jq -r '.manifest_schemas[].schema_name' $(JSON_FILE))

# Target to run the command for each schema_name
get_templates:
	$(foreach comp, $(COMPONENTS), \
		schematic manifest -c config_example.yml get -dt $(comp) -s >$(comp)_stdout.log 2>$(comp)_stderr.log; \
	)

fetch_manifests:
	@echo "Fetching manifests..."
	@synapse query "SELECT * FROM syn20446927 WHERE name LIKE 'synapse_storage_manifest_%.csv'" > all_manifests.tsv

	@while IFS=$$'\t' read -r row_id row_version row_etag id name type currentVersion parentId benefactorId projectId createdBy createdOn modifiedOn modifiedBy dataFileHandleId etag allowedTeam fileFormat Component dataFileSizeBytes dataFileMD5Hex dataFileConcreteType dataFileBucket dataFileKey description dataFileName; do \
		if [ "$$row_id" != "ROW_ID" ]; then \
			echo "Processing ID: $$id"; \
			filepath=$$(synapse get "$$id" | grep 'Downloaded file:' | awk '{print $$NF}'); \
			echo "Downloaded File Path: $$filepath"; \
			if [ -f "$$filepath" ]; then \
				component=$$(awk -F, 'FNR == 2 {print $$1}' "$$filepath"); \
				echo "Validating against: $$component"; \
				schematic model --config config_example.yml validate -dt "$$component" -mp "$$filepath"; \
			else \
				echo "File $$filepath not found"; \
			fi; \
		fi; \
	done < all_manifests.tsv

@adamjtaylor
Copy link
Contributor Author

Follow on tasks

  • Remove loops in DAG Model is not a DAG #332 (not an issue immediately but this will be come an error rather than warning in the future
  • Consider adding this as a cron job to be run say weekly/monthly
  • Add errors logging to the fetch_manifests step - this would warn us where there are manifests that are no-longer compliant with the data model
  • Stretch: Consider if we can run this on all previous versions of the data model to get a global sense of rolling compliance

@adamjtaylor
Copy link
Contributor Author

@mialy-defelice FYI I think I am happy with the schematic refactor testing. It might be worth finding time for us to check in to discuss.

@mialy-defelice
Copy link

@adamjtaylor Thanks so much for testing it out! Feel free to schedule some time on my calendar, it is up to date with my availability for at least the next month.

@aclayton555
Copy link
Contributor

2024.01.04 mid sprint:
Keep open in Jan sprint. Adam will be meeting with Mialy in January to talk through this further.

Longer term, consider how all manifests can be validated against the data model with each update/release.

@aclayton555
Copy link
Contributor

As we close out this sprint, outline additional actions (and issues) associated with required action communicated by Amy on 2024.01.22 (see Slack: https://sagebionetworks.slack.com/archives/C01ANC02U59/p1705959957507399)

All Schematic and DCA Users - This release includes the latest from the Schema Refactor (details in confluence here) and release notes here. Please pay special attention to the breaking change and required actions in the confluence page. Deployment of this Schematic version to DCA will be shared via the FAIR data release calendar invite.

@aclayton555
Copy link
Contributor

  • v24.1.1 of schematic to be released to DCA in v24.2.1. on 2024.02.08
  • Adam has chatted with Mialy about refactor. not anticipating any breaks in latest version, but have picked up some warnings and looping structures that we should look at in an upcoming sprint. We currently do not have a DAG, so we will need to fix that in the future (Model is not a DAG #332 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort-high This one's hard
Projects
None yet
Development

No branches or pull requests

4 participants