-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add
model_embedding_dim
argument to Dataset constructors (#107)
This PR makes DXSM datasets specific to the embedding dimension of the model that the dataset is intended for. Minimal changes in dnsm-experiments-1 were required, and these are in matsengrp/dnsm-experiments-1#78. * `load_pcp_df` and associated functions will load pcp_df without inserting any special token scaffolding into the sequences. We will always maintain separate heavy and light-chain columns wherever applicable. * There will be a free function that takes perhaps pairs of heavy- and light- chain sequences and scaffolds them with special tokens so they can be presented to the model. This function will take a `known_token_count` so that it knows how to process the sequences. * This free function will be called by the Dataset constructor, which will also still need to accept the `known_token_count` parameter. * Calling the `model` forward/represent functions directly (or through model.__call__) will require the user to do any sequence token scaffolding on their own, perhaps using the free function mentioned above * Calling the `Crepe.__call__` function on sequences will do the required scaffolding automatically. Perhaps we'll have to strip out the model predictions for special token sites, so the outputs match the input sequence lengths? Erick suggests input format of something like `crepe([(heavy, None), (heavy1, light1), ...])` to allow heavy and light chain sequences to be passed to the crepe for proper scaffolding.
- Loading branch information
Showing
14 changed files
with
670 additions
and
199 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.