Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for frequency in multi-study view #11331

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

eugeniomazzone
Copy link

Fix #10967

Describe changes proposed in this pull request:

  • Function AlterationCountServiceUtil.setupAlterationGeneCountsMap now initialize the profiledCases to 0.
  • Added an additional piece of code (after above function is called) in AlterationCountServiceImpl to update the number of profiled samples with the correct one, for each study and each gene.

After the fix, the frequency displayed on the multi-study view appear to be correct as it's 23.8% as expected instead of 100%.

Before fix:
Before1
Before2

After fix:
After1
After2

@zeynepkaragoz
Copy link
Contributor

@inodb Eugenio worked on fixing one of the open issues, could you point us to some potential reviewers please?

@eugeniomazzone eugeniomazzone marked this pull request as ready for review January 13, 2025 16:20
@inodb inodb requested review from alisman and haynescd January 13, 2025 19:55
@inodb
Copy link
Member

inodb commented Jan 13, 2025

@haynescd @alisman this is a fix in legacy code for determining alteration frequency in multi-study query. Thoughts?

Comment on lines 291 to 316
List<S> studyAlterationCountByGenes = dataFetcher.apply(studyMolecularProfileCaseIdentifiers);
if (includeFrequency) {
Long studyProfiledCasesCount = includeFrequencyFunction.apply(studyMolecularProfileCaseIdentifiers, studyAlterationCountByGenes);
profiledCasesCount.updateAndGet(v -> v + studyProfiledCasesCount);
}
Map<String, S> studyResult = new HashMap<>();
studyAlterationCountByGenes.forEach(datum -> {
String key = datum.getUniqueEventKey();
studyResult.put(key, datum);
});
List<S> allGene= new ArrayList<>(totalResult.values());
allGene.forEach(datum -> {
String key = datum.getUniqueEventKey();
S alterationCountByGene = totalResult.get(key);
alterationCountByGene.setNumberOfProfiledCases(alterationCountByGene.getNumberOfProfiledCases() + studyMolecularProfileCaseIdentifiers.size());
Set<String> matchingGenePanelIds = new HashSet<>();
if (!alterationCountByGene.getMatchingGenePanelIds().isEmpty()) {
matchingGenePanelIds.addAll(alterationCountByGene.getMatchingGenePanelIds());
}
if (!datum.getMatchingGenePanelIds().isEmpty()) {
matchingGenePanelIds.addAll(datum.getMatchingGenePanelIds());
}
alterationCountByGene.setMatchingGenePanelIds(matchingGenePanelIds);
totalResult.put(key, alterationCountByGene);
});
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets break this out into functions instead of one big anonymous function. Also, can we add some comments

Copy link
Author

@eugeniomazzone eugeniomazzone Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the code to a separate function updateAlterationGeneCountsMap mimicking setupAlterationGeneCountsMap (see new commit at https://github.com/eugeniomazzone/cbioportal/tree/master)

@@ -148,7 +148,8 @@ public static <S extends AlterationCountBase> void setupAlterationGeneCountsMap(
S alterationCountByGene = totalResult.get(key);
alterationCountByGene.setTotalCount(alterationCountByGene.getTotalCount() + datum.getTotalCount());
alterationCountByGene.setNumberOfAlteredCases(alterationCountByGene.getNumberOfAlteredCases() + datum.getNumberOfAlteredCases());
alterationCountByGene.setNumberOfProfiledCases(alterationCountByGene.getNumberOfProfiledCases() + datum.getNumberOfProfiledCases());
alterationCountByGene.setNumberOfProfiledCases(0);
//alterationCountByGene.setNumberOfProfiledCases(alterationCountByGene.getNumberOfProfiledCases() + datum.getNumberOfProfiledCases());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it just commented out?

Copy link
Author

@eugeniomazzone eugeniomazzone Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old line is now removed (see new commit at https://github.com/eugeniomazzone/cbioportal/tree/master)

@fuzhaoyuan fuzhaoyuan self-requested a review January 22, 2025 15:37
@fuzhaoyuan
Copy link
Contributor

fuzhaoyuan commented Jan 22, 2025

Tests are not passing. Could you run mvn clean package in your local environment? @eugeniomazzone

@alisman
Copy link
Contributor

alisman commented Jan 24, 2025

It looks like legacy is not reporting "not_profiled" status correctly. Compare the following curl with /api/ vs /api/column-store/ and you'll see the difference. The legacy implementation in master does report the not-profiled samples.

curl 'http://localhost:8082/api/column-store/mutation-data-counts/fetch?projection=SUMMARY' \
  -H 'Accept: application/json' \
  -H 'Accept-Language: en-US,en;q=0.9' \
  -H 'Cache-Control: no-cache' \
  -H 'Connection: keep-alive' \
  -H 'Content-Type: application/json' \
  -H 'Cookie: JSESSIONID=0B4EB4ED93E52DB8CA6991CBFAAA88CB' \
  -H 'Origin: http://localhost:8082' \
  -H 'Pragma: no-cache' \
  -H 'Referer: http://localhost:8082/study/summary?id=brca_tcga_pan_can_atlas_2018' \
  -H 'Sec-Fetch-Dest: empty' \
  -H 'Sec-Fetch-Mode: cors' \
  -H 'Sec-Fetch-Site: same-origin' \
  -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36' \
  -H 'sec-ch-ua: "Google Chrome";v="131", "Chromium";v="131", "Not_A Brand";v="24"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "macOS"' \
  --data-raw '{"genomicDataFilters":[{"hugoGeneSymbol":"EGFR","profileType":"mutations"}],"studyViewFilter":{"mutationDataFilters":[{"hugoGeneSymbol":"EGFR","profileType":"mutations","values":[[{"value":"NOT_MUTATED"}]],"categorization":"MUTATED"}],"studyIds":["brca_tcga_pan_can_atlas_2018"],"alterationFilter":{"copyNumberAlterationEventTypes":{"AMP":true,"HOMDEL":true},"mutationEventTypes":{"any":true},"structuralVariants":null,"includeDriver":true,"includeVUS":true,"includeUnknownOncogenicity":true,"includeUnknownTier":true,"includeGermline":true,"includeSomatic":true,"includeUnknownStatus":true,"tiersBooleanMap":{}}}}'

@eugeniomazzone
Copy link
Author

eugeniomazzone commented Feb 3, 2025

Today, I've looked at the tests (haven't done that before). I've added the new function to the tests and I've initialized a MolecularProfileCaseIdentifier with generic names for each sample to make it work. Finally, it's now specified that some function are called twice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mutation count in combined studies with genepanels seems to be incorrect
6 participants