Skip to content

Commit

Permalink
fixup: document joint struct
Browse files Browse the repository at this point in the history
  • Loading branch information
rileyhgrant committed Aug 12, 2024
1 parent 7ed499e commit e202f3f
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 6 deletions.
12 changes: 12 additions & 0 deletions browser/help/topics/v4-browser-hts.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,18 @@ Row fields:
- `metric`: Metric name.
- `value`: Metric value.
- `genome`: Struct containing information about this variant from genome data. Contains all the same fields as the exome data, with the exception that the subsets are (`all` `hgdp`, `tgp`) instead of (`all`, `non_ukb`).
- `joint`: Struct containing information about this variant for the joint exome and genome data.
- `freq`: A struct containing variant frequency information.
- `all`: Struct containing variant frequency information calculated across the combined (joint) gnomAD exomes and genomes. Contains the same fields as exomes `freq.all` struct.
- `faf`: Array of combined exomes and genomes filtering allele frequency information. See `faf` description on the v4 Hail Tables [help page](/v4-hts#joint-faf).
- `fafmax`: Struct containing information about the maximum FAF. Contains same fields as exomes' `fafmax.gnomad` struct.
- `grpmax`: Allele frequency information for the non-bottlenecked genetic ancestry group with the maximum alelle frequency. See `grpmax` description on the v4 Hail Tables [help page](/v4-hts#joint-grpmax).
- `histograms`: Variant information histograms from the joint gnomAD exomes and genomes. See `histograms` description on the v4 Hail Tables [help page](v4-hts#joint-histograms).
- `qual_hists`: Genotype quality metric histograms for high quality genotypes. See v4 Hail Tables [help page](v4-hts#joint-histograms).
- `raw_qual_hists`: Genotype quality metric histograms for all genotypes as opposed to high quality genotypes. See v4 Hail Tables [help page](v4-hts#joint-histograms).
- `age_hists`: Histograms containing age information for release samples. See v4 Hail Tables [help page](v4-hts#joint-age-histograms)
- `flags`: Set containing flags about joint exome and genome data, possible values are [`discrepant_frequencies`, `not_called_in_exomes`, and `not_called_in_genomes`].
- `freq_comparison_stats`: Struct containing results from contingency table and Cochran-Mantel-Haenszel tests comparing allele frequencies between the gnomAD exomes and genomes. See `freq_comparison_stats` description on the v4 Hail Tables [help page](/v4-hts#joint-freq-comparison-stats).
- `rsids`: dbSNP reference SNP identification (rsID) numbers.
- `in_silico_predictors`: Variant prediction annotations. Struct contains prediction scores from multiple in silico predictors. See `in_silico_predictors` description on the v4 Hail Tables [help page](v4-hts#in-silico-predictors).
- `variant_id`: gnomAD variant ID.
Expand Down
10 changes: 5 additions & 5 deletions browser/help/topics/v4-hts.md
Original file line number Diff line number Diff line change
Expand Up @@ -551,16 +551,16 @@ Row fields
- `AF`: Combined (exomes + genomes) alternate allele frequency, (AC/AN), in release.
- `AN`: Total number of alleles across exomes and genomes in release.
- `homozygote_count`: Count of homozygous alternate individuals across exomes and genomes in release.
- `grpmax`: Allele frequency information (AC, AN, AF, homozygote count) for the non-bottlenecked genetic ancestry group with maximum allele frequency across both exomes and genomes. Excludes Amish (`ami`), Ashkenazi Jewish (`asj`), European Finnish (`fin`), and "Remaining individuals" (`remaining`) groups.
- <a id="joint-grpmax"></a>`grpmax`: Allele frequency information (AC, AN, AF, homozygote count) for the non-bottlenecked genetic ancestry group with maximum allele frequency across both exomes and genomes. Excludes Amish (`ami`), Ashkenazi Jewish (`asj`), European Finnish (`fin`), and "Remaining individuals" (`remaining`) groups.
- `AC`: Alternate allele count in the group with the maximum allele frequency.
- `AF`: Maximum alternate allele frequency, (AC/AN), across groups in gnomAD.
- `AN`: Total number of alleles in the group with the maximum allele frequency.
- `homozygote_count`: Count of homozygous individuals in the group with the maximum allele frequency.
- `gen_anc`: Genetic ancestry group with maximum allele frequency.
- `faf`: Array of combined exomes and genomes filtering allele frequency information (AC, AN, AF, homozygote count). Note that the values in array will correspond to the joint or combined value if the variant had a defined filtering allele frequency in both data types, otherwise this array will contain filtering allele frequencies only for the data type associated with the Hail Table (in this case, exomes).
- <a id="joint-faf"></a>`faf`: Array of combined exomes and genomes filtering allele frequency information (AC, AN, AF, homozygote count). Note that the values in array will correspond to the joint or combined value if the variant had a defined filtering allele frequency in both data types, otherwise this array will contain filtering allele frequencies only for the data type associated with the Hail Table (in this case, exomes).
- `faf95`: Combined exomes and genomes filtering allele frequency (using Poisson 95% CI).
- `faf99`: Combined exomes and genomes filtering allele frequency (using Poisson 99% CI).
- `histograms`: Variant information histograms of the combined (joint) gnomAD exomes and genomes.
- <a id="joint-histograms"></a>`histograms`: Variant information histograms of the combined (joint) gnomAD exomes and genomes.
- `qual_hists`: Genotype quality metric histograms for high quality genotypes.
- `gq_hist_all`: Histogram for GQ calculated on high quality genotypes.
- `bin_edges`: Bin edges for the GQ histogram calculated on high quality genotypes are: 0|5|10|15|20|25|30|35|40|45|50|55|60|65|70|75|80|85|90|95|100.
Expand Down Expand Up @@ -613,7 +613,7 @@ Row fields
- `bin_freq`: Bin frequencies for the histogram of AB in heterozygous individuals calculated on all genotypes. The number of records found in each bin.
- `n_smaller`: Count of AB values in heterozygous individuals falling below lowest histogram bin edge, calculated on all genotypes.
- `n_larger`: Count of AB values in heterozygous individuals falling above highest histogram bin edge, calculated on all genotypes.
- `age_hists`: Histograms containing age information for release samples.
- <a id="joint-age-histograms"></a>`age_hists`: Histograms containing age information for release samples.
- `age_hist_het`: Histogram for age in all heterozygous release samples calculated on high quality genotypes.
- `bin_edges`: Bin edges for the age histogram.
- `bin_freq`: Bin frequencies for the age histogram. This is the number of records found in each bin.
Expand All @@ -624,7 +624,7 @@ Row fields
- `bin_freq`: Bin frequencies for the age histogram. This is the number of records found in each bin.
- `n_smaller`: Count of age values falling below lowest histogram bin edge.
- `n_larger`: Count of age values falling above highest histogram bin edge.
- `freq_comparison_stats`: Struct containing results from contingency table and Cochran-Mantel-Haenszel tests comparing allele frequencies between the gnomAD exomes and genomes.
- <a id="joint-freq-comparison-stats"></a>`freq_comparison_stats`: Struct containing results from contingency table and Cochran-Mantel-Haenszel tests comparing allele frequencies between the gnomAD exomes and genomes.
- `contingency_table_test`: Array of results from Hail's [`contingency_table_test`](https://hail.is/docs/0.2/functions/stats.html#hail.expr.functions.contingency_table_test) with `min_cell_count=100` comparing allele frequencies between exomes and genomes. Each element in the array corresponds to the comparasion of a specific frequency aggregation group defined by the `joint.freq_meta` global field.
- `odds_ratio`: Odds ratio from the contingency table test.
- `p_value`: P-value from the contingency table test.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -393,7 +393,11 @@ def freq_joint(ds, subset=None, pop=None, sex=None, raw=False):

def prepare_table_for_release(variants_table_path):
ds = hl.read_table(variants_table_path)
ds = ds.annotate(exomes=ds.exomes.drop("faf95", "faf99"), genomes=ds.genomes.drop("faf95", "faf99"))
ds = ds.annotate(
exomes=ds.exomes.drop("faf95", "faf99"),
genomes=ds.genomes.drop("faf95", "faf99"),
joint=ds.joint.drop("faf99_joint", "faf95_joint"),
)
ds = ds.select_globals(mane_select_version=ds.globals.mane_transcripts_version)
return ds

Expand Down

0 comments on commit e202f3f

Please sign in to comment.