Skip to main content
Version: next

Toy Data (PEGASUS Framework applied) 🎠

Toy data​

This toy data is an illustration of the type of data commonly found in PEG publications. It shows two loci identified by lead associations from a GWAS for the trait myocardial infarction. Each locus contains multiple nearby candidate effector genes (Gene 1–6).

The bottom table summarises the supporting evidence for each gene β€” including eQTLs, predicted functional impact, gene expression in aorta, gene prioritisation scores generated by PoPS, and the authors’ overall conclusion.

Toy Data

Importantly, in publications this type of evidence is often scattered across the main text and multiple supplementary tables, making it difficult to compare, integrate, or reproduce.

PEG Evidence Matrix​

PEG Evidence Matrix propose presenting all evidence in a single structured table. The following table illustrates how the same information can be reformatted into a unified matrix.

Primary Variant IDrsIDGeneIDGeneSymbolLocus RangeLocusIDGWAS_pvalueFUNC_CADDQTL_eQTL_aorta_pvalueEXP_aorta_RPKMPERTURB_mouseINT_popsINT_Combined prediction (author score)
chr1:100000:T:Crs1234ENSG00000000001Gene 1chr1:99500-115000rs12344.00E-0918.27.00E-078.7enlarged heart | increased heart weight10STRONG
chr1:100000:T:Crs1234ENSG00000000002Gene 2chr1:99500-115000rs12344.00E-093.450.01NANA3WEAK
chr1:100000:T:Crs1234ENSG00000000003Gene 3chr1:99500-115000rs12344.00E-096.40.05NANA1WEAK
chr2:20000:A:Grs5432ENSG00000000004Gene 4chr2:19000-21000rs54323.00E-0815.628.00E-051.3NA7MODERATE
chr2:20000:A:Grs5432ENSG00000000005Gene 5chr2:19000-21000rs54323.00E-082.130.2NANA5WEAK
chr2:20000:A:Grs5432ENSG00000000006Gene 6chr2:19000-21000rs54323.00E-084.40.05NANA4WEAK

PEG List​

The PEG List distils the matrix into a concise summary, highlighting the strongest candidate gene at each locus. The PEG List Foundational model records whether evidence was considered (tick = data present, blank = not assessed) and reflects the author’s integrated conclusions for top genes.

Primary Variant IDGeneSymbolVariant-centricGene-centricINT_Combined prediction
(author score)
GWASFUNCQTLEXPPERTUB
chr1:100000:T:CGene 1STRONG
chr2:20000:A:GGene 4MODERATE

Tick = data/value present. Blank = not assessed. Ticks do NOT imply supportive vs negative.

Author conclusions and provenance are summarised here; detailed information for each evidence category is available in the evidence matrix.

PEG Metadata​

PEG Metadata β€” Provides the detailed context behind the PEGASUS Matrix, recording column definitions, provenance, biosamples, and methods so that PEG evidence is fully interpretable and reproducible. Here, the data is presented in

  • (i) tabular format suitable for submission to a resource or presentation as a supplementary table in a publication and
  • (ii) machine readable format suitable for download from a data resource and re-use in an automated pipeline.

PEG Metadata in tabular format (suitable for submission)​

peg_sourcegwas_sourcetrait_descriptiontrait_ontology_idsample_descriptionsample_sizecase_control_studysample_ancestrysample_ancestry_label
PMID:36357675PMID:36357675Ascorbic acid 3-sulfate levelsEFO_08001736,136 Finnish ancestry individuals6136FalseFinlandEuropean
source_tagprovenancefile_nameversionurlaccesstiondoitissuesample_origincell_typecell_linediseaselife_stagetreatmentsexagespeciesdescription
source_caddCADDAll possible SNVs of GRCh38/hg38 incl. all annotationsv1.7linkNANANANANANANANANANANANANA
source_gtex_eqtlGTExGTEx_Analysis_v10_eQTL.tarv10linkNANAaortaprimary tissueNANAhealthyadultNonemixedmixedHomo sapiens
Bulk aorta tissue

Samples from healthy adult human donors in GTEx v10. Used for eQTL discovery. Donors aged ~20–70 years, male and female.

source_gtex_aorta_RNAGTExGTEx_Analysis_v10_RNASeQCv2.4.2_gene_tpm.gct.gzv10linkNANAaortaprimary tissueNANAhealthyadultNonemixedmixedHomo sapiens
Bulk aorta tissue

samples (GTEx v10) from healthy postmortem adult human donors in GTEx v10. Used for RNA expression profiling. Donors aged ~20–70 years, male and female.

source_impcIMPCIMPC_genotype_phenotype.csv.gz23linkNANAmultipleIMPC mouse knockout modelsNANANAmixedgene knockoutmixedmixedMus musculusMice with single-gene knockouts generated by the IMPC project.

PEG Metadata in YAML (suitable for reader)​

Using YAML for metadata keeps all information on one page in a structured format, so users can easily search and extract the details they need, and is both human and machine-readable.

Show YAML file