Skip to main content

PEG Example on Toy Data 🎠

Toy data description​

Toy Data

This Toy Data shows two loci identified by leadassociations from a GWAS for the trait myocardial infarction. Each locus contains multiple nearby candidate effector genes (Gene 1–6).

The bottom table summarises the supporting evidence for each gene β€” including eQTLs, predicted functional impact, gene expression in aorta, gene prioritisation scores generated by PoPS, and the authors’ overall conclusion.

Importantly, in publications this type of evidence is often scattered across the main text and multiple supplementary tables, making it difficult to compare, integrate, or reproduce.

PEG Evidence Matrix​

In the PEG Matrix, we propose presenting all evidence in a single structured table. The following table illustrates how the same information can be reformatted into a unified matrix.

Primary Variant IDrsIDGene IDGene symbolLocus rangeLocus IDGWAS_pvalueFUNC_CADDQTL_eQTL_aorta_pvalueEXP_aorta_RPKMPERTURB_mouseINT_popsINT_Combined prediction (author score)
chr1:100000:T:Crs1234ENSG00000000001Gene 1chr1:99500-115000rs12344.00E-0918.27.00E-078.7enlarged heart | increased heart weight10STRONG
chr1:100000:T:Crs1234ENSG00000000002Gene 2chr1:99500-115000rs12344.00E-093.450.01NANA3WEAK
chr1:100000:T:Crs1234ENSG00000000003Gene 3chr1:99500-115000rs12344.00E-096.40.05NANA1WEAK
chr2:20000:A:Grs5432ENSG00000000004Gene 4chr2:19000-21000rs54323.00E-0815.628.00E-051.3NA7MODERATE
chr2:20000:A:Grs5432ENSG00000000005Gene 5chr2:19000-21000rs54323.00E-082.130.2NANA5WEAK
chr2:20000:A:Grs5432ENSG00000000006Gene 6chr2:19000-21000rs54323.00E-084.40.05NANA4WEAK

PEG List​

The PEG List distils the matrix into a concise summary, highlighting the strongest candidate gene at each locus.

rsIDGene symbolVariant-centricGene-centricINT_Combined prediction
(author score)
GWASFUNCQTLEXPPERTUB
rs1234Gene 1STRONG
rs5432Gene 4STRONG

Tick = data/value present (VAL). Blank = not assessed (NA). Ticks do not imply supportive vs negative; see author interpretation & provenance

The PEGASUS List Foundational model - records whether evidence was considered (tick = data present, blank = not assessed) and reflects the author’s integrated conclusions for top genes.

PEG Metadata​

PEG Metadata β€” Provides the detailed context behind the PEG Matrix, recording column definitions, provenance, biosamples, and methods so that PEG evidence is fully interpretable and reproducible.

PEG Metadata in Excel (suitable for submission)​

peg_sourcegwas_sourcetrait_descriptiontrait_ontology_idsample_descriptionsample_sizecase_control_studysample_ancestrysample_ancestry_label
PMID:36357675PMID:36357675Ascorbic acid 3-sulfate levelsEFO_08001736,136 Finnish ancestry individuals6136FalseFinlandEuropean
source_tagprovenancefile_nameversionurlaccesstiondoitissuesample_origincell_typecell_linediseaselife_stagetreatmentsexagespeciesdescription
source_caddCADDAll possible SNVs of GRCh38/hg38 incl. all annotationsv1.7linkNANANANANANANANANANANANANA
source_gtex_eqtlGTExGTEx_Analysis_v10_eQTL.tarv10linkNANAaortaprimary tissueNANAhealthyadultNonemixedmixedHomo sapiens
Bulk aorta tissue

Samples from healthy adult human donors in GTEx v10. Used for eQTL discovery. Donors aged ~20–70 years, male and female.

source_gtex_aorta_RNAGTExGTEx_Analysis_v10_RNASeQCv2.4.2_gene_tpm.gct.gzv10linkNANAaortaprimary tissueNANAhealthyadultNonemixedmixedHomo sapiens
Bulk aorta tissue

samples (GTEx v10) from healthy postmortem adult human donors in GTEx v10. Used for RNA expression profiling. Donors aged ~20–70 years, male and female.

source_impcIMPCIMPC_genotype_phenotype.csv.gz23linkNANAmultipleIMPC mouse knockout modelsNANANAmixedgene knockoutmixedmixedMus musculusMice with single-gene knockouts generated by the IMPC project.

PEG Metadata in YAML (suitable for reader)​

Using YAML for metadata keeps all information on one page in a structured format, so users can easily search and extract the details they need

Show YAML file