Version: 0.0.1

💡 Illustrative Examples of Evidence Matrix Columns

The tables below show examples of how PEG evidence matrix columns can be named and formatted.
These examples are not mandatory fields — they are provided to demonstrate recommended naming patterns, data formats, and reporting styles.

Projects may define additional or alternative columns, we recommend following these general conventions.
Metadata should provide comprehensive information to understand the data type, provenance, and scale used for each column.

Variant-centric evidence example
Gene-centric evidence example
Integration example

Evidence Category	Column header	Data Format	Description	Requirement	Example data
GWAS	`GWAS_pvalue`	Exponent or −log10	P-value of the primary variant in the source GWAS. Specify whether exponent (e.g. `4×10⁻⁹`) or −log10 scale in the metadata file.	optional	4×10⁻⁹
Proximity	`PROX_nearest_gene`	boolean	Indicates whether the variant is the nearest gene. Details on how distance is derived (e.g. to TSS, to gene footprint) should be documented in the metadata.	optional	N
QTL	`QTL_eQTL_pancreas_pvalue`	exponent or -log10	Significance value for eQTL association in pancreas tissue.	optional	0.01
QTL	`QTL_eQTL_pancreas_CI`	range	Confidence interval for the eQTL effect. Define confidence level (e.g. 95%) in metadata.	optional	[1.2, 2.5]
Functional	`FUNC_CADD`	float	CADD functional prediction score. Specify genome build and release in the metadata.	optional	15.62
Fine-mapping	`FM_credible_set_ID`	string	Identifier of the credible set variant from fine-mapping.	optional	chr10:114754071:T:C
Fine-mapping	`FM_PIP`	float	Posterior inclusion probability (PIP) from fine-mapping.	optional	0.98
Coloc	`COLOC_PPH4`	float	Colocalisation posterior probability that both traits share a causal variant (PPH4).	optional	0.85

Evidence Category	Column header	Data Format	Description	Requirement	Example data
TWAS	`TPWAS_TWAS_pvalue`	float	Transcriptome-wide association study (TWAS) p-value linking gene expression to trait.	optional	1×10⁻⁷
Expression	`EXP_Adipose_TPM`	float	Expression level of the gene in adipose tissue, reported as Reads Per Million per Kilobase (RPMK) or Transcripts Per Million (TPM).	optional	0.8
Expression	`EXP_pancreas_TPM`	float	Expression level of the gene in pancreas tissue, reported as RPMK or TPM.	optional	—
Perturbation	`PERTURB_mouse`	Free text / ontology terms	Observed phenotype in mouse perturbation models (e.g., knockout, overexpression). Terms can be free text or ontology labels, defined in metadata.	optional	hypoglycemia \| increased insulin secretion \| impaired glucose tolerance
Knowledge	`KNOW`	Narrative text	Expert or knowledge-base curation describing gene function and its relationship to phenotype or disease.	optional	ANGPTL4 inhibits lipoprotein lipase (LPL), increasing circulating triglycerides and reducing fatty acid uptake.
Literature	`LIT`	Narrative text	Human-curated evidence from published studies linking the gene to relevant traits or disease mechanisms.	optional	Zebrafish Tcf7l2 mutant shows hyperglycemia, pancreatic and vascular defects, reduced regeneration.
Literature	`LIT_PMID`	PMID list	PubMed identifiers (PMIDs) supporting literature evidence for the gene–trait association.	optional	PMID_28851992 \| PMID_31829936
Drug	`DRUG`	Drug name(s)	Drug(s) known to target or modulate the gene, separated by `\|` (pipe). Reference databases (e.g., DrugBank) can be cited in metadata.	optional	METFORMIN \| CYCLOSPORINE

Column header	Format	Description	Requirement	Example data
`INT_pops`	float	Population count or weighted metric used in integration scoring. Define precise meaning and provenance in the metadata file.	optional	9
`INT_Combined_prediction_author_score`	any	Author-provided integrated prediction score. Units, scale, or categories (e.g. `STRONG`, `WEAK`) must be described in the metadata file.	optional	STRONG