Skip to main content

📋 PEG Evidence Matrix Standard

Genomic Identifier

Column headerData formatDescriptionRequirementExample data
Primary Variant IDchr:bp:ref:altThe variant to which variant-centric evidence relates. Used as the primary row ID; may be a lead variant, a variant in LD, or a fine-mapped SNP (defined in metadata).mandatorychr10:114754071:T:C
rsIDrs[]The rsID of the primary variant.optionalrs1234
VAR_[xyz]bespokeAdditional variant ID columns. Custom names must follow VAR_[xyz] and be defined in the metadata file.optionalbespoke

Evidence — General Pattern

All variant-centric evidence columns are optional. However, we suggest to include at least one variant-centric evidence to support variant-gene relationship.

We define a general reporting pattern:

Column headerData FormatDescriptionRequirementExample data
Category_[xyz]BesopkeMost headers follow the format Category_(stream)_[xyz].

Category is mandatory;
stream is used only if it differs from the category;
[xyz] can be any user-defined label.

e.g. GWAS_pvalue, EXP_AdiposeTissue_TPM, QTL_eQTL_pancreas,TPWAS_TWAS_pvalue. The category must be from the controlled list and defined in the metadata file.
optionalvariant-centric evidence examples;

gene-centric evidence examples

These are not strict requirements. Different categories may call for different types of data, and users can adapt them as needed. For guidance, we provide reference guidelines for the general evidence categories. Each category — variant-centric, gene-centric, comes with suggested naming patterns and example formats.

Integration Evidence — General Pattern

Column headerData FormatDescriptionRequirementExample data
INT_[xyz]Bespoke

Headers may follow the format INT_[xyz].

INT denotes integration evidence;
[xyz] can be a user-defined label.

Provenance and specifics may vary across rows — they should be specified in the metadata file, and if variable, also in the data file.

optionalIntegration evidence example