📋 PEG Evidence Matrix Standard
Genomic Identifier
- Variant information
- Gene information
- Locus information
Column header | Data format | Description | Requirement | Example data |
---|---|---|---|---|
Primary Variant ID | chr:bp:ref:alt | The variant to which variant-centric evidence relates. Used as the primary row ID; may be a lead variant, a variant in LD, or a fine-mapped SNP (defined in metadata). | mandatory | chr10:114754071:T:C |
rsID | rs[] | The rsID of the primary variant. | optional | rs1234 |
VAR_[xyz] | bespoke | Additional variant ID columns. Custom names must follow VAR_[xyz] and be defined in the metadata file. | optional | bespoke |
Column header | Data format | Description | Requirement | Example data |
---|---|---|---|---|
Gene ID | ENSG[] | The gene under consideration in this row (gene-centric evidence). Primary identifier must be the Ensembl Gene ID. Other IDs can be added using GENE_[xyz] (e.g. GENE_EntrezID). | mandatory (or) | ENSG00000151532 |
Gene symbol | HGNC | The gene under consideration in this row. Primary symbol must be the HGNC-approved gene symbol. Alternative/legacy symbols may be provided via GENE_[xyz] (e.g. GENE_alias). | mandatory (or) | VTI1A |
GENE_[xyz] | bespoke | Additional gene-related columns (e.g. Entrez, aliases). Must be defined in metadata. | optional | bespoke |
Column header | Data format | Description | Requirement | Example data |
---|---|---|---|---|
Locus range | chr:pos:start-end | The genomic range around the primary variant considered in this analysis. | recommended | chr10:1000-2000 |
Locus ID | any | Internal or curated region ID. Recommended to use the associated variant (chr:bp or rsID); internal IDs may also be “Locus 1, Locus 2”. | optional | chr10:114754071:T:C |
LOCUS_[xyz] | bespoke | Additional locus-related columns. Must follow LOCUS_[xyz] and be defined in metadata. | optional | bespoke |
Evidence — General Pattern
All variant-centric evidence columns are optional. However, we suggest to include at least one variant-centric evidence to support variant-gene relationship.
We define a general reporting pattern:
Column header | Data Format | Description | Requirement | Example data |
---|---|---|---|---|
Category_[xyz] | Besopke | Most headers follow the format Category_(stream)_[xyz] .Category is mandatory; stream is used only if it differs from the category; [xyz] can be any user-defined label.e.g. GWAS_pvalue ,
EXP_AdiposeTissue_TPM ,
QTL_eQTL_pancreas ,TPWAS_TWAS_pvalue .
The category must be from the controlled list and defined in the metadata file. | optional | variant-centric evidence examples; gene-centric evidence examples |
These are not strict requirements. Different categories may call for different types of data, and users can adapt them as needed. For guidance, we provide reference guidelines for the general evidence categories. Each category — variant-centric, gene-centric, comes with suggested naming patterns and example formats.
Integration Evidence — General Pattern
Column header | Data Format | Description | Requirement | Example data |
---|---|---|---|---|
INT_[xyz] | Bespoke | Headers may follow the format | optional | Integration evidence example |