📋 PEG Evidence Matrix Standard
Genomic Identifiers
- Variant information
- Gene information
- Locus information
| Column header | Data format | Description | Requirement | Example data |
|---|---|---|---|---|
| PrimaryVariantID | chr:bp:ref:alt |
| Mandatory | chr10:114754071:T:C |
| rsID | rs[] | The rsID of the primary variant. | Optional | rs1234 |
| Var_[xyz] | Bespoke (Any data type, as long as it is used consistently within the column.) | Other columns relating to variant identification may be added, PEGASUS recommend using the format Var_[xyz] and should be defined in the metadata file. | Optional | bespoke |
| Column header | Data format | Description | Requirement | Example data |
|---|---|---|---|---|
| GeneID | ENSG[] | The gene under consideration in this row (gene-centric evidence). The Ensembl Gene ID is recommended as the primary identifier. Other IDs can be added using GeneID_[provider] (e.g. GeneID_EntrezID). | Mandatory | ENSG00000151532 |
| GeneSymbol | HGNC | The gene under consideration in this row, to which gene centric evidence relates. HGNC Symbol is recommended as the primary gene symbol identifier. Alternative/legacy symbols may be provided via GeneSymbol_[provider] (e.g. GeneSymbol_alias). | Mandatory | VTI1A |
| Gene_[xyz] | Bespoke (Any data type, as long as it is used consistently within the column.) | Additional gene-related columns (e.g. Entrez, aliases). Must be defined in metadata. | Optional | Bespoke |
| Column header | Data format | Description | Requirement | Example data |
|---|---|---|---|---|
| LocusRange | chr:pos:start-end | The range around the primary variant considered in this analysis. | Recommended | chr10:1000-2000 |
| LocusID | Bespoke | An internal or curated ID for the region considered. PEGASUS recommend the associated variant (chr:bp or rsID); internal IDs may be e.g. 'Locus 1, Locus 2'. | Optional | chr10:114754071:T:C |
| Locus_[xyz] | Bespoke (Any data type, as long as it is used consistently within the column.) | Other columns relating to the locus may be added, PEGASUS recommend using the header format Locus_[xyz], and should be defined in the metadata file. | Optional | bespoke |
Evidence — General Pattern
All variant-centric evidence columns are optional. However, PEGASUS suggest to include at least TWO pieces evidence to support variant-gene-phenotype relationship.
PEGASUS define a general reporting pattern:
| Column header | Data Format | Description | Requirement | Example data |
|---|---|---|---|---|
Category_(stream)_[details] | Bespoke (Any data type, as long as it is used consistently within the column.) | Headers follow the format Category_(stream)_[details].Category: Use the abbreviated category name from the evidence categories listed in controlled list.(stream) is optional and is only required when multiple evidence streams are used within a single category (e.g. QTL_eqtl).[details] is a user-defined suffix that reflects the content of the data.For any field consisting of multiple words, please use CamelCase. For example, credible set id should be written as CredibleSetID.If no category in the list are applicable, please use Other_[CustomisedCategory]_(stream)_[details] | Optional | variant-centric evidence examples; gene-centric evidence examples |
PEGASUS only define column name patterns and does not impose strict requirements on the data type. For guidance, PEGASUS provide reference guidelines for the general evidence categories. Each category — variant-centric, gene-centric, comes with suggested naming patterns and example formats.
Integration Evidence — General Pattern
| Column header | Data Format | Description | Requirement | Example data |
|---|---|---|---|---|
INT_[tag]_[details] | Bespoke (Any data type, as long as it is used consistently within the column.) | Headers may follow the format | Optional | Integration evidence example |