Version: 0.0.1

📋 PEGASUS Evidence Matrix Standard

Genomic Identifiers

Variant information
Gene information
Locus information

Column header	Data format	Description	Requirement	Example data
Primary Variant ID	chr:bp:ref:alt	The variant to which variant-centric evidence relates. Used as the primary row ID; may be a lead variant, a variant in LD, or a fine-mapped SNP (defined in metadata).	mandatory	chr10:114754071:T:C
rsID	rs[]	The rsID of the primary variant.	optional	rs1234
VAR_[xyz]	bespoke	Additional variant ID columns. Custom names must follow VAR_[xyz] and be defined in the metadata file.	optional	bespoke

Column header	Data format	Description	Requirement	Example data
Gene ID	ENSG[]	The gene under consideration in this row (gene-centric evidence). Primary identifier must be the Ensembl Gene ID. Other IDs can be added using GENE_[xyz] (e.g. GENE_EntrezID).	mandatory (or)	ENSG00000151532
Gene symbol	HGNC	The gene under consideration in this row. Primary symbol must be the HGNC-approved gene symbol. Alternative/legacy symbols may be provided via GENE_[xyz] (e.g. GENE_alias).	mandatory (or)	VTI1A
GENE_[xyz]	bespoke	Additional gene-related columns (e.g. Entrez, aliases). Must be defined in metadata.	optional	bespoke

Column header	Data format	Description	Requirement	Example data
Locus range	chr:pos:start-end	The genomic range around the primary variant considered in this analysis.	recommended	chr10:1000-2000
Locus ID	any	Internal or curated region ID. Recommended to use the associated variant (chr:bp or rsID); internal IDs may also be “Locus 1, Locus 2”.	optional	chr10:114754071:T:C
LOCUS_[xyz]	bespoke	Additional locus-related columns. Must follow LOCUS_[xyz] and be defined in metadata.	optional	bespoke

Evidence — General Pattern

All variant-centric evidence columns are optional. However, we suggest to include at least one variant-centric evidence to support variant-gene relationship.

We define a general reporting pattern:

Column header	Data Format	Description	Requirement	Example data
`Category_[xyz]`	Besopke	Most headers follow the format `Category_(stream)_[xyz]`. Category is mandatory; stream is used only if it differs from the category; `[xyz]` can be any user-defined label. e.g. `GWAS_pvalue`, `EXP_AdiposeTissue_TPM`, `QTL_eQTL_pancreas`,`TPWAS_TWAS_pvalue`. The category must be from the controlled list and defined in the metadata file.	optional	variant-centric evidence examples; gene-centric evidence examples

These are not strict requirements. Different categories may call for different types of data, and users can adapt them as needed. For guidance, we provide reference guidelines for the general evidence categories. Each category — variant-centric, gene-centric, comes with suggested naming patterns and example formats.

Integration Evidence — General Pattern

Column header	Data Format	Description	Requirement	Example data
`INT_[details]`	Bespoke	Headers may follow the format `INT_[details]` (or `INT` alone). INT indicates integration evidence; `[details]` is a user-defined suffix when multiple integrations are reported. For multi-word field names, use CamelCase (e.g., CredibleSetId). Provenance and integration specifics can differ by row; capture them in the metadata file and, if they vary within the dataset, also in the data file.	optional	Integration evidence example

Genomic Identifiers​

Evidence — General Pattern​

Integration Evidence — General Pattern​

Genomic Identifiers

Evidence — General Pattern

Integration Evidence — General Pattern