Skip to main content
Version: next

πŸ“‹ PEG Metadata Standard

The metadata consists of four primary components:

  • Dataset Description: descriptors for the whole PEG matrix (trait, source of the GWAS data and publication reference)
  • Genomic Identifiers: details about the variants, genes, or locus included in your dataset.
  • Evidence: explains the evidence columns and their associated categories, and links provenance and analysis methods via source_tag and method_tag.
  • Integration: information about what and how different streams of evidence are combined.

In addition, there are two modular components:

  • Source: citation and provenance information for each evidence stream, including publications, databases, and biosample details.
  • Method: a description of the methodology, pipelines, or softwares used to generate the data.

These modular components can be referenced by multiple evidence entries.

Detailed descriptions of each component are provided in the corresponding tabs below:


Standard Content​

FieldDescriptionRequirementData_formatExample
peg_sourceIdentifier of the origin of the PEG list (e.g., publication, DOI, preprint, URL).Recommendedstring (PMID, DOI, URL)PMID:36357675
trait_descriptionFree-text description of the phenotype under investigation. Should be concise but clear to a non-specialist. Avoid abbreviations.MandatorystringAscorbic acid 3-sulfate levels
trait_ontology_idStandard ontology identifier mapped to the trait (e.g., EFO, MONDO, HPO, DOID). Use the most specific term available.OptionalstringEFO_0800173
gwas_sourceIdentifier of the GWAS source. Prefer GWAS Catalog accession (GCST); if not available, use PubMed ID or another recognised accession.Recommendedstring(GCST[0-9]+, PMID, other accession ID)GCST000001
gwas_sample_descriptionOnly required if gwas_source is not a GWAS Catalog accession. Detailed description of the GWAS samples (e.g., cohort name, case/control numbers, ancestry).Optionalstring6,136 Finnish ancestry individuals
gwas_sample_sizeOnly required if gwas_source is not a GWAS Catalog accession. Total number of individuals included in the GWAS analysis.Optionalinteger6136
gwas_case_control_studyOnly required if gwas_source is not a GWAS Catalog accession. Indicator of whether the GWAS design is case–control (TRUE) or quantitative/other (FALSE).OptionalbooleanFALSE
gwas_sample_ancestryOnly required if gwas_source is not a GWAS Catalog accession. Free-text description of participant ancestry, as reported in the original study.OptionalstringFinnish
gwas_sample_ancestry_labelHarmonised ancestry label appropriate for the sample. For label definitions, see Morales et al., 2018 (Table 1). Only required if gwas_source is not a GWAS Catalog accession.Optionalstring (controlled vocabulary)European