Skip to main content
Version: next

πŸ“‹ PEG Metadata Standard

The metadata consists of four primary components:

  • Dataset Description: descriptors for the whole PEG matrix (trait, source of the GWAS data and publication reference)
  • Genomic Identifiers: details about the variants, genes, or locus included in your dataset.
  • Evidence: explains the evidence columns and their associated categories, and links provenance and analysis methods via source_tag and method_tag.
  • Integration: information about what and how different streams of evidence are combined.

In addition, there are two modular components:

  • Source: citation and provenance information for each evidence stream, including publications, databases, and biosample details.
  • Method: a description of the methodology, pipelines, or softwares used to generate the data.

These modular components can be referenced by multiple evidence entries.

Detailed descriptions of each component are provided in the corresponding tabs below:


Standard Content​

FieldDescriptionRequirementData_formatExample
trait_descriptionFree-text description of the phenotype under investigation. Should be concise but clear to a non-specialist. Avoid abbreviations.MandatorystringAscorbic acid 3-sulfate levels
trait_ontology_idStandard ontology identifier mapped to the trait (e.g., EFO, MONDO, HPO, DOID). Use the most specific term available.OptionalstringEFO_0800173
peg_sourceIdentifier of the origin of the PEG list (e.g., PubMed ID, DOI, preprint, URL). Use "unpublished" if not publicly available. Recommendedstring (PMID, DOI, URL or "unpublished")PMID:36357675
gwas_sourceIdentifier of the GWAS source. Prefer GWAS Catalog accession (GCST); if not available, use PubMed ID, DOI, or another recognised accession. Use "unpublished" if not publicly available.Mandatorystring(GCST[0-9]+, other accession ID, PMID, DOI, URL or "unpublished")GCST000001
gwas_sample_descriptionDetailed description of the GWAS samples (e.g., cohort name, case/control numbers, ancestry).Mandatory if gwas_source is NOT a GWAS Catalog accession.string6,136 Finnish ancestry individuals
gwas_sample_sizeTotal number of individuals included in the GWAS analysis.Mandatory if gwas_source is NOT a GWAS Catalog accession.integer6136
gwas_case_control_studyIndicator of whether the GWAS design is case–control (TRUE) or quantitative/other (FALSE).Mandatory if gwas_source is NOT a GWAS Catalog accession.booleanFALSE
gwas_sample_ancestryFree-text description of participant ancestry, as reported in the original study.Mandatory if gwas_source is NOT a GWAS Catalog accession.stringFinnish
gwas_sample_ancestry_labelHarmonised ancestry label appropriate for the sample. For label definitions, see Morales et al., 2018 (Table 1).Optionalstring (controlled vocabulary)European