π PEG Metadata Standard Introduction
What is the PEG Metadata?β
Metadata is essential to help users understand, interpret, and reuse your data. It provides context for how the data was generated, processed, and interpreted, ensuring that your data is FAIR (Findable, Accessible, Interoperable, Reusable).
The PEG metadata provides a structured description of the PEG matrix. It:
- Defines the meaning of each column.
- Records the provenance of the underlying source data (including biosample information).
- Specifies the methods and pipelines used to generate each evidence type and the approaches used for integration.
Principles of the PEG Metadata Standard
The PEG metadata standard is guided by a set of high-level principles to ensure clarity, transparency, and interoperability:
- Provenance β metadata must include the GWAS from which a PEG list or matrix was derived.
- Consistency β evidence types should use standardised terminology across studies.
- Transparency β criteria for significance (e.g., thresholds for QTLs, fine-mapping, colocalisation) should be explicitly stated if it is applicable.
- Completeness β metadata should specify whether de novo wet-lab evidence is included alongside computational or re-analysed evidence.
- Methodology β details of prioritisation, scoring, or ranking approaches must be documented.
- Interoperability β metadata for individual publications and pipeline-generated data should follow the same structure to allow integration across sources.
Together, these principles ensure that PEG metadata not only documents the data itself, but also supports reproducibility, benchmarking, and community-wide use.
PEG Metadata Schemaβ
The metadata is the description on the data in the PEG Evidence Matrix and Listβ it explains what each column means, where the data comes from, how it was generated, and how it should be interpreted.
Because the same Source and Method details can apply to multiple evidence streams, we have modularised them in the schema. Each source or method is defined once with a unique identifier, which can then be referenced across the Evidence and Integration tabs (in a data entry form).
The following diagram shows how the main entitiesβ metadata are presented together as a group (as a single tab in the Excel file), while Method and Source are kept separate to avoid duplication and improve reusability.
PEG Metadata Standards Overviewβ
π‘ For easy-to-fill information, we also provide an Excel view with 6 tabs (one per entity type), designed to capture and present these fields clearly.
If you have questions about any attribute for each entity, we also provide detailed explanations here. We are more than happy to hear from you β please feel free to contact us if you have further questions.
The PEG metadata is currently organised into six tabs. The diagram below shows a simplified view of their relationships.
Each entity contains fields that capture a different aspect of the dataset:
- Dataset description - descriptors for the whole PEG matrix (trait, source of the matrix itself, publication reference, release date, creator)
- Genomic Identifier β details about the variants, genes, or locus included in your dataset.
- Evidence β supporting data types and experimental or computational evidence that link variants to genes or traits.
- Integration β information about how different streams of evidence are combined (e.g., scoring, weighting, prioritisation).
- Source β citation and provenance information for each evidence stream, including publications, databases, and biosample details.
- Method β a description of the methodology, pipelines, or softwares used to generate the data.