PEG List Preparation
This page provides a practical guide to prepare a PEG List for submission or sharing. For the exact column standards, see the PEG List Standard and the toy example.
Start from your final PEG Evidence Matrix and the metadata integration tab. The PEG List is derived from the matrix using the integration column marked author_conclusion = True.
Before you startβ
Make sure you have:
- A finalised PEG Evidence Matrix (all evidence and integration columns complete)
- Only ONE integration column flagged as the author conclusion in metadata
- The evidence category abbreviations used in the matrix (from the controlled vocabulary)
- The top gene selection rule for each locus
Recommended file formatβ
- Use a machine-readable table - TSV file
- Do not use merged cells or styled spreadsheets
- Headers only contains the evidence category abbreviations (If it other, please use
Other_[CustomisedCategory])
List structureβ
A PEG List is a compact table where each row represents the top gene at a locus. It should include:
- A variant identifier (e.g., lead/index variant for the locus)
- The gene symbol for the author-prioritised gene
- One column per evidence category (Categories considered in the conclusion, boolean, indicating whether that category was available)
- The author conclusion integration value (copied from the matrix)
Step-by-step preparationβ
1) Start from the Evidence Matrixβ
- Identify the integration column with
author_conclusion = Truein metadata - Use that integration column to determine the top gene per locus
- If a locus has multiple top genes, include one row per gene and keep the variant identifier
2) Build identifier columnsβ
- Use a consistent variant identifier (
variant_idor rsID) - Use HGNC gene symbols for the
Gene symbolcolumn - Ensure the genome build and variant format match the matrix
3) Add evidence category columnsβ
-
Each evidence category abbreviation becomes a boolean column
-
Use a consistent boolean encoding (e.g.,
TRUE/FALSE) -
Values indicate whether evidence from that category was available for the author conclusion
-
Ticks do not imply supportive vs negative; they only mean βconsideredβ
4) Add the author conclusion columnβ
- Copy the author conclusion integration values from the matrix
- Use the same header format as the matrix, e.g.,
INT_CombinedPrediction - Ensure the values are consistent with the matrix (same scale and encoding)
Final checksβ
- Variant identifiers and gene symbols are consistent with the matrix
- Evidence category columns use approved abbreviations
- The author conclusion column matches the matrix and metadata
If you are unsure about any field or naming pattern, refer back to the PEG List Standard or the PEG Evidence Matrix Preparation guide.