Ontologies at large - practical introduction to the EMBL-EBI ontology tools and services for data annotation

Abstract

EMBL-EBI has been developing ontologies for the annotation of life science data for the past decade and has developed a range of open-source applications to support both the construction and management of ontologies along with tools to support the curation and querying of biomedical data with ontologies. In this workshop, members of the EMBL-EBI Sample, Phenotype and Ontologies Team (SPOT) will present a range of tools to support the construction of application ontologies for dataset-specific annotation demonstrated through the development of the Experimental Factor Ontology (EFO). The tutorial will demonstrate the use of ontology-aided tools in a real-world scenario where data are to be annotated with ontologies. This will include an introduction to the EBI tools and service in a complete set: Ontology Lookup Service (OLS) for term search, Zooma ontology annotating tool, Ontology-Cross-Ontology (OxO) cross-reference finder, and Webulous ontology term builder. We will look at the accompanying APIs and the various third-party client libraries that have been developed to access the tools. The tutorial will walk the participants through a real biocuration workflow at the EBI. We will aim to end with an introduction to the BioSolr project that shows how data enriched with ontologies can be indexed into to the popular Solr and Elasticsearch technology to support enhanced semantic data search.

Expected outcomes and format

We will offer a half-day tutorial that includes a series of short presentations by members of the SPOT team on each component followed by live demos and some hands-on examples of using the tools that can be followed by the participants. The audience are encouraged to bring their personal computer to follow along and try out the tools with their own data if they wish to. Supplementary example data files will also be available to download on the day. After each session there will be an opportunity for the participants to discuss the tooling in the context of their own work. We expect this workshop to provide a platform for people to learn about the ontology tooling at EMBL-EBI and how these tools can be adopted within their own field of interest. The dialogue will also provide a channel for user feedback gathering for additional requirements to improve the ontology tools for a wider user community. There may also be the possibility to foster new collaborations to further develop these tools. The essence of this tutorial is the connectivity through OLS-Zooma-OxO-Webulous workflow. We will focus our presentation on the use of these tools on real-world applications. Depending on the extent of discussion and interaction at the tutorial, we will aim to conclude with BioSolr demonstrating the motivation that drives the advanced ontology-powered search over curated data.

Rationale

Certain activities are common to all ontology-aided data curation process, ranging from ontology term lookup to ontology-tagged annotation to creating a new ontology class where the need arises. This workflow is often seen across high-volume data repositories at the EMBL-EBI and other institutions. With this observation, a tutorial to introduce how the ontology-tools are developed and being used by the community will be useful to both software developers and ontology users at large.

Data annotation workflow

OLS, Zooma, OxO and Webulous in combination represent our suggested ’data-ontology-annotation workflow’. A real live example should demonstrate how to annotate your data and which role these tools can play during this workflow.

  1. Ontology Lookup Service:

    The Ontology Lookup Service (OLS) is a repository for biomedical ontologies that aims to provide a single point of access to the latest ontology versions. We present the UI and the RESTful API as well as how to integrate OLS widgets (e.g. search autocomplete box) in your own project.

  2. Zooma:

    Zooma is a tool for mapping text labels to ontology terms based on a curated repository of annotation knowledge. Zooma takes previously manually curated matches into account and therefore helps people to annotate their data with domain specific terms. The presentation of the UI as well as the API should demonstrate how people can use Zooma to annotate their data.

  3. OxO:

    Since many ontologies overlap, there is a need for a ontology cross references tool. OxO tries to fill this gap, analysing and displaying connection between ontologies over multiple ‘hops’.

  4. Webulous:

    If no suitable term for your data exists in an ontology, you have to create one - Webulous is a tool to support ontology development from spreadsheets through a google sheet add-on.

  5. [Optional]BioSolr:

    To demonstrate the power of well annotated data, we present the BioSolr Ontology Expansion plugin, a search-engine plugin that provides search benefits from data that is well-annotated to ontologies by exposing the structure and additional information that ontology terms can provide (e.g. synonyms) through search.

Proposers

Samples, Phenotypes, and Ontologies Team (SPOT)
European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
URL: http://www.ebi.ac.uk/about/spot-team
E-mail: ontology-tools-support@ebi.ac.uk

The Samples, Phenotypes and Ontologies team, led by Helen Parkinson, is organised into three themes: BioSamples and Semantic Data Integration, Mouse Informatics, and the Gene Ontology Editorial Office. The team is part of the Cross Domain Cluster and the Genes, Genomes and Variation Cluster and provides ontologies, ontology tooling, and resources providing access to samples and ontologies both for EBI resources and external users. SPOT has been providing support in ontology buildings and ontology-aided tools and services for over a decade. SPOT has generated many well-used ontologies such as the Experimental Factor Ontology (EFO), Cellular Microscopy Phenotype Ontology (CMPO), and Ancestry Ontology (ANCESTRO). SPOT is also a key member of the Gene Ontology Consortium, as well as other large-scale collaborations including ENCODE, FANTOM5, BioSamples, GWAS Catalog, and International Mouse Phenotyping Consortium (IMPC). The expertise possessed by the team has gained SPOT the understanding of biology-driven requirements, and the computational technical implementation needed to solve complex questions in the health informatics domain.

Presenters

Schedule

14:00 - 15:30

Introduction
The big picture - from data to knowledge
Hands on workshop - Part A - annotating our data
Data annotation using the EMBL-EBI Ontology Tools
Zooma
OLS
OxO
Webulous

15:30 - 16:00

coffee break

16:00 - 17:30

Hands on workshop - Part B - Exploiting our newly found knowledge
Index our data using Solr
Enabling BioSolr - a tool that enriches Solr or Elasticsearch indexes with ontology knowledge
Re-indexing our data and performing ontology powered searches

Prerequisites

Java 8 - We will be installing and running the Solr server and a sample web application, that both rely on Java 8

Excel - we will be opening and editing files in excel