Data Annotation Methods

Gene expression patterns in HUDSEN are described using these methods:

using words (i.e. a text annotation)
using space (i.e. a spatial annotation)
using words automatically generated from some of the spatial annotations

Text Annotation

In this process, HUDSEN curators simply use the anatomy ontology as a standardised language framework to denote the sites of gene expression, based on the description of the expression pattern supplied by the data submitter. The process involves the annotator finding and reading any text-based descriptions associated with the specimen as supplied by the originator of the data, finding equivalent terms within the anatomy ontology and then annotating these terms based on the original description. This converts the unstructured free-text description of the data submitter into a standardised and structured description that is available for database storage and query. Associated with each annotated term is a signal strength, pattern and relevant notes.

Spatial Annotation

In this process, HUDSEN curators use the virtual embryos as a 3D spatial framework in which to denote the sites of gene expression as seen in the image in the anatomically equivalent correct places. This converts the abstract and unstructured information captured in each image into a spatially standardised description that is available for database storage and query. The process involves the annotator using a bespoke program called MAPaint and:

Selecting a target HUDSEN embryo model that is of the same Carnegie Stage as the data embryo.
Adding anchor points in the two images at anatomically similar places - this directs a 'warp' of the data image over the target HUDSEN model embryo.
Extracting signal from the data image according to various levels of signal intensity, and then transferring these to the model, using the warp parameters from the previous step as a guide to place these regions in the corresponding places on the target.
The colours in the spatial annotations represent apparent strength of signal.

During this process, HUDSEN curators also assign confidence scores relating to the clarity of expression pattern seen, and the morphological match between each data embryo and the corresponding HUDSEN virtual model used as the spatial template during this process.