Data Annotation Methods

 

Gene expression patterns in HUDSEN are described using these methods:

 

Text Annotation

In this process, HUDSEN curators simply use the anatomy ontology as a standardised language framework to denote the sites of gene expression, based on the description of the expression pattern supplied by the data submitter. The process involves the annotator finding and reading any text-based descriptions associated with the specimen as supplied by the originator of the data, finding equivalent terms within the anatomy ontology and then annotating these terms based on the original description. This converts the unstructured free-text description of the data submitter into a standardised and structured description that is available for database storage and query. Associated with each annotated term is a signal strength, pattern and relevant notes.

 

Spatial Annotation

In this process, HUDSEN curators use the virtual embryos as a 3D spatial framework in which to denote the sites of gene expression as seen in the image in the anatomically equivalent correct places. This converts the abstract and unstructured information captured in each image into a spatially standardised description that is available for database storage and query. The process involves the annotator using a bespoke program called MAPaint and:


During this process, HUDSEN curators also assign confidence scores relating to the clarity of expression pattern seen, and the morphological match between each data embryo and the corresponding HUDSEN virtual model used as the spatial template during this process.