Difference between revisions of "InterMine Presentation"

Revision as of 20:05, 8 October 2012

This Wiki page is an edited version of Gos's presentation

Data model --> Java classes, relational schema, mappings through automatic code generation
Custom Java object/relational system
- When we started, couldn’t select from multiple classes at one time using hibernate.
Optimised for read-only performance
Designed for big, complex queries, bulk data
Performance optimisation
- Transparent query re-writing
Web application - Struts/JSP/Ajax

Loading Data

Read-only in production environment (therefore Problems 3 and 5 skipped)
Load data from InterMine XML
Parsers from standard formats
- e.g. UniProt, GFF3, PSI, FASTA
Powerful integration system: coarse/fine grained data source priorities give load-order independence

Test problems

Used SOFA as core data model - similar to Chado.
Added Gene.description (absent from model), compiled, loaded data (here XML + FASTA), released webapp.

Example InterMine XML for Problem 1: load genes + annotation

  <item id="0_3" class=”” implements="http://www.flymine.org/model/genomic#Gene">
     <attribute name="identifier" value="xfile" />
     <attribute name="description" value="A test gene for GMOD meeting" />
     <reference name="organism" ref_id="0_1" />
     <collection name="transcripts">
        <reference ref_id="0_9" />
     </collection>
  </item>
  <item id="0_1" class="" implements="http://www.flymine.org/model/genomic#Organism">
     <attribute name="taxonId" value="7227" />
  </item>
  ...

</xml>

Resulting webapp object page

Code for Problem 2: Print gene annotation report

<java> public class BakeOff {

   public static void main(String[] args) throws Exception {
       // code to get the "xfile" gene
       ObjectStore os = ObjectStoreFactory.getObjectStore("os.production");
       Query q = new Query();
       QueryClass qcObj = new QueryClass(Gene.class);
       q.addFrom(qcObj);
       QueryField qf = new QueryField(qcObj, "identifier");
       q.addToSelect(qf);
       SimpleConstraint sc = new SimpleConstraint(qf, ConstraintOp.EQUALS, new QueryValue("xfile"));
       q.setConstraint(sc);
       System.err.println("query: " + q);
       Results res = os.execute(q);

       // a Results object is a List of Lists
       List rr = (List) res.get(0);
       Gene gene = (Gene) rr.get(0);

       System.err.println ("symbol: " + gene.getIdentifier());

       // a BioEntity in FlyMine has a collection of Synonym objects -
       // we need Synonym.value for each Synonym
       System.err.print ("synonyms: ");
       Iterator synIter = gene.getSynonyms().iterator();
       while (synIter.hasNext()) {
           Synonym syn = (Synonym) synIter.next();
           System.err.print (syn.getValue() + ' ');
       }

       System.err.println ("description: " + gene.getDescription());

       // get the class name, but we already know that the gene is a Gene
       System.err.println ("type: " + gene.getClass().getName());

       // make a List from a the Set of exons for this Gene
       List exons = new ArrayList(gene.getExons());
       Exon exon1 = (Exon) exons.get(0);
       Exon exon2 = (Exon) exons.get(1);

       // get the start and end via the Location object
       System.err.println ("exon1 start: " + exon1.getChromosomeLocation().getStart());
       System.err.println ("exon1 end: " + exon1.getChromosomeLocation().getEnd());
       System.err.println ("exon2 start: " + exon2.getChromosomeLocation().getStart());
       System.err.println ("exon2 end: " + exon2.getChromosomeLocation().getEnd());

       // write out the first cds
       List cdss = new ArrayList(gene.getCDSs());
       FlyMineSequence flymineSequence = FlyMineSequenceFactory.make((CDS) cdss.get(0));

       // use BioJava to output the sequence
       Annotation annotation = flymineSequence.getAnnotation();
       annotation.setProperty(FastaFormat.PROPERTY_DESCRIPTIONLINE,
                              gene.getIdentifier() + " cds");
       SeqIOTools.writeFasta(System.err, flymineSequence);
   }

}

</java>

Quicksearch - Problem 4: find genes starting with x

Java API

<java>

 Query q = new Query();
 QueryClass qcObj = new QueryClass(Gene.class);
 q.addFrom(qcObj);
 q.addToSelect(qcObj);

 QueryField qf = new QueryField(qcObj, "identifier");

 SimpleConstraint sc = new SimpleConstraint(qf, ConstraintOp.MATCHES, new QueryValue("x-%"));
 q.setConstraint(sc);

</java>

IQL

<sql>

 SELECT DISTINCT a1_.identifier AS a2_ FROM org.flymine.model.genomic.Gene AS a1_ WHERE a1_.identifier LIKE 'x-%'

</sql>

Perl API

  my $genes = InterMine::Gene::Manager->get_genes(query => [
                             identifier => { like => 'x-%' },],);
</perl>
 
====Larger Query====
 
Within FlyMine:
For one or more genes report:
* Gene, Transcripts, Exons, Chromosomal Locations, Lengths
 
* Query joins 7 classes
** all are on select list of query
** many more tables than classes are joined
 
* Performance:
** One gene:
*** 2 rows in ~2 seconds
** All genes, all organisms
***~300,000 rows in 36 seconds (without using pre-computation to enhance performance)
***~300,000 rows in ~1 second (using pre-computation)
 
====Implications of Query Optimisation====
 
* Performance optimisation not tied to schema design
* Can adapt performance optimisation to usage of live database
* Template queries pre-computed
** ~40 template queries run per gene details page - renders in seconds
 
====Acknowlegements====
 
* Richard Smith
* Kim Rutherford
* Matthew Wakeling
* Xavier Watkins
* Julie Sullivan
* Rachel Lyne
* Hilde Janssens
* François Guillier
* Philip North
* Tom Riley 
* Peter Mclaren
* Mark Woodbridge
* Debashis Rana
* Wenyan Ji 
* Markus Brosch
* Florian Reising
* Andrew Varley
* Gos Micklem
 
InterMine/FlyMine are funded by the Wellcome Trust (grant no. 067205),
awarded to M. Ashburner, G. Micklem, S. Russell, K. Lilley
and K. Mizuguchi.
 
[[Category:InterMine]]

@@ Line 143: / Line 143: @@
 =====Perl API=====
-<perl>
+<syntaxhighlight lang="perl">
    my $genes = InterMine::Gene::Manager->get_genes(query => [
                               identifier => { like => 'x-%' },],);

Difference between revisions of "InterMine Presentation"

Revision as of 20:05, 8 October 2012

Contents

Background

Technical Overview

Loading Data

Test problems

Example InterMine XML for Problem 1: load genes + annotation

Resulting webapp object page

Code for Problem 2: Print gene annotation report

Quicksearch - Problem 4: find genes starting with x

Java API

IQL

Perl API

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Documentation

Community

Tools