NOTE: We are working on migrating this site away from MediaWiki, so editing pages will be disabled for now.

Load GenBank into Chado

From GMOD
Revision as of 18:08, 15 April 2007 by Bosborne (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Abstract

This HOWTO describes how to load GenBank format files into Chado.

Authors


Copyright

This document is copyright Don Gilbert , 2007. For reproduction other than personal use please contact <cain@cshl.edu>


Revision History

Revision 1.0 2007-04-16 BIO First version



Creating GFF3 from GenBank Files

GFF3 can also be generated using a script provided by Bioperl, scripts/Bio-DB-GFF/genbank2gff3.pl (this script is currently preferred over the script of the same name found in the GMOD package). If your working directory contains a Genbank file you could use it like this:

>bp_genbank2gff3.pl --dir . --outdir .

A recent update (April 2007) to bp_genbank2gff3.pl and gmod_bulk_load_gff3.pl should solve the first two problems below. Another addition to bp_genbank2gff3.pl is the option --noCDS that produces GFF gene models suited to loading to Chado.

  >bp_genbank2gff3.pl --noCDS --in mygenome.gbk 
  >gmod_bulk_load_gff3.pl --database mygenome --gff  mygenome.gbk.gff


Possible Errors

This method for generating GFF3 files is not completely satisfactory and development is ongoing to provide better translation. However, by proceeding carefully you should be able to get it to produce GFF3 that can be loaded. Possible errors from running this script, and their fixes, are described below.

couldn't open /var/lib/gmod/conf directory for reading:No such file or directory

Make sure the environmental variable GMOD_ROOT is set to where gmod was installed, for example:

 setenv GMOD_ROOT /usr/local/gmod/ # tcsh

or

 set GMOD_ROOT=/usr/local/gmod/ # bash

Unable to find srcfeature <some feature> in the database

Solution: Edit the '##sequence-region' 2nd line of the GFF3 output. Change it to '# sequence-region' is enough, or remove the line.


Your GFF3 file uses a tag called <term>, but this term is not already in the cvterm and dbxref tables so that its value can be inserted into the featureprop table

Solution: This error message will be followed by SQL statements that insert the term in the correct way - execute them. By the way, one explanation for this error is that the source sequence was curated but not with terms from the Sequence Ontology.


DBD::Pg::db pg_endcopy failed: ERROR: duplicate key violates unique constraint "featureprop_c1"
CONTEXT: COPY featureprop, line ...

Solution: The CONTEXT line above is telling you what the offending data is. This error probably means that there are 2 features sharing the same name or ID and feature type in the GFF file. Correct these errors by hand and reload.