ESTminer CHADO adapter development progress
From: A. Gingle
Center for Applied Genetic Technologies
Date: 2/6/2006
We are progressing on our effort to develop a GMOD/CHADO compatible version of our ESTminer tool for mining EST contig and cluster data. For test and development, we have generated an Oracle compatible version of the relevant CHADO modules (cv, general, organism and sequence) and have developed a controlled vocabulary (CHADO_CVTERM_DATA.html). It along with sorghum EST data has been loaded into our Oracle version of the CHADO schema (ESTminer_CHADO_schema.png). Comparative benchmarking of typical queries to retrieve EST contig and cluster data has been performed as part of developing a design strategy to achieve both CHADO compatibility and optimal interface performance. The benchmarking has shown that queries against our more compact and strongly typed CSGR schema (ESTminer_schema.png) execute between 3 and 5 times faster than against CHADO. In light of this, we plan to develop a CHADO adapter, based on scripts that query the CHADO schema and generate more compact and strongly typed tables or materialized views, which will be strictly for high performance data presentation purposes. Future steps will include development of an EST data loader for the native postgres schema and an ESTminer-CHADO adapter. Descriptive information on our approach was presented at a previous GMOD meeting (Gingle and Huang, 2004) and is available for download (ESTminerChadoadaptor.ppt).