wellcoveted.com wellcoveted.com
   Index >> About Us >> Privacy Policy >> Terms & Conditions >> Add Url >> Add Article
Search:   
Add Url
 

Government & Politics

Health & Hygiene

Employment & Careers

Banking & Finance

Food & Recipe

Automotive

Home & Garden

Children

Society & Issues

Property & Estate

Malls & Shopping

Healthcare & Medicine

Recreation

Travel & Accommodation

Sports & Adventure

Business & Commerce

Self Enhancement

Education & Reference

Art & Culture

Lifestyle & Fashion

Internet & Computers

Online & Board Games

News & Events

Technology & Science

 

Index › Education & Reference › Science Courses
 

Capturing the Data and Making It Useful

 
Author: Aaron Hall
 

Redesigning GDB and GSDB

The explosive growth of information and the challenges of acquiring, representing, and providing access to data pose new and monumental tasks for the large public databases. Ken Fasman [Genome Database (GDB)] and Gifford Keen [Genome Sequence Data Base (GSDB)] discussed the restructuring of GDB and GSDB to handle the flood of data and make it useful for downstream biology.

GDB

Observing that one can't scroll or BLAST through 3 billion base pairs in a meaningful way, Fasman defined GDB's future role as the coordination site for the complete electronic description of the human genome. The map, he asserted, provides an ideal framework for jumping into the sequence (http://www.gdb.org/).

Fasman described the extensive changes made to GDB over the last 2 years that have culminated in the enhanced representation of genomic maps and gene information in GDB V6.0, which was released early this year [HGN 7(3-4), 13-14 and 7(5), 15].

Redesign of the database schema and front-end interfaces now provide true graphical genetic and physical map representation; direct community editing and curation, including third-party annotation; and an improved model for gene information that includes links to databases describing function, structure, products, expression, and associated phenotypes. A user can create a link from any GDB object to any other entity on the Internet. GDB plans to become the focal point for accessing information about the human genome.

Under the Hood

New technologies used in developing V6.0 include an object-oriented data model, object broker, data-driven WWW interface, and graphical interfaces for the most popular computer platforms. The new GDB architecture depends heavily on OPM developed by Victor Markowitz and colleagues at LBNL (see "GDB-LBNL"). GDB 6.0 data representation is captured in a schema file that drives all other pieces of software. This new architecture will enable GDB to adapt more quickly to changes in biological knowledge and representation of maps, genes, and other structures.

At the heart of the system is a Sybase database server that communicates in SQL, the relational query language. Everything from that point forward deals in complex objects, rather than in the rows and tables of a relational database.

Goals

Future enhancements will include improved map editing, an integrated editing environment, improved polymorphism and mutation representation, and integration with the specialized GSDB Sequence Annotator and Mouse Genome Database interfaces. To tie GDB to the evolving sequence databases, an interface is being developed to represent gene structure maps (maps of introns, exons, and regulatory regions associated with genes).

GSDB

Keen identified data acquisition, representation, and access as major issues for sequence databases.

Capturing and Annotating Data

Data acquisition is a two-part challenge, he said. Vast quantities of sequence data will be captured with custom software for bulk-submission processes; future plans include direct database-to-database communication for direct downloading of data from laboratories into GSDB. The more difficult task in data acquisition, he noted, is capturing the follow-on sequence annotation, which is usually published in print journals and subsequently "lost." This data will be crucial for studying gene expression, variation, and function. GSDB Annotator, a graphical browser and editor, is being developed to facilitate community annotation of the database. Researchers are also working to provide access to such common analysis algorithms as BLAST and GRAIL.

Data Representation: Building Whole Chromosomes

In addition to captured sequences and annotations, information needs to be generated about relationships between sequences. The data must be maintained in a form capable of supporting complex, ad hoc queries. GSDB is working toward a model within the near future of 24 sequences for humans, one for each chromosome. As data comes in, it will be aligned to the representative sequence, which initially will have many gaps. Keen drew an analogy of GSDB as a community laboratory information-management system supporting what is essentially a multiyear, multilaboratory, multiorganism shotgun-assembly process. Feature accession numbers will enable separation of annotation from sequences.

Data Access

Although GSDB has the tools and the structure (normalized and atomized data) to answer such robust queries as annotation relationships, problems with data quality and consistency do not allow this to be done well. GSDB is now mounting a major effort to develop software for rationalizing the data stream as it enters the database.

GSDB has also developed an object-oriented access library that sits on top of the database. Almost all GSDB applications and the software that imports data from other databases work through this object layer. GSDB will make the object libraries and an application programming interface available to the public. Programmatic access will be through assigned accounts, and the database can be accessed either through the object libraries or directly on the table, row, and column level.

Availability

The new GSDB schema is complete and should be operational later this year. After fairly extensive alpha and beta testing, GSDB Annotator should be released at the same time on Mac and Sun, with Windows to follow. Software will be available via ftp from NCGR's Web site

 
 
 

Related Articles

 
"Senior Moments: Getting The Most Our of Your Golden Years" author David Wayne Silva: BOOK REVIEW
 
E=mc2 is Wrong - Einstein's Special Relativity Fundamentally Flawed
 
Feel the Fear and Do It Anyway
 
UAV Mini - Torpedo Bombers for Eliminating Hydro Cushioned Water Craft
 
Many Sci Fi writers have discussed Earthquake predictions
 
Aluminum Anodizing Technology and Market Assessment
 
Science and a Young Earth
 
How Great Our Sun Is
 
DOE Microbial Cell Project
 
What Is Bulk Metallic Glass?
 
 
 
 

The Glitters of Gold

Wat is it in gold that makes it glitters? Is it because of its chemical components, its unique chara ... - James Monahan
 

DNA Testing

In the year 1980, investigation agencies ushered in the age of DNA testing that permitted investigat ... - Seth Miller
 

Can We Build a National Defense Net to Stop Swarms of MAVs?

Is it possible to build a defense shield to stop micro-air vehicles in swarms from flying into Blue ... - Lance Winslow
 
 

Many Sci Fi writers have discussed Earthquake predictions

Cal Tech along with the Universities in Japan are on the leading edge of this new technology and it ... - Lance Winslow
 

Making Biodiesel For Fun and Savings

All of us have a little chemist in us that likes to come out and play. Experimenting with different ... - Joseph Then
 

A Better World through Child Sponsorship

The world is divided into poor and rich, but it is not necessarily divided into selfish and generous ... - Joseph Hogue
 

Silver: Timeless Elegance

For those who value simplicity and elegance, and have more practical matters to consider, silver is ... - James Monahan
 

Quality Control Online

Online Quality Control Schools offer courses in Statistical Quality Control (SQC) and Total Quality ... - Michael Bustamante
 
 
Index >> Privacy Policy >> Terms & Conditions  
Copyright © 2008 www.wellcoveted.com All Rights Reserved.