2) The HUGEMAP database: contents
The SE/IDB system was designed to store and facilitate access to the data on the
human genome produced at Généthon and at the Human Polymorphism Study Center
(CEPH). Généthon's large scale approach to physical mapping, genetic mapping and
cDNA sequencing required an effective database system, and no existing system was
judged satisfactory.
An integrated database of the human genome, HUGEMAP, was created using IDB. It
includes all of Généthon and CEPH's physical mapping data (clone sizes and
fingerprints, Alu-PCR mediated hybridization results, STS screenings, ...), an
integrated map of the human genome, part of Généthon's genetic mapping data and a
cytogenetic description of the human genome (ISCN 850).
The scale of the human genome project has required enlarging this database. We are
currently integrating external physical mapping data and extending the meta-schema
to include genetic data and, in particular, Genbase (the CEPH database containing the
data of the collaborative world-wide research on the genetic map). We are also
investigating the integration of cDNA production and screening results, cytogenetic
translocation data, and sequence data: we are planning to write a translator, that will
generate an IDB meta-schema from a description in ASN-1, allowing us to import
the Genbank data into an IDB database, using NCBI software development toolkit.
3) The HUGEMAP database: client programs
Several clients of HUGEMAP have been written to assist us in building physical
maps and exploiting the physical and genetic maps:
Clone, STS or chromosome-oriented queries: presents all the available
information on the specified Clone/STS/Chromosome (size, fingerprints, STS
screening results, Alu-PCR mediated clone-to-clone hybridization,
chromosomal assignment, FISH results, genetic map, ...).
clone overlap likelihood computations: calculates the most likely overlap of
two clones from their restriction fingerprints.
contig assembly: uses STS content, overlap likelihoods and Alu-PCR
mediated hybridizations to look for the connected parts of the map.
clone ordering: a first program finds the shortest clone paths between two
starting points in the genome (basically two adjacent STSs from the genetic
map), by performing a breadth-first search in the graph of clones (where
clones are linked based on their STS content, overlap likelihood or mutual
hybridizations). A second program uses a genetic algorithm to optimize the
map construction (clone and STS positioning), taking into account all available
information (genetic and physical mapping data).
a map viewer: provides a graphical representation of the integrated physical
and genetic maps. Using the viewer, you can select objects with the mouse and
apply other programs to this selection. We are implementing a front-end with
a work bench, containing database objects and directories of objects, on which
different filters and programs will be able to be run. This will offer a unified
view of the HUGEMAP facilities.
data servers: a mail-server and a WWW server can be used to query the
HUGEMAP database through some of its clients.
We are in the process of facilitating importations, so that users can add their own data
to the existing database.
We think that the HUGEMAP database, together with its clients and a triple interface
(C functions API, interface to an interpreted language, graphical interface), is a useful
tool for the human genome project. It can also be a basis for collaborative research
and a starting point towards a new family of molecular biology databases.