2) The HUGEMAP database: contents
The SE/IDB system was designed to store and facilitate access to the data on the human genome produced at Généthon and at the Human Polymorphism Study Center (CEPH). Généthon's large scale approach to physical mapping, genetic mapping and cDNA sequencing required an effective database system, and no existing system was judged satisfactory. An integrated database of the human genome, HUGEMAP, was created using IDB. It includes all of Généthon and CEPH's physical mapping data (clone sizes and fingerprints, Alu-PCR mediated hybridization results, STS screenings, ...), an integrated map of the human genome, part of Généthon's genetic mapping data and a cytogenetic description of the human genome (ISCN 850). The scale of the human genome project has required enlarging this database. We are currently integrating external physical mapping data and extending the meta-schema to include genetic data and, in particular, Genbase (the CEPH database containing the data of the collaborative world-wide research on the genetic map). We are also investigating the integration of cDNA production and screening results, cytogenetic translocation data, and sequence data: we are planning to write a translator, that will generate an IDB meta-schema from a description in ASN-1, allowing us to import the Genbank data into an IDB database, using NCBI software development toolkit.
3) The HUGEMAP database: client programs
Several clients of HUGEMAP have been written to assist us in building physical maps and exploiting the physical and genetic maps:
Clone, STS or chromosome-oriented queries: presents all the available information on the specified Clone/STS/Chromosome (size, fingerprints, STS screening results, Alu-PCR mediated clone-to-clone hybridization, chromosomal assignment, FISH results, genetic map, ...). clone overlap likelihood computations: calculates the most likely overlap of two clones from their restriction fingerprints. contig assembly: uses STS content, overlap likelihoods and Alu-PCR mediated hybridizations to look for the connected parts of the map. clone ordering: a first program finds the shortest clone paths between two starting points in the genome (basically two adjacent STSs from the genetic map), by performing a breadth-first search in the graph of clones (where clones are linked based on their STS content, overlap likelihood or mutual hybridizations). A second program uses a genetic algorithm to optimize the map construction (clone and STS positioning), taking into account all available information (genetic and physical mapping data). a map viewer: provides a graphical representation of the integrated physical and genetic maps. Using the viewer, you can select objects with the mouse and apply other programs to this selection. We are implementing a front-end with a work bench, containing database objects and directories of objects, on which different filters and programs will be able to be run. This will offer a unified view of the HUGEMAP facilities. data servers: a mail-server and a WWW server can be used to query the HUGEMAP database through some of its clients.
We are in the process of facilitating importations, so that users can add their own data to the existing database. We think that the HUGEMAP database, together with its clients and a triple interface (C functions API, interface to an interpreted language, graphical interface), is a useful tool for the human genome project. It can also be a basis for collaborative research and a starting point towards a new family of molecular biology databases.