We aim to demonstrate the power of parallel calculation technology by efficiently solving computational biology problems. For this purpose, we have developed the PAPIA (PArallel Protein Information Analysis) system.
PAPIA system consists of these three major elements:
RWCP Tsukuba Research Center has been building workstation clusters and PC clusters since 1995 as environments for the research and development of parallel operating systems and parallel programming languages. The series of clusters were designed and implemented by Tezuka, Hori, and Ishikawa in Parallel and Distributed System Software TRC Lab., RWCP.
The RWC PC Cluster II developed in October 1997 is a highly efficient parallel computer with powerful network connections. The PAPIA system was first implemented on the RWC PC Cluster II and was demonstrated at the SC97 conference exhibition. Then, we built a new application-dedicated cluster, RWC PC Cluster IIa (the 'a' stands for application), "PAPIA cluster", in February 1998.
PAPIA Cluster (RWC PC Cluster IIa) is a 64 node PC cluster based on industrial PCs that are connected by Myricom Myrinet. Each node consists of a 200MHz Intel Pentium Pro microprocessor, 256MB memory, 4.1GB hard disk, and a Myrinet gigabit network interface.
The NetBSD operating system is running on each node, and the SCore-D global operating system is running on top of NetBSD. All of the user's parallel processes are created and gang-scheduled by SCore-D. The MPC++ parallel programming language and the MPI (MPICH-PM) message passing library are available for efficient parallel programming.
For more details, see " Clustering Technologies " by Parallel and Distributed System Software TRC Lab., RWCP.PAPIA Cluster Specifications
|Number of Nodes||64 cell nodes + 2 internal monitor nodes|
|Processor||Intel Pentium Pro microprocessor (200MHz, 8KB L1 cache, 512KB L2 cache)|
|Memory||256MB EDO DRAM (with ECC) / node|
|Hard Disk||4.1GB EIDE Disk / node|
|Network Hardware||Myricom Myrinet (2.56Gbit/sec), 100Base-T Ethernet|
|Network Driver||PM (developed by RWCP) [Click here for detail]|
|Local Operating System||NetBSD 1.2|
|Global Operation System||Score-D (developed by RWCP)|
|Programming Language||MPC++ (Parallel C++ developed by RWCP), C, C++|
|Size||W 80cm x D 80cm x H 160cm x 2 Chassis|
In order to rapidly and efficiently develop application software for protein analysis, it is important to develop a library of commonly used program modules. However, a common library for protein research did not exist. One reason might be the highly complex semantics and notoriously ill-defined format of the Protein Data Bank (PDB), the principal database of protein tertiary structures.
Onizuka et.al have developed a C++ class library, "PAPIA library", for protein information analysis. In the PAPIA library, protein structures are described hierarchically with clear class definitions. Commonly required calculations, like PDB parsing, geometric rotation, structure matching, sequence alignment, multivariate analysis, and so on are defined as methods of the corresponding classes. We use the PAPIA library for our original academic research in parallel protein information analysis.
Using the PAPIA library, we have also been building an assorted collection of efficient parallel programs, "PAPIA applications", which can be used for typical protein analysis. We have ported this collection of practical applications to the PAPIA cluster.
Significant functions in current PAPIA system are the following three parallel calculations.
We have also implemented a job submission mechanism using a WWW browser as the user interface for entering query sequences and several parameters. From the WWW, any user can easily submit jobs from remote sites on the Internet. The mechanism consists of the following modules: a) HTML-based input forms as the user interface, b) CGI scripts for submitting and monitoring jobs for the PAPIA system, c) FIFO queues for each service, and d) JAVA and HTML-based graphic output forms as the user interface.