Description
PDB-REPRDB : Representative protein chains from PDB


Method :

PDB-REPRDB is a reorganized database of protein chains from PDB. The protein chains are arranged in order of the quality of atomic coordinate data. Earliest chain is taken for a representative and compared with every other chain. Similar chains to the representative on
  1. amino acid sequence or
  2. structure similarity
are classified into the same group with the representative. The earliest chain in the rest ( not classified yet ) becomes the next representative.
Thus PDB-REPRDB supplies 'the list of the representative protein chains', unique to each other on sequence and structure, and 'the list of protein chain groups'.

For more information:
Document of the PDB-REPRDB
About DSA system

References :

T. Noguchi, K. Onizuka, Y. Akiyama, and M. Saito:
"PDB-REPRDB: A Database of Representative Protein Chains in PDB (Protein Data Bank)".
Proc. of the Fifth International Conference on Intelligent Systems for Molecular Biology,
AAAI press (1997).

Tamotsu Noguchi, Kentaro Onizuka, Makoto Ando, Hideo Matsuda and Yutaka Akiyama:
"Quick Selection of Representative Protein Chain Sets Based on Customizable Requirements",
Bioinformatics, Vol.16, No. 6, 520-526 (2000).

Tamotsu Noguchi, Hideo Matsuda and Yutaka Akiyama:
"PDB-REPRDB: a database of representative protein chains from the Protein Data Bank (PDB)",
Nucleic Acids Research, Vol.29, No.1, 219-220 (2001).

Noguchi,T. and Akiyama,Y.:
"PDB-REPRDB: a database of representative protein chains from the Protein Data Bank (PDB) in 2003",
Nucleic Acids Research, Vol.31, No.1, 492-493 (2003).


How to use :

Page 1 : Eliminate and Sort Chains

  1. Select 'Apply constraints'.
    Factors in the first column of this table determine the quality of atomic coordinates of protein chains in PDB.
    Elimination and sort options make various data sets to be classified into groups afterward.
    If you choose 'No' option for the apply constraints, as concerns the factor of this line, all the chains will be used. 'Yes' option causes elimination of chains by the following threshold.
    Factors : "MUTANT", "COMPLEX", "FRAGMENT", "NMR" and "Membrane Proteins" are as follows
    • include MUTANT : If you choose 'Yes', chains of mutant will be used.
    • COMPLEX : You can choose 'only COMPLEX', 'exclude COMPLEX', 'All'(both complex and not) .
    • FRAGMENT : You can choose 'only FRAGMENT', 'exclude FRAGMENT', 'All'(both fragment and not) .
    • include NMR : If you choose 'Yes', chains by NMR will be used.
    • include Membrane Proteins : If you choose 'Yes', chains including membrane domain in the SCOP database will be used.
  2. Set 'threshold'.
    • resolution : eliminate chains with greater value than the threshold
    • r-factor : eliminate chains with greater value than the threshold
    • number of chain break : eliminate chains with greater value than the threshold
    • rate of non-standard amino acid residues : eliminate chains with greater value than the threshold
    • rate of residues with only CA coordinates : eliminate chains with greater value than the threshold
    • rate of residues with only backbone coordinates : eliminate chains with greater value than the threshold
    • number of residues : eliminate chains with smaller value than the threshold
  3. Set 'priority'.
    Independently of elimination, chains are sorted by keys of factors.
    At first, factor given '1' as priority is compared. Later factors are compared only after all earlier factor compare equal. Earlier chains have priority to be selected as representatives.
  4. Push 'Reset this form' only if you want to reset the input form.
  5. Then, push'Make List' button to extract and sort chains.

Page 2 : Select representative chains

  1. See 'Service status' and check the service is ON.
  2. Set the 'Parameters for classification'.
  3. Push 'Reset this form' only if you want to reset the input form.
  4. Push 'Service status' button to confirm the service status for your query.
  5. Then, push 'Submit' button to submit your query to the server.

Page 3: Waiting for calculation

    While waiting for the results, processing status and server information will be displayed.
    Time-out error of the browser might happen. In that case, try to click the 'Reload' button of your browser.

Page 4: Result

    After the calculation ends succesfully, the results will be displayed.
    Data items are as follows.

  1. Hyperlink
    • PDB ID (and chain) : The group list which the chain belongs to
    • * (asterisk) : Three-dimensional structure. Viewing 3D graphics requires proprietary client software such as Rasmol.
    • scop : SCOP's data
    • EC number : Information of ligand entry.
  2. To see more data
    Click the 'Full listing' button, then you can see the EC number and Compound ( based on PDB's data ) .
  3. Download of the results
    Click the 'Result Download' button, then you can save the results on your machine.
    Download file is compressed as "*.tar.gz", and includes the chain list and chain group list.