                           MATGEN3D documentation



CONTENTS

   1.0 SUMMARY
   2.0 INPUTS & OUTPUTS
   3.0 INPUT FILE FORMAT
   4.0 OUTPUT FILE FORMAT
   5.0 DATA FILES
   6.0 USAGE
   7.0 KNOWN BUGS & WARNINGS
   8.0 NOTES
   9.0 DESCRIPTION
   10.0 ALGORITHM
   11.0 RELATED APPLICATIONS
   12.0 DIAGNOSTIC ERROR MESSAGES
   13.0 AUTHORS
   14.0 REFERENCES

1.0 SUMMARY

   Generate a 3D-1D scoring matrix from CCF files

   Generates a 3D-1D scoring matrix from CCF files (clean coordinate
   files).

2.0 INPUTS & OUTPUTS

   MATGEN3D reads a DCF file (domain classification file) and a directory
   of domain CCF files (clean coordinate files) which have been processed
   by using PDBPLUS so that they contain solvent accessibility and
   secondary structure information. The directory must contain a CCF file
   for the first domain from each family represented in the DCF file. A
   matrix of 3D:1D scores (environment:residue scoring matrix), reflecting
   the probability of the amino acids occuring in different tertiary
   environments, is calculated from the CCF files of the first domain from
   each family only. The path of the CCF files is specified by the user
   and the file extensions is specified in the ACD file. Two log files of
   informative messages are also written.

3.0 INPUT FILE FORMAT

   The format of domain CCF files is described in the DOMAINER
   documentation.
   The format of the DCF file (domain classification file) is described in
   the SCOPPARSE documentation and the CATHPARSE documentation.

  Input files for usage example

  File: ../scopparse-signature/all.scop

ID   D1CS4A_
XX
EN   1CS4
XX
TY   SCOP
XX
SI   53931 CL; 54861 FO; 55073 SF; 55074 FA; 55077 DO; 55078 SO; 39418 DD;
XX
CL   Alpha and beta proteins (a+b)
XX
FO   Ferredoxin-like
XX
SF   Adenylyl and guanylyl cyclase catalytic domain
XX
FA   Adenylyl and guanylyl cyclase catalytic domain
XX
DO   Adenylyl cyclase VC1, domain C1a
XX
OS   Dog (Canis familiaris)
XX
NC   1
XX
CN   [1]
XX
CH   A CHAIN; . START; . END;
//
ID   D1II7A_
XX
EN   1II7
XX
TY   SCOP
XX
SI   53931 CL; 56299 FO; 56300 SF; 64427 FA; 64428 DO; 64429 SO; 62415 DD;
XX
CL   Alpha and beta proteins (a+b)
XX
FO   Metallo-dependent phosphatases
XX
SF   Metallo-dependent phosphatases
XX
FA   DNA double-strand break repair nuclease
XX
DO   Mre11
XX
OS   Archaeon Pyrococcus furiosus
XX
NC   1
XX
CN   [1]
XX
CH   A CHAIN; . START; . END;
//

4.0 OUTPUT FILE FORMAT

   The matrix of 3D:1D scores (environment:residue scoring matrix, Figure
   1) follows the standard EMBOSS format. Single letter amino acid codes
   are column labels and environments are row labels. The environment
   labels are strings of 2 characters, beginning with AA, AB, AC, through
   to AZ, BA, BB etc. The final row and column have the label '*' and give
   default substitution values (the minimum from the entire matrix). In
   the example shown (Figure 1), only two environments AA through AX are
   defined but only AA to AB are given scores owing to the sparse input
   data for this example (typically, all environments would receive
   scores).

  Output files for usage example

  File: matgen3d.calc


SUMiNijArr
A       0
B       0
C       0
D       10
E       0
F       4
G       5
H       0
I       5
J       6
K       0
L       6
M       12
N       0
O       6
P       14
Q       0
R       8
S       4
T       0
U       9

NiArr
A       10
B       0
C       3
D       6
E       11
F       7
G       4
H       3
I       7
J       0
K       4
L       11
M       1
N       4
O       0
P       1
Q       5
R       3
S       2
T       3
U       0
V       3
W       0
X       0
Y       1


  [Part of this file has been deleted for brevity]


B       0.000

C       0.034

D       0.067

E       0.124

F       0.079

G       0.045

H       0.034

I       0.079

J       0.000

K       0.045

L       0.124

M       0.011

N       0.045

O       0.000

P       0.011

Q       0.056

R       0.034

S       0.022

T       0.034

U       0.000

V       0.034

W       0.000

X       0.000

Y       0.011

Z       0.000


  File: matgen3d.log

D1CS4A_

R:D-2 S:C A:63.80
        OEnv AO:D
R:I-3 S:T A:28.30
        OEnv AF:I
R:E-4 S:T A:113.00
        OEnv AU:E
R:G-5 S:T A:104.20
        OEnv AU:G
R:F-6 S:H A:23.70
        OEnv AD:F
R:T-7 S:H A:111.60
        OEnv AS:T
R:S-8 S:H A:84.60
        OEnv AP:S
R:L-9 S:H A:57.60
        OEnv AJ:L
R:A-10 S:H A:24.00
        OEnv AD:A
R:S-11 S:T A:78.20
        OEnv AR:S
R:Q-12 S:T A:78.00
        OEnv AR:Q
R:C-13 S:T A:28.80
        OEnv AF:C
R:T-14 S:C A:72.00
        OEnv AO:T
R:A-15 S:H A:103.00
        OEnv AS:A
R:Q-16 S:H A:88.10
        OEnv AP:Q
R:E-17 S:H A:66.30
        OEnv AM:E
R:L-18 S:H A:17.70
        OEnv AD:L
R:V-19 S:H A:79.50
        OEnv AP:V
R:M-20 S:H A:78.40
        OEnv AP:M
R:T-21 S:H A:74.90
        OEnv AM:T
R:L-22 S:H A:16.50
        OEnv AD:L
R:N-23 S:H A:94.00
        OEnv AS:N
R:E-24 S:H A:74.00
        OEnv AM:E
R:L-25 S:H A:40.60
        OEnv AG:L


  [Part of this file has been deleted for brevity]

        OEnv AD:F
R:A-26 S:H A:59.60
        OEnv AJ:A
R:E-27 S:H A:62.30
        OEnv AM:E
R:A-28 S:H A:76.30
        OEnv AP:A
R:F-29 S:H A:20.30
        OEnv AD:F
R:K-30 S:H A:64.60
        OEnv AM:K
R:N-31 S:H A:68.30
        OEnv AM:N
R:A-32 S:H A:71.80
        OEnv AM:A
R:L-33 S:H A:42.20
        OEnv AG:L
R:E-34 S:H A:56.50
        OEnv AJ:E
R:I-35 S:H A:70.20
        OEnv AM:I
R:A-36 S:H A:27.70
        OEnv AD:A
R:V-37 S:H A:89.80
        OEnv AP:V
R:Q-38 S:H A:80.20
        OEnv AP:Q
R:E-39 S:H A:73.50
        OEnv AM:E
R:N-40 S:C A:109.90
        OEnv AU:N
R:V-41 S:C A:44.50
        OEnv AI:V
R:D-42 S:C A:105.60
        OEnv AU:D
R:F-43 S:C A:83.50
        OEnv AR:F
R:I-44 S:C A:43.50
        OEnv AI:I
R:L-45 S:C A:87.00
        OEnv AR:L
R:I-46 S:C A:30.40
        OEnv AI:I
R:A-47 S:C A:108.40
        OEnv AU:A
R:G-48 S:C A:102.60
        OEnv AU:G
R:D-49 S:C A:72.50
        OEnv AO:D
R:L-50 S:C A:55.40
        OEnv AL:L

  File: matgen3d.out

# 3D-1D Scoring matrix created by matgen3d
# ajResidueEnv1
# Total SCOP entries: 2
# No. of files opened: 2
# No. of files not opened: 0
         A     R     N     D     C     Q     E     G     H     I     L     K
 M     F     P     S     T     W     Y     V     B     Z     X     *
AA    0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.
00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 -2.64
AB    0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.
00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 -2.64
AC    0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.
00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 -2.64
AD    0.98  1.09  0.80  0.39  1.09  0.58 -0.21  0.80  1.09  0.24  0.48  0.80  2.
19  1.63  2.19  1.49  1.09 -2.30  2.19  1.09 -2.30 -2.30 -2.30 -2.64
AE    0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.
00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 -2.64
AF    0.80  2.00  1.72  1.31  2.00  1.49  0.70  1.72  2.00  1.85  0.70  1.72  3.
10  1.16  3.10  2.41  2.00 -1.39  3.10  2.00 -1.39 -1.39 -1.39 -2.64
AG    0.58  1.78  1.49  1.09  1.78  1.27  0.48  1.49  1.78  0.93  1.17  1.49  2.
88  0.93  2.88  2.19  1.78 -1.61  2.88  1.78 -1.61 -1.61 -1.61 -2.64
AH    0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.
00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 -2.64
AI    0.58  1.78  1.49  1.09  1.78  1.27  0.48  1.49  1.78  2.03  0.48  1.49  2.
88  0.93  2.88  2.19  1.78 -1.61  2.88  1.78 -1.61 -1.61 -1.61 -2.64
AJ    1.09  1.60  1.31  0.91  1.60  1.09  0.99  1.31  1.60  0.75  0.30  1.31  2.
70  0.75  2.70  2.00  1.60 -1.79  2.70  1.60 -1.79 -1.79 -1.79 -2.64
AK    0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.
00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 -2.64
AL    0.39  1.60  1.31  0.91  1.60  1.09  0.30  1.31  1.60  0.75  0.30  1.31  2.
70  0.75  2.70  2.00  1.60 -1.79  2.70  1.60 -1.79 -1.79 -1.79 -2.64
AM   -0.30  0.91  0.62  0.21  0.91  0.39  0.99  0.62  0.91  0.06 -0.39  0.62  2.
00  0.06  2.00  1.31  0.91 -2.48  2.00  0.91 -2.48 -2.48 -2.48 -2.64
AN    0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.
00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 -2.64
AO    0.39  1.60  1.31  1.60  1.60  1.09  0.30  1.31  1.60  0.75  0.30  1.31  2.
70  0.75  2.70  2.00  1.60 -1.79  2.70  1.60 -1.79 -1.79 -1.79 -2.64
AP    0.24  0.75  0.46  0.06  0.75  1.34  0.14  0.46  0.75 -0.10 -0.55  0.46  1.
85 -0.10  1.85  1.16  0.75 -2.64  1.85  1.44 -2.64 -2.64 -2.64 -2.64
AQ    0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.
00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 -2.64
AR    0.11  1.31  1.02  0.62  1.31  0.80  0.01  1.02  1.31  0.46  0.01  1.02  2.
41  0.46  2.41  1.72  1.31 -2.08  2.41  1.31 -2.08 -2.08 -2.08 -2.64
AS    0.80  2.00  1.72  1.31  2.00  1.49  0.70  1.72  2.00  1.16  0.70  1.72  3.
10  1.16  3.10  2.41  2.00 -1.39  3.10  2.00 -1.39 -1.39 -1.39 -2.64
AT    0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.
00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 -2.64
AU   -0.01  1.19  0.91  0.50  1.19  0.68 -0.11  2.00  1.19  0.35 -0.11  0.91  2.
29  0.35  2.29  1.60  1.19 -2.20  2.29  1.19 -2.20 -2.20 -2.20 -2.64
*    -2.64 -2.64 -2.64 -2.64 -2.64 -2.64 -2.64 -2.64 -2.64 -2.64 -2.64 -2.64 -2.
64 -2.64 -2.64 -2.64 -2.64 -2.64 -2.64 -2.64 -2.64 -2.64 -2.64 -2.64

5.0 DATA FILES

   MATGEN3D does not use any data files.

6.0 USAGE

  6.1 COMMAND LINE ARGUMENTS

Generate a 3D-1D scoring matrix from CCF files.
Version: EMBOSS:6.6.0.0

   Standard (Mandatory) qualifiers (* if not always prompted):
   -modeposition       menu       [1] This option specifies the amino acid
                                  residue positions to use in calculating the
                                  substitution data. (Values: 1 (All amino
                                  acid positions for domains in a DCF file); 2
                                  (Ligand-binding positions.))
*  -modeligand         menu       [1] This option specifies whether to use all
                                  ligands or a specific set of ligands
                                  present in a CON file when calculating the
                                  substitution data. (Values: 1 (All ligands
                                  within a CON file); 2 (Select ligands within
                                  a CON file.))
*  -dcfinfile          infile     This option specifies the name of DCF file
                                  (domain classification file) (input). A
                                  'domain classification file' contains
                                  classification and other data for domains
                                  from SCOP or CATH, in DCF format
                                  (EMBL-like). The files are generated by
                                  using SCOPPARSE and CATHPARSE. Domain
                                  sequence information can be added to the
                                  file by using DOMAINSEQS.
*  -coninfile          infile     This option specifies the location of CON
                                  files (contact files) (output). A 'contact
                                  file' contains contact data for a protein or
                                  a domain from SCOP or CATH, in the CON
                                  format (EMBL-like). The contacts may be
                                  intra-chain residue-residue, inter-chain
                                  residue-residue or residue-ligand. The files
                                  are generated by using CONTACTS, INTERFACE
                                  and SITES.
*  -liginfile          infile     This option specifies the location of the
                                  ligand list file. This file contains a list
                                  of ligand ('heterogen') 3-character
                                  identifier codes. One id code should be
                                  given per line.
  [-ccfddir]           directory  [./] This option specifies the location of
                                  CCF files (clean coordinate files) (input)
                                  which have been processed by using PDBPLUS.
                                  A 'clean cordinate file' contains protein
                                  coordinate and derived data for a single PDB
                                  file ('protein clean coordinate file') or a
                                  single domain from SCOP or CATH ('domain
                                  clean coordinate file'), in CCF format
                                  (EMBL-like). The files, generated by using
                                  PDBPARSE (PDB files) or DOMAINER (domains),
                                  contain 'cleaned-up' data that is
                                  self-consistent and error-corrected. Records
                                  for residue solvent accessibility and
                                  secondary structure are added to the file by
                                  using PDBPLUS.
*  -ccfpdir            directory  [./] This option specifies the location of
                                  CCF files (clean coordinate files) (input)
                                  which have been processed by using PDBPLUS.
                                  A 'clean cordinate file' contains protein
                                  coordinate and derived data for a single PDB
                                  file ('protein clean coordinate file') or a
                                  single domain from SCOP or CATH ('domain
                                  clean coordinate file'), in CCF format
                                  (EMBL-like). The files, generated by using
                                  PDBPARSE (PDB files) or DOMAINER (domains),
                                  contain 'cleaned-up' data that is
                                  self-consistent and error-corrected. Records
                                  for residue solvent accessibility and
                                  secondary structure are added to the file by
                                  using PDBPLUS.
   -modeenvir          menu       [1] This option specifies the environment
                                  definition. See matgen3d documentation for
                                  description of definitions. (Values: 1
                                  (Env1); 2 (Env2); 3 (Env3); 4 (Env4); 5
                                  (Env5); 6 (Env6); 7 (Env7); 8 (Env8); 9
                                  (Env9); 10 (Env10); 11 (Env11); 12 (Env12);
                                  13 (Env13); 14 (Env14); 15 (Env15); 16
                                  (Env16))
  [-scmatrixfile]      outfile    [matgen3d.out] Domainatrix substitution
                                  matrix output file
  [-calclogfile]       outfile    [matgen3d.calc] Domainatrix calculations log
                                  output file
  [-logfile]           outfile    [matgen3d.log] Domainatrix log output file

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-ccfddir" associated qualifiers
   -extension1         string     Default file extension

   "-ccfpdir" associated qualifiers
   -extension          string     Default file extension

   "-scmatrixfile" associated qualifiers
   -odirectory2        string     Output directory

   "-calclogfile" associated qualifiers
   -odirectory3        string     Output directory

   "-logfile" associated qualifiers
   -odirectory4        string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write first file to standard output
   -filter             boolean    Read first file from standard input, write
                                  first file to standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options and exit. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages
   -version            boolean    Report version number and exit


