The FSSA software is used for function classification of a query structure. For example, suppose you just obtained a new structure with the TIM barrel fold (SCOP fold identifier c.1), and you want to know its exact biological function. The problem is that this new structure has low sequence similarity (less than 30%) to any other TIM barrel proteins with known function (based on sequence and structure comparisons). In such cases, FSSA has better predictive power than other methods that directly inherit functional annotation information from homology inference, such as Smith-Waterman, PSI-BLAST, HMMs, and structure comparison methods.
The actual function prediction is composed of two steps: First the FSSA script calls the MAMMOTH program and generates all-against-all pairwise structure alignments for all TIM barrel proteins with known function (we use SCOP superfamily to define "same function"). These structure alignments are used by FSSA to generate a signature for each protein and save them in a model file. Second, the FSSA script calls the MAMMOTH program to generate a structure alignment between a new structure and all TIM barrel structures. Based on the model file generated in the first step and the alignment file generated in the second, the FSSA will assign a functional category to the new structure.
Download or request the MAMMOTH structure software from
http://fulcrum.physbio.mssm.edu:8083/mammoth/. The MAMMOTH
structure comparison software is used by the FSSA software. After
installation of MAMMOTH, make sure that the command
is in your path.
and unzip this file which contains one programs:
fssa.pl in the
src/scripts directory. One additional file,
FastaStream.pm, in the same directory is used by the
fssa.pl program. The
.tgz file also contains an
example in the
lib/example directory to demonstrate how to
use the programs.
Run the following commands:
cd lib/example ../../src/scripts/fssa.pl train cross1.model --trainfile cross1.train --pdbloc idpdblist --submatfile blosum50.mat ../../src/scripts/fssa.pl test cross1.prediction --testfile cross1.test --modelfile cross1.model --pdbloc idpdblist --submatfile blosum50.mat
When the first command is issued, a model file named
cross1.model is generated. In the second command, the model
file is used to predict function of proteins in the
cross1.test file. The prediction will be written to
The file formats are as follows:
cross1.train is a text file,
containing one record per line. Each record is composed of an ID and its
function, separated by a tab.
idpdblist has a similar format: one
record per line and each record is composed of an ID and its PDB file location
separated by tab.
cross1.test has same format as cross1.train. It can
have one or two columns, but only the first column (protein ID) will be used.
cross1.prediction is a text file, containing one record per line. The
first tab-delimited column is the protein ID, and the second column is the
predicted function category (other columns are ignored).
is a text file containing a substitution matrix. The substitution matrix files
can be copied from the FASTA package.
Calling the fssa.pl program without any arguments will display brief
usage information. Calling the program with the
will display a help message that explains the usage of different
arguments in detail. Calling with the
-m option will print
more thorough documentation.