The FSSA software is used for function classification of a query structure. For example, suppose you just obtained a new structure with the TIM barrel fold (SCOP fold identifier c.1), and you want to know its exact biological function. The problem is that this new structure has low sequence similarity (less than 30%) to any other TIM barrel proteins with known function (based on sequence and structure comparisons). In such cases, FSSA has better predictive power than other methods that directly inherit functional annotation information from homology inference, such as Smith-Waterman, PSI-BLAST, HMMs, and structure comparison methods.
The actual function prediction is composed of two steps: First the FSSA script calls the MAMMOTH program and generates all-against-all pairwise structure alignments for all TIM barrel proteins with known function (we use SCOP superfamily to define "same function"). These structure alignments are used by FSSA to generate a signature for each protein and save them in a model file. Second, the FSSA script calls the MAMMOTH program to generate a structure alignment between a new structure and all TIM barrel structures. Based on the model file generated in the first step and the alignment file generated in the second, the FSSA will assign a functional category to the new structure.
Download or request the MAMMOTH structure software from
http://fulcrum.physbio.mssm.edu:8083/mammoth/. The MAMMOTH
structure comparison software is used by the FSSA software. After
installation of MAMMOTH, make sure that the command mammoth
is in your path.
Download
http://software.compbio.washington.edu/fssa/fssa.tgz. Untar
and unzip this file which contains one programs:
fssa.pl
in the
src/scripts
directory. One additional file,
FastaStream.pm
, in the same directory is used by the
fssa.pl
program. The .tgz
file also contains an
example in the lib/example
directory to demonstrate how to
use the programs.
Run the following commands:
cd lib/example
../../src/scripts/fssa.pl train cross1.model --trainfile cross1.train --pdbloc idpdblist --submatfile blosum50.mat
../../src/scripts/fssa.pl test cross1.prediction --testfile cross1.test --modelfile cross1.model --pdbloc idpdblist --submatfile blosum50.mat
When the first command is issued, a model file named
cross1.model
is generated. In the second command, the model
file is used to predict function of proteins in the
cross1.test
file. The prediction will be written to
the cross1.prediction
file.
The file formats are as follows: cross1.train
is a text file,
containing one record per line. Each record is composed of an ID and its
function, separated by a tab. idpdblist
has a similar format: one
record per line and each record is composed of an ID and its PDB file location
separated by tab. cross1.test
has same format as cross1.train. It can
have one or two columns, but only the first column (protein ID) will be used.
cross1.prediction
is a text file, containing one record per line. The
first tab-delimited column is the protein ID, and the second column is the
predicted function category (other columns are ignored). blosum50.mat
is a text file containing a substitution matrix. The substitution matrix files
can be copied from the FASTA package.
Calling the fssa.pl program without any arguments will display brief
usage information. Calling the program with the -h
option
will display a help message that explains the usage of different
arguments in detail. Calling with the -m
option will print
more thorough documentation.