Next Previous Contents

2. Software

2.1 General issues

The FSSA software is used for function classification of a query structure. For example, suppose you just obtained a new structure with the TIM barrel fold (SCOP fold identifier c.1), and you want to know its exact biological function. The problem is that this new structure has low sequence similarity (less than 30%) to any other TIM barrel proteins with known function (based on sequence and structure comparisons). In such cases, FSSA has better predictive power than other methods that directly inherit functional annotation information from homology inference, such as Smith-Waterman, PSI-BLAST, HMMs, and structure comparison methods.

The actual function prediction is composed of two steps: First the FSSA script calls the MAMMOTH program and generates all-against-all pairwise structure alignments for all TIM barrel proteins with known function (we use SCOP superfamily to define "same function"). These structure alignments are used by FSSA to generate a signature for each protein and save them in a model file. Second, the FSSA script calls the MAMMOTH program to generate a structure alignment between a new structure and all TIM barrel structures. Based on the model file generated in the first step and the alignment file generated in the second, the FSSA will assign a functional category to the new structure.

2.2 Download and usage

Download the MAMMOTH software

Download or request the MAMMOTH structure software from http://fulcrum.physbio.mssm.edu:8083/mammoth/. The MAMMOTH structure comparison software is used by the FSSA software. After installation of MAMMOTH, make sure that the command mammoth is in your path.

Download the FSSA software

Download http://software.compbio.washington.edu/fssa/fssa.tgz. Untar and unzip this file which contains one programs: fssa.pl in the src/scripts directory. One additional file, FastaStream.pm, in the same directory is used by the fssa.pl program. The .tgz file also contains an example in the lib/example directory to demonstrate how to use the programs.

Usage

Run the following commands:

cd lib/example
../../src/scripts/fssa.pl train cross1.model --trainfile cross1.train --pdbloc idpdblist --submatfile blosum50.mat
../../src/scripts/fssa.pl test cross1.prediction --testfile cross1.test --modelfile cross1.model --pdbloc idpdblist --submatfile blosum50.mat

When the first command is issued, a model file named cross1.model is generated. In the second command, the model file is used to predict function of proteins in the cross1.test file. The prediction will be written to the cross1.prediction file.

The file formats are as follows: cross1.train is a text file, containing one record per line. Each record is composed of an ID and its function, separated by a tab. idpdblist has a similar format: one record per line and each record is composed of an ID and its PDB file location separated by tab. cross1.test has same format as cross1.train. It can have one or two columns, but only the first column (protein ID) will be used. cross1.prediction is a text file, containing one record per line. The first tab-delimited column is the protein ID, and the second column is the predicted function category (other columns are ignored). blosum50.mat is a text file containing a substitution matrix. The substitution matrix files can be copied from the FASTA package.

Calling the fssa.pl program without any arguments will display brief usage information. Calling the program with the -h option will display a help message that explains the usage of different arguments in detail. Calling with the -m option will print more thorough documentation.


Next Previous Contents