PROTINFO documentation


Calculation times

The methods are usually executed on a dedicated 64-processor cluster. Our goal is to ensure that that the prediction time for each sequence is less than 24 hours (comparative modelling predictions will be most likely take only a few hours), but this of course depends on how many people submit sequences. Please see the notes below on the methods and the length dependence to understand why it can take long times. You can also monitor the progress of your jobs using the PROTINFO monitor.


Submission tips


Troubleshooting tips


Methods

All papers published regarding the research are accessible from our ongoing areas of research page, and all or most of the software is accessible from our software distribution server.


General notes

Following the CASP convention, up to five models for each prediction method may be returned (in CASP format). Under certain conditions (no clear target-template relationship discerned, for example), both methods may be executed by the PROTINFO server regardless of method.


Comparative modelling using RAMP

If no template and alignment is specified, the method does a sequence-only search using a variety of methods and then uses the "hits" returned as seeds for a multiple sequence alignment. Initial models are then built for each alignment to a template and the resulting models are scored. Loops and side chains are built on the best scoring models using a frozen approximation. A sophisticated graph-theory search to mix and match between various main chain and side chain conformations is done in some cases (when the templates all match well).

During the searches, templates with >= 95% sequence identity to the target are usually ignored (since this could represent the same structure in the PDB). If you really want a model where the target-template alignment has a sequence identity >= 95%, then you should submit the alignment and template structures explicitly (it should be trivial to construct such an alignment by hand).

This approach is likely to produce the best models when the relationship between the target and template proteins is clearly discernible (>= 30% sequence identity). Even though models are built if the sequence identity is lower, they are likely to contain errors.


De novo prediction using RAMP

If there are no related templates to the target and/or if the target sequence has the appropriate length (around 100 residues), then it will be modelled using our de novo methods. This approach is likely to be most useful for small sequences.


Secondary structure assignment using PsiCSI

This method (published in Protein Science) uses neural networks to translate NMR chemical shifts into secondary structure information (somewhat similar to CSI) and combines it with sequence based predictions (à la Psipred). It has a sustained three-state average accuracy of 89% on a rigourously jack-knifed test set of 92 proteins for which NMR chemical shift information was publicly available.

PsiCSI chemical shifts must be supplied in NMR-Star format (tools for converting to this format from other popular formats are available).

The output will include individual secondary structure assignments as well as the confidences for each of the three states.


Version information

nov212012 PROTINFO-CM v0.2
nov212012 PROTINFO-AB v0.2
nov212012 PsiCSI v1.1
nov212012 RAMP v0.4


Protinfo || Bioverse || Samudrala Computational Biology Research Group || protinfo@compbio.org