CANDO preliminary predictions for COVID-19

Last update of predictions: May 8, 2020; CANDO version v2.


Interpreting this data without understanding the context and how the platform works could be hazardous to your health (literally and figuratively)! These are a set of initial and preliminary predictions of human approved drugs and compounds likely to inhibit the SARS coronavirus 2 (COVID-19/SARS-CoV-2). The predictions were made using the CANDO platform ( CANDO works well in expert hands, but these predictions are a first pass application optimizing across the SARS proteome first, then using the top scoring methods to create predictions against the COVID-19 proteome. Refinement of the predictions based on considering the protein targets carefully is still in progress, and attempts are being made to validate the predictions at the bench.

The accuracy of the CANDO platform is between 12%-35% depending on the number of compounds evaluated both in terms of retrospective benchmarking and prospective in vitro validation (see references and links below). Indeed, from twelve prospective in vitro validation studies across ten indications, 58/163 (35%) top ranking predictions validated using the CANDO platform (v1) or its components had comparable or better activity relative to a standard treatment (if available, or micromolar or better inhibitors of the pathogenic system, in this case COVID-19). The validated predictions represent potential novel repurposed therapies for indications such as dengue, dental caries, diabetes, herpes, lupus, malaria, and tuberculosis. The logic is that since these are compounds generally approved for human use, the predictions that are verified in vitro are good candidates for off-label use human studies to evaluate efficacy and ultimately, represent potential new therapies for a particular indication.


The manifests for each directory corresponding to a particular round of predictions are available above. Currently, four lists of predictions were generated using three approaches: 1) a homology based approach using known actives against the original SARS-CoV virus, 2) a de novo method involving the top predicted binding affinities of approved drugs against structures from the COVID-19 proteome and 3) a protein structure agnostic approach using only fingerprint similarities between the SARS actives. The SARS actives were extracted from two high-throughput screens from Shen et al. (2019), Dyall et al. (2014), and an additional broad-spectrum active from Cao et al. (2015).

For the homology based approach (canpredict-homology-human-all.txt), our canpredict module was utilized in which we tallied the number of times certain drugs/compounds were ranked highly in the ranked similarity lists to drugs known to have activity against SARS-CoV (38 compounds total, which themselves had a backrecovery rate aka accuracy of roughly 50%). The set of proteins used for this approach was 5,317 human protein structures from the PDB. This list includes experimental/investigational compounds selected from a total of 8,696, including 1,979 approved drugs (according to DrugBank), and only includes compounds with a consensus score of at least 2 (second column -> "score1"). This same method was used for the protein structure indepedent method in which structural fingerprints were used instead of protein- compound interaction scores (ecfp4-2048-bit_vect_tani-cp_human.txt)

For the de novo method (ecfp10-covid19_coach-int_vect-dice-pscoreXpercentile-CTD1860set-top225.txt), the predictions are based on generating interactions between known drugs and viral proteins in a holistic manner: the feature being predicted is really the binding or "stickiness" of small molecules to viral proteins. This in turn is expected to turn up some inhibitors (of both the proteins and the pathogen) which in turn would lead to clinical efficacy. Some of the predictions are obviously incorrect or inappropriate, but the unedited output of the software is what is provided; it may be possible to refine the predictions further manually. These predictions only include drugs with at least one associated indication from the Comparative Toxicogenomics Database, which served as a proxy for "approved" status (some unapproved drugs are still present, which are indicated by a 'false' in the 'approved' column). The March 16, 2020 list of predictions is largely the same as the one from March 5, 2020, but only includes approved drugs according to DrugBank's mapping (the scoring method is identical).

Some of the interesting predictions from the March 5, 2020 round include chloroquine and other antimalarials at rank 35-40 (approach 1), ACE inhibitors at rank 19-21 (approach 2), remdesivir at rank 46 (approach 2), and amprenavir and other HIV protease inhibitors at rank 48-50 (approach 2), all of which have either shown or believed to have efficacy against SARS-CoV-2 and are undergoing (multiple) clinical trials to demonstrate efficacy. Therefore, some of the other higher ranked drugs in our lists are also worth evaluating, with the potential payoff of choice, greater efficacy, and reduced cost if shown to inhibit SARS-CoV-2 in vitro.

Both prediction methods use the same output schema for listing candidates. Column 1 ('rank') is the rank of the prediction. Column 2 ('score1') is the primary score used to rank - in the case of the homology method, this would be the number of times the drug showed up in the topN most similar drugs to the SARS actives; for the de novo method, this is the number of protein interactions above the set threshold. Column 3 ('score2') is the tiebreaker score - for the homology method, this is average rank of each of the 'score1' votes (lower is better), while for the de novo method, this is the sum of each interaction above the threshold (higher is better). Column 4 ('id') is the in-house ID number assigned to CANDO drugs within the package. Column 5 ('approved') is whether or not the drug is FDA approved according to DrugBank; a '*' or '+' indicates that the compound is associated with the query indication according to the input drug-disease mapping (in our case, from the Comparative Toxicogenomics Database). The de novo method will include associated drugs for the indication in the output list even if they are unapproved as a reference. Column 6 ('name') is simply the generic name of the drug/compound.


The publications that describe the CANDO platform in further detail are given below. The platform is based on a large number of methods developed by us and by others for protein and proteome structure, function, and interaction prediction. In this specific case: For the homology based approach rankings, the CANDO platform was used as described in the publications except for the changes specified above. For the de novo method, eighteen proteins encoded by the SARS-CoV-2 genome were modelled to obtain tertiary structures using I-TASSER. Modelling was accomplished by inferring homology to structures in the PDB (typically SARS proteins) determined by x-ray diffraction. The CANDO pipeline was used to to obtain predicted interaction scores between these eighteen structures and a library of 8,696 small molecule compounds, which includes 1,860 human ingestible drugs. The compounds with the strongest predicted interaction scores (>0.9) and those that bound the most frequently to these 18 protein structures using various scoring protocols are ranked and reported.


The software implementation of the CANDO platform, which is the primary reference for the version of CANDO used to make these predictions:

CANDO platform development, recent significant publications, in order of citation importance for this work:

See also all our publications related to therapeutic discovery as well as a comprehensive list of all our publications.

CANDO || Protinfo || Bioverse || Samudrala Computational Biology Research Group ||