Preferences help
enabled [disable] Abstract
Number of results
2008 | 55 | 2 | 261-267
Article title

Prediction of signal peptides in protein sequences by neural networks

Title variants
Languages of publication
We present here a neural network-based method for detection of signal peptides (abbreviation used: SP) in proteins. The method is trained on sequences of known signal peptides extracted from the Swiss-Prot protein database and is able to work separately on prokaryotic and eukaryotic proteins. A query protein is dissected into overlapping short sequence fragments, and then each fragment is analyzed with respect to the probability of it being a signal peptide and containing a cleavage site. While the accuracy of the method is comparable to that of other existing prediction tools, it provides a significantly higher speed and portability. The accuracy of cleavage site prediction reaches 73% on heterogeneous source data that contains both prokaryotic and eukaryotic sequences while the accuracy of discrimination between signal peptides and non-signal peptides is above 93% for any source dataset. As a consequence, the method can be easily applied to genome-wide datasets. The software can be downloaded freely from

Physical description
  • Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University, Warszawa, Poland
  • BioInfoBank Institute, Poznań, Poland
  • Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University, Warszawa, Poland
  • BioInfoBank Institute, Poznań, Poland
  • Baldi P, Brunak S (2001) Bioinformatics: The Machine Learning Approach. 2nd edn., MIT Press, Cambridge, MA.
  • Bendtsen JD, Jensen LJ, Blom N, Von Heijne G, Brunak S (2004a) Feature-based prediction of non-classical and leaderless protein secretion. Protein Eng Des Sel 17: 349-356.
  • Bendtsen JD, Nielsen H, von Heijne G, Brunak S (2004b) Improved prediction of signal peptides: SignalP 3.0. J Mol Biol 340: 783-795.
  • Bendtsen JD, Kiemer L, Fausboll A, Brunak S (2005a) Non-classical protein secretion in bacteria. BMC Microbiol 5: 58.
  • Bendtsen JD, Nielsen H, Widdick D, Palmer T, Brunak S (2005b) Prediction of twin-arginine signal peptides. BMC Bioinformatics 6: 167.
  • Bruch MD, McKnight CJ, Gierasch LM (1989) Helix formation and stability in a signal sequence. Biochemistry 28: 8554-8561.
  • Chou KC (2001) Prediction of signal peptides using scaled window. Peptides 22: 1973-1999.
  • Cornell DG, Dluhy RA, Briggs MS, McKnight CJ, Gierasch LM (1989) Conformations and orientations of a signal peptide interacting with phospholipid monolayers. Biochemistry 28: 2789-2797.
  • Fariselli P, Finocchiaro G, Casadio R (2003) SPEPlip: the detection of signal peptide and lipoprotein cleavage sites. Bioinformatics 19: 2498-2499.
  • Gierasch LM (1989) Signal sequences. Biochemistry 28: 923-930.
  • Hiller K, Grote A, Scheer M, Munch R, Jahn D (2004) PrediSi: prediction of signal peptides and their cleavage positions. Nucleic Acids Res 32 (Web Server issue): W375-W379.
  • Juncker AS, Willenbrock H, Von Heijne G, Brunak S, Nielsen H, Krogh A (2003) Prediction of lipoprotein signal peptides in Gram-negative bacteria. Protein Sci 12: 1652-1662.
  • Kall L, Krogh A, Sonnhammer EL (2004) A combined transmembrane topology and signal peptide prediction method. J Mol Biol 338: 1027-1036.
  • Koczyk G, Wyrwicz LS, Rychlewski L (2007) LigProf: a simple tool for in silico prediction of ligand-binding sites. J Mol Model 13: 445-455.
  • Ladunga I, Czako F, Csabai I, Geszti T (1991) Improving signal peptide prediction accuracy by simulated neural network. Comput Appl Biosci 7: 485-487.
  • Lao DM, Arai M, Ikeda M, Shimizu T (2002a) The presence of signal peptide significantly affects transmembrane topology prediction. Bioinformatics 18: 1562-1566.
  • Lao DM, Okuno T, Shimizu T (2002b) Evaluating transmembrane topology prediction methods for the effect of signal peptide in topology prediction. In Silico Biol 2: 485-494.
  • Li W, Jaroszewski L, Godzik A (2001) Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics 17: 282-283.
  • Liu L, Li J, Tian X, Ren D, Lin J (2005) Information theory in prediction of cleavage sites of signal peptides. Protein Pept Lett 12: 339-342.
  • Menne KM, Hermjakob H, Apweiler R (2000) A comparison of signal sequence prediction methods using a test set of signal peptides. Bioinformatics 16: 741-742.
  • Nielsen H, Engelbrecht J, Brunak S, von Heijne G (1997a) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng 10: 1-6.
  • Nielsen H, Engelbrecht J, Brunak S, von Heijne G (1997b) A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Int J Neural Syst 8: 581-599.
  • Nielsen H, Brunak S, von Heijne G (1999) Machine learning approaches for the prediction of signal peptides and other protein sorting signals. Protein Eng 12: 3-9.
  • Nielsen H, Krogh A (1998) Prediction of signal peptides and signal anchors by a hidden Markov model. Proc Int Conf Intell Syst Mol Biol 6: 122-130.
  • Ostrowski J, Mikula M, Karczmarski J, Rubel T, Wyrwicz LS, Bragoszewski P, Gaj P, Dadlez M, Butruk E, Regula J (2007) Molecular defense mechanisms of Barrett's metaplasia estimated by an integrative genomics. J Mol Med 85: 733-743.
  • Ostrowski J, Rubel T, Wyrwicz LS, Mikula M, Bielasik A, Butruk E, Regula J (2006) Three clinical variants of gastroesophageal reflux disease form two distinct gene expression signatures. J Mol Med 84: 872-882.
  • Plewczynski D, Pas J, Von Grotthuss M, Rychlewski L (2004) Comparison of proteins based on segments structural similarity. Acta Biochim Pol 51: 161-172.
  • Plewczynski D, Jaroszewski L, Godzik A, Kloczkowski A, Rychlewski L (2005a) Molecular modeling of phosphorylation sites in proteins using a database of local structure segments. J Mol Model 11: 431-438.
  • Plewczynski D, Tkacz A, Godzik A, Rychlewski L (2005b) A support vector machine approach to the identification of phosphorylation sites. Cell Mol Biol Lett 10: 73-89.
  • Plewczynski D, Tkacz A, Wyrwicz LS, Rychlewski L (2005c) AutoMotif server: prediction of single residue post-translational modifications in proteins. Bioinformatics 21: 2525-2527.
  • Plewczynski D, Tkacz A, Wyrwicz LS, Godzik A, Kloczkowski A, Rychlewski L (2006) Support-vector-machine classification of linear functional motifs in proteins. J Mol Model 12: 453-461.
  • Plewczynski D, Tkacz A, Wyrwicz LS, Rychlewski L, Ginalski K (2008) AutoMotif Server for prediction of phosphorylation sites in proteins using support vector machine: 2007 update. J Mol Model 14: 69-76.
  • Puntervoll P, Linding R, Gemund C, Chabanis-Davidson S, Mattingsdal M, Cameron S, Martin DM, Ausiello G, Brannetti B, Costantini A, Ferre F, Maselli V, Via A, Cesareni G, Diella F, Superti-Furga G, Wyrwicz L, Ramu C, McGuigan C, Gudavalli R, Letunic I, Bork P, Rychlewski L, Kuster B, Helmer-Citterich M, Hunter WN, Aasland R, Gibson TJ (2003) ELM server: A new resource for investigating short functional sites in modular eukaryotic proteins. Nucleic Acids Res 31: 3625-3630.
  • Rapoport TA (1992) Transport of proteins across the endoplasmic reticulum membrane. Science 258: 931-936.
  • Reinhardt A, Hubbard T (1998) Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res 26: 2230-2236.
  • Sidhu A, Yang ZR (2006) Prediction of signal peptides using bio-basis function neural networks and decision trees. Appl Bioinformatics 5: 13-19.
  • Talmud P, Lins L, Brasseur R (1996) Prediction of signal peptide functional properties: a study of the orientation and angle of insertion of yeast invertase mutants and human apolipoprotein B signal peptide variants. Protein Eng 9: 317-321.
  • Vert JP (2002) Support vector machine prediction of signal peptide cleavage site using a new class of kernels for strings. Pac Symp Biocomput: 649-660.
  • von Grotthuss M, Plewczynski D, Ginalski K, Rychlewski L, Shakhnovich EI (2006) PDB-UF: database of predicted enzymatic functions for unannotated protein structures from structural genomics. BMC Bioinformatics 7: 53.
  • von Heijne G (1986a) Net N-C charge imbalance may be important for signal sequence function in bacteria. J Mol Biol 192: 287-290.
  • von Heijne G (1986b) A new method for predicting signal sequence cleavage sites. Nucleic Acids Res 14: 4683-4690.
  • Wyrwicz LS, Rychlewski L (2007) Fold recognition insights into function of herpes ICP4 protein. Acta Biochim Polon 54: 551-559.
  • Wyrwicz LS, Gaj P, Hoffmann M, Rychlewski L, Ostrowski J (2007) A common cis-element in promoters of protein synthesis and cell cycle genes. Acta Biochim Polon 54: 89-98.
  • Wyrwicz LS, Rychlewski L (2008) Cytomegalovirus immediate early gene UL37 encodes a novel MHC-like protein. Acta Biochim Polon 55: 67-74.
  • Wyrwicz LS, Koczyk G, Rychlewski L (2008) Homologues of HSV-1 nuclear egress factor UL34 are potential phosphoinositide-binding proteins. Acta Biochim Polon 55: 207-213.
  • Zhang Z, Henzel WJ (2004) Signal peptide prediction based on analysis of experimentally verified cleavage sites. Protein Sci 13: 2819-2824.
Document Type
Publication order reference
YADDA identifier
JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.