Full-text resources of PSJD and other databases are now available in the new Library of Science.
Visit https://bibliotekanauki.pl


Preferences help
enabled [disable] Abstract
Number of results


2012 | 7 | 1 | 38-44

Article title

Pattern recognition approach to classifying CYP 2C19 isoform


Title variants

Languages of publication



In this paper a pattern recognition approach to classifying quantitative structure-property relationships (QSPR) of the CYP2C19 isoform is presented. QSPR is a correlative computer modelling of the properties of chemical molecules and is widely used in cheminformatics and the pharmaceutical industry. Predicting whether or not a particular chemical will be metabolized by 2C19 is of primary importance to the pharmaceutical industry. This task poses certain challenges. First of all analyzed data are characterized by a significant biological noise. Additionally the training set is unbalanced, with objects from negative class outnumbering the positives four times. Presented solution deals with those problems, additionally incorporating a throughout feature selection for improving the stability of received results. A strong emphasis is put on the outlier detection and proper model validation to achieve the best predictive power.










Physical description


1 - 2 - 2012
24 - 11 - 2011


  • Department of Systems and Computer Networks, Wroclaw University of Technology, Wybrzeze Wyspianskiego 27, 50-370, Wroclaw, Poland


  • [1] http://www.simulations-plus.com/
  • [2] Gasteiger J., Funatsu K., Chemoinformatics-An Important Scientific Discipline, Journal of Computational Chemistry Jpn., 2006, Vol. 5, No. 2:53–58 http://dx.doi.org/10.2477/jccj.5.53
  • [3] Chawla N.V., Bowyer K.W., Hall L.O. and Kegelmeyer W.P., SMOTE: Synthetic Minority Over-sampling Technique, Journal of Artificial Intelligence Research, 2002, Volume 16:321–357
  • [4] Chawla N.V., Lazarevic A., Hal L.O. and Bowyer K.W., Smoteboost: improving prediction of the minority class in boosting, Proceedings of the Principles of Knowledge Discovery in Databases, 2003, PKDD-2003:107–119
  • [5] Han H., Wang W., and Mao B., Borderline-smote: A new over-sampling method in imbalanced data sets learning, Lecture Notes in Computer Science, 2005, vol. 3644:878–887 http://dx.doi.org/10.1007/11538059_91[Crossref]
  • [6] Köknar-Tezel S., Latecki L.J., Improving SVM classification on imbalanced time series data sets with ghost points, Knowledge and Information Systems, 2010, DOI: 10.1007/s10115-010-0310-3 [WoS][Crossref]
  • [7] Wang B.X., Japkowicz N., Boosting Support Vector Machines for Imbalanced Data Sets, Lecture Notes in Computer Science, 2008, Volume 4994/2008:38–47 http://dx.doi.org/10.1007/978-3-540-68123-6_4[Crossref]
  • [8] Li B.Y., Peng J., Chen Y.Q. and Jin Y.Q., Classifying Unbalanced Pattern Groups by Training Neural Network, Lecture Notes in Computer Science, 2006, Volume 3972/2006:8–13 http://dx.doi.org/10.1007/11760023_2[Crossref]
  • [9] Zhao Z., Huang D., An evolutionary modular neural network for unbalanced pattern classifications, Evolutionary Computation, 2007, CEC 2007:1662–1669
  • [10] Gasteiger J.(Editor), Handbook of Chemoinformatics - From Data to Knowledge, Wiley-VCH, 2003
  • [11] Lindsay K.R., Buchanan B.G., Feigenbaum E.A., Lederberg J., Applications of Artificial Intelligence for Organic Chemistry; the DendralProject, McGraw-Hill, New York, 1980
  • [12] Brown F., Editorial Opinion: Chemoinformatics-a ten year update, Current Opinion in Drug Discovery & Development, 2005, 8(3):296–302
  • [13] Anoyama, T., Suzuki, Y., Ichikawa, H., Neural networks applied to structure-active relationships. Journal of Medicinal Chemistry. 1990, 33, 905–908 http://dx.doi.org/10.1021/jm00165a004[Crossref]
  • [14] King, R. D., Hirst, J. D., Sternberg, M. J. E., Comparison of artificial intellogence methods for modeling pharmaceutical QSARs. Applied Artificial Intelligence, 1995, 9, 213–233 http://dx.doi.org/10.1080/08839519508945474[Crossref]
  • [15] Liu, Y., A comparative study on feature selection methods for drug discovery. Journal of Chem. Inf. Comput. Sci., 2004, 44, 1823–1828 http://dx.doi.org/10.1021/ci049875d[Crossref]
  • [16] Burbidge, R., Trotter, M., Buxton, B., Drug design by machine learning: support vector machines for pharmaceutical data analysis. Computers and Chemistry, 2001, 26, 5–14 http://dx.doi.org/10.1016/S0097-8485(01)00094-8[Crossref]
  • [17] Duda R.O., Hart P.E., Stork D.G., Pattern Classification, Wiley-Interscience, 2001
  • [18] Vapnik V., Statistical Learning Theory, Willey 1998
  • [19] Williams, C. K. I., Barber, D., Bayesian classification with Gaussian Processes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20, 1342–1351 http://dx.doi.org/10.1109/34.735807[WoS][Crossref]
  • [20] Crammer, K., Singer, Y., On the algorithmic implementation of multiclass kernel-based vector machines, Journal of Machine Learning Research, 2001, 2, 265–292
  • [21] Redman T. C., Data Quality. The Field Guide, Boston Digital Press, 2001
  • [22] Ben-Gal I., Outlier detection, Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researchers, Kluwer Academic Publishers, 2005
  • [23] Guyon I., Gunn S., Nikravesh M. and Zadeh L., Feature extraction, foundations and applications, Springer, 2006
  • [24] Yu L., Liu H., Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 2004, 1205–1224
  • [25] http://www.r-project.org/
  • [26] Karatzoglou A., Smola A., Hornik K., Zeileis A., Kernlab - An S4 Package for Kernel Methods in R, Journal of Statistical Software, 2004, 11(9)
  • [27] Karatzoglou A., Meyer D., Hornik K., Support Vector Machines in R, Journal of Statistical Software, 2006, 15(9)
  • [28] Alpaydin, E., Combined 5 × 2 cv F Test for Comparing Supervised Classification Learning Algorithms, Neural Computation, 1998, 11:1885–1892 http://dx.doi.org/10.1162/089976699300016007[Crossref][WoS]

Document Type

Publication order reference


YADDA identifier

JavaScript is turned off in your web browser. Turn it on to take full advantage of this site, then refresh the page.