Data Stream Classification Using Classifier Ensemble
Languages of publication
For the contemporary business, the crucial factor is making smart decisions on the basis of the knowledge hidden in stored data. Unfortunately,m traditional simple methods of data analysis are not sufficient for efficient management of modern enterprizes, because they are not appropriate for the huge and growing amount of the data stored by them. Additionally data usually comes continuously in the form of so-called data stream. The great disadvantage of traditional classification methods is that they assume that statistical properties of the discovered concept are being unchanged, while in real situation, we could observe so-called concept drift, which could be caused by changes in the probabilities of classes or/and conditional probability distributions of classes. The potential for considering new training data is an important feature of machine learning methods used in security applications (spam filtering or intrusion detection) or decision support systems for marketing departments, which need to follow the changing client behavior. Unfortunately, the occurrence of concept drift dramatically decreases classification accuracy. This work presents the comprehensive study on the ensemble classifier approach applied to the problem of drifted data streams. Especially it reports the research on modifications of previously developed Weighted Aging Classifier Ensemble (WAE) algorithm, which is able to construct a valuable classifier ensemble for classification of incremental drifted stream data. We generalize WAE method and propose the general framework for this approach. Such framework can prune an classifier ensemble before or after assigning weights to individual classifiers. Additionally, we propose new classifier pruning criteria, weight calculation methods, and aging operators. We also propose rejuvenating operator, which is able to soften the aging effect, which could be useful, especially in the case if quite ”old” classifiers are high quality models, i.e., their presence increases ensemble accuracy, what could be found, e.g., in the case of recurring concept drift. The chosen characteristics of the proposed frameworks were evaluated on the basis of the wide range of computer experiments carried out on the two benchmark data streams. Obtained results confirmed the usability of proposed method to the data stream classification with the presence of incremental concept drift.
21 - 05 - 2015
Publication order reference