快捷分类

Key-Attributes-Based Ensemble Classifier for Customer Churn Prediction

更新时间：2016-07-05

1.Introduction

Data mining has become increasingly important in management activities, especially in the support of decision making, most of which can be attributed to the task of classification. Therefore, classification analysis has been widely used in the study of management decision problems[1]-[4], for example, trend prediction and customer segmentation.Obviously, classification methods with high accuracy would reduce the decision loss of misclassification. However, with the increasing complexity of modern management and the diversity of related data, the results provided by a single classifier are suspected of having poor semantics and thus are hard to understand in management practice, especially for the prediction tasks with complex data and managerial scenarios[5].

In recent years, ensemble classifiers have been introduced into solving complicated classification problems[6], and they represent the new direction for the improvement of the performance of classifiers. These classifiers could be based on a variety of classification methodologies, and could achieve different rates of correct classified individuals. The goal of the classification result integration algorithms is to generate more certain, precise, and accurate results[7].

In literature, numerous methods have been suggested for the creation of ensemble classifiers[7],[8]. Although the ensemble classifiers constructed by any of the general methods have archived a great number of applications in classification tasks[8],they have to face two challenges in performance under some real managerial scenarios. The first one is the expensive time cost for classifiers’ training/learning, and the second one is about the poor semantic understanding(management insights) of the classification results.

In this research, we propose a method which builds an ensemble classifier based on the key attributes (values) that are filtered out from the initial data. Experiment results with real data show that the proposed method not only has high relative precision in classification, but also has high comprehensibility of its calculation results.

不管怎么说，靠孟导的天生条件是很难光彩夺目了，于是孟导把目光投在了身外之物上。靠钱，虽然不少，但是要说多到出名的地步，还远远不够。再说钻进钱眼的人名声多半也不好，和孟导璀璨人生的计划也不协调。靠股票，虽然股票大亨这个头衔听起来诱人，但是孟导朋友里因为炒股从8位数回到7位数，乃至一朝回到解放前的也大有人在。孟导仔细掂量了一下自己的身价，就不再做这个梦了。

2.Related Work

2.1　Classification Models for Churn Prediction

In most real applications, studies are mainly focused on improving the performance of a single algorithm in predicting activities, typically in predicting the customer churn in the service industry.

陈琛是典型的八零后，别看他年轻，但是他的工作经验很丰富。2009年，陈琛毕业后在广东天禾就职，从事推广及销售工作。从事过销售的人都知道，干这一行不仅得能说会道，还得肯付出，也就是说不仅得有嘴皮子功夫，凡事更得身体力行。陈琛也不例外，做销售到处跑，甚至没有稳定的生活。他在浙江、江苏、上海都从事过销售工作，而且取得了喜人的业绩。

In this stream, Hu et al. analyzed and evaluated three implementations for decision trees in the churn prediction system with big data[9]. Kim et al. used a logistic regression to construct the customer churn prediction model[10]. Tian et al.adopted the Bayesian classifier to build a customer churn prediction model[11]. More complicatedly, artificial neural network (ANN)[12] and random forest (RF)[13] have been adopted to build the customer churn prediction model. Ultsch introduced a self-organizing map (SOM) to build the customer churn prediction model[14]. Rodan et al.[15] used support vector machine (SVM) to predict customer churn. Au et al. built the customer churn prediction model based on evolutionary learning algorithms[16].

2.2　Ensemble Classifier

The main idea of the ensemble classifier is to build multiple classifiers on the collected original data set, and then gather the results of these individual classifiers in the classification process. Here, individual classifiers are called base/weak classifiers. During the training, the base classifiers are trained separately on the data set. During the prediction, the base classifiers provide a decision on the test dataset. An ensemble method then combines the decisions produced by all the base classifiers into one final result. Accordingly, there are a lot of fusion methods in the literature including voting, the Borda count, algebraic combiners, and so on[7].

The theory and practices in literature have proved that an ensemble classifier can improve the performance of classification significantly, which might be better than the performance provided by any single classifier[8]. Generally,there are two methods to construct an ensemble classifier[7],[8]:1) algorithm-oriented method: Implementing different classification learning algorithms on the same data, for example, the neural network and decision tree; 2) data-oriented method: Separating the initial dataset into parts and using different subsets of training data with a single classification method.

Representing each user as an entity, then the dataset is composed by the values of user-attributes can be treated as an initial matrix as shown in Table 1. In which, the value of xi is typically vectors of the form and it denotes the whole values to the Useri. The value of Ai is typically vectors of the form whose components are discrete or real value representing the values for attribute Aj such as age, income, and location. The attributes are called the features of xi.

With the speedup of the market competition, maintaining the existing customers is then becoming the core marketing strategy to survive in the telecommunication industry. For the better performance of customer maintaining, it is necessary to predict those who are about to churn in the future. Studying the churn prediction is an important concern in the telecommunication industry. For instants, in the following experiments, the data is collected from China Mobile.

3.Research Method

3.1　Research Problem

Particularly, for the decline of the prediction precision caused by the complex data structure, processing the training data is a feasible way for ensemble classifier construction.Bagging and boosting are two typical ensemble methods of handling the datasets[17]-[19].

Table 1: Initial data matrix

A1…Aj…　An User1　x1…Useri　xi　x…Userm　xm

Given a vector of data , a classifier learning program is given training examples of the form for some unknown function.An ensemble of classifiers, i.e.,, is a set of classifiers whose individual decisions are combined in some way to classify new examples. For the algorithm-oriented method, people try to train a set of classifiers by selecting some functions in F and implement these functions on the data set X.Whereas, for the data-oriented methods, people train a set of classifiers by selecting one function in F and implement it on each of the sub dataset Xi.

Since the basic research context of this study is about complicated training data and complex decision scenarios, both of the algorithm- and data-oriented methods would be taken into consideration in ensemble classifier construction by training a group of classifiers.

3.2　Key Attribute Selection

To simplify the calculation, we introduce the following“clustering-annotation” process to selecte the key attributes.Firstly, we use a clustering method to cluster the attributes of into groups, i.e., ,according to their values’ similarity. In other words, if are similar to each other, then

Key attribute selection used in this study is to select a lot of attributes from data sets and the selection is basically consistency with the goal of prediction.

The two ways of supervised and unsupervised methods can be used to selecte attributes. The supervised method is also called the management-oriented method. It is used to determine whether an attribute A is a key attribute according to the management needs and prior knowledge. The typical method is asking some experts to label out the key attributes. The advantage of this method is that its calculation process is simple and its results have higher comprehensibility. To avoid the selection bias from the experts’ side, sometimes, the unsupervised method is used for data preprocessing by introducing some methods with the computational capacity of grouping or dimension reduction, for example, clustering or principal component analysis (PCA).

As the dimensionality of the data increases, many types of data classification problems become significantly harder,particularly at the high computational cost and memory usage[20]. As a result, the reduction of the attribute space may lead to a better understandable model and simplifies the usage of different visualization techniques. Thus, a learning algorithm to induce a classifier must address the two issues: Selecting some key attributes (dimension reduction) and further splitting a dataset into parts according to the value distributions of these key attributes.

Next, we associate one representative attribute for πi in accordance with the management semantics of attributes in πi.The basic rule for the association is that the selected attribute should have strong potential correlation (management insights)with the decision-making problem.

2.政策建议。①进一步改善信贷结构，加强抵御外界风险的能力。在保持合理流动性的同时，进一步加大力度调整信贷结构，加大对“三农”、小微企业的信贷资金投入，如成立风险补偿基金并予以财政补贴，以知识产权质押贷款为例，政府和银行等金融机构可一同搭建平台，完善评估体系，创新担保方式，进一步拓宽中小企业的融资渠道。金融机构要优化信贷结构和信贷投向，保证信贷资金的收益，加强抵御风险的能力。同时，政府要注重发展循环经济，搭建平台，构建机制，积极向金融机构输送低投入高产出的优质项目。

3.3　Attribute Value Based Dataset Splitting

After the key attributes are selected, then the data set X would be split (clustered) into k parts by the value distributions of these key attributes.

The general method for such task is the binning method which can distribute sorted values into a number of bins.Assume that the maximum value of attribute A is max and its minimum is min, and divide the original data set into k subdataset. The record x, whose value of attribute A satisfies the following condition, will be classified as a member of the group Ci:

where i = 1, 2, ···, k.

In literature, researchers have introduced some efficient methods to split the initial dataset into sub-datasets automatically, for example, the maximum information gain method and Gini method[8],[21]. The performance of such unsupervised methods is affected by the type, range, and distributions of the attribute values, and especially, they may suffer from the higher computational complexity.

Equation (2) works well on data splitting with one attribute A. Moreover, we could split data with a set of attributes as clustered in (1). To deal with a very large dataset, it is argued that the singular value decomposition (SVD) of matrices might provide an excellent tool[22].

The previous experiments on sub-datasets separated by different key attributes provide 4 hybrid models. Next, we will use these four models to make prediction on dataset X2. Also,the measurements of precision and ROC curve are used to evaluate the performance of each model.

1) Extracting dataset from X to form a m×|πi| matrix according to the key attributes of πi;

《回忆与随想》是曾经作为外交家及天主教司铎的陆徵祥晚年通过在比利时圣安德鲁修道院的几次演讲，而对自己生平重要经历的回忆以及心路历程的思考。

首位度多用于测量区域主导性[28]，反映区域规模序列中的顶头优势性，也表明区域中各种资源的集中程度。本文将其用于反映旅游客源市场分布的集中度，计算公式如下：

2) Computing the SVD of matrix such that

where U and V are orthonormal and S is diagonal. The column vectors of U are taken from the orthonormal eigenvectors of , and ordered right to left from largest corresponding eigenvalue to the smallest.

3) The elements of S are only nonzero on the diagonal, and are called the singular values. By convention, the ordering of the singular values is determined by high-to-low sorting, so that we can choose the top-k eigenvalues of S and cluster the vectors x(πi) in into k clusters: .Finally, the cluster information for is further used to map each vector of x in X into the group:

3.4　Ensemble Classifier

To keep more managerial information, we can construct an ensemble classifier as following:

Firstly, given a decision-making goal, we cluster all the attributes into l groups and associate each group with a representative feature. Then, we introduce SVD to split the data matrix of for the group πi, and the results are used to map all the vectors in X into k groups, each of which is a sub-dataset specially for the purpose of better prediction for the targeted decision-making goal. Next, based on the new generated subdataset, we can introduce the general algorithm or data oriented method to train a set of approximate classifiers and use them to perform the classification tasks for decision-making problem.At last, a fused result will be reported for the prediction.

Another important work is to select an appropriate classification algorithm for those aforementioned sub-datasets.Considering the cost of calculation and the precision of results,in this study, we choose three typical classification algorithms of neural net, logistic, and C5.0[23] as the basic algorithms to build the hybrid model.

The classification of a new instance x is made by voting on all classifiers {CFt}, each has a weight of αt where t={Neural net, Logistic, C5.0}. The final prediction can be written as:

由于Au、Ag等贵金属主要沉降在阳极泥里，因此降低电解液比重、增加阳极泥的沉降性有利于贵金属的回收。生产实践表明，阳极板银含量低于800×10-6时，电解铜含银可维持在(8～10)×10-6。图4表明，2017年以来，为提高金属回收率，电解重点加强中间物料管理及系统控制等手段，以铜为例，铜回收率为98.8%左右，为提高公司经济效益做出了贡献。

where, αt is a value between [0, 1] according to the performance of CFt. In order to simplify the calculation, αt can be set as 1 for the best classifier and 0 for the others.

《化妆师的就业指导》熊琴指出随着经济发展和社会进步，大众对美的追求日益强烈，同时受到近几年的日韩欧美各种流行文化的影响，使更多爱美人士走在了追求美的道路上，随着审美需求的不断加大，大众对自己的形象要求也越来越高，于是也使得一些化妆培训机构如雨后春笋一般应运而生，在全国遍地开放，广招生源，这也出现了一个问题短期的化妆培训机构没有办法对化妆师进行系统高素质的培训练习，造成了化妆师专业素质的层次不齐。

3.5　Evaluation Method

In this paper, the precision[21] and receiver operating characteristic (ROC)[24] are used to evaluate the results.

Given a set of prediction results made by a classifier, the confusion matrix of two classes “true” and “false” are shown in Table 2. Here, variable the A, B, C, and D are used to denote the number of true positive, true negative, false positive,and false negative results, respectively.

综上所述，对于胃部疾病患者在进行临床诊断期间，合理选择口服速溶胃肠超声助显剂超声造影方法完成，对于最终胃部疾病的早期确诊可以提供参考依据，从而对应进行胃部疾病治疗方案的研究并且顺利实施，最终对于胃部疾病患者治疗效果的提升以及生活品质的提升奠定基础，进一步说明对胃部疾病检查患者实施口服速溶胃肠超声助显剂超声造影诊断的可行性。

Table 2: Results matrix predicted by the classifier

田面水有机氮的动态变化如图2所示。在水稻生育期，田面水有机氮浓度表现为 CF>N100>N90>N80>N70>WN；除CF处理在施用基肥后有机氮先下降后上升趋于平稳外，整个稻季中，各处理有机氮都表现为施用基肥后先增加后降低，追肥后先增加再降低的趋势；水稻生育初期各控释氮肥处理的有机氮浓度峰值出现时间晚于CF处理，而各施肥处理追肥后的高峰早于WN处理。本试验中，早稻田面水中有机氮含量高于无机氮，这是由于基肥施用前期，所有处理都翻压紫云英的缘故（万水霞等，2015；喻足衡，2012）。

4.Experiment Results

4.1　Data Set

As mentioned before, the focuses of these researches have been put on the prediction accuracy of each single model.However, we could also address the problems of constructing an ensemble classifier based on the data distribution for better prediction results.

Note that, due to the great uncertainty of the consumer behavior and little data recorded in companies’ operation databases, the records generated by temporary customers and the customers who buy a SIM card and discard the card soon after short-term consumption are cleared. At last, all together 47735 customers are randomly selected from three main sub-branches which located in 3 different cities separately. The observation period is from January 1st, 2008 to May 31st, 2008 and the extracted information is about the activities of the users in using the telecommunication services, such as contract data, consumer behaviors, and billing data.

After data preprocessing such as data clean, integration,transformation, and discretization, the valid customer data is 47365 (99.2% of the total number of samples and noted as dataset X), in which, 3421 users are churn customers (the churn rate is 7.2%). In the experiments, the data set X has been separated into two parts: The training data which were generated from January 1st to March 31st, 2008, denoted by X1,and the test data which are generated from April 1st to May 31st, 2008, denoted by X2.

The experiment platform is SPSS Clementine12.0, which provides well-programmed software tools for the classification algorithms of C5.0, logistic, and neural net.

The ROC is a graphical plot that illustrates the performance of a classifier system as its discrimination threshold is varied.The ROC analysis is originated from the statistical decision theory in the 1950s and has been widely used in the performance assessment of classification[21]. The ROC curve is plotted by treating the ratio of true positive as Y-axis and ratio of false positive as X-axis. The closer to the upper left corner of ROC curve, the higher the accuracy of the model predictions.The area under curve (AUC) can be used as a measure of the prediction effect. The value of AUC generally ranges between 1.000 and 0.500 and it represents the better prediction if the area value approaches closer to 1.000.

营养过剩是肥胖发生最主要的诱因，肥胖不仅可以造成学生各种代谢障碍，严重的还会增加患糖尿病、高血压、心脑血管疾病、痛风等的风险[3]；营养过剩也是导致龋齿发生的重要因素，高蛋白、高热量食物在口腔积存，会增加其龋齿发生的几率；营养过剩的学生容易出现一些心理问题，肥胖人群经常受到身边人的嘲笑，这样会使得他们出现自卑、自闭的现象。

4.2　Attribute Selection and Dataset Splitting

In total, there are 116 (n=116) variables included in the customer relationship management (CRM) system are extracted as the initial data set X.

Implement the cosine similarity based k-means clustering method on vectors in X. Inspired by the customer segmentation in marketing (in conjunction with necessarily experts’ annotations), we cluster the common variables according to their relations in marketing practice.At last, the attributes are clustered into 4 (l=4) groups and 4 attributes of brand, area, age, and bill (having strong correlation with customers’ churn in the telecommunication industry) are chosen as the key attributes, respectively.

Moreover, the values of these four attributes are split into 3(k=3) sub-datasets, respectively, according to the SVD clustering results. The results are summarized in Table 3.

Table 3: Subdivision categories of each variables

4.3　Ensemble Model Construction

In the following, four ensemble classifiers will be built according to the sub-datasets separated by four attributes of brand, area, age and bill.

The classification algorithms of C5.0, logistic, and neural net algorithms are implemented on each sub-dataset for a series of repeated prediction experiments. The logic view of the ensemble classifier model construction is shown in Fig. 1.

Fig. 1. Logic view of ensemble classifier models.

For the attribute of brand, the training set X1 is firstly divided into three sub-datasets, namely GoTone, EasyOwn, and M-Zone. Each of them accounts for 7.2%, 80.7%, and 12.1%customers, respectively. In the learning process, each subset is separated firstly into training and test sets according to the ratio of 60.0% and 40.0%.

建议教学中教师在加强小数与分数联系的同时，让学生借助多元情境，经历直接描述小数产生的过程，即“将（）平均分成10份、100份……表示这样（）份的数就是零点几或零点零几……”在语言描述中让学生感受到小数的整体“1”和分数的整体“1”同等重要，从而形成自动化的意识，即解释小数意义时先思考该小数的整体“1”是什么。联系到数线问题，在解决问题时第一步应先思考“把（）进行平均分”。

Among all the classification results reported by each algorithm on the test dataset, the result with the largest AUC area under the ROC curve is selected as the basic model for such a sub-dataset. The AUC results reported by three models on each brand are shown in Table 4.

The comparative results are shown in Fig. 2. The results in Table 4 show that the neural net algorithm works the best in the prediction of GoTone and EasyOwn sub-datasets, whereas the C5.0 works the best on the M-Zone sub-dataset.

Table 4: AUC of prediction on brand sub-datasets

Model　GoTone　EasyOwn　M-Zone Training　Test　Training　Test　Training　Test C5.0　0.583　0.597　0.911　0.827　0.851　0.810 Neural net　0.796　0.803　0.851　0.852　0.830　0.807 Logistic　0.868　0.713　0.843　0.841　0.839　0.802

Similarly, the performances of classification (prediction) on sub-datasets split by attributes of area, age, and bill are reported in Tables 5 to 7, respectively. Accordingly, the visualized results are shown in Figs. 3 to 5.

4.4　Result Evaluation

Based on the values of selected key attributes of πi, in this study, the dataset will be split as follows:

1) Comparison of precision

The average accuracy of prediction provided by the four of each model based on X2 is summarized in Table 8. It shows that there is the highest precision (86.1%) reported while using the key attribute of area for data segmentation to build a hybrid model, followed by the result generated with the attribute bill(85.9%). However, the performance of hybrid models constructed by the attributes brand and age for data segmentation is lower (81.2% and 76.2%).

Fig. 2. AUC of prediction on brand sub-datasets: (a) GoTone, (b) EasyOwn, and (c) M-Zone.

Fig. 3. AUC of prediction on area sub-datasets: (a) area A, (b) area B, and (c) area C.

Fig. 4. AUC of prediction on age sub-datasets: (a) net age low, (b) net age middle, and (c) net age high.

Fig. 5. AUC of prediction on bill sub-datasets: (a) low consumption level, (b) middle consumption level, and (c) high consumption level.

Table 5: AUC of prediction on area sub-datasets

Model　Area A　Area B　Area C Training　Test　Training　Test　Training　Test C5.0　0.848　0.799　0.968　0.911　0.965　0.871 Neural net　0.820　0.834　0.928　0.924　0.919　0.921 Logistic　0.808　0.815　0.931　0.910　0.930　0.886

Table 6: AUC of prediction on age sub-datasets

Model　Low age　Middle age　High age Training　Test　Training　Test　Training　Test C5.0　0.915　0.866　0.868　0.805　0.855　0.715 Neural net　0.882　0.886　0.784　0.800　0.783　0.773 Logistic　0.869　0.859　0.815　0.821　0.801　0.748

Table 7: AUC of prediction on bill sub-datasets

Model　Low level　Middle level　High level Training　Test　Training　Test　Training　Test C5.0　0.923　0.870　0.829　0.729　0.894　0.690 Neural net　0.873　0.868　0.793　0.764　0.732　0.774 Logistic　0.872　0.856　0.802　0.772　0.761　0.732

Table 8: Prediction accuracy of the four hybrid models on test set X2

Key attribute　Brand　Area　Age　Bill Precision (%)　81.2　86.1　76.2　85.9

2) Comparison of ROC

The ROC curves for the prediction results provided by the four hybrid models on testing set X2 is shown in Fig. 6. The area under the ROC curve of each hybrid model is calculated in Table 9.

Comparing the results in Fig. 6 and Table 9, we know that the two hybrid models constructed based on attributes of area and bill would generate a better AUC (0.888 and 0.855) than based on brand and age (0.828 and 0.845).

According to the experiment results, we can conclude that using the attribute of area as the segment variable would get the best prediction results, which are followed by those of the bill attribute. However, the key attributes age and brand would perform relatively poorly. Therefore, in practice of customer churn prediction, it is recommended that telecommunication companies use the consumers’ bill information as the key attribute to build the customer churn prediction hybrid model for each area separately. Moreover,it is necessary to strengthen brand management and to improve the customer segmentation effect of different brands.

Fig. 6. ROC curve of prediction accuracy of the four hybrid models on test set X2.

Table 9: AUC of prediction accuracy of the four hybrid models on test set X2

Key attribute　Brand　Area　Age　Bill AUC　0.828　0.888　0.845　0.855

3) Limitations

The main idea of the method proposed in this work is to construct an ensemble classifier for higher precision and managerial insights. We should note some limitations of this work. First, there is a lack of criteria for how many base classifiers should be selected in the hybrid classifier. Second,the proposed method has involved some time consumption preprocessing processes in ensemble classifier construction, for example, the PCA and SVD methods, which would cause the higher complexity of computation.

海外项目安全管理覆盖面广，不仅要求做好生产安全管理，而且要做好营地及基地建设，员工的吃喝拉撒睡样样都得管。人员不仅有中方的，还有当地雇员和其他国籍的人员，每天必须处理和应对来自不同国家和地区的不同文化习俗和做法。

利用DNASTAR软件分析JEV P3株C基因编码127个氨基酸(如表2)，分子量大小13 KDa；从氨基酸组成上，25个强酸性氨基酸 (K,R)，5个强碱性氨基酸 (D, E)，53个疏水性氨基酸(A, I, L, F, W, V)，18个两性氨基酸 (N, C, Q, S, T, Y)，等电点为11. 89，在中性环境中带正电。C蛋白的二级结构分析 (图4) 发现C蛋白主要由β片层结构和α螺旋结构构成，中间有少量的β转角相连接。整个蛋白亲水性一般，有较强的抗原性。

5.Conclusions

Classification analysis has been widely used in the study of decision problems. However, with the increasing complexity of modern management and the diversity of related data, the results provided by a single classifier are suspected of having poor semantics, thus are hard to understand in the management practice, especially for the prediction tasks with the very complex data and managerial scenarios.

Regarding to the management issues of classification and prediction, an ensemble of single classifiers is an effective way to improve the prediction results. In order to solve the problems of poor precision and management semantics caused by the ordinary ensemble classifiers, in this paper, we proposed the ensemble classifier construction method based on the key attributes in the data set. The experimental results based on the real data collected from China Mobile show that the keyattributes-based ensemble classifier has the advantages on both prediction accuracy and result comprehensibility.

References

[1]M. J. Berry and G. S. Linoff, Data Mining Techniques: for Marketing, Sales, and Customer Support, New York: John Wiley & Sons, 1997, ch. 8.

[2]Y. K. Noh, F. C. Park, and D. D. Lee, “Diffusion decision making for adaptive k-nearest neighbor classification,”Advances in Neural Information Processing Systems, vol. 3,pp. 1934-1942, Jan. 2012.

[3]X.-L. Xia and Jan H.-H. Huang, “Robust texture classification via group-collaboratively representation-based strategy,” Journal of Electronic Science and Technology, vol.11, no. 4, pp. 412-416, Dec. 2013.

[4]S. Archana and D. K. Elangovan, “Survey of classification technique in data mining,” Intl. Journal of Computer Science and Mobile Applications, vol. 2, no. 2, pp. 65-71, 2014.

[5]H Grimmett, R. Paul, R. Triebel, and I. Posner, “Knowing when we don’t know: Introspective classification for mission-critical decision making,” in Proc. of IEEE Intl.Conf. on Robotics and Automation, Karlsruhe, 2013, pp.4531-4538.

[6]R. L. MacTavish, S. Ontanon, J. Radhakrishnan, et al., “An ensemble architecture for learning complex problem-solving techniques from demonstration,” ACM Trans. on Intelligent Systems and Technology, vol. 3, no. 4, pp. 1-38, 2012.

[7]T. G. Dietterich, Ensemble Methods in Machine Learning,Multiple Classifier Systems, Berlin: Springer, 2000, pp. 1-15.

[8]Z.-H. Zhou, Ensemble　 Methods: Foundations and Algorithms, Boca Raton: Chapman and Hall/CRC, 2012.

[9]X.-Y. Hu, M.-X. Yuan, J.-G. Yao, et al., “Differential privacy in telco big data platform,” in Proc. of the 41st Intl.Conf. on Very Large Data Bases, Kohala Coast, 2015, pp.1692-1703.

[10]H. S. Kim and C. H. Yoon, “Determinants of subscriber churn and customer loyalty in the Korean mobile telephony market,” Telecommunications Policy, vol. 28, no. 9, pp. 751-765, 2004.

[11]L. Tian, K.-P. Zhang, and Z. Qin, “Application of a Bayesian network learning algorithm in telecom CRM,” Modern Electronics Technique, vol. 10, pp. 52-55, Oct. 2005.

[12]A. Sharma and P. K. Panigrahi, “A Neural network based approach for predicting customer churn in cellular network services,” Intl. Journal of Computer Applications, vol. 27,no. 11, pp. 26-31, Aug. 2011.

[13]Y.-Q. Huang, F.-Z. Zhu, M.-X. Yuan, et al., “Telco churn prediction with big data,” in Proc. of ACM SIGMOD Intl.Conf. on Management of Data, Melbourne, 2015, pp. 607-618.

[14]A. Ultsch, “Emergent self-organizing feature maps used for prediction and prevention of churn in mobile phone markets,” Journal of Targeting, Measurement and Analysis for Marketing, vol. 10, no. 4, pp. 314-324, 2002.

[15]A. Rodan, H. Faris, J. Alsakran, and O. Al-Kadi, “A support vector machine approach for churn prediction in telecom industry,” Intl. Journal on Information, vol. 17, pp. 3961-3970, Aug. 2014.

[16]W. H. Au, KCC. Chen, and X. Yao, “A novel evolutionary data mining algorithm with applications to churn prediction,”IEEE Trans. on Evolutionary Computation, vol. 7, no. 6, pp.532-545, 2003.

[17]S. B. Kotsiantis and P. E. Pintelas, “Combining bagging and boosting,” Intl. Journal of Computational Intelligence, vol. 1,no. 4, pp. 324-333, 2004.

[18]N. C. Oza, “Online bagging and boosting,” in Proc. of IEEE Intl. Conf. on Systems, Man & Cybernetics, Tucson, 2001,pp. 2340-2345.

[19]C.-X. Zhang, J.-S. Zhang, and G.-W. Wang, “A novel bagging ensemble approach for variable ranking and selection for linear regression models,” in Multiple Classifier Systems, Friedhelm Schwenker, Ed. Switzerland: Springer,2015, pp. 3-14.

[20]W. Drira and F. Ghorbel, “Decision bayes criteria for optimal classifier based on probabilistic measures,” Journal of Electronic Science and Technology, vol. 12, no. 2, pp. 216-219, 2014.

[21]J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques: Concepts and Techniques, 3rd ed., San Francisco: Morgan Kaufmann, 2011.

[22]P. Drineas, A. Frieze, R. Kannan, S. Vempala, and V. Vinay,“Clustering large graphs via the singular value decomposition,” Machine Learning, vol. 56, no. 1, pp. 9-33,2004.

[23]T. Bujlow, T. Riaz, and J. M. Pedersen, “A method for classification of network traffic based on C5.0 machine learning algorithm,” in Proc. of IEEE Intl. Conf. on Computing, Networking and Communications, Okinawa,2012, pp. 237-241.

[24]A. P. Bradley, “The use of the area under the ROC curve in the evaluation of machine learning algorithms,” Pattern Recognition, vol. 30, pp. 1145-1159, Jul. 1997.

作者

Yu Qian，Liang-Qiang Li，Jian-Rong Ran，Pei-Ji Shao

基金

分类号

出处

《Journal of Electronic Science and Technology》 2018年第1期

上一篇：Learning Association Rules and Tracking the Changing Concepts on Webpages: An Effective Pornographic Websites Filtering Approach

下一篇：Security Enhanced Anonymous User Authenticated Key Agreement Scheme Using Smart Card

《Journal of Electronic Science and Technology》2018年第1期文献

Probabilistic Quantitative Temporal Constraints:Representing, Reasoning, and Query Answering 作者：Paolo Terenziani，Antonella Andolina

Optimizing Deep Learning Parameters Using Genetic Algorithm for Object Recognition and Robot Grasping 作者：Delowar Hossain，Genci Capi，Mitsuru Jindai

Novel Biological Based Method for Robot Navigation and Localization 作者：Endri Rama，Genci Capi，Yusuke Fujimura，Norifumi Tanaka，Shigenori Kawahara，Mitsuru Jindai

Learning Association Rules and Tracking the Changing Concepts on Webpages: An Effective Pornographic Websites Filtering Approach 作者：Jyh-Jian Sheu

Key-Attributes-Based Ensemble Classifier for Customer Churn Prediction 作者：Yu Qian，Liang-Qiang Li，Jian-Rong Ran，Pei-Ji Shao

Security Enhanced Anonymous User Authenticated Key Agreement Scheme Using Smart Card 作者：Jaewook Jung，Donghoon Lee，Hakjun Lee，Dongho Won

Pairing-Free Certificateless Key-Insulated Encryption with Provable Security 作者：Li-Bo He，Dong-Jie Yan，Hu Xiong，Zhi-Guang Qin

Overview of Graphene as Anode in Lithium-Ion Batteries 作者：Ri-Peng Luo，Wei-Qiang Lyu，Ke-Chun Wen，Wei-Dong He

High Power Highly Nonlinear Holey Fiber with Low Confinement Loss for Supercontinuum Light Sources 作者：Feroza Begum，Juliana Zaini，Saifullah Abu Bakar，Iskandar Petra，Yoshinori Namihira

Multi-Reconfigurable Band-Notched Coplanar Waveguide-Fed Slot Antenna 作者：M. Lertwatechakul，C. Benjangkaprasert

UEs Power Reduction Evolution with Adaptive Mechanism over LTE Wireless Networks 作者：Ruchi Sachan，Chang Wook Ahn

Modeling TCP Incast Issue in Data Center Networks and an Adaptive Application-Layer Solution 作者：Jin-Tang Luo，Jie Xu，Jian Sun

Journal of Electronic Science and Technology Information for Authors 2016/07/05

Call for Papers Journal of Electronic Science and Technology Special Section on Energy-Efficient Technologies 2016/07/05

Call for Papers Journal of Electronic Science and Technology Special Section on Terahertz Technology and Applications 2016/07/05

Message from JEST Editorial Committee 2016/07/05

杂志信息网

Key-Attributes-Based Ensemble Classifier for Customer Churn Prediction

1.Introduction

2.Related Work

2.1 Classification Models for Churn Prediction

2.2 Ensemble Classifier

3.Research Method

3.1 Research Problem

3.2 Key Attribute Selection

3.3 Attribute Value Based Dataset Splitting

3.4 Ensemble Classifier

3.5 Evaluation Method