

浏览全部资源
扫码关注微信
Published:30 June 2018,
Published Online:2018-06,
移动端阅览
XUE ZHANG, ZHIGUO SHI, XUAN LIU. Multilayer neural network model for unbalanced data. [J]. Chinese journal on internet of things, 2018, 2(2): 65-72.
XUE ZHANG, ZHIGUO SHI, XUAN LIU. Multilayer neural network model for unbalanced data. [J]. Chinese journal on internet of things, 2018, 2(2): 65-72. DOI: 10.11959/j.issn.2096-3750.2018.00055.
传统的不平衡数据分类问题往往会因为类间数据不平衡造成分类器的性能下降。利用 AUC(ROC 曲线下的面积)为评价指标,结合单类 F-score 特征选择和遗传算法建立多层神经网络模型,选出对于不平衡数据分类更有利的特征子集,从而建立更适用于不平衡数据分类的深度模型。基于Tensor Flow建立多层神经网络模型,通过对4组不同UCI数据集进行测试,并与传统的机器学习算法如朴素贝叶斯、K最近邻、神经网络等进行对比验证。实验证明,所提模型在处理不平衡数据分类问题上的表现更优秀。
Classification of unbalanced data often has low performance of the classifier because of the unbalance of data between classes.Using AUC (the area under the ROC curve) as evaluation index
combined with one class F-score feature selection and genetic algorithm
a multilayer neural network model was established
and a more favorable feature set for unbalanced data classification was selected
so as to establish a deeper model suitable for classification of unbalanced data.Based on Tensor Flow
a multilayer neural network model was established.Using four different UCI datasets for testing
and comparing with the traditional machine learning algorithms such as Naive Bayesian
KNN
neural networks
etc
the performance of the proposed model built on the unbalanced data classification is more excellent.
不平衡数据单类F-score特征选择遗传算法多层神经网络
unbalanced dataone class F-score feature selectiongenetic algorithmmultilayer neural network
CHAWLA N V, JAPKOWICZ N, KOTCZ A . Editorial:special issue on learning from imbalanced data sets[J]. ACM Sigkdd Explorations Newsletter, 2004,6(1): 1-6.
EZAWA K J, SINGH M, NORTON S W . Learning goal oriented Bayesian networks for telecommunications risk management[C]// Thirteenth International Conference on International Conference on Machine Learning. 1996: 139-147.
BATISTA G E A P A, PRATI R C, MONARD M C . A study of the behavior of several methods for balancing machine learning training data[J]. ACM Sigkdd Explorations Newsletter, 2004,6(1): 20-29.
JAPKOWICZ N, STEPHEN S . The class imbalance problem:a systematic study[M]. Amsterdam: IOS PressPress, 2002.
WEISS G M . Mining with rarity:a unifying framework[J]. ACM Sigkdd Explorations Newsletter, 2004,6(1): 7-19.
AKBANI R, KWEK S, JAPKOWICZ N . Applying support vector machines to imbalanced datasets[J]. Lecture Notes in Computer Science, 2001,3201: 39-50.
RASKUTTI , BHAVANI , KOWALCZYK .Extreme re-balancing for SVMs:a case study[J]. ACM Sigkdd Explorations Newsletter, 2004,6(1): 60-69.
WU G, CHANG E Y . Class-boundary alignment for imbalanced dataset learning[J]. ICML Workshop on Learning from Imbalanced Data Sets, 2003: 49-56.
ZHANG J, MANI I . KNN approach to unbalanced data distributions:a case study involving information extraction[C]// The ICML 2003 Workshop on Learning from Imbalanced Datasets. 2003.
PATCHA A, PARK J M . An overview of anomaly detection techniques:existing solutions and latest technological trends[J]. Computer Networks, 2007,51(12): 3448-3470.
FAWCETT T, PROVOST F . Adaptive fraud detection[J]. Data Mining& Knowledge Discovery, 1997,1(3): 291-316.
CARDIE C, NOWE N . Improving minority class prediction using case-specific feature weights[C]// Fourteenth International Conference on Machine Learning. 1997: 57-65.
BLAKE C . UCI repository of machine learning databases[J]. Department of Information and Computer Science, 1998.
MALOOF M A . Learning when data sets are imbalanced and when costs are unequal and unknown[J]. ICML-2003 Workshop on Learning from Imbalanced Data Sets II, 2003.
KUBAT M, MATWIN S . Addressing the curse of imbalanced training sets:one-sided selection[C]// International Conference on Machine Learning. 2012: 179-186.
CHAWLA N V, BOWYER K W, HALL L O ,et al. SMOTE:synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002,16(1): 321-357.
JOSHI M V, KUMAR V, AGARWAL R C . Evaluating boosting algorithms to classify rare classes:comparison and improvements[C]// IEEE International Conference on Data Mining. 2001: 257-264.
王和勇, 樊泓坤, 姚正安 ,等. 不平衡数据集的分 类方法研究[J]. 计算机应用研究, 2008,25(5): 1301-1303.
WANG H Y, FAN H K, YAO Z A ,et al. Research on the classification method of unbalanced dataset[J]. Computer Application Research, 2008,25(5): 1301-1303.
LEE M C . Using support vector machine with a hybrid feature selection method to the stock trend prediction[J]. Expert Systems with Applications, 2009,36(8): 10896-10904.
MALDONADO S, WEBER R . A wrapper method for feature selection using support vector machines[J]. Information Sciences, 2008,179(13): 2208-2217.
LIU Y, ZHENG Y F . FS_SFS:a novel feature selection method for support vector machines[J]. IEEE International Conference on Acoustics, 2006,39(7): 1333-1345.
RAMARAJ N, RAMARAJ N . A hybrid prediction model with F-score feature selection for type II Diabetes databases[C]// Amrita ACM-W Celebration on Women in Computing in India. 2010:13.
LIN X, WEI H, WANG F ,et al. A breast cancer risk classification model based on the features selected by novel f-score index for the imbalanced multi-feature dataset[C]// International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery. 2017.
HOLLAND J H . Adaption in natural and artificial systems[J]. Quarterly Review of Biology, 1975,6(2): 126-137.
HINTON G E, SALAKHUTDINOV R R . Reducing the dimensionality of data with neural networks[J]. Science, 2006,313: 504-507.
HINTON G E, OSINDERO S, TEH Y W . A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006,18(7): 1527-1554.
沈崇圣 . 遗传算法中常用选择算子在 MATLAB 中的实现[J]. 上海应用技术学院学报(自然科学版), 2003,3(3): 199-202.
SHEN C S . The implementation of commonly used selection operators in MATLAB in genetic algorithm[J]. Journal of Shanghai Institute of Technology (Natural Science Edition), 2003,3(3): 199-202.
林晓丽 . 复杂高维医学数据挖掘与疾病风险分类研究[D]. 北京:北京科技大学, 2016.
LIN X L . Research on complex high-dimensional medical data mining and disease risk classification[D]. Beijing:University of Science and Technology Beijing, 2016.
0
Views
980
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621