Open Access Open Access  Restricted Access Subscription Access

Outlier Detection based on Eigenspace Subtracting

Bo Jin, Zhongliang Jing, Haitao Zhao, Rongli Liu, Han Pan, Canlong Zhang

Abstract


Many existed outlier detection methods are not suitable for the high-dimensional and large-scale problem because of their needs to generate k nearest neighborhoods or assume a specific data distribution. Angel information or subspace projection is more reliable for the high-dimension data. Therefore, we propose an angle-based outlier detection algorithm based on eigenspace subtracting. Within the leaveone-out (LOO) procedure, an outlier sample is detected by a multiple principle component (PC) scheme, i.e. observing the change of sub-leading principle directions caused by removing the candidate point. The eigenspace subtracting method is adopted as the optimal estimator of PCs. We also build a natural link between eigenspace subtracting and anglebased outlier detection. A lot of comparative experiments demonstrate that the proposed ES-OD algorithm significantly outperforms benchmark outlier detection methods with a valuable efficiency.

Full Text:

PDF

References


C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.

K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, 1990.

D. M. Hawkins, Identification of Outliers, Chapman & Hall, 1980.

H.-P. Kriegel, M. S. hubert, A. Zimek, Angle-based outlier detection in high-dimensional data, in: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2008, pp. 444–452.

Y.-J. Lee, Y.-R. Yeh, Y.-C. F. Wang, Anomaly detection via online over-sampling principal component analysis, IEEE Transactions on Knowledge and Data Engineering 25 (7) (2013) 1460–1470.

P. Hall, D. Marshall, R. Martin, Merging and splitting eigenspace models, IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (9) (2000) 1042–1049.

P. Hall, D. Marshall, R. Martin, Adding and subtracting eigenspaces with eigenvalue decomposition and singular value decomposition, Image and vision computing 20 (2) (2002) 1009–1016.

V. Chandola, A. Banerjee, V. Kumar, Anomaly detection: A survey, ACM Computing Surveys 41 (3) (2009) 15:1–58.

V. Barnett, The ordering of multivariate data (with discussion), Journal of the royal statistical society Series. A (139) (1976) 318–354.

N. Ye, Q. Chen, An anomaly detection technique based on a chi-square statistic for detecting intrusions into information systems, Quality and reliability engineering international 2001, 17 (2001) 105–112.

M. Breunig, H.-P. Kriegel, R. T. Ng, J. Sander, LOF: Identifying density-based local outliers, in: Proceeding of the 2000 ACM SIGMOD International Conference on Management of Data, ACM, 2000, pp. 93–104.

I. T. Jolliffe, Principal Component Analysis, 2nd Ed., Springer, 2002.

M. L. Shyu, S. C. CHEN, K. SARINNAPAKORN, L. CHANG, A novel anomaly detection scheme-based on principal component classifier. In Proceedings of the 3rd IEEE International Conference on Data Mining. (2003) 353–365.

K. Bache, M. Lichman, http://archive.ics.uci.edu/mlUCI machine learning repository (2013). http://archive.ics.uci.edu/ml

A. V. Uzilov, J. M. Keegan, D. H. Mathews., Detection of non-coding rnas on the basis of predicted secondary structure formation free energy change, BMC Bioinformatics 7 (1) (2006) 173–202.

P. J. Phillips, H. Moon, S. A. Rizviand, P. J. Rauss, The FERET evaluation methodology for face-recognition algorithms, IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (10) (2000) 1090–1104.

A. M. Martinez, R. Benavente, The AR face database, CVC Technical Report #24.

A. P. Bradley, The use of the area under the roc curve in the evaluation of machine learning algorithms, Pattern Recognition 30 (1997) 1145–1159.




DOI: http://dx.doi.org/10.21535%2FProICIUS.2014.v10.303

Refbacks

  • There are currently no refbacks.