Home // SECURWARE 2011, The Fifth International Conference on Emerging Security Information, Systems and Technologies // View article


Proposal of n-gram Based Algorithm for Malware Classification

Authors:
Abdurrahman Pektaş
Mehmet Eriş
Tankut Acarman

Keywords: malware; n-gram based; classification

Abstract:
Obfuscation techniques degrade the n-gram features of binary form of the malware. In this study, methodology to classify malware instances by using n-gram features of its disassembled code is presented. The presented statistical method uses the n-gram features of the malware to classify its instance with respect to their families. n-gram is a fixed size sliding window of byte array, where n is the size of the window. The contribution of the presented method is capability of using only one vector to represent malware subfamily which is called subfamily centroid. Using only one vector for classification simply reduces the dimension of the n-gram space. Experimental results are performed over a fairly large data set, which is being collected through Computer Emergency Response Team (CERT) activities in the National Research Institute of Electronics and Cryptology, to illustrate the effectiveness of the proposed malware classification methodology.

Pages: 14 to 18

Copyright: Copyright (c) IARIA, 2011

Publication date: August 21, 2011

Published in: conference

ISSN: 2162-2116

ISBN: 978-1-61208-146-5

Location: Nice/Saint Laurent du Var, France

Dates: from August 21, 2011 to August 27, 2011