Home // ICIMP 2016, The Eleventh International Conference on Internet Monitoring and Protection // View article


Detecting Obfuscated JavaScripts Using Machine Learning

Authors:
Simon Aebersold
Krzysztof Kryszczuk
Sergio Paganoni
Bernhard Tellenbach
Timothy Trowbridge

Keywords: Computer security; Machine learning; Pattern analysis; Classification algorithms; JavaScript, Random Forest; Malicious

Abstract:
JavaScript is a common attack vector for attacking browsers, browser plug-ins, email clients and other JavaScript enabled applications. Malicious JavaScripts redirect victims to exploit kits, probe for known vulnerabilities to select a fitting exploit or manipulate the Document Object Model (DOM) of a web page in a harmful way. Malicious JavaScript code is often obfuscated in order to make it hard to detect using signature-based approaches. Since the only other reason to use obfuscation is to protect intellectual property, the share of scripts which are both benign and obfuscated is quite low, and could easily be captured with a whitelist. A detector that can reliably detect obfuscated JavaScripts would therefore be a valuable tool in fighting malicious JavaScripts. In this paper, we present a method for automatic detection of obfuscated JavaScript using a machine-learning approach. Using a dataset of regular, minified and obfuscated samples from a content delivery network and the Alexa top 500 websites, we show that it is possible to distinguish between obfuscated and non-obfuscated scripts with precision and recall around 99%. We also introduce a novel set of features, which help detect obfuscation in JavaScripts. Our results presented here shed additional light on the problem of distinguishing between malicious and benign scripts.

Pages: 11 to 16

Copyright: Copyright (c) IARIA, 2016

Publication date: May 22, 2016

Published in: conference

ISSN: 2308-3980

ISBN: 978-1-61208-475-6

Location: Valencia, Spain

Dates: from May 22, 2016 to May 26, 2016