Fighting malware with simple maths

We will go into greater detail around the science of collection, across hundreds of millions of files, collected via feeds from industry sources, proprietary organisational repositories and live inputs from active computers with Dell agents on them. We will discuss how scale ensures we benefit from a statistically significant sample size, we cover the broadest possible range of file types and file authors such as Microsoft, Adobe etc.
We will detail how we extract leveraging the compute capacity of machines and data-mining techniques to identify the broadest possible set of characteristics of a file.
We’ll then explain how we classify using our mathematical approach, to build models that specifically determine if a file is valid or malicious. In so doing, we gain a holistic perspective on the files running in the environment. It also eliminates the current industry bias in which threat researchers only determine if a file is malicious and whitelist vendors only determine if a file is good.
Finally we will demonstrate via a live demo of Threat Prevention against zero day exploits along with commodity (known) threats to prove our effectiveness.
***
As the number of threats globally reaches 600 million1, security teams are struggling to keep up with and provide an effective defence. Dell questions how traditional, signature based defences and man-made interventions can efficiently and effectively cope with the volume, sophistication and velocity of todays’ threats. We also describe how Dell has invested in a new approach to threat prevention that utilises machine learning and artificial intelligence to stop the crime by identifying the attacker before the crime takes place.
Traditional, signature-based threat protections are widely recognised to be 45%2 effective against modern threats which are becoming increasingly disguised, malicious, targeted and commercialised. The failure of traditional solutions is derived in the main by the fact that, as humans we learn over time, through experience and understanding. The corresponding technology we have developed to date is designed to stop threats we can recognise and in turn develop an antidote. The problem with this approach is that to stop the criminals, we must wait for a victim, understand how it happened, develop a solution and deploy to all endpoints before tomorrow’s tidal wave of up to 1 million3 new threats.
Cyber-criminals know human potential, our physical, endurance and capacity limitations. As a result they take advantage by launching attacks that are varied by their features, origin and purpose so as to continually enhance their complexity, velocity and inevitable success rate. Despite advances we have developed in understanding risks, vulnerabilities and behaviours, our solutions suffer from the same flaw – they all are based on requiring a human to provide analysis, understanding and answers.
It’s time for machine-derived intervention techniques that can leverage machine learning and artificial intelligence to expose the criminal’s intent before the crime. Machines unlike humans, don’t suffer from the same physical limitations and flaws, bias or capacity to learn.
Machine learning as a science focuses on prediction and probability based on the understanding of patterns. Since it came to prominence in the 1990’s, billions of files of all types have been created, from a myriad of sources. Yet, despite the variety and volume, there are only two options a ‘threat-detective’ machine needs to consider - Is this file malicious or harmless? Since we first started to machine-learn, irrespective of the number and sources, clear file types and patterns have emerged that describe and define how applications and files are constructed and consequently will behave.
Unlike humans, machines with access to multiple data sources and vast datasets, become even more predictive and precise. The more data they can interrogate, the more intelligent, accurate and faster they become in making definitive decisions. By cross referencing a file’s attributes systematically, across thousands of variables, machines can in milliseconds, eliminate the need for human intervention, corresponding decision-delays and potential bias.
It’s time for cyber-crime to be fought by the most effective means known to man. It’s time to use today’s technology, machines and artificial intelligence, to expose and exploit the most accurate identification of a criminal’s identity – DNA.
1 AV-TEST.org https://www.av-test.org/en/statistics/malware/
2 Wallstreet Journal - http://www.wsj.com/news/article_email/SB10001424052702303417104579542140235850578-lMyQjAxMTA0MDAwNTEwNDUyWj
3 CNN and Symantec http://money.cnn.com/2015/04/14/technology/security/cyber-attack-hacks-security/
All talks: