|
|
|
 |
Detecting Spammers with SNARE: Spatio-temporal Network-level
Automatic Reputation Engine
Shuang Hao, Nick Feamster, Alexander Gray, Nadeem Syed, and Sven Krasser
USENIX Security Symposium 2009
We demonstrated the ability to perform automatic spam blacklisting without examining email content at all -- instead, looking at senders'
spatio-temporal activities.
[pdf]
Abstract:
Users and network administrators need ways to filter email messages based primarily on the
reputation of the sender. Unfortunately, conventional mechanisms for sender reputation -- notably, IP
blacklists are cumbersome to maintain and evadable. This paper investigates ways to infer the
reputation of an email sender based solely on network-level features, without looking at the contents of a
message. First, we study first-order properties of network-level features that may help distinguish
spammers from legitimate senders. We examine features that can be ascertained without ever looking at a
packet's contents, such as the distance in IP space to other email senders or the geographic distance
between sender and receiver. We derive features that are lightweight, since they do not require seeing
a large amount of email from a single IP address and can be gleaned without looking at an email's
contents -- many such features are apparent from even a single packet. Second, we incorporate these
features into a classification algorithm and evaluate the classifier's ability to automatically classify email
senders as spammers or legitimate senders. We build an automated reputation engine, SNARE, based on
these features using labeled data from a deployed commercial spam-filtering system. We demonstrate
that SNARE can achieve comparable accuracy to existing static IP blacklists: about a 70% detection
rate for less than a 0.3% false positive rate. Third, we show how SNARE can be integrated into existing
blacklists, essentially as a first-pass filter.
@incollection{hao2009snare,
title = "{Detecting Spammers with SNARE: Spatio-temporal Network-level
Automatic Reputation Engine}",
author = "Shuang Hao and Nick Feamster and Alexander Gray and Nadeem Syed and Sven Krasser",
booktitle = "Proceedings of the Eighteenth USENIX Security Symposium"
year = "2009"
}
|
|
|
 |
|
|
Machine Learning in Relational Databases
Most of the world's data is business data, which mostly lives in relational databases. We have developed the first scheme for
performing scalable machine learning analyses inside relational databases.
[see full entry here]
|
|
|
 |
|
|
A Research Document Search Engine
We are developing new methods for text analysis, including topic modeling, in the context of a system for retrieval and visualization of
research papers.
|
Nonlinear Recommendation Systems
Recommendation systems are mostly based on linear dimension reduction methods. We are developing an approach to recommender systems
based on more powerful machine learning methods.
|
Data-Intensive Computing and Networking
We are developing approaches to data-intensive computing which account for the fact that massive datasets must be sent over the network
to clusters or clouds of computers, a current bottleneck.
|
|
|
|