Within the past few years, the Internet has, to a great extent, impacted every aspect of
our daily life. Such impact has played a major role in influencing the design, deployment
and functionality of enterprise, campus and even home computer networks. As we
increasingly depend on computer networks for communication, information access and
storage; entertainment and other activities, managing and securing such networks are
critical. Due to its scale and complexity, managing and securing today’s large campus or
enterprise networks is a challenging task. The scale and complexity comes not only from
the number of heterogeneous hosts and devices on the network (e.g., various servers,
desktop office client machines, laptops, lab machines, wireless access points, routers and
so forth), but also from a wide range of diverse applications running on these machines.
In this thesis, we conduct a study for developing methodologies to profile and track
activities within networks by addressing two key problems: capturing the dynamic interaction
represented by Internet traffic between inside and outside hosts at the block
level; and synthesizing static knowledge-base on hosts and networks to map dynamic
interaction to interpretable profiles. We develop methodologies utilizing machine learning
techniques for capturing, characterizing and profiling activities within the network.
Next, we take these techniques one step further by proposing tools and systems that
address profiling and tracking as a utility in a large-scale distributed system.
More specifically, we propose a Hierarchical Extraction of Activity Patterns (HEAPs)
methodology to characterize and profile activity patterns within the subnet. We express
activities in a host-port association matrix and apply Probabilistic Latent Semantic
Analysis (pLSA) to co-cluster dominant and significant activities within the subnet.
We also propose a Block-wise (host) Port Activity Matrix (BPAM) to describe the
traffic within a block. We then apply Singular Value Decomposition (SVD) low-rank
approximation techniques to obtain the low-dimensional subspace representation which
captures the typical activities within the block and consequently assign a high-level descriptive
label summarizing the activities within the block. We also develop methods
to track and quantify changes in the activity within the subnet (or block) over time
and demonstrate how to utilize these methods to identify major changes and anomalies within the network. We demonstrate the utility of a light-weigh self-contained tool for
multi-level analysis of activities within the network. While the tool does not solve a
specific security problem, it helps users and operators localize problems within a small
network or individual host.
While our methodologies provide the dynamic interaction within the network, it
lacks additional information that help validate the profiling results. Towards that end,
we develop a methodology to differentiate dynamic from static IP address blocks. More
specifically, we propose a scanning-based technique for identifying dynamic IP addresses
blocks within the network. We also include other statistic information by building a
system that maps dynamic interaction to static information in a datacenter-like environment.
Our system addresses key design issues for providing network management
and profiling services in a collaborative system with interpretable characterization and
The thesis serves 1) to propose various novel methodologies utilizing machine learning
techniques to extract and profile the behavior of hosts and blocks within the network;
2) to pinpoint design principles for building light-weight as well as large-scale systems
for profiling and tracking activities in the network; 3) to propose how to incorporate
static information readily available within on-line tools to provide interpretation and
mapping for network dynamic interaction.
University of Minnesota Ph.D. dissertation. May 2011. Major: Computer science. Advisor: Zhi-Li Zhang. 1 computer file (PDF); xi, 154 pages.
Sharafuddin, Esam Ahmed.
Providing network profiling and tracking utility in large distributed systems..
Retrieved from the University of Minnesota Digital Conservancy,
Content distributed via the University of Minnesota's Digital Conservancy may be subject to additional license and use restrictions applied by the depositor.