scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

User traffic classification for proxy-server based internet access control

TL;DR: A proxy-based mechanism has been proposed for classification of users according to the share of their Internet access, where users sharing the same computer can be distinguished by the proxy server and appropriate control policies can be exercised.
Abstract: In a LAN, Internet access should be managed well for a better user experience. Those using a larger share of the bandwidth may be restricted during peak hours to enable others to use the Internet. This can be viewed as a problem of classifying the users based on their Internet usage into normal and high categories, following which control policies may be applied. For this purpose, a proxy-based mechanism has been proposed for classification of users according to the share of their Internet access. The advantage of this approach is that users sharing the same computer can be distinguished by the proxy server and appropriate control policies can be exercised. To understand user behaviour, data is collected at the proxy server in a campus LAN. Machine learning algorithms are then used to learn and characterise user behaviour. In particular, Naive Bayes' and Gaussian Mixture Model based classifiers are used. It is observed that the algorithms are able to scale in that users are clustered into two different groups. Performance evaluation on a held out data set indicates that users can be accurately distinguished 94.96% of the time. The algorithm is also practical since the time consuming task of model building need be done only once a month offline, while the daily task of classification may be accomplished in a period of 20 mins for GMMs. It has also been shown how the user behavior of the two groups of users may be characterized. This would be a useful aid in the design of policies and algorithms for Internet access control.
Citations
More filters
01 Jan 2010
TL;DR: In this article, the authors analyze traffic measurements from a Swedish municipal broadband access network and derive corresponding user behavior models, as well as user activity characteristics, such as session lengths and traffic rate distributions.
Abstract: Internet usage has changed, and the demands on the broadband access networks have increased, both regarding bandwidth and QoS. Characterizing the traffic, as seen by a broadband access network, can help understanding both the demands of today and the demands of tomorrow. In this paper we analyze traffic measurements from a Swedish municipal broadband access network and derive corresponding user behavior models. The paper focuses on Internet usage in terms of traffic patterns, volumes and applications. Also, user activity characteristics, as session lengths and traffic rate distributions, are analyzed and modeled. Notably, the resulting models for user session lengths turn out different than traditionally assumed.

7 citations

Proceedings ArticleDOI
08 May 2014
TL;DR: The objective of this paper is to proactively classify anomalous accesses to enable campus ISPs to deny access to users, misusing the Internet.
Abstract: Internet availability on a campus is not metered. Internet link bandwidths are vulnerable as they can be misused. Moreover, websites blacklist campuses for misuse. Especially blacklisting by academic websites like IEEE and ACM can lead to serious researchers being denied access to information. The objective of this paper is to proactively classify anomalous accesses. This will enable campus ISPs to deny access to users, misusing the Internet. In particular URLs are classified using the short snippets(meta-data) that are available. New Features, namely random walk term weights, within class popularity in tandem with non negative matrix factorization show a lot of promise for classifying URLs. The classification accuracy is as a high as 92.96% on 10 gigabytes of proxy data.

3 citations


Cites background from "User traffic classification for pro..."

  • ...Earlier efforts in controlling traffic primarily relied on classifying users based on their share of bandwidth [1]....

    [...]

Proceedings ArticleDOI
01 Dec 2014
TL;DR: Empirical evaluations with features like AutoRegressive model roots and Line Spectral Pairs obtained from the request time series along with data description show a lot of promise for detecting systematic downloading with F-measure and accuracy as high as 0.90 and 99.5%, respectively.
Abstract: The Internet has become a vital source of information. This comes with the attendant problems, namely misuse. Systematic downloading of academic and digitized media have become commonplace. Academic institutions in particular get blacklisted owing to free availability of Internet across campus. The objective of this paper is to pro-actively detect systematic downloading. Time series of number of requests are analyzed with pattern analysis techniques. A characterization of the model in the Z-domain shows that the roots of the transfer function form separate clusters during normal and abnormal behavior of traffic. Stability of the system has been used as a cue to detect systematic downloading. Analyzing the trajectory and location of roots to detect systematic downloading involves complex decisions which may not be robust to evolving traffic. This issue is addressed by using Support Vector Data Description that learns a hypersphere enclosing normal traffic. Our empirical evaluations with features like AutoRegressive model roots and Line Spectral Pairs (LSP) obtained from the request time series along with data description show a lot of promise for detecting systematic downloading with F-measure and accuracy as high as 0.90 and 99.5%, respectively. This hybrid approach ensures low false alarms, misses and guarantees robustness of the system.

1 citations


Cites background from "User traffic classification for pro..."

  • ...It logs the details of all the requests from the client [19]....

    [...]

Patent
07 May 2019
TL;DR: In this paper, a proxy server offline method and a control server is proposed, and the method comprises the steps: obtaining all interfaces of the proxy server, and identifying the type of each interface; continuously monitoring whether each interface in the type has a call failure state or not; and if the interface of the type having a calling failure state and a preset condition is met, the interfaces of type in the proxy servers is offline.
Abstract: The invention provides a proxy server offline method and a control server, and the method comprises the steps: obtaining all interfaces of a proxy server, and identifying the type of each interface; For each type, continuously monitoring whether each interface in the type has a call failure state or not; And if the interface of the type has a calling failure state and a preset condition is met, the interface of the type in the proxy server is offline. According to the method, online and offline of proxy server interface granularity can be achieved, the offline real-time performance of the proxy server is improved, the response speed of the proxy server to encryption machine faults is increased, and the technical problem that in the prior art, the offline real-time performance of the proxyserver is poor is solved.
References
More filters
Journal ArticleDOI
TL;DR: The contributions of this special issue cover a wide range of aspects of variable selection: providing a better definition of the objective function, feature construction, feature ranking, multivariate feature selection, efficient search methods, and feature validity assessment methods.
Abstract: Variable and feature selection have become the focus of much research in areas of application for which datasets with tens or hundreds of thousands of variables are available. These areas include text processing of internet documents, gene expression array analysis, and combinatorial chemistry. The objective of variable selection is three-fold: improving the prediction performance of the predictors, providing faster and more cost-effective predictors, and providing a better understanding of the underlying process that generated the data. The contributions of this special issue cover a wide range of aspects of such problems: providing a better definition of the objective function, feature construction, feature ranking, multivariate feature selection, efficient search methods, and feature validity assessment methods.

14,509 citations


"User traffic classification for pro..." refers background in this paper

  • ...The idea is to select a subset of features that are highly correlated with the category, yet they are uncorrelated amongst themselves [18]....

    [...]

Journal ArticleDOI
TL;DR: In this article, the maximal statistical dependency criterion based on mutual information (mRMR) was proposed to select good features according to the maximal dependency condition. But the problem of feature selection is not solved by directly implementing mRMR.
Abstract: Feature selection is an important problem for pattern classification systems. We study how to select good features according to the maximal statistical dependency criterion based on mutual information. Because of the difficulty in directly implementing the maximal dependency condition, we first derive an equivalent form, called minimal-redundancy-maximal-relevance criterion (mRMR), for first-order incremental feature selection. Then, we present a two-stage feature selection algorithm by combining mRMR and other more sophisticated feature selectors (e.g., wrappers). This allows us to select a compact set of superior features at very low cost. We perform extensive experimental comparison of our algorithm and other methods using three different classifiers (naive Bayes, support vector machine, and linear discriminate analysis) and four different data sets (handwritten digits, arrhythmia, NCI cancer cell lines, and lymphoma tissues). The results confirm that mRMR leads to promising improvement on feature selection and classification accuracy.

8,078 citations

05 Aug 2003
TL;DR: This work derives an equivalent form, called minimal-redundancy-maximal-relevance criterion (mRMR), for first-order incremental feature selection, and presents a two-stage feature selection algorithm by combining mRMR and other more sophisticated feature selectors (e.g., wrappers).

7,075 citations


"User traffic classification for pro..." refers methods in this paper

  • ...An example of this is the Max­ Relevance Min-Redundancy (mRMR) approach for feature selection [19]....

    [...]

  • ...We make use of the mechanism in [19] wh ich uses mutual information for feature selection....

    [...]

Journal ArticleDOI
TL;DR: The major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs) are described.

4,673 citations


"User traffic classification for pro..." refers methods in this paper

  • ...In this work the Universal Background Model (UBM) approach has been used and the means Pm only adapted [21]....

    [...]

Journal ArticleDOI
01 Jan 1997
TL;DR: This paper addresses the issues of charging, rate control and routing for a communication network carrying elastic traffic, such as an ATM network offering an available bit rate service, from which max-min fairness of rates emerges as a limiting special case.
Abstract: This paper addresses the issues of charging, rate control and routing for a communication network carrying elastic traffic, such as an ATM network offering an available bit rate service. A model is described from which max-min fairness of rates emerges as a limiting special case; more generally, the charges users are prepared to pay influence their allocated rates. In the preferred version of the model, a user chooses the charge per unit time that the user will pay; thereafter the user's rate is determined by the network according to a proportional fairness criterion applied to the rate per unit charge. A system optimum is achieved when users' choices of charges and the network's choice of allocated rates are in equilibrium.

3,067 citations


"User traffic classification for pro..." refers background in this paper

  • ...Many concepts have been proposed for fairness -examples are [5] [6] [7] [8]....

    [...]