scispace - formally typeset
Search or ask a question
Topic

Semantic URL

About: Semantic URL is a research topic. Over the lifetime, 667 publications have been published within this topic receiving 21387 citations.


Papers
More filters
Patent
18 Nov 2002
TL;DR: In this paper, a system for integrating video programming with the vast information resources of the Internet is presented, where the web pages are synchronized to the video content for display in conjunction with a television program being broadcast to the user at that time.
Abstract: A system for integrating video programming with the vast information resources of the Internet. A computer-based system receives a video program with embedded uniform resource locators (URLs). The URLs, the effective addresses of locations or Web sites on the Internet, are interpreted by the system and direct the system to the Web site locations to retrieve related Web pages. Upon receipt of the Web pages by the system, the Web pages are synchronized to the video content for display. The video program signal can be displayed on a video window on a conventional personal computer screen. The actual retrieved Web pages are time stamped to also be displayed, on another portion of the display screen, when predetermined related video content is displayed in the video window. As an alternative, the computer-based system receives the URLs directly through an Internet connection, at times specified by TV broadcasters in advance. The system interprets the URLs and retrieves the appropriate Web pages. The Web pages are synchronized to the video content for display in conjunction with a television program being broadcast to the user at that time. This alternative system allows the URLs to be entered for live transmission to the user.

1,504 citations

Proceedings ArticleDOI
28 Jun 2009
TL;DR: This paper describes an approach to this problem based on automated URL classification, using statistical methods to discover the tell-tale lexical and host-based properties of malicious Web site URLs.
Abstract: Malicious Web sites are a cornerstone of Internet criminal activities. As a result, there has been broad interest in developing systems to prevent the end user from visiting such sites. In this paper, we describe an approach to this problem based on automated URL classification, using statistical methods to discover the tell-tale lexical and host-based properties of malicious Web site URLs. These methods are able to learn highly predictive models by extracting and automatically analyzing tens of thousands of features potentially indicative of suspicious URLs. The resulting classifiers obtain 95-99% accuracy, detecting large numbers of malicious Web sites from their URLs, with only modest false positives.

806 citations

Proceedings ArticleDOI
14 Jun 2009
TL;DR: It is demonstrated that recently-developed online algorithms can be as accurate as batch techniques, achieving classification accuracies up to 99% over a balanced data set.
Abstract: This paper explores online learning approaches for detecting malicious Web sites (those involved in criminal scams) using lexical and host-based features of the associated URLs. We show that this application is particularly appropriate for online algorithms as the size of the training data is larger than can be efficiently processed in batch and because the distribution of features that typify malicious URLs is changing continuously. Using a real-time system we developed for gathering URL features, combined with a real-time source of labeled URLs from a large Web mail provider, we demonstrate that recently-developed online algorithms can be as accurate as batch techniques, achieving classification accuracies up to 99% over a balanced data set.

567 citations

Proceedings Article
28 Jul 2008
TL;DR: The relationship between the user browsing habits and exposure to malware, the techniques used to lure the user into the malware distribution networks, and the different properties of these networks are studied.
Abstract: As the web continues to play an ever increasing role in information exchange, so too is it becoming the prevailing platform for infecting vulnerable hosts. In this paper, we provide a detailed study of the pervasiveness of so-called drive-by downloads on the Internet. Drive-by downloads are caused by URLs that attempt to exploit their visitors and cause malware to be installed and run automatically. Over a period of 10 months we processed billions of URLs, and our results shows that a non-trivial amount, of over 3 million malicious URLs, initiate drive-by downloads. An even more troubling finding is that approximately 1.3% of the incoming search queries to Google's search engine returned at least one URL labeled as malicious in the results page. We also explore several aspects of the drive-by downloads problem. Specifically, we study the relationship between the user browsing habits and exposure to malware, the techniques used to lure the user into the malware distribution networks, and the different properties of these networks.

563 citations

Patent
04 Apr 1995
TL;DR: In this paper, a method and system for sending and receiving Uniform Resource Locators (URLs) in electronic mail over the Internet is presented, where the user can click on the URL to look up the information corresponding to the URL.
Abstract: A method and system for sending and receiving Uniform Resource Locators (URLs) in electronic mail over the Internet. An electronic mail document containing a URL may have several different types. If the message type indicates a URL, when the received URL type document is read or browsed using a multimedia Internet browser, the URL is looked up so that the information corresponding to the URL is displayed without necessarily displaying any portion of the received message. If the received document is of the Hypertext Markup Language (HTML) type, the document may be displayed and a user may "click" on the URL to look up the information corresponding to the URL. If the received document is of the text type, the text may be converted to the HTML format and the HTML format document displayed so that a user may "click" on the URL in order to look up the information corresponding to the URL without the need to type in the URL address.

374 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
77% related
Web service
57.6K papers, 989K citations
76% related
Server
79.5K papers, 1.4M citations
74% related
Ontology (information science)
57K papers, 869.1K citations
72% related
Scalability
50.9K papers, 931.6K citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
201712
201637
201554
201442
201347
201241