Showing papers on "Web page published in 2004"

PDF

Open Access

Book Chapter•DOI•

[...]

Xin Dong¹, Alon Halevy¹, Jayant Madhavan¹, Ema Nemes¹, Jun Zhang¹ - Show less +1 more•Institutions (1)

31 Aug 2004

TL;DR: Woogle supports similarity search for web services, such as finding similar web-service operations and finding operations that compose with a given one, and novel techniques to support these types of searches are described.

...read moreread less

Abstract: Web services are loosely coupled software components, published, located, and invoked across the web. The growing number of web services available within an organization and on the Web raises a new and challenging search problem: locating desired web services. Traditional keyword search is insufficient in this context: the specific types of queries users require are not captured, the very small text fragments in web services are unsuitable for keyword search, and the underlying structure and semantics of the web services are not exploited. We describe the algorithms underlying the Woogle search engine for web services. Woogle supports similarity search for web services, such as finding similar web-service operations and finding operations that compose with a given one. We describe novel techniques to support these types of searches, and an experimental study on a collection of over 1500 web-service operations that shows the high recall and precision of our algorithms.

...read moreread less

828 citations

Proceedings Article•DOI•

Adaptive web search based on user profile constructed without any effort from users

[...]

Kazunari Sugiyama¹, Kenji Hatano¹, Masatoshi Yoshikawa²•Institutions (2)

Nara Institute of Science and Technology¹, Nagoya University²

17 May 2004

TL;DR: Experimental results show that search systems that adapt to each user's preferences can be achieved by constructing user profiles based on modified collaborative filtering with detailed analysis of user's browsing history in one day.

...read moreread less

Abstract: Web search engines help users find useful information on the World Wide Web (WWW). However, when the same query is submitted by different users, typical search engines return the same result regardless of who submitted the query. Generally, each user has different information needs for his/her query. Therefore, the search result should be adapted to users with different information needs. In this paper, we first propose several approaches to adapting search results according to each user's need for relevant information without any user effort, and then verify the effectiveness of our proposed approaches. Experimental results show that search systems that adapt to each user's preferences can be achieved by constructing user profiles based on modified collaborative filtering with detailed analysis of user's browsing history in one day.

...read moreread less

782 citations

Proceedings Article•DOI•

Web-a-where: geotagging web content

[...]

Einat Amitay¹, Nadav Har'El¹, Ron Sivan¹, Aya Soffer¹•Institutions (1)

IBM¹

25 Jul 2004

TL;DR: Web-a-Where, a system for associating geography with Web pages that locates mentions of places and determines the place each name refers to, is described and an implementation of the tagger within the framework of the WebFountain data mining system is described.

...read moreread less

Abstract: We describe Web-a-Where, a system for associating geography with Web pages. Web-a-Where locates mentions of places and determines the place each name refers to. In addition, it assigns to each page a geographic focus --- a locality that the page discusses as a whole. The tagging process is simple and fast, aimed to be applied to large collections of Web pages and to facilitate a variety of location-based applications and data analyses.Geotagging involves arbitrating two types of ambiguities: geo/non-geo and geo/geo. A geo/non-geo ambiguity occurs when a place name also has a non-geographic meaning, such as a person name (e.g., Berlin) or a common word (Turkey). Geo/geo ambiguity arises when distinct places have the same name, as in London, England vs. London, Ontario.An implementation of the tagger within the framework of the WebFountain data mining system is described, and evaluated on several corpora of real Web pages. Precision of up to 82% on individual geotags is achieved. We also evaluate the relative contribution of various heuristics the tagger employs, and evaluate the focus-finding algorithm using a corpus pretagged with localities, showing that as many as 91% of the foci reported are correct up to the country level.

...read moreread less

603 citations

Web Services Choreography Description Language Version 1.0, W3C

[...]

N. Kavantzas

01 Jan 2004

TL;DR: The Web Services Choreography Description Language (WS-CDL) as mentioned in this paper is an XML-based language that describes peer-to-peer collaborations of parties by defining, from a global viewpoint, their common and complementary observable behavior; where ordered message exchanges result in accomplishing a common business goal.

...read moreread less

Abstract: 19 20 21 22 23 24 25 26 27 28 29 30 31 The Web Services Choreography Description Language (WS-CDL) is an XML-based language that describes peer-to-peer collaborations of parties by defining, from a global viewpoint, their common and complementary observable behavior; where ordered message exchanges result in accomplishing a common business goal. The Web Services specifications offer a communication bridge between the heterogeneous computational environments used to develop and host applications. The future of E-Business applications requires the ability to perform long-lived, peer-to-peer collaborations between the participating services, within or across the trusted domains of an organization. The Web Services Choreography specification is targeted for composing interoperable, peer-to-peer collaborations between any type of party regardless of the supporting platform or programming model used by the implementation of the hosting environment.

...read moreread less

602 citations

Book Chapter•DOI•

Web Services Architecture

[...]

Matthew MacDonald

01 Jan 2004

TL;DR: This chapter introduces web services and explains their role in Microsoft’s vision of the programmable web and removes some of the confusion surrounding technical terms like WSDL, SOAP, and UDDI.

...read moreread less

Abstract: Microsoft has promoted ASP.NET’s new web services more than almost any other part of the.NET Framework. But despite their efforts, confusion is still widespread about what a web service is and, more importantly, what it’s meant to accomplish. This chapter introduces web services and explains their role in Microsoft’s vision of the programmable web. Along the way, you’ll learn about the open standards plumbing that allows web services to work and removes some of the confusion surrounding technical terms like WSDL (Web Service Description Language), SOAP, and UDDI (universal description, discovery, and integration).

...read moreread less

546 citations

Proceedings Article•DOI•

Weighted PageRank algorithm

[...]

W. Xing, Ali A. Ghorbani

19 May 2004

TL;DR: The weighted PageRank algorithms (WPR), an extension to the standard PageRank algorithm, is introduced, which takes into account the importance of both the inlinks and the outlinks of the pages and distributes rank scores based on the popularity of thepages.

...read moreread less

Abstract: With the rapid growth of the Web, users easily get lost in the rich hyper structure. Providing the relevant information to users to cater to their needs is the primary goal of Website owners. Therefore, finding the content of the Web and retrieving the users' interests and needs from their behavior have become increasingly important. Web mining is used to categorize users and pages by analyzing user behavior, the content of the pages, and the order of the URLs that tend to be accessed. Web structure mining plays an important role in this approach. Two page ranking algorithms, HITS and PageRank, are commonly used in Web structure mining. Both algorithms treat all links equally when distributing rank scores. Several algorithms have been developed to improve the performance of these methods. The weighted PageRank algorithm (WPR), an extension to the standard PageRank algorithm, is introduced. WPR takes into account the importance of both the inlinks and the outlinks of the pages and distributes rank scores based on the popularity of the pages. The results of our simulation studies show that WPR performs better than the conventional PageRank algorithm in terms of returning a larger number of relevant pages to a given query.

...read moreread less

535 citations

Journal Article•DOI•

Selective Markov models for predicting Web page accesses

[...]

Mukund Deshpande¹, George Karypis¹•Institutions (1)

University of Minnesota¹

01 May 2004-ACM Transactions on Internet Technology

TL;DR: In this paper, different techniques for intelligently selecting parts of different order Markov models so that the resulting model has a reduced state complexity, while maintaining a high predictive accuracy are presented.

...read moreread less

Abstract: The problem of predicting a user's behavior on a Web site has gained importance due to the rapid growth of the World Wide Web and the need to personalize and influence a user's browsing experience. Markov models and their variations have been found to be well suited for addressing this problem. Of the different variations of Markov models, it is generally found that higher-order Markov models display high predictive accuracies on Web sessions that they can predict. However, higher-order models are also extremely complex due to their large number of states, which increases their space and run-time requirements. In this article, we present different techniques for intelligently selecting parts of different order Markov models so that the resulting model has a reduced state complexity, while maintaining a high predictive accuracy.

...read moreread less

532 citations

Proceedings Article•DOI•

What's new on the web?: the evolution of the web from a search engine perspective

[...]

Alexandros Ntoulas¹, Junghoo Cho¹, Christopher Olston²•Institutions (2)

University of California, Los Angeles¹, Carnegie Mellon University²

17 May 2004

TL;DR: The authors' findings indicate a rapid turnover rate of Web pages, i.e., high rates of birth and death, coupled with an even higher rate ofturnover in the hyperlinks that connect them, which is likely to remain consistent over time.

...read moreread less

Abstract: We seek to gain improved insight into how Web search engines shouldcope with the evolving Web, in an attempt to provide users with themost up-to-date results possible. For this purpose we collectedweekly snapshots of some 150 Web sites over the course of one year,and measured the evolution of content and link structure. Our measurements focus on aspects of potential interest to search engine designers: the evolution of link structure over time, the rate ofcreation of new pages and new distinct content on the Web, and the rate of change of the content of existing pages under search-centric measures of degree of change.Our findings indicate a rapid turnover rate of Web pages, i.e.,high rates of birth and death, coupled with an even higher rate ofturnover in the hyperlinks that connect them. For pages that persistover time we found that, perhaps surprisingly, the degree of contentshift as measured using TF.IDF cosine distance does not appear to beconsistently correlated with the frequency of contentupdating. Despite this apparent non-correlation, the rate of content shift of a given page is likely to remain consistent over time. That is, pages that change a great deal in one week will likely change by a similarly large degree in the following week. Conversely, pages that experience little change will continue to experience little change. We conclude the paper with a discussion of the potential implications ofour results for the design of effective Web search engines.

...read moreread less

511 citations

Proceedings Article•

Client-Side Defense Against Web-Based Identity Theft.

[...]

Neil Chou, Robert Ledesma, Yuka Teraguchi, John C. Mitchell¹•Institutions (1)

Stanford University¹

01 Jan 2004

TL;DR: A framework for client-side defense is proposed: a browser plug-in that examines web pages and warns the user when requests for data may be part of a spoof attack.

...read moreread less

Abstract: Web spoofing is a significant problem involving fraudulent email and web sites that trick unsuspecting users into revealing private information We discuss some aspects of common attacks and propose a framework for client-side defense: a browser plug-in that examines web pages and warns the user when requests for data may be part of a spoof attack While the plugin, SpoofGuard, has been tested using actual sites obtained through government agencies concerned about the problem, we expect that web spoofing and other forms of identity theft will be continuing problems in

...read moreread less

487 citations

Journal Article•DOI•

A study on tolerable waiting time: how long are Web users willing to wait?

[...]

Fiona Fui-Hoon Nah¹•Institutions (1)

University of Nebraska–Lincoln¹

01 May 2004-Behaviour & Information Technology

TL;DR: In this article, the authors reviewed the literature on computer response time and users' waiting time for download of Web pages, and assessed Web users' tolerable waiting time in information retrieval.

...read moreread less

Abstract: Web users often face a long waiting time for downloading Web pages. Although various technologies and techniques have been implemented to alleviate the situation and to comfort the impatient users, little research has been done to assess what constitutes an acceptable and tolerable waiting time for Web users. This research reviews the literature on computer response time and users' waiting time for download of Web pages, and assesses Web users' tolerable waiting time in information retrieval. It addresses the following questions through an experimental study: What is the effect of feedback on users' tolerable waiting time? How long are users willing to wait for a Web page to be downloaded before abandoning it? The results from this study suggest that the presence of feedback prolongs Web users' tolerable waiting time and the tolerable waiting time for information retrieval is approximately 2 s.

...read moreread less

480 citations

Journal Article•DOI•

An ontology of time for the semantic web

[...]

Jerry R. Hobbs¹, Feng Pan¹•Institutions (1)

University of Southern California¹

01 Mar 2004-ACM Transactions on Asian Language Information Processing

TL;DR: An ontology of time is being developed for describing the temporal content of Web pages and the temporal properties of Web services, which covers topological properties of instants and intervals, measures of duration, and the meanings of clock and calendar terms.

...read moreread less

Abstract: In connection with the DAML project for bringing about the Semantic Web, an ontology of time is being developed for describing the temporal content of Web pages and the temporal properties of Web services This ontology covers topological properties of instants and intervals, measures of duration, and the meanings of clock and calendar terms

...read moreread less

Patent•

Photo-based mobile deixis system and related techniques

[...]

Trevor Darrell¹, Tom Yeh¹, Konrad Tollmar¹•Institutions (1)

Massachusetts Institute of Technology¹

22 Jan 2004

TL;DR: In this paper, a mobile deixis device includes a camera to capture an image and a wireless handheld device coupled to the camera and to a wireless network to communicate the image with existing databases to find similar images.

...read moreread less

Abstract: A mobile deixis device includes a camera to capture an image and a wireless handheld device, coupled to the camera and to a wireless network, to communicate the image with existing databases to find similar images. The mobile deixis device further includes a processor, coupled to the device, to process found database records related to similar images and a display to view found database records that include web pages including images. With such an arrangement, users can specify a location of interest by simply pointing a camera-equipped cellular phone at the object of interest and by searching an image database or relevant web resources, users can quickly identify good matches from several close ones to find an object of interest.

...read moreread less

Proceedings Article•DOI•

Hierarchical clustering of WWW image search results using visual, textual and link information

[...]

Deng Cai¹, Xiaofei He², Zhiwei Li¹, Wei-Ying Ma¹, Ji-Rong Wen¹ - Show less +1 more•Institutions (2)

Microsoft¹, University of Chicago²

10 Oct 2004

TL;DR: Wang et al. as mentioned in this paper proposed a hierarchical clustering method using visual, textual and link analysis to organize the results into different semantic clusters to facilitate users' browsing, which can be applied to image search results.

...read moreread less

Abstract: We consider the problem of clustering Web image search results. Generally, the image search results returned by an image search engine contain multiple topics. Organizing the results into different semantic clusters facilitates users' browsing. In this paper, we propose a hierarchical clustering method using visual, textual and link analysis. By using a vision-based page segmentation algorithm, a web page is partitioned into blocks, and the textual and link information of an image can be accurately extracted from the block containing that image. By using block-level link analysis techniques, an image graph can be constructed. We then apply spectral techniques to find a Euclidean embedding of the images which respects the graph structure. Thus for each image, we have three kinds of representations, i.e. visual feature based representation, textual feature based representation and graph based representation. Using spectral clustering techniques, we can cluster the search results into different semantic clusters. An image search example illustrates the potential of these techniques.

...read moreread less

Journal Article•DOI•

Wiki: a technology for conversational knowledge management and group collaboration

[...]

Christian Wagner

01 Feb 2004-Communications of The Ais

TL;DR: The article concludes that organizations willing to embrace the “Wiki way” with collaborative, conversational knowledge management systems, may enjoy better than linear knowledge growth while being able to satisfy ad-hoc, distributed knowledge needs.

...read moreread less

Abstract: Wikis (from wikiwiki, meaning “fast” in Hawaiian) are a promising new technology that supports “conversational” knowledge creation and sharing. A Wiki is a collaboratively created and iteratively improved set of web pages, together with the software that manages the web pages. Because of their unique way of creating and managing knowledge, Wikis combine the best elements of earlier conversational knowledge management technologies, while avoiding many of their disadvantages. This article introduces Wiki technology, the behavioral and organizational implications of Wiki use, and Wiki applicability as groupware and help system software. The article concludes that organizations willing to embrace the “Wiki way” with collaborative, conversational knowledge management systems, may enjoy better than linear knowledge growth while being able to satisfy ad-hoc, distributed knowledge needs.

...read moreread less

Proceedings Article•DOI•

Towards the self-annotating web

[...]

Philipp Cimiano¹, Siegfried Handschuh¹, Steffen Staab¹•Institutions (1)

Karlsruhe Institute of Technology¹

17 May 2004

TL;DR: PANKOW (Pattern-based Annotation through Knowledge on theWeb), a method which employs an unsupervised, pattern-based approach to categorize instances with regard to an ontology, is proposed.

...read moreread less

Abstract: The success of the Semantic Web depends on the availability of ontologies as well as on the proliferation of web pages annotated with metadata conforming to these ontologies. Thus, a crucial question is where to acquire these metadata from. In this paper wepropose PANKOW (Pattern-based Annotation through Knowledge on theWeb), a method which employs an unsupervised, pattern-based approach to categorize instances with regard to an ontology. The approach is evaluated against the manual annotations of two human subjects. The approach is implemented in OntoMat, an annotation tool for the Semantic Web and shows very promising results.

...read moreread less

Proceedings Article•DOI•

MEAD - A Platform for Multidocument Multilingual Text Summarization

[...]

Dragomir R. Radev¹, Timothy Allison, Sasha Blair-Goldensohn, John Blitzer, Arda Çelebi, Stanko Dimitrov, Elliott F. Drabek, Ali Hakim, Wai Lam, Danyu Liu, Jahna Otterbacher, Hong Qi, Horacio Saggion, Simone Teufel, Michael Topper, Adam Winkel, Zhu Zhang - Show less +13 more•Institutions (1)

University of Michigan¹

01 May 2004

TL;DR: The functionality of MEAD is described, a comprehensive, public domain, open source, multidocument multilingual summarization environment that has been thus far downloaded by more than 500 organizations.

...read moreread less

Abstract: This paper describes the functionality of MEAD, a comprehensive, public domain, open source, multidocument multilingual summarization environment that has been thus far downloaded by more than 500 organizations. MEAD has been used in a variety of summarization applications ranging from summarization for mobile devices to Web page summarization within a search engine and to novelty detection.

...read moreread less

Journal Article•DOI•

Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential

[...]

Gobinda G. Chowdhury

01 Jun 2004-Online Information Review

Proceedings Article•DOI•

The determinants of web page viewing behavior: an eye-tracking study

[...]

Bing Pan¹, Helene Hembrooke¹, Laura Granka¹, Matthew K. Feusner¹, Jill K. Newman¹ - Show less +1 more•Institutions (1)

Cornell University¹

22 Mar 2004

TL;DR: The results indicate that gender of subjects, the viewing order of a web page, and the interaction between page order and site type influences online ocular behavior.

...read moreread less

Abstract: The World Wide Web has become a ubiquitous information source and communication channel. With such an extensive user population, it is imperative to understand how web users view different web pages. Based on an eye tracking study of 30 subjects on 22 web pages from 11 popular web sites, this research intends to explore the determinants of ocular behavior on a single web page: whether it is determined by individual differences of the subjects, different types of web sites, the order of web pages being viewed, or the task at hand. The results indicate that gender of subjects, the viewing order of a web page, and the interaction between page order and site type influences online ocular behavior. Task instruction did not significantly affect web viewing behavior. Scanpath analysis revealed that the complexity of web page design influences the degree of scanpath variation among different subjects on the same web page. The contributions and limitations of this research, and future research directions are discussed.

...read moreread less

Patent•

Intelligent web based help system

[...]

Easton Robert E Jr Keller Neal¹, Neal M. Keller¹, Juhnyoung Lee¹, Lisa M. Ungar¹•Institutions (1)

IBM¹

19 Oct 2004

TL;DR: In this paper, a system, method and computer program product that combines techniques in the fields of search, data mining, collaborative filtering, user ratings and referral mappings into a system for intelligent web-based help for task or transaction oriented web based systems.

...read moreread less

Abstract: A system, method and computer program product that combines techniques in the fields of search, data mining, collaborative filtering, user ratings and referral mappings into a system for intelligent web-based help for task or transaction oriented web based systems. The system makes use of a service oriented architecture based on metadata and web services to locate, categorize and provide relevant context sensitive help, including found help not available when the web based system or application was first developed. As part of the inventive system, there is additionally provided a system for providing an integrated information taxonomy which combines automatically, semi-automatically, and manually generated taxonomies and applies them to help systems. This aspect of the invention is applicable to the fields of online self-help systems for web sites and software applications as well as to customer, supplier and employee help desks.

...read moreread less

Proceedings Article•DOI•

Learning block importance models for web pages

[...]

Ruihua Song¹, Haifeng Liu², Ji-Rong Wen¹, Wei-Ying Ma¹•Institutions (2)

Microsoft¹, University of Toronto²

17 May 2004

TL;DR: This paper uses a vision-based page segmentation algorithm to partition a web page into semantic blocks with a hierarchical structure, then spatial features and content features are extracted and used to construct a feature vector for each block.

...read moreread less

Abstract: Previous work shows that a web page can be partitioned into multiple segments or blocks, and often the importance of those blocks in a page is not equivalent. Also, it has been proven that differentiating noisy or unimportant blocks from pages can facilitate web mining, search and accessibility. However, no uniform approach and model has been presented to measure the importance of different segments in web pages. Through a user study, we found that people do have a consistent view about the importance of blocks in web pages. In this paper, we investigate how to find a model to automatically assign importance values to blocks in a web page. We define the block importance estimation as a learning problem. First, we use a vision-based page segmentation algorithm to partition a web page into semantic blocks with a hierarchical structure. Then spatial features (such as position and size) and content features (such as the number of images and links) are extracted to construct a feature vector for each block. Based on these features, learning algorithms are used to train a model to assign importance to different segments in the web page. In our experiments, the best model can achieve the performance with Micro-F1 79% and Micro-Accuracy 85.9%, which is quite close to a person's view.

...read moreread less

Proceedings Article•DOI•

Spam, damn spam, and statistics: using statistical analysis to locate spam web pages

[...]

Dennis Fetterly¹, Mark S. Manasse¹, Marc Najork¹•Institutions (1)

Microsoft¹

17 Jun 2004

TL;DR: This paper proposes that some spam web pages can be identified through statistical analysis, and examines a variety of properties, including linkage structure, page content, and page evolution, and finds that outliers in the statistical distribution of these properties are highly likely to be caused by web spam.

...read moreread less

Abstract: The increasing importance of search engines to commercial web sites has given rise to a phenomenon we call "web spam", that is, web pages that exist only to mislead search engines into (mis)leading users to certain web sites. Web spam is a nuisance to users as well as search engines: users have a harder time finding the information they need, and search engines have to cope with an inflated corpus, which in turn causes their cost per query to increase. Therefore, search engines have a strong incentive to weed out spam web pages from their index.We propose that some spam web pages can be identified through statistical analysis: Certain classes of spam pages, in particular those that are machine-generated, diverge in some of their properties from the properties of web pages at large. We have examined a variety of such properties, including linkage structure, page content, and page evolution, and have found that outliers in the statistical distribution of these properties are highly likely to be caused by web spam.This paper describes the properties we have examined, gives the statistical distributions we have observed, and shows which kinds of outliers are highly correlated with web spam.

...read moreread less

Journal Article•

Wide Open Spaces: Wikis Ready or Not

[...]

Brian Lamb

01 Jan 2004-Educational Review

Journal Article•DOI•

The ELODIE Archive

[...]

J. Moultaka, S. Ilovaisky, Ph. Prugniel, Caroline Soubiran

15 Jun 2004-Publications of the Astronomical Society of the Pacific

TL;DR: The ELODie archive contains the complete collection of high‐resolution echelle spectra accumulated over the last decade using the ELODIE spectrograph at the Observatoire de Haute‐Provence 1.93 m telescope.

...read moreread less

Abstract: The ELODIE archive contains the complete collection of high‐resolution echelle spectra accumulated over the last decade using the ELODIE spectrograph at the Observatoire de Haute‐Provence 1.93 m telescope. This article presents the different data products and the facilities available on the World Wide Web to reprocess these data on‐the‐fly. Users can retrieve the data in FITS format from the archive Web page (http://atlas.obs‐hp.fr/elodie) and apply to them different functions, wavelength resampling and flux calibration in particular.

...read moreread less

Journal Article•DOI•

A latent class segmentation analysis of e-shoppers

[...]

Amit Bhatnagar¹, Sanjoy Ghose¹•Institutions (1)

University of Wisconsin–Milwaukee¹

01 Jul 2004-Journal of Business Research

TL;DR: This article applied a latent class modeling approach to segment web shoppers, based on their purchase behavior across several product categories, and then profile the segments along the twin dimensions of demographics and benefits sought.

...read moreread less

Patent•

Managing digital identity information

[...]

Joseph Andrew Mellmer¹, Russell T. Young², Arn D. Perkins², John M. Robertson², Jeffrey Neil Sabin³, Michael C. McDonald³, Douglas Phillips³, Robert Michael Sheridan³, Nadeem Ahmad Nazeer³, DeeAnne Barker Higley³, Stephen R. Carter³, Douglas G. Earl³, Kelly Sonderegger³, Daniel T. Ferguson³, Farrell Lynn Brough³ - Show less +11 more•Institutions (3)

Novell¹, Philips², EMC Corporation³

16 Sep 2004

TL;DR: In this article, a basic architecture for managing digital identity information in a network such as the World Wide Web is provided, where a user can organize his or her information into one or more profiles which reflect the nature of different relationships between the user and other entities, and grant or deny each entity access to a given profile.

...read moreread less

Abstract: A basic architecture for managing digital identity information in a network such as the World Wide Web is provided. A user of the architecture can organize his or her information into one or more profiles which reflect the nature of different relationships between the user and other entities, and grant or deny each entity access to a given profile. Various enhancements which may be provided through the architecture are also described, including tools for filtering email, controlling access to user web pages, locating other users and making one's own location known, browsing or mailing anonymously, filling in web forms automatically with information already provided once by hand, logging in automatically, securely logging in to multiple sites with a single password and doing so from any machine on the network, and other enhancements.

...read moreread less

Proceedings Article•DOI•

Impact of search engines on page popularity

[...]

Junghoo Cho¹, Sourashis Roy¹•Institutions (1)

University of California, Los Angeles¹

17 May 2004

TL;DR: This paper analytically estimates how much longer it takes for a new page to attract a large number of Web users when search engines return only popular pages at the top of search results and shows that search engines can have an immensely worrisome impact on the discovery of new Web pages.

...read moreread less

Abstract: Recent studies show that a majority of Web page accesses are referred by search engines. In this paper we study the widespread use of Web search engines and its impact on the ecology of the Web. In particular, we study how much impact search engines have on the popularity evolution of Web pages. For example, given that search engines return currently popular" pages at the top of search results, are we somehow penalizing newly created pages that are not very well known yet? Are popular pages getting even more popular and new pages completely ignored? We first show that this unfortunate trend indeed exists on the Web through an experimental study based on real Web data. We then analytically estimate how much longer it takes for a new page to attract a large number of Web users when search engines return only popular pages at the top of search results. Our result shows that search engines can have an immensely worrisome impact on the discovery of new Web pages.

...read moreread less

Proceedings Article•DOI•

Optimizing web search using web click-through data

[...]

Gui-Rong Xue¹, Hua-Jun Zeng², Zheng Chen², Yong Yu¹, Wei-Ying Ma², Wensi Xi³, Weiguo Fan³ - Show less +3 more•Institutions (3)

Shanghai Jiao Tong University¹, Microsoft², Virginia Tech³

13 Nov 2004

TL;DR: A novel iterative reinforced algorithm to utilize the user click-through data to improve search performance and effectively finds "virtual queries" for web pages and overcomes the challenges discussed above.

...read moreread less

Abstract: The performance of web search engines may often deteriorate due to the diversity and noisy information contained within web pages. User click-through data can be used to introduce more accurate description (metadata) for web pages, and to improve the search performance. However, noise and incompleteness, sparseness, and the volatility of web pages and queries are three major challenges for research work on user click-through log mining. In this paper, we propose a novel iterative reinforced algorithm to utilize the user click-through data to improve search performance. The algorithm fully explores the interrelations between queries and web pages, and effectively finds "virtual queries" for web pages and overcomes the challenges discussed above. Experiment results on a large set of MSN click-through log data show a significant improvement on search performance over the naive query log mining algorithm as well as the baseline search engine.

...read moreread less

Journal Article•DOI•

A practical guide to developing effective web-based learning.

[...]

David A. Cook¹, Denise M. Dupras¹•Institutions (1)

Mayo Clinic¹

01 Jun 2004-Journal of General Internal Medicine

TL;DR: Teaching on the Web involves more than putting together a colorful webpage and by consistently employing principles of effective learning, educators will unlock the full potential of Web-based medical education.

...read moreread less

Abstract: OBJECTIVE: Online learning has changed medical education, but many “educational” websites do not employ principles of effective learning. This article will assist readers in developing effective educational websites by integrating principles of active learning with the unique features of the Web. DESIGN: Narrative review. RESULTS: The key steps in developing an effective educational website are: Perform a needs analysis and specify goals and objectives; determine technical resources and needs; evaluate preexisting software and use it if it fully meets your needs; secure commitment from all participants and identify and address potential barriers to implementation; develop content in close coordination with website design (appropriately use multimedia, hyperlinks, and online communication) and follow a timeline; encourage active learning (self-assessment, reflection, self-directed learning, problem-based learning, learner interaction, and feedback); facilitate and plan to encourage use by the learner (make website accessible and user-friendly, provide time for learning, and motivate learners); evaluate learners and course; pilot the website before full implementation; and plan to monitor online communication and maintain the site by resolving technical problems, periodically verifying hyperlinks, and regularly updating content. CONCLUSION: Teaching on the Web involves more than putting together a colorful webpage. By consistently employing principles of effective learning, educators will unlock the full potential of Web-based medical education.

...read moreread less

Patent•

Web server apparatus and method for virus checking

[...]

Cary Lee Bates¹, Paul Reuben Day¹, John M. Santosuosso¹•Institutions (1)

IBM¹

01 Jul 2004

TL;DR: A web server computer system includes a virus checker and mechanisms for checking e-mails and their attachments, downloaded files, and web sites for possible viruses as discussed by the authors, which allows a web server to perform virus checking of different types of information real-time as the information is requested by a web client.

...read moreread less

Abstract: A web server computer system includes a virus checker and mechanisms for checking e-mails and their attachments, downloaded files, and web sites for possible viruses. The virus checker allows a web server to perform virus checking of different types of information real-time as the information is requested by a web client. In addition, a web client may also request that the server perform virus checking on a particular drive on the web client. If this case, the web server may receive information from the web client drive, scan the information for viruses, and inform the web client whether any viruses were found. In the alternative, the web server may download a client virus checker to the web client and cause the client virus checker to be run on the web client. The preferred embodiments thus eliminate the need for virus checking software to be installed on each web client.

...read moreread less

Journal Article•DOI•

PEBL: Web page classification without negative examples

[...]

Hwanjo Yu¹, Jiawei Han¹, Kevin Chen-Chuan Chang¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Jan 2004-IEEE Transactions on Knowledge and Data Engineering

TL;DR: The paper presents a framework, called positive example based learning (PEBL), for Web page classification which eliminates the need for manually collecting negative training examples in preprocessing and applies an algorithm, called mapping-convergence (M-C), to achieve high classification accuracy as high as that of a traditional SVM.

...read moreread less

Abstract: Web page classification is one of the essential techniques for Web mining because classifying Web pages of an interesting class is often the first step of mining the Web. However, constructing a classifier for an interesting class requires laborious preprocessing such as collecting positive and negative training examples. For instance, in order to construct a "homepage" classifier, one needs to collect a sample of homepages (positive examples) and a sample of nonhomepages (negative examples). In particular, collecting negative training examples requires arduous work and caution to avoid bias. The paper presents a framework, called positive example based learning (PEBL), for Web page classification which eliminates the need for manually collecting negative training examples in preprocessing. The PEBL framework applies an algorithm, called mapping-convergence (M-C), to achieve high classification accuracy (with positive and unlabeled data) as high as that of a traditional SVM (with positive and negative data). M-C runs in two stages: the mapping stage and convergence stage. In the mapping stage, the algorithm uses a weak classifier that draws an initial approximation of "strong" negative data. Based on the initial approximation, the convergence stage iteratively runs an internal classifier (e.g., SVM) which maximizes margins to progressively improve the approximation of negative data. Thus, the class boundary eventually converges to the true boundary of the positive class in the feature space. We present the M-C algorithm with supporting theoretical and experimental justifications. Our experiments show that, given the same set of positive examples; the M-C algorithm outperforms one-class SVMs, and it is almost as accurate as the traditional SVMs.

...read moreread less

Collapse