scispace - formally typeset
Search or ask a question

Showing papers on "Hyperlink published in 2004"


Proceedings ArticleDOI
17 May 2004
TL;DR: The authors' findings indicate a rapid turnover rate of Web pages, i.e., high rates of birth and death, coupled with an even higher rate ofturnover in the hyperlinks that connect them, which is likely to remain consistent over time.
Abstract: We seek to gain improved insight into how Web search engines shouldcope with the evolving Web, in an attempt to provide users with themost up-to-date results possible. For this purpose we collectedweekly snapshots of some 150 Web sites over the course of one year,and measured the evolution of content and link structure. Our measurements focus on aspects of potential interest to search engine designers: the evolution of link structure over time, the rate ofcreation of new pages and new distinct content on the Web, and the rate of change of the content of existing pages under search-centric measures of degree of change.Our findings indicate a rapid turnover rate of Web pages, i.e.,high rates of birth and death, coupled with an even higher rate ofturnover in the hyperlinks that connect them. For pages that persistover time we found that, perhaps surprisingly, the degree of contentshift as measured using TF.IDF cosine distance does not appear to beconsistently correlated with the frequency of contentupdating. Despite this apparent non-correlation, the rate of content shift of a given page is likely to remain consistent over time. That is, pages that change a great deal in one week will likely change by a similarly large degree in the following week. Conversely, pages that experience little change will continue to experience little change. We conclude the paper with a discussion of the potential implications ofour results for the design of effective Web search engines.

511 citations


Journal ArticleDOI
TL;DR: Teaching on the Web involves more than putting together a colorful webpage and by consistently employing principles of effective learning, educators will unlock the full potential of Web-based medical education.
Abstract: OBJECTIVE: Online learning has changed medical education, but many “educational” websites do not employ principles of effective learning. This article will assist readers in developing effective educational websites by integrating principles of active learning with the unique features of the Web. DESIGN: Narrative review. RESULTS: The key steps in developing an effective educational website are: Perform a needs analysis and specify goals and objectives; determine technical resources and needs; evaluate preexisting software and use it if it fully meets your needs; secure commitment from all participants and identify and address potential barriers to implementation; develop content in close coordination with website design (appropriately use multimedia, hyperlinks, and online communication) and follow a timeline; encourage active learning (self-assessment, reflection, self-directed learning, problem-based learning, learner interaction, and feedback); facilitate and plan to encourage use by the learner (make website accessible and user-friendly, provide time for learning, and motivate learners); evaluate learners and course; pilot the website before full implementation; and plan to monitor online communication and maintain the site by resolving technical problems, periodically verifying hyperlinks, and regularly updating content. CONCLUSION: Teaching on the Web involves more than putting together a colorful webpage. By consistently employing principles of effective learning, educators will unlock the full potential of Web-based medical education.

292 citations


Patent
26 May 2004
TL;DR: Contextual information concerning linked documents is promoted to display pages that contain hyperlinks to those documents as mentioned in this paper, which can include a variety of information about the linked document, including whether it has been modified within a predefined time period, a comment by the author concerning recent changes, and the size of the document.
Abstract: Contextual information concerning linked documents is promoted to display pages that contain hyperlinks to those documents. The contextual information can be immediately displayed, or it can be selectively displayed in response to a user selecting a text hyperlink anchor or a picture icon hyperlink anchor. Furthermore, the contextual information can include a variety of information about the linked document, including whether it has been modified within a predefined time period, such as the last 24 hours, a comment by the author concerning recent changes, and the size of the document. Preferably, the contextual information is automatically generated by a data promotion engine based on meta-data that is associated with the document and stored on a web site for the document. The contextual information may be either added to the document that corresponds to a display page at the time the document page is saved, or it may be dynamically uploaded to a browser when the display page is rendered by a browser. One of the types of contextual information stored in the meta-data is a manually defined category for a hyperlink in the display page.

184 citations


Patent
01 Jul 2004
TL;DR: In this article, a system for augmenting data from a source data file with data from reference database to generate an augmented data file is presented, which includes a reference database including at least one reference datum.
Abstract: A system for augmenting data from a source data file with data from a reference database to generate an augmented data file is provided. The system includes a reference database including at least one reference datum. A handler component is configured to retrieve a source data file including the structured datum. A locator component is configured to locate the structured datum in the source data file; an analyzer component configured to associate the identified structured datum to one reference datum to create an association according to an analyzing strategy. A generating component is configured to generate a hyperlink based upon the association and embeds the generated hyperlink in the source file to create an augmented data file.

181 citations


Proceedings ArticleDOI
27 Jun 2004
TL;DR: The usefulness of common image search metrics applied on images captured with a camera-equipped mobile device to find matching images on the World Wide Web or other general-purpose databases is demonstrated.
Abstract: We describe an approach to recognizing location from mobile devices using image-based Web search. We demonstrate the usefulness of common image search metrics applied on images captured with a camera-equipped mobile device to find matching images on the World Wide Web or other general-purpose databases. Searching the entire Web can be computationally overwhelming, so we devise a hybrid image-and-keyword searching technique. First, image-search is performed over images and links to their source Web pages in a database that indexes only a small fraction of the Web. Then, relevant keywords on these Web pages are automatically identified and submitted to an existing text-based search engine (e.g. Google) that indexes a much larger portion of the Web. Finally, the resulting image set is filtered to retain images close to the original query. It is thus possible to efficiently search hundreds of millions of images that are not only textually related but also visually relevant. We demonstrate our approach on an application allowing users to browse Web pages matching the image of a nearby location.

166 citations


Patent
14 Jan 2004
TL;DR: In this article, a method is proposed to generate a minimum set of simplified and navigable web contents from a single web document that is oversized for targeted smaller devices, while preserving text, image, transactional and embedded presentation constraint information.
Abstract: A method is disclosed to generate, while preserving text, image, transactional and embedded presentation constraint information, a minimum set of simplified and navigable web contents from a single web document that is oversized for targeted smaller devices. The method includes a parser, a content tree builder, a document tree builder, a document simplifier, a virtual layout engine, a document partitioner, a content scalar and a markup generator. The parser generates markup and data tags from an HTML source document. The builder constructs a content tree. The simplifier transforms the document tree into an intermediate one defined by a subset of XHTML tags and attributes. Layout constraints, including size, area, placement order, and column/row relationships, are calculated for partitioning and scaling the document tree into sub document trees with assigned navigation order and hierarchical hyperlinks. A simplified HTML document is then generated with the markup generator.

155 citations


Journal ArticleDOI
TL;DR: This paper gives a detailed analysis of the HITS algorithm through a unique combination of probabilistic analysis and matrix algebra and shows that to first-order approximation, the ranking given by the H ITS algorithm is the same as the ranking by counting inbound and outbound hyperlinks.
Abstract: Ranking the tens of thousands of retrieved webpages for a user query on a Web search engine such that the most informative webpages are on the top is a key information retrieval technology. A popular ranking algorithm is the HITS algorithm of Kleinberg. It explores the reinforcing interplay between authority and hub webpages on a particular topic by taking into account the structure of the Web graphs formed by the hyperlinks between the webpages. In this paper, we give a detailed analysis of the HITS algorithm through a unique combination of probabilistic analysis and matrix algebra. In particular, we show that to first-order approximation, the ranking given by the HITS algorithm is the same as the ranking by counting inbound and outbound hyperlinks. Using Web graphs of different sizes, we also provide experimental results to illustrate the analysis.

131 citations


Patent
01 Oct 2004
TL;DR: In this article, a novel electronic information transport component can be incorporated in a wide range of electronic information products, for example magazine collections, to automate the mass distribution of updates, such as current issues, from a remote server to a wide user base having a diversity of computer stations.
Abstract: A novel electronic information transport component can be incorporated in a wide range of electronic information products, for example magazine collections, to automate the mass distribution of updates, such as current issues, from a remote server to a wide user base having a diversity of computer stations. Advantages of economy, immediacy and ease of use are provided. Extensions of the invention permit automated electronic catalog shopping with order placement and, optionally, order confirmation. A server-based update distribution service is also provided. In addition, an offline web browser system, with hyperlink redirection capabilities, a novel recorded music product with automated update capabilities and an Internet charging mechanism are provided.

123 citations


Patent
Fernando Incertis Carro1
06 Apr 2004
TL;DR: In this paper, a transparent electro-luminiscentered tablet or other touch sensitive plate is coupled to a workstation, and the workstation directs the tablet or plate to display the active region over the physical document page.
Abstract: A system, method and program product for presenting and selecting an active region of a physical document page so that a user can access corresponding information via a workstation. A transparent electro-luminiscent tablet or other touch sensitive plate is positioned over the physical document page. The tablet or plate is coupled to the workstation. The physical document page is identified to the workstation. The workstation stores information defining an active region for the physical document page and a hyperlink to a web page or web file containing information related to content of the active region. The workstation directs the tablet or plate to display the active region over the physical document page. A user touches a point within the active region. In response, the tablet or plate conveys the touch point to the workstation, and the workstation displays on a computer screen the hyperlink. The active region can be identified by an outline that encompasses the active region. One such active region can encompass another such active region, so that touching a point within the inner active region, elicits display of hyperlinks or documents related to both active regions.

111 citations


Proceedings ArticleDOI
13 Nov 2004
TL;DR: A reasonable estimate of the PageRank value of a node is possible in most cases by retrieving only a moderate number of nodes in the local neighborhood of the node, and this model assumes that the graph is accessible remotely via a link database or is stored in a relational database that performs lookups on disks to retrieve node and connectivity information.
Abstract: The Google search engine uses a method called PageRank, together with term-based and other ranking techniques, to order search results returned to the user. PageRank uses link analysis to assign a global importance score to each web page. The PageRank scores of all the pages are usually determined off-line in a large-scale computation on the entire hyperlink graph of the web, and several recent studies have focused on improving the efficiency of this computation, which may require multiple hours on a workstation.However, in some scenarios, such as online analysis of link evolution and mining of large web archives such as the Internet Archive, it may be desirable to quickly approximate or update the PageRanks of individual nodes without performing a large-scale computation on the entire graph. We address this problem by studying several methods for efficiently estimating the PageRank score of a particular web page using only a small subgraph of the entire web. In our model, we assume that the graph is accessible remotely via a link database (such as the AltaVista Connectivity Server) or is stored in a relational database that performs lookups on disks to retrieve node and connectivity information. We show that a reasonable estimate of the PageRank value of a node is possible in most cases by retrieving only a moderate number of nodes in the local neighborhood of the node.

108 citations


Journal ArticleDOI
TL;DR: This work proposes an entropy-based analysis (LAMIS) mechanism for analyzing the entropy of anchor texts and links to eliminate the redundancy of the hyperlinked structure so that the complex structure of a Web site can be distilled.
Abstract: We study the problem of mining the informative structure of a news Web site that consists of thousands of hyperlinked documents. We define the informative structure of a news Web site as a set of index pages (or referred to as TOC, i.e., table of contents, pages) and a set of article pages linked by these TOC pages. Based on the Hyperlink Induced Topics Search (HITS) algorithm, we propose an entropy-based analysis (LAMIS) mechanism for analyzing the entropy of anchor texts and links to eliminate the redundancy of the hyperlinked structure so that the complex structure of a Web site can be distilled. However, to increase the value and the accessibility of pages, most of the content sites tend to publish their pages with intrasite redundant information, such as navigation panels, advertisements, copy announcements, etc. To further eliminate such redundancy, we propose another mechanism, called InfoDiscoverer, which applies the distilled structure to identify sets of article pages. InfoDiscoverer also employs the entropy information to analyze the information measures of article sets and to extract informative content blocks from these sets. Our result is useful for search engines, information agents, and crawlers to index, extract, and navigate significant information from a Web site. Experiments on several real news Web sites show that the precision and the recall of our approaches are much superior to those obtained by conventional methods in mining the informative structures of news Web sites. On the average, the augmented LAMIS leads to prominent performance improvement and increases the precision by a factor ranging from 122 to 257 percent when the desired recall falls between 0.5 and 1. In comparison with manual heuristics, the precision and the recall of InfoDiscoverer are greater than 0.956.

Proceedings ArticleDOI
17 May 2004
TL;DR: This paper proposes a unified link analysis framework, called "link fusion", which considers both the inter- and intra- type link structure among multiple-type inter-related data objects and brings order to objects in each data type at the same time.
Abstract: Web link analysis has proven to be a significant enhancement for quality based web search. Most existing links can be classified into two categories: intra-type links (e.g., web hyperlinks), which represent the relationship of data objects within a homogeneous data type (web pages), and inter-type links (e.g., user browsing log) which represent the relationship of data objects across different data types (users and web pages). Unfortunately, most link analysis research only considers one type of link. In this paper, we propose a unified link analysis framework, called "link fusion", which considers both the inter- and intra- type link structure among multiple-type inter-related data objects and brings order to objects in each data type at the same time. The PageRank and HITS algorithms are shown to be special cases of our unified link analysis framework. Experiments on an instantiation of the framework that makes use of the user data and web pages extracted from a proxy log show that our proposed algorithm could improve the search effectiveness over the HITS and DirectHit algorithms by 24.6% and 38.2% respectively.

Patent
10 Nov 2004
TL;DR: In this article, a method and apparatus for prefetching electronic data (e.g., a web page, HTML, a document, an image) viewable in a browser is provided.
Abstract: A method and apparatus are provided for prefetching electronic data (e.g., a web page, HTML, a document, an image) viewable in a browser. When a browser is opened to a web page (or other form of electronic data) that contains links (e.g., hyperlinks) to other content, content described by one or more of the links is prefetched. In particular, the content is retrieved before a user operating the browser selects any of the links. As a result, an enhanced browsing window can be very rapidly displayed when the user does select one of the prefetched links. Links on the browser page may be selected and/or prioritized for prefetching in several ways—by automatically selecting some or all links, by using a template customized for the page, by applying heuristics to identify links meeting certain criteria, etc.

Patent
Vibhu Mittal1
30 Dec 2004
TL;DR: In this article, the authors present a system for the generation of hyperlinks and anchor text from data such as reference text in HTML and in non-HTML documents, which is based on a respective statistical model of text formatting and lexical cues.
Abstract: Systems and methods for generation of hyperlinks and anchor text from data such as reference text in HTML and in non-HTML documents are disclosed. The method generally includes locating a text reference in a source document, searching using a search engine for a target document relating to the text reference, computing anchor text from the text reference, generating a hyperlink to the target document, and associating the hyperlink with the computed anchor text. The locating and/or computing may be based on a respective statistical model of text formatting and/or lexical cues. The text reference may be parsed into pieces such that the searching, computing, generating, and associating are performed for each piece of text. The source document may be an HTML or non-HTML document. The text reference may be a reference to, for example, a paper, article, company, institution, product, search engine, image, object, and geographical location.

Patent
14 Jan 2004
TL;DR: In this paper, a method is proposed to generate a minimum set of simplified and navigable web contents from a single web document that is oversized for targeted smaller devices, while preserving text, image, transactional and embedded presentation constraint information.
Abstract: A method is disclosed to generate, while preserving text, image, transactional and embedded presentation constraint information, a minimum set of simplified and navigable web contents from a single web document that is oversized for targeted smaller devices. The method includes a parser, a content tree builder, a document tree builder, a document simplifier, a virtual layout engine, a document partitioner, a content scalar and a markup generator. The parser generates markup and data tags from an HTML source document. The builder constructs a content tree. The simplifier transforms the document tree into an intermediate one defined by a subset of XHTML tags and attributes. Layout constraints, including size, area, placement order, and column/row relationships, are calculated for partitioning and scaling the document tree into sub document trees with assigned navigation order and hierarchical hyperlinks. A simplified HTML document is then generated with the markup generator.

Journal ArticleDOI
TL;DR: The purpose was to characterize these links in order to gain a better understanding why links are created and to characterize the interlinkage between the eight Israeli universities.
Abstract: Links analysis proved to be very fruitful on the Web. Google's very successful ranking algorithm is based on link analysis. There are only a few studies that analyzed links qualitatively, most studies are quantitative. Our purpose was to characterize these links in order to gain a better understanding why links are created. We limited the study to the academic environment, and as a specific case we chose to characterize the interlinkage between the eight Israeli universities.

Patent
30 Nov 2004
TL;DR: In this article, a method and system for cross-marketing products and services to customers of specific merchant web sites on the Internet by a merchant loyalty service provider web site on the internet, provides a hyperlink to a merchant LSP web site from a merchant web site whereby customers can access the merchant loyalty web site directly from the merchant Web site.
Abstract: A method and system for cross-marketing products and services to customers of specific merchant web sites on the Internet by a merchant loyalty service provider web site on the Internet, provides a hyperlink to a merchant loyalty service provider web site from a merchant web site whereby customers can access the merchant loyalty service provider web site directly from the merchant web site. The merchant loyalty service provider web site makes to customers of the merchant customer loyalty benefit offers related to the business of said merchant web site, to build customer loyalty to the merchant web site.

Proceedings ArticleDOI
19 May 2004
TL;DR: It is argued that PageRank and HITS algorithms miss an important dimension of the Web, the temporal dimension, and a number of methods are proposed to deal with the problem.
Abstract: Web search is probably the single most important application on the Internet. The most famous search techniques are perhaps the PageRank and HITS algorithms. These algorithms are motivated by the observation that a hyperlink from a page to another is an implicit conveyance of authority to the target page. They exploit this social phenomenon to identify quality pages, e.g., "authority" pages and "hub" pages. In this paper we argue that these algorithms miss an important dimension of the Web, the temporal dimension. The Web is not a static environment. It changes constantly. Quality pages in the past may not be quality pages now or in the future. These techniques favor older pages because these pages have many in-links accumulated over time. New pages, which may be of high quality, have few or no in-links and are left behind. Bringing new and quality pages to users is important because most users want the latest information. Research publication search has exactly the same problem. This paper studies the temporal dimension of search in the context of research publication search. We propose a number of methods deal with the problem. Our experimental results show that these methods are highly effective.

Book ChapterDOI
31 Aug 2004
TL;DR: The preliminary experiments on a real data set demonstrate that the proposed distributed search engine framework achieves comparable accuracy on PageRank vectors to Google's well-known PageRank algorithm and, therefore, high quality of query results.
Abstract: Existing Internet search engines use web crawlers to download data from the Web. Page quality is measured on central servers, where user queries are also processed. This paper argues that using crawlers has a list of disadvantages. Most importantly, crawlers do not scale. Even Google, the leading search engine, indexes less than 1% of the entire Web. This paper proposes a distributed search engine framework, in which every web server answers queries over its own data. Results from multiple web servers will be merged to generate a ranked hyperlink list on the submitting server. This paper presents a series of algorithms that compute PageRank in such framework. The preliminary experiments on a real data set demonstrate that the system achieves comparable accuracy on PageRank vectors to Google's well-known PageRank algorithm and, therefore, high quality of query results.

Patent
22 Nov 2004
TL;DR: In this paper, a method and system for integrating a digital map system with a source document is disclosed including detecting a location description in the source document, and replacing the detected location description with a hyperlink linking to a depiction of the location description.
Abstract: A method and system for integrating a digital map system with a source document is disclosed including detecting a location description (110) in the source document, and replacing the detected location description (110) with a hyperlink linking to a depiction of the location description (110). Another embodiment may include a method and system for integrating a digital map system with a source document including detecting a location description (110) in a source document, verifying that the location description (110) describes an actual location, and integrating a hyperlink linking a depiction of the location description (110) into the source document.

Journal Article
TL;DR: In this article, the authors focus on whether there is an international bias in the Internet Archive's coverage of the Web and find that there are indeed large national differences in the archive's coverage.

Book ChapterDOI
14 Mar 2004
TL;DR: An overview of the most popular methodologies and implementations in terms of clustering either Web users or Web sources are presented and a survey about current status and future trends in clustering employed over the Web is presented.
Abstract: Clustering is a challenging topic in the area of Web data management Various forms of clustering are required in a wide range of applications, including finding mirrored Web pages, detecting copyright violations, and reporting search results in a structured way Clustering can either be performed once offline, (independently to search queries), or online (on the results of search queries) Important efforts have focused on mining Web access logs and to cluster search engine results on the fly Online methods based on link structure and text have been applied successfully to finding pages on related topics This paper presents an overview of the most popular methodologies and implementations in terms of clustering either Web users or Web sources and presents a survey about current status and future trends in clustering employed over the Web.

Patent
15 Dec 2004
TL;DR: In this paper, a display parameter that denotes a level of prominence that is usable to affect the display of a hyperlink is defined. And a prominence score used to determine the display parameter may be calculated by dividing a count of click-throughs by a number of page views of the page containing the hyperlink.
Abstract: Prominence data resulting from user interaction with a hyperlink on a page may be stored in one or more repositories. Prominence data may be collected automatically when the user views a page or clicks on a hyperlink without requiring separate action from the user to accomplish storage or retrieval. The prominence data may further include a display parameter that denotes a level of prominence that is usable to affect the display of a hyperlink. A prominence score used to determine the display parameter may be calculated by dividing a count of click-throughs by a count of page views of the page containing the hyperlink. Other score calculations may be used. User feedback may be incorporated into the score calculation and may be collected through a toolbar, popup window, or other feedback-collecting mechanism.

Journal ArticleDOI
TL;DR: There are indeed large national differences in the archive's coverage of the Web, and the bias is unintentional, so researchers using the archive in the future need to be aware of this problem.

Book ChapterDOI
22 Aug 2004
TL;DR: In this article, the authors proposed to personalize PageRank vectors by weighting links based on the match between hyperlinks and user profiles, where each feature corresponds to a set of one or more DNS tree nodes.
Abstract: Personalized search has gained great popularity to improve search effectiveness in recent years. The objective of personalized search is to provide users with information tailored to their individual contexts. We propose to personalize Web search based on features extracted from hyperlinks, such as anchor terms or URL tokens. Our methodology personalizes PageRank vectors by weighting links based on the match between hyperlinks and user profiles. In particular, here we describe a profile representation using Internet domain features extracted from URLs. Users specify interest profiles as binary vectors where each feature corresponds to a set of one or more DNS tree nodes. Given a profile vector, a weighted PageRank is computed assigning a weight to each URL based on the match between the URL and the profile. We present promising results from an experiment in which users were allowed to select among nine URL features combining the top two levels of the DNS tree, leading to 29 pre-computed PageRank vectors from a Yahoo crawl. Personalized PageRank performed favorably compared to pure similarity based ranking and traditional PageRank.

Journal ArticleDOI
Einat Amitay1, David Carmel1, Michael Herscovici1, Ronny Lempel1, Aya Soffer1 
TL;DR: It is predicted that by using more robust methods for tracking modifications in the content of pages, search engines will be able to provide results that are more timely and better reflect current real-life trends than those they provide today.
Abstract: Although time has been recognized as an important dimension in the co-citation literature, to date it has not been incorporated into the analogous process of link analysis on the Web. In this paper, we discuss several aspects and uses of the time dimension in the context of Web information retrieval. We describe the ideal case-- where search engines track and store temporal data for each of the pages in their repository, assigning timestamps to the hyperlinks embedded within the pages. We introduce several applications which benefit from the availability of such timestamps. To demonstrate our claims, we use a somewhat simplistic approach, which dates links by approximating the age of the page's content. We show that by using this crude measure alone it is possible to detect and expose significant events and trends. We predict that by using more robust methods for tracking modifications in the content of pages, search engines will be able to provide results that are more timely and better reflect current real-life trends than those they provide today.

Journal ArticleDOI
TL;DR: In this paper, the authors apply emerging network theory to the use of hyperlinks in journalism stories on the Web and examine a five-year data set, including almost 1,500 Web news stories.
Abstract: This study applies emerging network theory to the use of hyperlinks in journalism stories on the Web. A five-year data set, including almost 1,500 Web news stories, is examined. The study concludes...

Patent
21 Sep 2004
TL;DR: In this paper, a system for assisting a user to determine whether a hyperlink to a target uniform resource locator (URL) is spoofed is presented, where a computerized system having a display unit is provided and logic ( 158 ) therein listens for activation of the hyperlink ( 152 ) in a message.
Abstract: A system ( 50, 150 ) for assisting a user ( 14 ) to determine whether a hyperlink ( 152 ) to a target uniform resource locator (URL) is spoofed. A computerized system having a display unit is provided and logic ( 158 ) therein listens for activation of the hyperlink ( 152 ) in a message ( 154 ). The logic ( 158 ) extracts an originator identifier ( 102 ) and encrypted data from the hyperlink ( 152 ), and decrypts the encrypted data into decrypted data based on the originator identifier ( 102 ). The logic ( 158 ) determines whether the hyperlink ( 152 ) includes the originator identifier ( 102 ) and the encrypted data decrypts successfully. Responsive to this it then presents a confirmation of authentication conveying the name of the owner and the domain name of the target URL on the display unit, and it redirects the user ( 14 ) to the target URL. Otherwise, it presents a warning dialog to the user ( 14 ) on the display unit.

Book ChapterDOI
16 May 2004
TL;DR: A ranking algorithm that uses the logs of search engines to boost their retrieval quality and is based on a clustering process in which groups of semantically similar queries are identified.
Abstract: Over the past few years, there has been a great deal of research on the use of content and links of Web pages to improve the quality of Web page rankings returned by search engines. However, few formal approaches have considered the use of search engine logs to improve the rankings. In this paper we propose a ranking algorithm that uses the logs of search engines to boost their retrieval quality. The relevance of Web pages is estimated using the historical preferences of users that appear in the logs. The algorithm is based on a clustering process in which groups of semantically similar queries are identified. The method proposed is simple, has low computational cost, and we show with experiments that achieves good results.

Journal ArticleDOI
TL;DR: It is argued that interpretations of web science maps covering multiple disciplines will need to be sensitive to the contexts of the links mapped, and links within a discipline were found to be different in character to links between pages in different disciplines.
Abstract: Hyperlinks between academic web sites, like citations, can potentially be used to map disciplinary structures and identify evidence of connections between disciplines. In this paper we classified a sample of links originating in three different disciplines: maths, physics and sociology. Links within a discipline were found to be different in character to links between pages in different disciplines. There were also disciplinary differences in both types of link. As a consequence, we argue that interpretations of web science maps covering multiple disciplines will need to be sensitive to the contexts of the links mapped.