scispace - formally typeset
Search or ask a question

Showing papers on "Hyperlink published in 2010"


Proceedings ArticleDOI
26 Oct 2010
TL;DR: The authors designed and implemented TAGME, a system that is able to efficiently and judiciously augment a plain-text with pertinent hyperlinks to Wikipedia pages, which is extremely informative, so any task that is currently addressed using the bag-of-words paradigm could benefit from using this annotation to draw upon Wikipedia pages and their interrelations.
Abstract: We designed and implemented TAGME, a system that is able to efficiently and judiciously augment a plain-text with pertinent hyperlinks to Wikipedia pages. The specialty of TAGME with respect to known systems [5,8] is that it may annotate texts which are short and poorly composed, such as snippets of search-engine results, tweets, news, etc.. This annotation is extremely informative, so any task that is currently addressed using the bag-of-words paradigm could benefit from using this annotation to draw upon (the millions of) Wikipedia pages and their inter-relations.

795 citations


Journal ArticleDOI
TL;DR: A web-based server, called Metabolite Set Enrichment Analysis (MSEA), is introduced to help researchers identify and interpret patterns of human or mammalian metabolite concentration changes in a biologically meaningful context.
Abstract: Gene set enrichment analysis (GSEA) is a widely used technique in transcriptomic data analysis that uses a database of predefined gene sets to rank lists of genes from microarray studies to identify significant and coordinated changes in gene expression data. While GSEA has been playing a significant role in understanding transcriptomic data, no similar tools are currently available for understanding metabolomic data. Here, we introduce a web-based server, called Metabolite Set Enrichment Analysis (MSEA), to help researchers identify and interpret patterns of human or mammalian metabolite concentration changes in a biologically meaningful context. Key to the development of MSEA has been the creation of a library of approximately 1000 predefined metabolite sets covering various metabolic pathways, disease states, biofluids, and tissue locations. MSEA also supports user-defined or custom metabolite sets for more specialized analysis. MSEA offers three different enrichment analyses for metabolomic studies including overrepresentation analysis (ORA), single sample profiling (SSP) and quantitative enrichment analysis (QEA). ORA requires only a list of compound names, while SSP and QEA require both compound names and compound concentrations. MSEA generates easily understood graphs or tables embedded with hyperlinks to relevant pathway images and disease descriptors. For non-mammalian or more specialized metabolomic studies, MSEA allows users to provide their own metabolite sets for enrichment analysis. The MSEA server also supports conversion between metabolite common names, synonyms, and major database identifiers. MSEA has the potential to help users identify obvious as well as 'subtle but coordinated' changes among a group of related metabolites that may go undetected with conventional approaches. MSEA is freely available at http://www.msea.ca.

556 citations


Proceedings ArticleDOI
25 Jul 2010
TL;DR: This paper proposes an efficient solution by modeling networked data as a mixture model composed of multiple normal communities and a set of randomly generated outliers, and applies the model on both synthetic data and DBLP data sets to demonstrate importance of this concept, as well as the effectiveness and efficiency of the proposed approach.
Abstract: Linked or networked data are ubiquitous in many applications. Examples include web data or hypertext documents connected via hyperlinks, social networks or user profiles connected via friend links, co-authorship and citation information, blog data, movie reviews and so on. In these datasets (called "information networks"), closely related objects that share the same properties or interests form a community. For example, a community in blogsphere could be users mostly interested in cell phone reviews and news. Outlier detection in information networks can reveal important anomalous and interesting behaviors that are not obvious if community information is ignored. An example could be a low-income person being friends with many rich people even though his income is not anomalously low when considered over the entire population. This paper first introduces the concept of community outliers (interesting points or rising stars for a more positive sense), and then shows that well-known baseline approaches without considering links or community information cannot find these community outliers. We propose an efficient solution by modeling networked data as a mixture model composed of multiple normal communities and a set of randomly generated outliers. The probabilistic model characterizes both data and links simultaneously by defining their joint distribution based on hidden Markov random fields (HMRF). Maximizing the data likelihood and the posterior of the model gives the solution to the outlier inference problem. We apply the model on both synthetic data and DBLP data sets, and the results demonstrate importance of this concept, as well as the effectiveness and efficiency of the proposed approach.

260 citations


Proceedings ArticleDOI
24 Jul 2010
TL;DR: Three design patterns are presented that address issues and can be used to accelerate a large class of graph algorithms based on message passing, exemplified by PageRank, and are shown to reduce the running time of PageRank on a web graph with 1.4 billion edges by 69%.
Abstract: Graphs are analyzed in many important contexts, including ranking search results based on the hyperlink structure of the world wide web, module detection of proteinprotein interaction networks, and privacy analysis of social networks. Many graphs of interest are difficult to analyze because of their large size, often spanning millions of vertices and billions of edges. As such, researchers have increasingly turned to distributed solutions. In particular, MapReduce has emerged as an enabling technology for large-scale graph processing. However, existing best practices for MapReduce graph algorithms have significant shortcomings that limit performance, especially with respect to partitioning, serializing, and distributing the graph. In this paper, we present three design patterns that address these issues and can be used to accelerate a large class of graph algorithms based on message passing, exemplified by PageRank. Experiments show that the application of our design patterns reduces the running time of PageRank on a web graph with 1.4 billion edges by 69%.

188 citations


01 Jan 2010
TL;DR: Top Leaders is a new community mining approach that regards a community as a set of followers congregating around a potential leader, and starts by identifying promising leaders in a given network then iteratively assembles followers to their closest leaders to form communities.
Abstract: Much of the data of scientific interest, particularly when independence of data is not assumed, can be represented in the form of information networks where data nodes are joined together to form edges corresponding to some kind of associations or relationships. Such information networks abound, like protein interactions in biology, web page hyperlink connections in information retrieval on the Web, cellphone call graphs in telecommunication, co-authorships in bibliometrics, crime event connections in criminology, etc. All these networks, also known as social networks, share a common property, the formation of connected groups of information nodes, called community structures. These groups are densely connected nodes with sparse connections outside the group. Finding these communities is an important task for the discovery of underlying structures in social networks, and has recently attracted much attention in data mining research. In this paper, we present Top Leaders, a new community mining approach that, simply put, regards a community as a set of followers congregating around a potential leader. Our algorithm starts by identifying promising leaders in a given network then iteratively assembles followers to their closest leaders to form communities, and subsequently finds new leaders in each group around which to gather followers again until convergence. Our intuitions are based on proven observations in social networks and the results are very promising. Experimental results on benchmark networks verify the feasibility and eectiveness of our new community mining approach.

122 citations


Proceedings ArticleDOI
26 Apr 2010
TL;DR: Facetedpedia is a faceted retrieval system for information discovery and exploration in Wikipedia that builds upon the collaborative vocabulary in Wikipedia, more specifically the intensive internal structures (hyperlinks) and folksonomy (category system).
Abstract: This paper proposes Facetedpedia, a faceted retrieval system for information discovery and exploration in Wikipedia. Given the set of Wikipedia articles resulting from a keyword query, Facetedpedia generates a faceted interface for navigating the result articles. Compared with other faceted retrieval systems, Facetedpedia is fully automatic and dynamic in both facet generation and hierarchy construction, and the facets are based on the rich semantic information from Wikipedia. The essence of our approach is to build upon the collaborative vocabulary in Wikipedia, more specifically the intensive internal structures (hyperlinks) and folksonomy (category system). Given the sheer size and complexity of this corpus, the space of possible choices of faceted interfaces is prohibitively large. We propose metrics for ranking individual facet hierarchies by user's navigational cost, and metrics for ranking interfaces (each with k facets) by both their average pairwise similarities and average navigational costs. We thus develop faceted interface discovery algorithms that optimize the ranking metrics. Our experimental evaluation and user study verify the effectiveness of the system.

113 citations


Patent
22 Dec 2010
TL;DR: In this paper, the authors propose a method to identify one or more sponsored web pages in response to a search query, where each sponsored web page is associated with a hyperlink and the response further includes a visual tag or a reference to the visual tag for the hyperlink if the web page has been accessed by at least one of the first users.
Abstract: Particular embodiments access a search query submitted by a first user; identify one or more sponsored web pages in response to the search query, wherein each sponsored web page is associated with a hyperlink; determine whether one or more of the sponsored web pages has been accessed by one or more second users, wherein the one or more second users are connected in a graph structure to the first user within a threshold degree of separation; and send a response comprising a hyperlink for at least one of the sponsored web pages in response to the search query, wherein the response further includes a visual tag or a reference to the visual tag for the hyperlink if the sponsored web page has been accessed by at least one of the one or more second users.

104 citations


Proceedings ArticleDOI
19 Jul 2010
TL;DR: A temporal web link-based ranking scheme, which incorporates features from historical author activities, based on a temporal web graph composed of multiple web snapshots at different time points, which improves upon PageRank in both relevance and freshness of the search results.
Abstract: The collective contributions of billions of users across the globe each day result in an ever-changing web. In verticals like news and real-time search, recency is an obvious significant factor for ranking. However, traditional link-based web ranking algorithms typically run on a single web snapshot without concern for user activities associated with the dynamics of web pages and links. Therefore, a stale page popular many years ago may still achieve a high authority score due to its accumulated in-links. To remedy this situation, we propose a temporal web link-based ranking scheme, which incorporates features from historical author activities. We quantify web page freshness over time from page and in-link activity, and design a web surfer model that incorporates web freshness, based on a temporal web graph composed of multiple web snapshots at different time points. It includes authority propagation among snapshots, enabling link structures at distinct time points to influence each other when estimating web page authority. Experiments on a real-world archival web corpus show our approach improves upon PageRank in both relevance and freshness of the search results.

69 citations


Patent
13 Dec 2010
TL;DR: In this article, a method of conducting operations for a social network application, comprises: generating a notification list of recent activities of users of the social network applications, wherein the notification list includes (1) at least one activity within the social networks application of a first user and (2) a hyperlink to an offer involving an activity that is directly related to at least 1 activity of the first user.
Abstract: In one embodiment, a method of conducting operations for a social network application, comprises: generating a notification list of recent activities of users of the social network application, wherein the notification list includes (1) at least one activity within the social network application of a first user and (2) at least one hyperlink to an offer involving an activity that is directly related to at least one activity of the first user, wherein an account of the first user defines at least one notification rule for controlling visibility of the at least one activity to other users of the social network application; and providing the notification list to a second user, that is a friend of the first user within the social network application, according to the at least one notification rule of the first user.

62 citations


Patent
08 Jun 2010
TL;DR: In this paper, the authors detect a request made by a mobile access device to access a web page that includes a display element, determining a geographic location of the mobile access devices in response to the request, selecting enhanced content to be associated with the display element in accordance with the geographic location, and dynamically hyperlinking the display elements to the enhanced content.
Abstract: An exemplary method includes detecting a request made by a mobile access device to access a web page that includes a display element, determining a geographic location of the mobile access device in response to the request, selecting enhanced content to be associated with the display element in accordance with the geographic location, and dynamically hyperlinking the display element to the enhanced content. Corresponding methods and systems are also described.

56 citations


Patent
09 Apr 2010
TL;DR: In this paper, a document is received via a communications interface and an entity pair is determined by a processor, the entity pair includes a concept included in a concept taxonomy and a textual representation included in the document.
Abstract: Techniques for including a hyperlink in a document is disclosed. A document is received via a communications interface. An entity pair is determined by a processor. The entity pair includes a concept included in a concept taxonomy and a textual representation included in the document. As output, a hyperlink is provided.

Journal ArticleDOI
TL;DR: This paper examined the structure of the international network created by news media and found that news media preferred linking to established information sources, typically in core counties, to the rest of the world.
Abstract: This study takes a network approach to examining international communication. Building upon the world system theory and the preferential attachment network theorem, the structure of the international network created by news media is examined. The use of external hyperlinks in 6,298 foreign sries in 20 languages from 223 news Web sites in 73 countries was examined. Findings revealed that information continues to flow from a handful of countries to the rest of the world. News media preferred linking to established information sources, typically in core counties. This study concludes that news media use new technology to replicate old practices.

Journal ArticleDOI
TL;DR: This paper presents an efficient spam detection system based on a classifier that combines new link-based features with language-model (LM)-based ones, and applies an LM approach to different sources of information from a Web page that belongs to the context of a link, in order to provide high-quality indicators of Web spam.
Abstract: Web spam is a serious problem for search engines because the quality of their results can be severely degraded by the presence of this kind of page. In this paper, we present an efficient spam detection system based on a classifier that combines new link-based features with language-model (LM)-based ones. These features are not only related to quantitative data extracted from the Web pages, but also to qualitative properties, mainly of the page links. We consider, for instance, the ability of a search engine to find, using information provided by the page for a given link, the page that the link actually points at. This can be regarded as indicative of the link reliability. We also check the coherence between a page and another one pointed at by any of its links. Two pages linked by a hyperlink should be semantically related, by at least a weak contextual relation. Thus, we apply an LM approach to different sources of information from a Web page that belongs to the context of a link, in order to provide high-quality indicators of Web spam. We have specifically applied the Kullback-Leibler divergence on different combinations of these sources of information in order to characterize the relationship between two linked pages. The result is a system that significantly improves the detection of Web spam using fewer features, on two large and public datasets SUchasWEBSPAM-UK2006 and WEBSPAM-UK2007.

Journal ArticleDOI
TL;DR: The role of self-regulation in strategies that readers use to decide the order in which to read the different sections of a hypertext is explored, to try to explain why some readers select hyperlinks based on strategies that lead to lower levels of comprehension.
Abstract: This article explores the role of self-regulation in strategies that readers use to decide the order in which to read the different sections of a hypertext. This study explored 3 main strategies for link selection based on (a) link screen position, (b) link interest, and (c) the semantic relation of a link with the section just read. This study followed Winne's (1995, 2001) model of self-regulated learning to try to explain why some readers select hyperlinks based on strategies that lead to lower levels of comprehension (i.e., screen position and personal interest). Results from 2 studies revealed that readers with low prior knowledge base their decisions on what to read next on a default screen position or on link interest more often if they are instructed to set a low learning goal, if they regularly use shallow learning strategies (e.g., memorizing), or if they are poor at calibrating their comprehension. Readers' link selection strategies mediated the effect of the self-regulation variables studied on...

Patent
13 Apr 2010
TL;DR: In this article, a computer-implemented method for survey management is disclosed, where purchase data is received via a computer network from a first digital device coupled to the computer network and a targeted survey is generated based on the purchase data.
Abstract: A computer-implemented method for survey management is disclosed. Purchase data is received via a computer network from a first digital device coupled to the computer network. A targeted survey is generated based on the purchase data. A web link associated with the targeted survey is transmitted to a second digital device coupled to the computer network. A survey response is received via the web link from the second digital device. A weight is assigned to the survey response. The weighted survey response is transmitted for display on a third device coupled to the network.

Journal Article
TL;DR: The Hyperlinked Society: Questioning Connections in the Digital Age as discussed by the authors is an attempt at understanding the importance of links in the digital age, and it is based on the idea that links are not just connecting information, but also becoming something in and of itself.
Abstract: * The Hyperlinked Society: Questioning Connections in the Digital Age. Joseph Turow and Lokman Tsui, eds. Ann Arbor, MI: University of Michigan Press, 2008. 319 pp. $80 hbk. $26.95 pbk. It was in 2006 when a New York Times Magazine piece suggested that the hyperlink may be one of the most important inventions of the past fifty years. Yes, that humble little link that helps people move around the Internet at lightning speed was just as important as the Internet itself. Maybe even more important. After all, what good would the Internet be if people could not move around it? If people had to put in computer codes every time they wanted to go somewhere, the whole thing would slow to a crawl. In fact, links are such an important part of the Internet that most people would scarcely recognize that a link is something different from the Internet. Links are something different. Also, they are relatively new. That is despite the fact that people for centuries have been seeking ways to connect information. Indexes have been around as long as humans have been writing books. Footnotes and endnotes have existed for centuries. Libraries would essentially be useless without a system for finding all the stuff on the shelves. So links may be new, but their ancestors are not. In this information age when people are bombarded with data on a minute-byminute basis, it is the links that help us navigate through the flood out there in the world. It is no wonder, then, that the humble link has become the subject of scholarly research. The Hyperlinked Society: Questioning Connections in the Digital Age is one attempt at understanding the importance of links. The book is derived from papers presented at a conference of the same name conducted in June 2006 at the Annenberg School for Communications at the University of Pennsylvania, where the co-editors are an associate dean and a doctoral student, respectively. The goal of the conference was to bring together different kinds of scholars to consider the role that links play in people's lives. These were not just experts in computer science, but rather people engaged in everything from cartography to entertainment blogs. What brought them all together was the humble little hyperlink. The most interesting part of the book is the vast array of things that most people take for granted. The link is not just connecting information; it is also becoming something in and of itself. When a link is created, there is an understanding of acceptance. A Web page is relevant and important when someone links to it. The more people who link to a Web page, the more legitimacy it has. After all, most search engines such as Google use links to rank Web sites. …

Proceedings Article
11 Jul 2010
TL;DR: This paper proposes a model for online thread retrieval based on inference networks that utilizes the structural properties of forum threads and empirically shows the effectiveness of the proposed model using real-world data.
Abstract: Online forums contain valuable human-generated information. End-users looking for information would like to find only those threads in forums where relevant information is present. Due to the distinctive characteristics of forum pages from generic web pages, special techniques are required to organize and search for information in these forums. Threads and pages in forums are different from other webpages in their hyperlinking patterns. Forum posts also have associated social and non-textual metadata. In this paper, we propose a model for online thread retrieval based on inference networks that utilizes the structural properties of forum threads. We also investigate the effects of incorporating various relevance indicators in our model. We empirically show the effectiveness of our proposed model using real-world data.

Journal ArticleDOI
TL;DR: In this paper, a study on hyperlink analysis and the algorithms used for link analysis in the Web Information retrieval was done and the convergence of the PageRank values are shown in a chart form.
Abstract: Problem statement: A study on hyperlink analysis and the algorithms used for link analysis in the Web Information retrieval was done. Approach: This research was initiated because of the dependability of search engines for information retrieval in the web. Understand the web structure mining and determine the importance of hyperlink in web information retrieval particularly using the Google Search engine. Hyperlink analysis was important methodology used by famous search engine Google to rank the pages. Results: The different algorithms used for link analysis like PageRank (PR), Weighted PageRank (WPR) and Hyperlink-Induced Topic Search (HITS) algorithms are discussed and compared. PageRank algorithm was implemented using a Java program and the convergence of the PageRank values are shown in a chart form. Conclusion: This study was done basically to explore the link structure algorithms for ranking and compare those algorithms. The further research on this area will be problems facing PageRank algorithm and how to handle those problems.

Journal ArticleDOI
TL;DR: A structural reranking approach to ad-hoc retrieval that applies to settings with no hyperlink information is proposed and the merits of the language-model-based method for inducing interdocument links are demonstrated by comparing it to previously suggested notions of interdocument similarities.
Abstract: The ad hoc retrieval task is to find documents in a corpus that are relevant to a query. Inspired by the PageRank and HITS (hubs and authorities) algorithms for Web search, we propose a structural reranking approach to ad-hoc retrieval that applies to settings with no hyperlink information. We reorder the documents in an initially retrieved set by exploiting implicit asymmetric relationships among them. We consider generation links, which indicate that the language model induced from one document assigns high probability to the text of another. We study a number of reranking criteria based on measures of centrality in the graphs formed by generation links, and show that integrating centrality into standard language-model-based retrieval is quite effective at improving precision at top ranks; the best resultant performance is comparable, and often superior, to that of a state-of-the-art pseudo-feedback-based retrieval approach. In addition, we demonstrate the merits of our language-model-based method for inducing interdocument links by comparing it to previously suggested notions of interdocument similarities (e.g., cosines within the vector-space model).We also show that ourmethods for inducing centrality are substantially more effective than approaches based on document-specific characteristics, several of which are novel to this study.

Journal ArticleDOI
TL;DR: It is proposed that word co-occurrences on Webpages can be a measure of the relatedness of organizations and is tested in a group of companies in the LTE and WiMax sectors of the telecommunications industry.

Patent
20 Aug 2010
TL;DR: In this paper, a method and apparatus for controlling access to network resources referenced in electronic mail messages comprises the computer-implemented steps of receiving an electronic mail message that comprises one or more hyperlinks.
Abstract: A method and apparatus for controlling access to network resources referenced in electronic mail messages comprises the computer-implemented steps of receiving an electronic mail message that comprises one or more hyperlinks; determining sender information that identifies a sender of the electronic mail message; creating and storing a record that associates the sender information with each of the one or more hyperlinks; receiving a request to access a specified hyperlink among the one or more hyperlinks; retrieving, based on the specified hyperlink, the record; retrieving, based on the sender information associated with the specified hyperlink, sender reputation information associated with the sender; determining, based on the sender reputation information, a particular action among a plurality of allowed actions; and issuing a network request to access the specified hyperlink only when the particular action is allowing user access to the specified hyperlink.

Journal ArticleDOI
TL;DR: This study proposes an ant colony system to reorganize website structures and the proposed algorithm is tested extensively with numerical examples to verify the algorithm applicability.
Abstract: The growth of the Internet has led to many studies on adaptive websites based on web usage mining. Most studies focus on providing assistance to users rather than optimizing the website structure itself. A recent work pioneered the use of 0-1 programming models to optimally reorganize websites based on the cohesion among web pages obtained by web usage mining. The proposed models reduce the information overload and search depth for users surfing the web. A heuristic approach has also been proposed to reduce the required computation time. However, the heuristic approach involving two successive 0-1 programming models still requires a very long computation time to find the optimal solution, especially when the website contains many hyperlinks. To resolve the efficiency problem, this study proposes an ant colony system to reorganize website structures. The proposed algorithm is tested extensively with numerical examples. Additionally, an empirical study with a real-world website is conducted to verify the algorithm applicability.

Proceedings Article
01 Aug 2010
TL;DR: Novel methods for computing semantic relatedness by spreading activation energy over the hyperlink structure of Wikipedia are proposed and it is demonstrated that these techniques can approach state-of-the-art performance, while requiring only a fraction of the background data.
Abstract: Keyword-matching systems based on simple models of semantic relatedness are inadequate at modelling the ambiguities in natural language text, and cannot reliably address the increasingly complex information needs of users. In this paper we propose novel methods for computing semantic relatedness by spreading activation energy over the hyperlink structure of Wikipedia. We demonstrate that our techniques can approach state-of-the-art performance, while requiring only a fraction of the background data.

Journal ArticleDOI
TL;DR: A framework for the association of semantic data to webpage links based on a specific domain ontology is proposed, additionally permitting the user to express his opinion regarding his emotions about the content of the link.

Journal ArticleDOI
01 Apr 2010
TL;DR: It was found that in webpages with high scent, users were significantly more focused, confident of their choices, efficient and effective compared to webpages without scent, and this comparison provided support for the effectiveness of ISEtool in indicating potential scent-related navigability problems.
Abstract: Following information scent has been established as a metaphor to describe a user's behaviour while navigating an information space by successively selecting hyperlinks. This metaphor suggests that users assess the profitability of following a particular hyperlink based on its perceived semantic association with their goal. The purpose of this paper is to study how information scent, this important attribute of hypermedia navigability, influences concurrently four aspects of users' behaviour while exploring a website: (1) distribution of attention; (2) confidence in choice of link; (3) efficiency; and (4) effectiveness. It was found that in webpages with high scent, users were significantly more focused, confident of their choices, efficient and effective compared to webpages with ambiguous scent. The findings of the study are discussed in comparison with results obtained from a previously conducted analysis using InfoScent Evaluator (ISEtool), a tool that has been proposed to facilitate scent evaluation of websites. This comparison provided support for the effectiveness of ISEtool in indicating potential scent-related navigability problems. We argue that such a tool-based approach can facilitate hypermedia design by reducing the resources and expertise required, and by providing the necessary flexibility for practitioners.

Patent
08 Jan 2010
TL;DR: In this paper, the authors identify, from within a corpus of documents, a subject (e.g., person, location, date, etc.) that is relevant to a topic and that is usable to enhance a topic-describing document.
Abstract: The present technology is related to identifying, from within a corpus of documents, a subject (e.g., person, location, date, etc.) that is relevant to a topic and that is usable to enhance a topic-describing document. Documents within the corpus of documents share a link structure, such that some documents include hyperlinks that enable navigation to the topic-describing document, and the topic-describing document includes hyperlinks that enable navigation to other documents. Text of documents within the corpus is parsed to identify the subject, and a context of the subject suggests a degree of relevance of the subject to the topic. An enhancement type of the subject is determined, and a version of the topic-describing document is enhanced to include a presentation of the subject.

Journal Article
TL;DR: The authors investigated how online reading affected EFL students' reading comprehension and reported the difficulties eighty-eight Taiwanese students enrolled in the first-year Freshman English course at a comprehensive university in northern Taiwan encountered during the process of online reading.
Abstract: When reading traditional texts printed on paper, students start reading from the top left-hand corner and finish at the bottom right-hand corner. Their eyes move in a straight line, which is a linear activity. However, when reading hypertexts, students can click a hyperlink to find out certain information. As they click the various hyperlinks, they are often taken to a different web page. Online reading is thus not a linear activity anymore. This study investigated how online reading affected EFL students’ reading comprehension and reported the difficulties eighty-eight Taiwanese EFL students enrolled in the first-year Freshman English course at a comprehensive university in northern Taiwan encountered during the process of online reading. The results show that students disliked reading from computer screens. The factors that affected students when reading hypertext were font size and background color of web pages. The major difficulties included eyestrain, inability to take notes or underline text, and skipping lines when reading hypertext on computer screens. Results also support the claim that students found hypertext reading to be more difficult than linear reading.

Journal ArticleDOI
Yuting Liu1, Tie-Yan Liu2, Bin Gao2, Zhi-Ming Ma, Hang Li2 
TL;DR: Experimental results have shown that the proposed algorithms can outperform the baseline methods such as PageRank and TrustRank in several tasks, demonstrating the advantage of using the proposed framework.
Abstract: This paper is concerned with a framework to compute the importance of webpages by using real browsing behaviors of Web users. In contrast, many previous approaches like PageRank compute page importance through the use of the hyperlink graph of the Web. Recently, people have realized that the hyperlink graph is incomplete and inaccurate as a data source for determining page importance, and proposed using the real behaviors of Web users instead. In this paper, we propose a formal framework to compute page importance from user behavior data (which covers some previous works as special cases). First, we use a stochastic process to model the browsing behaviors of Web users. According to the analysis on hundreds of millions of real records of user behaviors, we justify that the process is actually a continuous-time time-homogeneous Markov process, and its stationary probability distribution can be used as the measure of page importance. Second, we propose a number of ways to estimate parameters of the stochastic process from real data, which result in a group of algorithms for page importance computation (all referred to as BrowseRank). Our experimental results have shown that the proposed algorithms can outperform the baseline methods such as PageRank and TrustRank in several tasks, demonstrating the advantage of using our proposed framework.

Journal ArticleDOI
TL;DR: This study employed network analysis, a set of research procedures for identifying structures in social systems, as the basis of the relations among the system's components rather than the attributes of individuals, to classify the role of political Web sites into relational (hyperlinking) and topical (shared-issues) aspects.
Abstract: Politicians' Web sites have been considered a medium for organizing, mobilizing, and agenda-setting, but extant literature lacks a systematic approach to interpret the Web sites of senators—a new medium for political communication. This study classifies the role of political Web sites into relational (hyperlinking) and topical (shared-issues) aspects. The two aspects may be viewed from a social embeddedness perspective and three facets, as K. Foot and S. Schneider ([2002]) suggested. This study employed network analysis, a set of research procedures for identifying structures in social systems, as the basis of the relations among the system's components rather than the attributes of individuals. Hyperlink and issue data were gathered from the United States Senate Web site and Yahoo. Major findings include: (a) The hyperlinks are more targeted at Democratic senators than at Republicans and are a means of communication for senators and users; (b) the issue network found from the Web is used for discussing public agendas and is more highly utilized by Republican senators; (c) the hyperlink and issue networks are correlated; and (d) social relationships and issue ecologies can be effectively detected by these two networks. The need for further research is addressed. © 2010 Wiley Periodicals, Inc.

Patent
Casey Ho1, Joanne Mckinley1
21 Jul 2010