scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the Association for Information Science and Technology in 2010"


Journal IssueDOI
TL;DR: SentiStrength as discussed by the authors is able to predict positive emotion with 60.6p accuracy and negative emotion with 72.8p accuracy, both based upon strength scales of 1-5.
Abstract: A huge number of informal messages are posted every day in social network sites, blogs, and discussion forums. Emotions seem to be frequently important in these texts for expressing friendship, showing social support or as part of online arguments. Algorithms to identify sentiment and sentiment strength are needed to help understand the role of emotion in this informal communication and also to identify inappropriate or anomalous affective utterances, potentially associated with threatening behavior to the self or others. Nevertheless, existing sentiment detection algorithms tend to be commercially oriented, designed to identify opinions about products rather than user behaviors. This article partly fills this gap with a new algorithm, SentiStrength, to extract sentiment strength from informal English text, using new methods to exploit the de facto grammars and spelling styles of cyberspace. Applied to MySpace comments and with a lookup table of term sentiment strengths optimized by machine learning, SentiStrength is able to predict positive emotion with 60.6p accuracy and negative emotion with 72.8p accuracy, both based upon strength scales of 1–5. The former, but not the latter, is better than baseline and a wide range of general machine learning approaches. © 2010 Wiley Periodicals, Inc.

1,371 citations


Journal IssueDOI
TL;DR: This chapter discusses Information Retrieval, the science and technology behind information retrieval and retrieval, and some of the techniques used in the retrieval of information.
Abstract: Introduction To Information Retrieval Overdrive Digital. Introduction To Information Retrieval. Introduction To Information Retrieval Putao Ufcg. Introduction To Information Retrieval Arbeitsbereiche. Introduction To Information Retrieval. Introduction To Information Retrieval Stanford Nlp Group. Introduction To Information Retrieval Cs Ucr Edu. Introduction To Information Retrieval By Christopher D. Introduction To Information Retrieval Book. Information Retrieval The Mit Press. Introduction Information Retrieval Uvm. Information Retrieval Lmu Munich. Introduction To Information Retrieval Stanford University. Introduction To Information Retrieval. Introduction To Information Retrieval Amp Models Slideshare. Introduction To Information Retrieval Kangwon Ac Kr. Information Retrieval. Introduction To Information Retrieval Assets. Introduction To Information Retrieval. Introduction To Information Retrieval

885 citations


Journal ArticleDOI
TL;DR: In this article, a multiple-perspective co-citation analysis method is introduced for characterizing and interpreting the structure and dynamics of cocitation clusters, which facilitates analytic and sense-making tasks by integrating network visualization, spectral clustering, automatic cluster labeling, and text summarization.
Abstract: A multiple-perspective co-citation analysis method is introduced for characterizing and interpreting the structure and dynamics of co-citation clusters. The method facilitates analytic and sense making tasks by integrating network visualization, spectral clustering, automatic cluster labeling, and text summarization. Co-citation networks are decomposed into co-citation clusters. The interpretation of these clusters is augmented by automatic cluster labeling and summarization. The method focuses on the interrelations between a co-citation cluster's members and their citers. The generic method is applied to a three-part analysis of the field of Information Science as defined by 12 journals published between 1996 and 2008: 1) a comparative author co-citation analysis (ACA), 2) a progressive ACA of a time series of co-citation networks, and 3) a progressive document co-citation analysis (DCA). Results show that the multiple- perspective method increases the interpretability and accountability of both ACA and DCA networks.

866 citations


Journal IssueDOI
TL;DR: Of the three pure citation-based approaches, bibliographic coupling slightly outperforms co-citation analysis using both accuracy measures; direct citation is the least accurate mapping approach by far.
Abstract: In the past several years studies have started to appear comparing the accuracies of various science mapping approaches. These studies primarily compare the cluster solutions resulting from different similarity approaches, and give varying results. In this study we compare the accuracies of cluster solutions of a large corpus of 2,153,769 recent articles from the biomedical literature (2004–2008) using four similarity approaches: co-citation analysis, bibliographic coupling, direct citation, and a bibliographic coupling-based citation-text hybrid approach. Each of the four approaches can be considered a way to represent the research front in biomedicine, and each is able to successfully cluster over 92p of the corpus. Accuracies are compared using two metrics—within-cluster textual coherence as defined by the Jensen-Shannon divergence, and a concentration measure based on the grant-to-article linkages indexed in MEDLINE. Of the three pure citation-based approaches, bibliographic coupling slightly outperforms co-citation analysis using both accuracy measures; direct citation is the least accurate mapping approach by far. The hybrid approach improves upon the bibliographic coupling results in all respects. We consider the results of this study to be robust given the very large size of the corpus, and the specificity of the accuracy measures used. © 2010 Wiley Periodicals, Inc.

761 citations


Journal IssueDOI
TL;DR: In this paper, a multidimensional scale to measure user engagement in online shopping environments is presented, based on a theory of engagement and a core set of attributes that operationalized this construct.
Abstract: Facilitating engaging user experiences is essential in the design of interactive systems. To accomplish this, it is necessary to understand the composition of this construct and how to evaluate it. Building on previous work that posited a theory of engagement and identified a core set of attributes that operationalized this construct, we constructed and evaluated a multidimensional scale to measure user engagement. In this paper we describe the development of the scale, as well as two large-scale studies (N=440 and N=802) that were undertaken to assess its reliability and validity in online shopping environments. In the first we used Reliability Analysis and Exploratory Factor Analysis to identify six attributes of engagement: Perceived Usability, Aesthetics, Focused Attention, Felt Involvement, Novelty, and Endurability. In the second we tested the validity of and relationships among those attributes using Structural Equation Modeling. The result of this research is a multidimensional scale that may be used to test the engagement of software applications. In addition, findings indicate that attributes of engagement are highly intertwined, a complex interplay of user-system interaction variables. Notably, Perceived Usability played a mediating role in the relationship between Endurability and Novelty, Aesthetics, Felt Involvement, and Focused Attention. © 2010 Wiley Periodicals, Inc.

541 citations


Journal ArticleDOI
TL;DR: A huge number of informal messages are posted every day in social network sites, blogs, and discussion forums, and emotions seem to be frequently important in these texts for expressing friendship, love and affection.
Abstract: A huge number of informal messages are posted every day in social network sites, blogs, and discussion forums. Emotions seem to be frequently important in these texts for expressing friendship, sho...

532 citations


Journal ArticleDOI
TL;DR: Analysis of the type of content that legislators are posting to Twitter shows that Congresspeople are primarily using Twitter to disperse information, particularly links to news articles about themselves and to their blog posts, and to report on their daily activities.
Abstract: Twitter is a microblogging and social networking service with millions of members and growing at a tremendous rate. With the buzz surrounding the service have come claims of its ability to transform the way people interact and share information and calls for public figures to start using the service. In this study, we are interested in the type of content that legislators are posting to the service, particularly by members of the United States Congress. We read and analyzed the content of over 6,000 posts from all members of Congress using the site. Our analysis shows that Congresspeople are primarily using Twitter to disperse information, particularly links to news articles about themselves and to their blog posts, and to report on their daily activities. These tend not to provide new insights into government or the legislative process or to improve transparency; rather, they are vehicles for self-promotion. However, Twitter is also facilitating direct communication between Congresspeople and citizens, though this is a less popular activity. We report on our findings and analysis and discuss other uses of Twitter for legislators. © 2010 Wiley Periodicals, Inc.

459 citations


Journal IssueDOI
TL;DR: Results show that the multiple-perspective cocitation analysis method increases the interpretability and accountability of both ACA and DCA networks.
Abstract: A multiple-perspective cocitation analysis method is introduced for characterizing and interpreting the structure and dynamics of cocitation clusters. The method facilitates analytic and sense making tasks by integrating network visualization, spectral clustering, automatic cluster labeling, and text summarization. Cocitation networks are decomposed into cocitation clusters. The interpretation of these clusters is augmented by automatic cluster labeling and summarization. The method focuses on the interrelations between a cocitation cluster's members and their citers. The generic method is applied to a three-part analysis of the field of information science as defined by 12 journals published between 1996 and 2008: (a) a comparative author cocitation analysis (ACA), (b) a progressive ACA of a time series of cocitation networks, and (c) a progressive document cocitation analysis (DCA). Results show that the multiple-perspective method increases the interpretability and accountability of both ACA and DCA networks. © 2010 Wiley Periodicals, Inc.

380 citations


Journal IssueDOI
TL;DR: This article describes how this approach to visually locate bodies of research within the sciences fits with other efforts to locally and globally map scientific outputs, and shows how these science overlay maps help benchmarking, explore collaborations, and track temporal changes.
Abstract: We present a novel approach to visually locate bodies of research within the sciences, both at each moment of time and dynamically. This article describes how this approach fits with other efforts to locally and globally map scientific outputs. We then show how these science overlay maps help benchmarking, explore collaborations, and track temporal changes, using examples of universities, corporations, funding agencies, and research topics. We address their conditions of application and discuss advantages, downsides, and limitations. Overlay maps especially help investigate the increasing number of scientific developments and organizations that do not fit within traditional disciplinary categories. We make these tools available online to enable researchers to explore the ongoing sociocognitive transformations of science and technology systems. © 2010 Wiley Periodicals, Inc.

339 citations


Journal IssueDOI
TL;DR: The authors examined the extent to which emotion is present in MySpace comments, using a combination of data mining and content analysis, and exploring age and gender, concluding that females are more successful social network site users partly because of their greater ability to textually harness positive affect.
Abstract: Despite the rapid growth in social network sites and in data mining for emotion (sentiment analysis), little research has tied the two together, and none has had social science goals. This article examines the extent to which emotion is present in MySpace comments, using a combination of data mining and content analysis, and exploring age and gender. A random sample of 819 public comments to or from U.S. users was manually classified for strength of positive and negative emotion. Two thirds of the comments expressed positive emotion, but a minority (20p) contained negative emotion, confirming that MySpace is an extraordinarily emotion-rich environment. Females are likely to give and receive more positive comments than are males, but there is no difference for negative comments. It is thus possible that females are more successful social network site users partly because of their greater ability to textually harness positive affect. © 2010 Wiley Periodicals, Inc.

289 citations


Journal IssueDOI
TL;DR: It is concluded that in general maps constructed using VOS provide a more satisfactory representation of a dataset than maps constructing using well-known MDS approaches.
Abstract: VOS is a new mapping technique that can serve as an alternative to the well-known technique of multidimensional scaling (MDS). We present an extensive comparison between the use of MDS and the use of VOS for constructing bibliometric maps. In our theoretical analysis, we show the mathematical relation between the two techniques. In our empirical analysis, we use the techniques for constructing maps of authors, journals, and keywords. Two commonly used approaches to bibliometric mapping, both based on MDS, turn out to produce maps that suffer from artifacts. Maps constructed using VOS turn out not to have this problem. We conclude that in general maps constructed using VOS provide a more satisfactory representation of a dataset than maps constructed using well-known MDS approaches. © 2010 Wiley Periodicals, Inc.

Journal IssueDOI
TL;DR: It is shown that tenure in the community does affect participation, but that this effect depends on the type of participation activity, and a weak moderating effect on a number of motivations with regard to their effect on participation is demonstrated.
Abstract: In recent years we have witnessed a significant growth of social-computing communities—online services in which users share information in various forms. As content contributions from participants are critical to the viability of these communities, it is important to understand what drives users to participate and share information with others in such settings. We extend previous literature on user contribution by studying the factors that are associated with various forms of participation in a large online photo-sharing community. Using survey and system data, we examine four different forms of participation and consider the differences between these forms. We build on theories of motivation to examine the relationship between users' participation and their motivations with respect to their tenure in the community. Amongst our findings, we identify individual motivations (both extrinsic and intrinsic) that underpin user participation, and their effects on different forms of information sharing; we show that tenure in the community does affect participation, but that this effect depends on the type of participation activity. Finally, we demonstrate that tenure in the community has a weak moderating effect on a number of motivations with regard to their effect on participation. Directions for future research, as well as implications for theory and practice, are discussed. © 2010 Wiley Periodicals, Inc.


Journal IssueDOI
TL;DR: Examination of articles in biogeography found that most of the influence is not cited, specific types of articles that are influential are cited while other types of that also are influential have not been cited, and work that is “uncited” and “seldom cited” is used extensively.
Abstract: To determine influences on the production of a scientific article, the content of the article must be studied. We examined articles in biogeography and found that most of the influence is not cited, specific types of articles that are influential are cited while other types of that also are influential are not cited, and work that is “uncited” and “seldom cited” is used extensively. As a result, evaluative citation analysis should take uncited work into account. © 2010 Wiley Periodicals, Inc.

Journal IssueDOI
TL;DR: In this paper, the authors examined the relationship between motivations, internal cognitive beliefs, social-relational factors, and knowledge-sharing intentions and found that attitudes, knowledge self-efficacy, and a basic norm of generalized reciprocity have significant and direct relationships with knowledge sharing intentions.
Abstract: This study explores how and why people participate in collaborative knowledge-building practices in the context of Wikipedia. Based on a survey of 223 Wikipedians, this study examines the relationship between motivations, internal cognitive beliefs, social-relational factors, and knowledge-sharing intentions. Results from structural equation modeling (SEM) analysis reveal that attitudes, knowledge self-efficacy, and a basic norm of generalized reciprocity have significant and direct relationships with knowledge-sharing intentions. Altruism (an intrinsic motivator) is positively related to attitudes toward knowledge sharing, whereas reputation (an extrinsic motivator) is not a significant predictor of attitude. The study also reveals that a social-relational factor, namely, a sense of belonging, is related to knowledge-sharing intentions indirectly through different motivational and social factors such as altruism, subjective norms, knowledge self-efficacy, and generalized reciprocity. Implications for future research and practice are discussed. © 2010 Wiley Periodicals, Inc.

Journal IssueDOI
TL;DR: There might be an optimum of interdisciplinarity beyond which the research is too dispersed to find its niche and under which it is too mainstream to have high impact, according to this analysis of individual articles published in Web of Science in 2000.
Abstract: This article analyzes the effect of interdisciplinarity on the scientific impact of individual articles. Using all the articles published in Web of Science in 2000, we define the degree of interdisciplinarity of a given article as the percentage of its cited references made to journals of other disciplines. We show that although for all disciplines combined there is no clear correlation between the level of interdisciplinarity of articles and their citation rates, there are nonetheless some disciplines in which a higher level of interdisciplinarity is related to a higher citation rates. For other disciplines, citations decline as interdisciplinarity grows. One characteristic is visible in all disciplines: Highly disciplinary and highly interdisciplinary articles have a low scientific impact. This suggests that there might be an optimum of interdisciplinarity beyond which the research is too dispersed to find its niche and under which it is too mainstream to have high impact. Finally, the relationship between interdisciplinarity and scientific impact is highly determined by the citation characteristics of the disciplines involved: Articles citing citation-intensive disciplines are more likely to be cited by those disciplines and, hence, obtain higher citation scores than would articles citing non-citation-intensive disciplines. © 2010 Wiley Periodicals, Inc.

Journal IssueDOI
TL;DR: This article showed that the journal in which papers are published has a strong influence on their citation rates, as duplicate papers published in high-impact journals obtain, on average, twice as many citations as their identical counterparts published in journals with lower impact factors.
Abstract: Since the publication of Robert K Merton's theory of cumulative advantage in science (Matthew Effect), several empirical studies have tried to measure its presence at the level of papers, individual researchers, institutions, or countries However, these studies seldom control for the intrinsic “quality” of papers or of researchers—“better” (however defined) papers or researchers could receive higher citation rates because they are indeed of better quality Using an original method for controlling the intrinsic value of papers—identical duplicate papers published in different journals with different impact factors—this paper shows that the journal in which papers are published have a strong influence on their citation rates, as duplicate papers published in high-impact journals obtain, on average, twice as many citations as their identical counterparts published in journals with lower impact factors The intrinsic value of a paper is thus not the only reason a given paper gets cited or not, there is a specific Matthew Effect attached to journals and this gives to papers published there an added value over and above their intrinsic quality © 2010 Wiley Periodicals, Inc

Journal ArticleDOI
TL;DR: In this paper, a general methodology for conducting bibliometric analyses at the micro level is presented, which combines several indicators grouped into three factors or dimensions, which characterize different aspects of scientific performance.
Abstract: The authors set forth a general methodology for conducting bibliometric analyses at the micro level. It combines several indicators grouped into three factors or dimensions, which characterize different aspects of scientific performance. Different profiles or “classes” of scientists are described according to their research performance in each dimension. A series of results based on the findings from the application of this methodology to the study of Spanish National Research Council scientists in Spain in three thematic areas are presented. Special emphasis is made on the identification and description of top scientists from structural and bibliometric perspectives. The effects of age on the productivity and impact of the different classes of scientists are analyzed. The classificatory approach proposed herein may prove a useful tool in support of research assessment at the individual level and for exploring potential determinants of research success. © 2010 Wiley Periodicals, Inc.

Journal IssueDOI
TL;DR: It is as urgent now as it was in 1975 seriously to consider “the subject knowledge view” of relevance (which may also be termed ‘the epistemological view’), because basic theoretical assumptions seem not to have been properly addressed.
Abstract: In 1975 Tefko Saracevic declared “the subject knowledge view” to be the most fundamental perspective of relevance. This paper examines the assumptions in different views of relevance, including “the system's view” and “the user's view” and offers a reinterpretation of these views. The paper finds that what was regarded as the most fundamental view by Saracevic in 1975 has not since been considered (with very few exceptions). Other views, which are based on less fruitful assumptions, have dominated the discourse on relevance in information retrieval and information science. Many authors have reexamined the concept of relevance in information science, but have neglected the subject knowledge view, hence basic theoretical assumptions seem not to have been properly addressed. It is as urgent now as it was in 1975 seriously to consider “the subject knowledge view” of relevance (which may also be termed “the epistemological view”). The concept of relevance, like other basic concepts, is influenced by overall approaches to information science, such as the cognitive view and the domain-analytic view. There is today a trend toward a social paradigm for information science. This paper offers an understanding of relevance from such a social point of view. © 2010 Wiley Periodicals, Inc.

Journal IssueDOI
TL;DR: It is found that shapers are not more likely to be managers or members of a community's core group who might typically serve in an administrator role, contrary to prior expectations.
Abstract: New Web 2.0 technologies such as wikis permit any organizational member of a virtual community of practice (CoP) to dynamically edit, integrate, and rewrite content (what we call knowledge shaping) as well as contribute personal knowledge. Previous research on factors that motivate contribution in virtual CoPs has focused exclusively on factors explaining why people contribute their personal knowledge, with no research focused on why people make the knowledge-shaping contributions (rewriting, integrating, and restructuring pages) which are possible with wikis. We hypothesize that factors that explain frequency of contribution will be different for those who shape from those who contribute only their personal knowledge. The results support our hypotheses. In addition, we find that shapers are not more likely to be managers or members of a community's core group who might typically serve in an administrator role, contrary to prior expectations. The implications of using Web 2.0 tools to encourage this shaping behavior are discussed. © 2010 Wiley Periodicals, Inc.


Journal IssueDOI
TL;DR: Fractional counting of citations can be contextualized at the paper level and aggregated impacts of sets can be tested for their significance, and it can be shown that the weighted impact of Annals of Mathematics is not so much lower than that of Molecular Cell despite a five-f old difference.
Abstract: Impact factors (and similar measures such as the Scimago Journal Rankings) suffer from two problems: (a) citation behavior varies among fields of science and, therefore, leads to systematic differences, and (b) there are no statistics to inform us whether differences are significant. The recently introduced “source normalized impact per paper” indicator of Scopus tries to remedy the first of these two problems, but a number of normalization decisions are involved, which makes it impossible to test for significance. Using fractional counting of citations—based on the assumption that impact is proportionate to the number of references in the citing documents—citations can be contextualized at the paper level and aggregated impacts of sets can be tested for their significance. It can be shown that the weighted impact of Annals of Mathematics (0.247) is not so much lower than that of Molecular Cell (0.386) despite a five-f old difference between their impact factors (2.793 and 13.156, respectively). © 2010 Wiley Periodicals, Inc.

Journal ArticleDOI
TL;DR: The authors discuss the pros and cons of various options, and provide software (freeware) for bridging existing gaps between the Science Citation Indices (Thomson Reuters) and Scopus (Elsevier) and these various visualization tools on the other.
Abstract: Using Google Earth, Google Maps, and-or network visualization programs such as Pajek, one can overlay the network of relations among addresses in scientific publications onto the geographic map. The authors discuss the pros and cons of various options, and provide software (freeware) for bridging existing gaps between the Science Citation Indices (Thomson Reuters) and Scopus (Elsevier), on the one hand, and these various visualization tools on the other. At the level of city names, the global map can be drawn reliably on the basis of the available address information. At the level of the names of organizations and institutes, there are problems of unification both in the ISI databases and with Scopus. Pajek enables a combination of visualization and statistical analysis, whereas the Google Maps and its derivatives provide superior tools on the Internet. © 2010 Wiley Periodicals, Inc.

Journal IssueDOI
TL;DR: Using the Scopus dataset (1996–2007) a grand matrix of aggregated journal-journal citations was constructed, which can be compared in terms of the network structures with the matrix contained in the Journal Citation Reports of the Institute of Scientific Information (ISI).
Abstract: Using the Scopus dataset (1996–2007) a grand matrix of aggregated journal-journal citations was constructed. This matrix can be compared in terms of the network structures with the matrix contained in the Journal Citation Reports (JCR) of the Institute of Scientific Information (ISI). Because the Scopus database contains a larger number of journals and covers the humanities, one would expect richer maps. However, the matrix is in this case sparser than in the case of the ISI data. This is because of (a) the larger number of journals covered by Scopus and (b) the historical record of citations older than 10 years contained in the ISI database. When the data is highly structured, as in the case of large journals, the maps are comparable, although one may have to vary a threshold (because of the differences in densities). In the case of interdisciplinary journals and journals in the social sciences and humanities, the new database does not add a lot to what is possible with the ISI databases. © 2010 Wiley Periodicals, Inc.

Journal IssueDOI
TL;DR: In this article, the authors surveyed 2,063 academic researchers in natural science, engineering, and medical science from five research universities in the United States to understand different aspects of researchers' information-seeking behavior.
Abstract: As new technologies and information delivery systems emerge, the way in which individuals search for information to support research, teaching, and creative activities is changing. To understand different aspects of researchers' information-seeking behavior, this article surveyed 2,063 academic researchers in natural science, engineering, and medical science from five research universities in the United States. A Web-based, in-depth questionnaire was designed to quantify researchers' information searching, information use, and information storage behaviors. Descriptive statistics are reported. Additionally, analysis of results is broken out by institutions to compare differences among universities. Significant findings are reported, with the biggest changes because of increased utilization of electronic methods for searching, sharing, and storing scholarly content, as well as for utilizing library services. Generally speaking, researchers in the five universities had similar information-seeking behavior, with small differences because of varying academic unit structures and myriad library services provided at the individual institutions. © 2010 Wiley Periodicals, Inc.

Journal IssueDOI
TL;DR: Perceived gratification factors, demographic variables, such as basic familiarity with features of mobile communication devices, and IT-related backgrounds were significant in predicting intention to use mobile sharing and gaming applications such as Indagator, however, age, gender, and the personal status gratification factor were nonsignificant predictors.
Abstract: The confluence of mobile content sharing and pervasive gaming yields new opportunities for developing novel applications on mobile devices. Yet, studies on users' attitudes and behaviors related to mobile gaming, content-sharing, and retrieval activities (referred to simply as content sharing and gaming) have been lacking. For this reason, the objectives of this article are three-fold. One, it introduces Indagator, an application that incorporates multiplayer, pervasive gaming elements into mobile content-sharing activities. Two, it seeks to uncover the motivations for content sharing within a game-based environment. Three, it aims to identify types of users who are motivated to use Indagator for content sharing. Informed by the uses and gratifications paradigm, a survey was designed and administered to 203 undergraduate and graduate students from two large universities. The findings revealed that perceived gratification factors, such as information discovery, entertainment, information quality, socialization, and relationship maintenance, demographic variables, such as basic familiarity with features of mobile communication devices, and IT-related backgrounds were significant in predicting intention to use mobile sharing and gaming applications such as Indagator. However, age, gender, and the personal status gratification factor were nonsignificant predictors. This article concludes by presenting the implications, limitations, and future research directions. © 2010 Wiley Periodicals, Inc.

Journal IssueDOI
TL;DR: This paper presents a system, known as Concept-Relation-Concept Tuple-based Ontology Learning (CRCTOL), for mining ontologies automatically from domain-specific documents and presents two case studies where CRCTOL is used to build a terrorism domain ontology and a sport event domain ontologies.
Abstract: Domain ontologies play an important role in supporting knowledge-based applications in the Semantic Web. To facilitate the building of ontologies, text mining techniques have been used to perform ontology learning from texts. However, traditional systems employ shallow natural language processing techniques and focus only on concept and taxonomic relation extraction. In this paper we present a system, known as Concept-Relation-Concept Tuple-based Ontology Learning (CRCTOL), for mining ontologies automatically from domain-specific documents. Specifically, CRCTOL adopts a full text parsing technique and employs a combination of statistical and lexico-syntactic methods, including a statistical algorithm that extracts key concepts from a document collection, a word sense disambiguation algorithm that disambiguates words in the key concepts, a rule-based algorithm that extracts relations between the key concepts, and a modified generalized association rule mining algorithm that prunes unimportant relations for ontology learning. As a result, the ontologies learned by CRCTOL are more concise and contain a richer semantics in terms of the range and number of semantic relations compared with alternative systems. We present two case studies where CRCTOL is used to build a terrorism domain ontology and a sport event domain ontology. At the component level, quantitative evaluation by comparing with Text-To-Onto and its successor Text2Onto has shown that CRCTOL is able to extract concepts and semantic relations with a significantly higher level of accuracy. At the ontology level, the quality of the learned ontologies is evaluated by either employing a set of quantitative and qualitative methods including analyzing the graph structural property, comparison to WordNet, and expert rating, or directly comparing with a human-edited benchmark ontology, demonstrating the high quality of the ontologies learned. © 2010 Wiley Periodicals, Inc.


Journal ArticleDOI
TL;DR: This article identifies 17 significant constructs in these fields of information searching and information retrieval contrasting the differences and comparing the similarities and provides a framework to compare and contrast the theoretical constructs using intellectual perspective and theoretical orientation.
Abstract: In this article, we identify, compare, and contrast theoretical constructs for the fields of information searching and information retrieval to emphasize the uniqueness of and synergy between the fields. Theoretical constructs are the foundational elements that underpin a field's core theories, models, assumptions, methodologies, and evaluation metrics. We provide a framework to compare and contrast the theoretical constructs in the fields of information searching and information retrieval using intellectual perspective and theoretical orientation. The intellectual perspectives are information searching, information retrieval, and cross-cutting; and the theoretical orientations are information, people, and technology. Using this framework, we identify 17 significant constructs in these fields contrasting the differences and comparing the similarities. We discuss the impact of the interplay among these constructs for moving research forward within both fields. Although there is tension between the fields due to contradictory constructs, an examination shows a trend toward convergence. We discuss the implications for future research within the information searching and information retrieval fields. © 2010 Wiley Periodicals, Inc.

Journal IssueDOI
TL;DR: The results show that the unsupervised method, when using all attributes, performs competitively against all other methods, under both metrics, loosing only in one case against a supervised method, whose result was very close to the authors'.
Abstract: Name ambiguity in the context of bibliographic citations is a difficult problem which, despite the many efforts from the research community, still has a lot of room for improvement In this article, we present a heuristic-based hierarchical clustering method to deal with this problem The method successively fuses clusters of citations of similar author names based on several heuristics and similarity measures on the components of the citations (eg, coauthor names, work title, and publication venue title) During the disambiguation task, the information about fused clusters is aggregated providing more information for the next round of fusion In order to demonstrate the effectiveness of our method, we ran a series of experiments in two different collections extracted from real-world digital libraries and compared it, under two metrics, with four representative methods described in the literature We present comparisons of results using each considered attribute separately (ie, coauthor names, work title, and publication venue title) with the author name attribute and using all attributes together These results show that our unsupervised method, when using all attributes, performs competitively against all other methods, under both metrics, loosing only in one case against a supervised method, whose result was very close to ours Moreover, such results are achieved without the burden of any training and without using any privileged information such as knowing a priori the correct number of clusters © 2010 Wiley Periodicals, Inc