scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the Association for Information Science and Technology in 2009"


Journal IssueDOI
TL;DR: It is found that microblogting is an online tool for customer word of mouth communications and the implications for corporations using microblogging as part of their overall marketing strategy are discussed.
Abstract: In this paper we report research results investigating microblogging as a form of electronic word-of-mouth for sharing consumer opinions concerning brands. We analyzed more than 150,000 microblog postings containing branding comments, sentiments, and opinions. We investigated the overall structure of these microblog postings, the types of expressions, and the movement in positive or negative sentiment. We compared automated methods of classifying sentiment in these microblogs with manual coding. Using a case study approach, we analyzed the range, frequency, timing, and content of tweets in a corporate account. Our research findings show that 19p of microblogs contain mention of a brand. Of the branding microblogs, nearly 20p contained some expression of brand sentiments. Of these, more than 50p were positive and 33p were critical of the company or product. Our comparison of automated and manual coding showed no significant differences between the two approaches. In analyzing microblogs for structure and composition, the linguistic structure of tweets approximate the linguistic patterns of natural language expressions. We find that microblogging is an online tool for customer word of mouth communications and discuss the implications for corporations using microblogging as part of their overall marketing strategy. © 2009 Wiley Periodicals, Inc.

1,753 citations


Journal IssueDOI
TL;DR: A survey of recent advances of the automated approaches to attributing authorship is presented, examining their characteristics for both text representation and text classification.
Abstract: Authorship attribution supported by statistical or computational methods has a long history starting from the 19th century and is marked by the seminal study of Mosteller and Wallace (1964) on the authorship of the disputed “Federalist Papers.” During the last decade, this scientific field has been developed substantially, taking advantage of research advances in areas such as machine learning, information retrieval, and natural language processing. The plethora of available electronic texts (e.g., e-mail messages, online forum messages, blogs, source code, etc.) indicates a wide variety of applications of this technology, provided it is able to handle short and noisy text from multiple candidate authors. In this article, a survey of recent advances of the automated approaches to attributing authorship is presented, examining their characteristics for both text representation and text classification. The focus of this survey is on computational requirements and settings rather than on linguistic or literary issues. We also discuss evaluation methodologies and criteria for authorship attribution studies and list open questions that will attract future work in this area. © 2009 Wiley Periodicals, Inc.

1,186 citations


Journal IssueDOI
TL;DR: Three scenarios are considered here for which solutions to the basic attribution problem are inadequate; it is shown how machine learning methods can be adapted to handle the special challenges of that variant.
Abstract: Statistical authorship attribution has a long history, culminating in the use of modern machine learning classification methods. Nevertheless, most of this work suffers from the limitation of assuming a small closed set of candidate authors and essentially unlimited training text for each. Real-life authorship attribution problems, however, typically fall short of this ideal. Thus, following detailed discussion of previous work, three scenarios are considered here for which solutions to the basic attribution problem are inadequate. In the first variant, the profiling problem, there is no candidate set at all; in this case, the challenge is to provide as much demographic or psychological information as possible about the author. In the second variant, the needle-in-a-haystack problem, there are many thousands of candidates for each of whom we might have a very limited writing sample. In the third variant, the verification problem, there is no closed candidate set but there is one suspect; in this case, the challenge is to determine if the suspect is or is not the author. For each variant, it is shown how machine learning methods can be adapted to handle the special challenges of that variant. © 2009 Wiley Periodicals, Inc.

523 citations


Journal IssueDOI
TL;DR: The nested maps of science are online at .
Abstract: The decomposition of scientific literature into disciplinary and subdisciplinary structures is one of the core goals of scientometrics. How can we achieve a good decomposition? The ISI subject categories classify journals included in the Science Citation Index (SCI). The aggregated journal-journal citation matrix contained in the Journal Citation Reports can be aggregated on the basis of these categories. This leads to an asymmetrical matrix (citing versus cited) that is much more densely populated than the underlying matrix at the journal level. Exploratory factor analysis of the matrix of subject categories suggests a 14-factor solution. This solution could be interpreted as the disciplinary structure of science. The nested maps of science (corresponding to 14 factors, 172 categories, and 6,164 journals) are online at . Presumably, inaccuracies in the attribution of journals to the ISI subject categories average out so that the factor analysis reveals the main structures. The mapping of science could, therefore, be comprehensive and reliable on a large scale albeit imprecise in terms of the attribution of journals to the ISI subject categories. © 2009 Wiley Periodicals, Inc.

427 citations


Journal IssueDOI
TL;DR: This article theoretically analyze the properties of similarity measures for cooccurrence data, focusing in particular on four well-known measures: the association strength, the cosine, the inclusion index, and the Jaccard index.
Abstract: In scientometric research, the use of cooccurrence data is very common. In many cases, a similarity measure is employed to normalize the data. However, there is no consensus among researchers on which similarity measure is most appropriate for normalization purposes. In this article, we theoretically analyze the properties of similarity measures for cooccurrence data, focusing in particular on four well-known measures: the association strength, the cosine, the inclusion index, and the Jaccard index. We also study the behavior of these measures empirically. Our analysis reveals that there exist two fundamentally different types of similarity measures, namely, set-theoretic measures and probabilistic measures. The association strength is a probabilistic measure, while the cosine, the inclusion index, and the Jaccard index are set-theoretic measures. Both our theoretical and our empirical results indicate that cooccurrence data can best be normalized using a probabilistic measure. This provides strong support for the use of the association strength in scientometric research. © 2009 Wiley Periodicals, Inc.

417 citations


Journal IssueDOI
TL;DR: This research draws on longitudinal network data from an online community to examine patterns of users' behavior and social interaction, and infer the processes underpinning dynamics of system use and for a host of applications, including information diffusion, communities of practice, and the security and robustness of information systems.
Abstract: This research draws on longitudinal network data from an online community to examine patterns of users' behavior and social interaction, and infer the processes underpinning dynamics of system use. The online community represents a prototypical example of a complex evolving social network in which connections between users are established over time by online messages. We study the evolution of a variety of properties since the inception of the system, including how users create, reciprocate, and deepen relationships with one another, variations in users' gregariousness and popularity, reachability and typical distances among users, and the degree of local redundancy in the system. Results indicate that the system is a “small world” characterized by the emergence, in its early stages, of a hub-dominated structure with heterogeneity in users' behavior. We investigate whether hubs are responsible for holding the system together and facilitating information flow, examine first-mover advantages underpinning users' ability to rise to system prominence, and uncover gender differences in users' gregariousness, popularity, and local redundancy. We discuss the implications of the results for research on system use and evolving social networks, and for a host of applications, including information diffusion, communities of practice, and the security and robustness of information systems. © 2009 Wiley Periodicals, Inc.

318 citations


Journal IssueDOI
TL;DR: A systematic comparison between the Google Scholar h-index and the ISI Journal Impact Factor for a sample of 838 journals in economics and business shows that the former provides a more accurate and comprehensive measure of journal impact.
Abstract: We propose a new data source (Google Scholar) and metric (Hirsch's h-index) to assess journal impact in the field of economics and business. A systematic comparison between the Google Scholar h-index and the ISI Journal Impact Factor for a sample of 838 journals in economics and business shows that the former provides a more accurate and comprehensive measure of journal impact. © 2009 Wiley Periodicals, Inc.

317 citations


Journal IssueDOI
TL;DR: Using macrolevel bibliometric indicators to compare results obtained from the WoS and Scopus provides evidence that indicators of scientific production and citations at the country level are stable and largely independent of the database.
Abstract: For more than 40 years, the Institute for Scientific Information (ISI, now part of Thomson Reuters) produced the only available bibliographic databases from which bibliometricians could compile large-scale bibliometric indicators. ISI's citation indexes, now regrouped under the Web of Science (WoS), were the major sources of bibliometric data until 2004, when Scopus was launched by the publisher Reed Elsevier. For those who perform bibliometric analyses and comparisons of countries or institutions, the existence of these two major databases raises the important question of the comparability and stability of statistics obtained from different data sources. This paper uses macrolevel bibliometric indicators to compare results obtained from the WoS and Scopus. It shows that the correlations between the measures obtained with both databases for the number of papers and the number of citations received by countries, as well as for their ranks, are extremely high (R2 a .99). There is also a very high correlation when countries' papers are broken down by field. The paper thus provides evidence that indicators of scientific production and citations at the country level are stable and largely independent of the database. © 2009 Wiley Periodicals, Inc.

303 citations


Journal IssueDOI
TL;DR: It is found that in the author co-citation network, citation rank is highly correlated with PageRank with different damping factors and also with different weighted PageRank algorithms; citation rank and PageRank are not significantly correlated with centrality measures; and h-index rank does not significantly correlate with centraly measures but does significantly correlates with other measures.
Abstract: This paper studies how varied damping factors in the PageRank algorithm influence the ranking of authors and proposes weighted PageRank algorithms. We selected the 108 most highly cited authors in the information retrieval (IR) area from the 1970s to 2008 to form the author co-citation network. We calculated the ranks of these 108 authors based on PageRank with the damping factor ranging from 0.05 to 0.95. In order to test the relationship between different measures, we compared PageRank and weighted PageRank results with the citation ranking, h-index, and centrality measures. We found that in our author co-citation network, citation rank is highly correlated with PageRank with different damping factors and also with different weighted PageRank algorithms; citation rank and PageRank are not significantly correlated with centrality measures; and h-index rank does not significantly correlate with centrality measures but does significantly correlate with other measures. The key factors that have impact on the PageRank of authors in the author co-citation network are being co-cited with important authors. © 2009 Wiley Periodicals, Inc.

301 citations


Journal IssueDOI
Erjia Yan1, Ying Ding1
TL;DR: It is found that the four centrality measures are significantly correlated with citation counts and it is suggested thatcentrality measures can be useful indicators for impact analysis.
Abstract: Many studies on coauthorship networks focus on network topology and network statistical mechanics. This article takes a different approach by studying micro-level network properties with the aim of applying centrality measures to impact analysis. Using coauthorship data from 16 journals in the field of library and information science (LIS) with a time span of 20 years (1988–2007), we construct an evolving coauthorship network and calculate four centrality measures (closeness centrality, betweenness centrality, degree centrality, and PageRank) for authors in this network. We find that the four centrality measures are significantly correlated with citation counts. We also discuss the usability of centrality measures in author ranking and suggest that centrality measures can be useful indicators for impact analysis. © 2009 Wiley Periodicals, Inc.

294 citations


Journal IssueDOI
TL;DR: This paper found that the average number of citations in reference lists has increased gradually and this is the predominant factor responsible for the inflation of impact factor scores over time, and moreover impact factors vary widely across academic disciplines.
Abstract: The bibliometric measure impact factor is a leading indicator of journal influence, and impact factors are routinely used in making decisions ranging from selecting journal subscriptions to allocating research funding to deciding tenure cases. Yet journal impact factors have increased gradually over time, and moreover impact factors vary widely across academic disciplines. Here we quantify inflation over time and differences across fields in impact factor scores and determine the sources of these differences. We find that the average number of citations in reference lists has increased gradually, and this is the predominant factor responsible for the inflation of impact factor scores over time. Field-specific variation in the fraction of citations to literature indexed by Thomson Scientific's Journal Citation Reports is the single greatest contributor to differences among the impact factors of journals in different fields. The growth rate of the scientific literature as a whole, and cross-field differences in net size and growth rate of individual fields, have had very little influence on impact factor inflation or on cross-field differences in impact factor. © 2009 Wiley Periodicals, Inc.

Journal IssueDOI
TL;DR: This study test the results of two recently available algorithms for the decomposition of large matrices against two content-based classifications of journals: the ISI Subject Categories and the field-subfield classification of Glanzel and Schubert (2003).
Abstract: The aggregated journal-journal citation matrix—based on the Journal Citation Reports (JCR) of the Science Citation Index—can be decomposed by indexers or algorithmically. In this study, we test the results of two recently available algorithms for the decomposition of large matrices against two content-based classifications of journals: the ISI Subject Categories and the field-subfield classification of Glanzel and Schubert (2003). The content-based schemes allow for the attribution of more than a single category to a journal, whereas the algorithms maximize the ratio of within-category citations over between-category citations in the aggregated category-category citation matrix. By adding categories, indexers generate between-category citations, which may enrich the database, for example, in the case of inter-disciplinary developments. Algorithmic decompositions, on the other hand, are more heavily skewed towards a relatively small number of categories, while this is deliberately counter-acted upon in the case of content-based classifications. Because of the indexer effects, science policy studies and the sociology of science should be careful when using content-based classifications, which are made for bibliographic disclosure, and not for the purpose of analyzing latent structures in scientific communications. Despite the large differences among them, the four classification schemes enable us to generate surprisingly similar maps of science at the global level. Erroneous classifications are cancelled as noise at the aggregate level, but may disturb the evaluation locally. © 2009 Wiley Periodicals, Inc.

Journal IssueDOI
TL;DR: The circular map of science is found to have a high level of correspondence with the 20 existing maps, and has a variety of advantages over hierarchical and centric forms.
Abstract: A consensus map of science is generated from an analysis of 20 existing maps of science. These 20 maps occur in three basic forms: hierarchical, centric, and noncentric (or circular). The consensus map, generated from consensus edges that occur in at least half of the input maps, emerges in a circular form. The ordering of areas is as follows: mathematics is (arbitrarily) placed at the top of the circle, and is followed clockwise by physics, physical chemistry, engineering, chemistry, earth sciences, biology, biochemistry, infectious diseases, medicine, health services, brain research, psychology, humanities, social sciences, and computer science. The link between computer science and mathematics completes the circle. If the lowest weighted edges are pruned from this consensus circular map, a hierarchical map stretching from mathematics to social sciences results. The circular map of science is found to have a high level of correspondence with the 20 existing maps, and has a variety of advantages over hierarchical and centric forms. A one-dimensional Riemannian version of the consensus map is also proposed. © 2009 Wiley Periodicals, Inc.

Journal IssueDOI
TL;DR: A rule-based approach including two phases: determining each sentence's sentiment based on word dependency, and aggregating sentences to predict the document sentiment is proposed to address the unique challenges posed by Chinese sentiment analysis.
Abstract: User-generated content on the Web has become an extremely valuable source for mining and analyzing user opinions on any topic. Recent years have seen an increasing body of work investigating methods to recognize favorable and unfavorable sentiments toward specific subjects from online text. However, most of these efforts focus on English and there have been very few studies on sentiment analysis of Chinese content. This paper aims to address the unique challenges posed by Chinese sentiment analysis. We propose a rule-based approach including two phases: (1) determining each sentence's sentiment based on word dependency, and (2) aggregating sentences to predict the document sentiment. We report the results of an experimental study comparing our approach with three machine learning-based approaches using two sets of Chinese articles. These results illustrate the effectiveness of our proposed method and its advantages against learning-based approaches. © 2009 Wiley Periodicals, Inc.

Journal IssueDOI
TL;DR: For instance, the authors reported an exploratory study of the similarity between the reported attributes of pairs of active MySpace Friends based upon a systematic sample of 2,567 members joining on June 18, 2007 and Friends who commented on their profile.
Abstract: Social network sites like MySpace are increasingly important environments for expressing and maintaining interpersonal connections, but does online communication exacerbate or ameliorate the known tendency for offline friendships to form between similar people (homophily)q This article reports an exploratory study of the similarity between the reported attributes of pairs of active MySpace Friends based upon a systematic sample of 2,567 members joining on June 18, 2007 and Friends who commented on their profile. The results showed no evidence of gender homophily but significant evidence of homophily for ethnicity, religion, age, country, marital status, attitude towards children, sexual orientation, and reason for joining MySpace. There were also some imbalances: women and the young were disproportionately commenters, and commenters tended to have more Friends than commentees. Overall, it seems that although traditional sources of homophily are thriving in MySpace networks of active public connections, gender homophily has completely disappeared. Finally, the method used has wide potential for investigating and partially tracking homophily in society, providing early warning of socially divisive trends. © 2009 Wiley Periodicals, Inc.

Journal IssueDOI
TL;DR: This article explored college students' perceptions, uses of, and motivations for using Wikipedia, and to understand their information behavior concerning Wikipedia based on social cognitive theory (SCT), and found that approximately one-third of the students reported using Wikipedia for academic purposes.
Abstract: The purposes of this study were to explore college students' perceptions, uses of, and motivations for using Wikipedia, and to understand their information behavior concerning Wikipedia based on social cognitive theory (SCT). A Web survey was used to collect data in the spring of 2008. The study sample consisted of students from an introductory undergraduate course at a large public university in the midwestern United States. A total of 134 students participated in the study, resulting in a 32.8p response rate. The major findings of the study include the following: Approximately one-third of the students reported using Wikipedia for academic purposes. The students tended to use Wikipedia for quickly checking facts and finding background information. They had positive past experiences with Wikipedia; however, interestingly, their perceptions of its information quality were not correspondingly high. The level of their confidence in evaluating Wikipedia's information quality was, at most, moderate. Respondents' past experience with Wikipedia, their positive emotional state, their disposition to believe information in Wikipedia, and information utility were positively related to their outcome expectations of Wikipedia. However, among the factors affecting outcome expectations, only information utility and respondents' positive emotions toward Wikipedia were related to their use of it. Further, when all of the independent variables, including the mediator, outcome expectations, were considered, only the variable information utility was related to Wikipedia use, which may imply a limited applicability of SCT to understanding Wikipedia use. However, more empirical evidence is needed to determine the applicability of this theory to Wikipedia use. Finally, this study supports the knowledge value of Wikipedia (Fallis, [2008]), despite students' cautious attitudes toward Wikipedia. The study suggests that educators and librarians need to provide better guidelines for using Wikipedia, rather than prohibiting Wikipedia use altogether. © 2009 Wiley Periodicals, Inc.

Journal IssueDOI
TL;DR: These new indicators for evaluating journals available than the traditional impact factor, cited half-life, and immediacy index of the ISI are compared with one another and with the older ones.
Abstract: The launching of Scopus and Google Scholar, and methodological developments in social-network analysis have made many more indicators for evaluating journals available than the traditional impact factor, cited half-life, and immediacy index of the ISI. In this study, these new indicators are compared with one another and with the older ones. Do the various indicators measure new dimensions of the citation networks, or are they highly correlated among themselves? Are they robust and relatively stable over time? Two main dimensions are distinguished—size and impact—which together shape influence. The h-index combines the two dimensions and can also be considered as an indicator of reach (like Indegree). PageRank is mainly an indicator of size, but has important interactions with centrality measures. The Scimago Journal Ranking (SJR) indicator provides an alternative to the journal impact factor, but the computation is less easy. © 2009 Wiley Periodicals, Inc.

Journal IssueDOI
TL;DR: The relation between Pearson's correlation coefficient and Salton's cosine measure is revealed based on the different possible values of the division of the L1-norm and the L2-norm of a vector, resulting in a sheaf of increasingly straight lines which together form a cloud of points, being the investigated relation.
Abstract: The relation between Pearson's correlation coefficient and Salton's cosine measure is revealed based on the different possible values of the division of the L1-norm and the L2-norm of a vector. These different values yield a sheaf of increasingly straight lines which together form a cloud of points, being the investigated relation. The theoretical results are tested against the author co-citation relations among 24 informetricians for whom two matrices can be constructed, based on co-citations: the asymmetric occurrence matrix and the symmetric co-citation matrix. Both examples completely confirm the theoretical results. The results enable us to specify an algorithm that provides a threshold value for the cosine above which none of the corresponding Pearson correlations would be negative. Using this threshold value can be expected to optimize the visualization of the vector space. © 2009 Wiley Periodicals, Inc.

Journal IssueDOI
TL;DR: Direct citation, which could detect large and young emerging clusters earlier, shows the best performance in detecting a research front, and co-citation shows the worst, which suggests that the content similarity of papers connected by direct citations is the greatest and that direct citation networks have the least risk of missing emerging research domains.
Abstract: In this article, we performed a comparative study to investigate the performance of methods for detecting emerging research fronts. Three types of citation network, co-citation, bibliographic coupling, and direct citation, were tested in three research domains, gallium nitride (GaN), complex network (CNW), and carbon nanotube (CNT). Three types of citation network were constructed for each research domain, and the papers in those domains were divided into clusters to detect the research front. We evaluated the performance of each type of citation network in detecting a research front by using the following measures of papers in the cluster: visibility, measured by normalized cluster size, speed, measured by average publication year, and topological relevance, measured by density. Direct citation, which could detect large and young emerging clusters earlier, shows the best performance in detecting a research front, and co-citation shows the worst. Additionally, in direct citation networks, the clustering coefficient was the largest, which suggests that the content similarity of papers connected by direct citations is the greatest and that direct citation networks have the least risk of missing emerging research domains because core papers are included in the largest component. © 2009 Wiley Periodicals, Inc.

Journal IssueDOI
TL;DR: In this article, the authors used Japanese publication data for the 1981-2004 period to study international coauthorship relations and university-industry-government (Triple Helix) relations and showed that the Japanese Triple Helix system has been continuously eroded at the national level.
Abstract: International co-authorship relations and university–industry–government (Triple Helix) relations have hitherto been studied separately. Using Japanese publication data for the 1981–2004 period, we were able to study both kinds of relations in a single design. In the Japanese file, 1,277,030 articles with at least one Japanese address were attributed to the three sectors, and we know additionally whether these papers were coauthored internationally. Using the mutual information in three and four dimensions, respectively, we show that the Japanese Triple-Helix system has been continuously eroded at the national level. However, since the mid-1990s, international coauthorship relations have contributed to a reduction of the uncertainty at the national level. In other words, the national publication system of Japan has developed a capacity to retain surplus value generated internationally. In a final section, we compare these results with an analysis based on similar data for Canada. A relative uncoupling of national university–industry–government relations because of international collaborations is indicated in both countries. © 2009 Wiley Periodicals, Inc.

Journal IssueDOI
TL;DR: Results show that Reformulation and Assistance account for approximately 45p of all query reformulations; furthermore, the results demonstrate that the first- and second-order models provide the best predictability, between 28 and 40p overall and higher than 70p for some patterns.
Abstract: Query reformulation is a key user behavior during Web search. Our research goal is to develop predictive models of query reformulation during Web searching. This article reports results from a study in which we automatically classified the query-reformulation patterns for 964,780 Web searching sessions, composed of 1,523,072 queries, to predict the next query reformulation. We employed an n-gram modeling approach to describe the probability of users transitioning from one query-reformulation state to another to predict their next state. We developed first-, second-, third-, and fourth-order models and evaluated each model for accuracy of prediction, coverage of the dataset, and complexity of the possible pattern set. The results show that Reformulation and Assistance account for approximately 45p of all query reformulations; furthermore, the results demonstrate that the first- and second-order models provide the best predictability, between 28 and 40p overall and higher than 70p for some patterns. Implications are that the n-gram approach can be used for improving searching systems and searching assistance. © 2009 Wiley Periodicals, Inc.

Journal IssueDOI
TL;DR: In this article, the authors examined the determinants of service quality and continuance intention of online services and empirically tested a model with both service and technology characteristics as the main drivers for service quality.
Abstract: This article examines the determinants of service quality and continuance intention of online services. We proposed and empirically tested a model with both service and technology characteristics as the main drivers of service quality and subsequent continuance intention of eTax, an electronic government (eGovernment) service that enables citizens to file their taxes online. Our data were collected via a two-stage longitudinal online survey of 518 participants before and after they made use of the eTax service in Hong Kong. The results showed that both service characteristics (i.e., security and convenience) and one of the technology characteristics (i.e., perceived usefulness, but not perceived ease of use) were the key determinants of service quality. Another interesting and important finding that runs counter to the vast body of empirical evidence on predicting intention is that perceived usefulness was not the strongest predictor of continuance intention but rather service quality was. To provide a richer picture of these relationships, we also conducted a post-hoc analysis of the effects of service and technology characteristics on the individual dimensions of service quality and their subsequent impact on continuance intention and found assurance and reliability to be the only significant predictors of continuance intention. We present implications for research and practice related to online services. © 2009 Wiley Periodicals, Inc.


Journal IssueDOI
TL;DR: The authors examined the criteria questioners use to select the best answers in a social QA Answers within the theoretical framework of relevance research and found that socio-emotional criteria are popular in discussion-oriented categories, content-oriented criteria in topicoriented categories and utility criteria in self-help categories.
Abstract: This study examines the criteria questioners use to select the best answers in a social QA Answers) within the theoretical framework of relevance research. A social QA Answers, the questioner selects the answer that best satisfies his or her question and leaves comments on it. Under the assumption that the comments reflect the reasons why questioners select particular answers as the best, this study analyzed 2,140 comments collected from Yahoo! Answers during December 2007. The content analysis identified 23 individual relevance criteria in six classes: Content, Cognitive, Utility, Information Sources, Extrinsic, and Socioemotional. A major finding is that the selection criteria used in a social Q&A site have considerable overlap with many relevance criteria uncovered in previous relevance studies, but that the scope of socio-emotional criteria has been expanded to include the social aspect of this environment. Another significant finding is that the relative importance of individual criteria varies according to topic categories. Socioemotional criteria are popular in discussion-oriented categories, content-oriented criteria in topic-oriented categories, and utility criteria in self-help categories. This study generalizes previous relevance studies to a new environment by going beyond an academic setting. © 2009 Wiley Periodicals, Inc. The authors contributed equally to this work.

Journal IssueDOI
TL;DR: The Blogging Privacy Management Measure (BPMM) is a multidimensional, valid, and reliable construct and could explore the influence of family values about privacy on blogging privacy rule management.
Abstract: This study applied Communication Privacy Management (CPM) theory to the context of blogging and developed a validated, theory-based measure of blogging privacy management. Across three studies, 823 college student bloggers completed an online survey. In study one (n = 176), exploratory and confirmatory factor analysis techniques tested four potential models. Study two (n = 291) cross-validated the final factor structure obtained in the fourth model with a separate sample. Study three (n = 356) tested the discriminant and predictive validity of the measure by comparing it to the self-consciousness scale. The Blogging Privacy Management Measure (BPMM) is a multidimensional, valid, and reliable construct. Future research could explore the influence of family values about privacy on blogging privacy rule management. © 2009 Wiley Periodicals, Inc.

Journal IssueDOI
TL;DR: It is argued that the best understanding and classification of theories of concepts is to view and classify them in accordance with epistemological theories (empiricism, rationalism, historicism, and pragmatism) and that the historicist and pragmatic understandings of concepts are the most fruitful views.
Abstract: It cannot be overemphasized that changes in concepts have far more impact than new discoveries © 2009 Wiley Periodicals, Inc. (Mayr, 1997, p. 98) Concept theory is an extremely broad, interdisciplinary and complex field of research related to many deep fields with very long historical traditions without much consensus. However, information science and knowledge organization cannot avoid relating to theories of concepts. Knowledge organizing systems (e.g., classification systems, thesauri, and ontologies) should be understood as systems basically organizing concepts and their semantic relations. The same is the case with information retrieval systems. Different theories of concepts have different implications for how to construe, evaluate, and use such systems. Based on “a post-Kuhnian view” of paradigms, this article put forward arguments that the best understanding and classification of theories of concepts is to view and classify them in accordance with epistemological theories (empiricism, rationalism, historicism, and pragmatism). It is also argued that the historicist and pragmatist understandings of concepts are the most fruitful views and that this understanding may be part of a broader paradigm shift that is also beginning to take place in information science. The importance of historicist and pragmatic theories of concepts for information science is outlined.

Journal IssueDOI
TL;DR: The libcitation count is presented, a count of the libraries holding a given book, as reported in a national or international union catalog, for book-oriented fields and a match-up between the departments of history, philosophy, and political science at the University of New South Wales and theUniversity of Sydney in Australia is imagined.
Abstract: Bibliometric measures for evaluating research units in the book-oriented humanities and social sciences are underdeveloped relative to those available for journal-oriented science and technology. We therefore present a new measure designed for book-oriented fields: the “libcitation count.” This is a count of the libraries holding a given book, as reported in a national or international union catalog. As librarians decide what to acquire for the audiences they serve, they jointly constitute an instrument for gauging the cultural impact of books. Their decisions are informed by knowledge not only of audiences but also of the book world (e.g., the reputations of authors and the prestige of publishers). From libcitation counts, measures can be derived for comparing research units. Here, we imagine a match-up between the departments of history, philosophy, and political science at the University of New South Wales and the University of Sydney in Australia. We chose the 12 books from each department that had the highest libcitation counts in the Libraries Australia union catalog during 2000 to 2006. We present each book's raw libcitation count, its rank within its Library of Congress (LC) class, and its LC-class normalized libcitation score. The latter is patterned on the item-oriented field normalized citation score used in evaluative bibliometrics. Summary statistics based on these measures allow the departments to be compared for cultural impact. Our work has implications for programs such as Excellence in Research for Australia and the Research Assessment Exercise in the United Kingdom. It also has implications for data mining in OCLC's WorldCat. © 2009 Wiley Periodicals, Inc.

Journal IssueDOI
TL;DR: Analysis of changes in the concentration of citations received by papers published between 1900 and 2005 shows that, contrary to what was reported by Evans, the dispersion of citations is actually increasing.
Abstract: This article challenges recent research (Evans, 2008) reporting that the concentration of cited scientific literature increases with the online availability of articles and journals. Using Thomson Reuters' Web of Science, the present article analyses changes in the concentration of citations received (2- and 5-year citation windows) by papers published between 1900 and 2005. Three measures of concentration are used: the percentage of papers that received at least one citation (cited papers); the percentage of papers needed to account for 20%, 50%, and 80% of the citations; and the Herfindahl-Hirschman index (HHI). These measures are used for four broad disciplines: natural sciences and engineering, medical fields, social sciences, and the humanities. All these measures converge and show that, contrary to what was reported by Evans, the dispersion of citations is actually increasing. © 2009 Wiley Periodicals, Inc.

Journal IssueDOI
TL;DR: It was showed that consumers may lack the motivation or literacy skills to evaluate the information quality of health Web pages, which suggests the need to develop accessible automatic information quality evaluation tools and ontologies.
Abstract: This article describes a model for online consumer health information consisting of five quality criteria constructs. These constructs are grounded in empirical data from the perspectives of the three main sources in the communication process: health information providers, consumers, and intermediaries, such as Web directory creators and librarians, who assist consumers in finding healthcare information. The article also defines five constructs of Web page structural markers that could be used in information quality evaluation and maps these markers to the quality criteria. Findings from correlation analysis and multinomial logistic tests indicate that use of the structural markers depended significantly on the type of Web page and type of information provider. The findings suggest the need to define genre-specific templates for quality evaluation and the need to develop models for an automatic genre-based classification of health information Web pages. In addition, the study showed that consumers may lack the motivation or literacy skills to evaluate the information quality of health Web pages, which suggests the need to develop accessible automatic information quality evaluation tools and ontologies. © 2009 Wiley Periodicals, Inc.

Journal IssueDOI
TL;DR: This article presents the results of its attempt at the recognition and extraction of the 10 most important categories of named entities in Arabic script: the person name, location, company, date, time, price, measurement, phone number, ISBN, and file name.
Abstract: Name identification has been worked on quite intensively for the past few years, and has been incorporated into several products revolving around natural language processing tasks. Many researchers have attacked the name identification problem in a variety of languages, but only a few limited research efforts have focused on named entity recognition for Arabic script. This is due to the lack of resources for Arabic named entities and the limited amount of progress made in Arabic natural language processing in general. In this article, we present the results of our attempt at the recognition and extraction of the 10 most important categories of named entities in Arabic script: the person name, location, company, date, time, price, measurement, phone number, ISBN, and file name. We developed the system Named Entity Recognition for Arabic (NERA) using a rule-based approach. The resources created are: a Whitelist representing a dictionary of names, and a grammar, in the form of regular expressions, which are responsible for recognizing the named entities. A filtration mechanism is used that serves two different purposes: (a) revision of the results from a named entity extractor by using metadata, in terms of a Blacklist or rejecter, about ill-formed named entities and (b) disambiguation of identical or overlapping textual matches returned by different name entity extractors to get the correct choice. In NERA, we addressed major challenges posed by NER in the Arabic language arising due to the complexity of the language, peculiarities in the Arabic orthographic system, nonstandardization of the written text, ambiguity, and lack of resources. NERA has been effectively evaluated using our own tagged corpus; it achieved satisfactory results in terms of precision, recall, and F-measure. © 2009 Wiley Periodicals, Inc.