scispace - formally typeset
Search or ask a question

Showing papers presented at "International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management in 2014"


Book ChapterDOI
21 Oct 2014
TL;DR: This work proposes a more straightforward approach based on nearest centroid classification: profiles of topic categories are extracted from the source domain and are then adapted by iterative refining steps using most similar documents in the target domain, obtaining accuracy measures better or comparable to other methods.
Abstract: In cross-domain text classification, topic labels for documents of a target domain are predicted by leveraging knowledge of labeled documents of a source domain, having equal or similar topics with possibly different words. Existing methods either adapt documents of the source domain to the target or represent both domains in a common space. These methods are mostly based on advanced statistical techniques and often require tuning of parameters in order to obtain optimal performances. We propose a more straightforward approach based on nearest centroid classification: profiles of topic categories are extracted from the source domain and are then adapted by iterative refining steps using most similar documents in the target domain. Experiments on common benchmark datasets show that this approach, despite its simplicity, obtains accuracy measures better or comparable to other methods, obtained with fixed empirical values for its few parameters.

25 citations


Proceedings ArticleDOI
21 Oct 2014
TL;DR: This research concerns the development of web content detection systems that will be able to automatically classify any web page into pre-defined content categories, and makes use of tf-idf weighted n-grams in building the content classification models.
Abstract: This research concerns the development of web content detection systems that will be able to automatically classify any web page into pre-defined content categories. Our work is motivated by practical experience and observations that certain categories of web pages, such as those that contain hatred and violence, are much harder to classify with good accuracy when both content and structural features are already taken into account. To further improve the performance of detection systems, we bring web sentiment features into classification models. In addition, we incorporate n-gram representation into our classification approach, based on the assumption that n-grams can capture more local context information in text, and thus could help to enhance topic similarity analysis. Different from most studies that only consider presence or frequency count of n-grams in their applications, we make use of tf-idf weighted n-grams in building the content classification models. Our result shows that unigram based models, even though a much simpler approach, show their unique value and effectiveness in web content classification. Higher order n-gram based approaches, especially 5-gram based models that combine topic similarity features with sentiment features, bring significant improvement in precision levels for the Violence and two Racism related web categories.

24 citations


Proceedings ArticleDOI
21 Oct 2014
TL;DR: A random perturbation method of the training set is proposed, which creates a new annotation matrix to be used to train the model to recognize new annotations, and demonstrated that the technique is able to improve novel annotation predictions with respect to state of the art unsupervised methods.
Abstract: Genomic annotations describing functional features of genes and proteins through controlled terminologies and ontologies are extremely valuable, especially for computational analyses aimed at inferring new biomedical knowledge. Thanks to the biology revolution led by the introduction of the novel DNA sequencing technologies, several repositories of such annotations have becoming available in the last decade; among them, the ones including Gene Ontology annotations are the most relevant. Nevertheless, the available set of genomic annotations is incomplete, and only some of the available annotations represent highly reliable human curated information. In this paper we propose a novel representation of the annotation discovery problem, so as to enable applying supervised algorithms to predict Gene Ontology annotations of different organism genes. In order to use supervised algorithms despite labeled data to train the prediction model are not available, we propose a random perturbation method of the training set, which creates a new annotation matrix to be used to train the model to recognize new annotations. We tested the effectiveness of our approach on nine Gene Ontology annotation datasets. Obtained results demonstrated that our technique is able to improve novel annotation predictions with respect to state of the art unsupervised methods.

21 citations


Proceedings ArticleDOI
21 Oct 2014
TL;DR: This paper shows how cubes can be represented in graphs and how these structures can be used for graph OLAP queries to support decision making and provides an original model for managing cubes into NoSQL graphs.
Abstract: OLAP is a leading technology for analysing data and decision making. It helps the users to discover relevant information from large databases. Graph OLAP has been studied for several years in the OLAP framework. In existing work, the authors study how to import graph data into OLAP cube models but no work has explored yet the feasability to exploit graph structures to store analytical data. As graph databases are more and more used through NoSQL implementations (e.g., social and biological networks), in this paper we aim at providing an original model for managing cubes into NoSQL graphs. We show how cubes can be represented in graphs and how these structures can then be used for graph OLAP queries to support decision making.

19 citations


Proceedings ArticleDOI
21 Oct 2014
TL;DR: A novel dynamic coherence-based approach that analyzes the information stored in the user profiles based on their coherence to identify and remove from the previously evaluated items those not adherent to the average preferences in order to make a user profile as close as possible to the user’s real tastes.
Abstract: Recommender systems usually produce their results to the users based on the interpretation of the whole historic interactions of these. This canonical approach sometimes could lead to wrong results due to several factors, such as a changes in user taste over time or the use of her/his account by third parties. This work proposes a novel dynamic coherence-based approach that analyzes the information stored in the user profiles based on their coherence. The main aim is to identify and remove from the previously evaluated items those not adherent to the average preferences, in order to make a user profile as close as possible to the user’s real tastes. The conducted experiments show the effectiveness of our approach to remove the incoherent items from a user profile, increasing the recommendation accuracy.

18 citations


Proceedings ArticleDOI
21 Oct 2014
TL;DR: This work presents a simpler method based on creating explicit representations of topic categories, which can be compared for similarity to the ones of documents, and obtains results better or comparable to other methods, obtained with fixed empirical values for its few parameters.
Abstract: Cross-domain text classification deals with predicting topic labels for documents in a target domain by leveraging knowledge from pre-labeled documents in a source domain, with different terms or different distributions thereof. Methods exist to address this problem by re-weighting documents from the source domain to transfer them to the target one or by finding a common feature space for documents of both domains; they often require the combination of complex techniques, leading to a number of parameters which must be tuned for each dataset to yield optimal performances. We present a simpler method based on creating explicit representations of topic categories, which can be compared for similarity to the ones of documents. Categories representations are initially built from relevant source documents, then are iteratively refined by considering the most similar target documents, with relatedness being measured by a simple regression model based on cosine similarity, built once at the begin. This expectedly leads to obtain accurate representations for categories in the target domain, used to classify documents therein. Experiments on common benchmark text collections show that this approach obtains results better or comparable to other methods, obtained with fixed empirical values for its few parameters.

18 citations


Book ChapterDOI
21 Oct 2014
TL;DR: Methods proposed for this task, for example, the all-grams approach which extracts all possible sub-strings as features, provide reasonable accuracy but do not scale well to large datasets.
Abstract: There are some situations these days in which it is important to have an efficient and reliable classification of a web-page from the information contained in the Uniform Resource Locator (URL) only, without the need to visit the page itself. For example, a social media website may need to quickly identify status updates linking to malicious websites to block them. The URL is very concise, and may be composed of concatenated words so classification with only this information is a very challenging task. Methods proposed for this task, for example, the all-grams approach which extracts all possible sub-strings as features, provide reasonable accuracy but do not scale well to large datasets.

15 citations


Proceedings ArticleDOI
21 Oct 2014
TL;DR: A metaphor of diffusion, borrowed from genetics, and a four phase model is proposed, which aims to be as simple as the SECI model proposed within the OKCT, but also more comprehensive and sociologically-informed.
Abstract: In this paper we present an original position on how knowledge is created and shared in organizational domains. We propose a metaphor of diffusion, borrowed from genetics, and a four phase model, which aims to be as simple as the SECI model proposed within the OKCT, but also more comprehensive and sociologically-informed. Our model takes into account the individual, social and cultural dimensions of knowledge (what we denote as co-knowledge) to account for the various ways knowledge is “circulated” among people (i.e., members of any social structure); we also propose ancillary concepts like that of “Knowing Community” and “Knowledge Artifact”, as analytical constructs to represent, respectively, the environment hosting such a circulation and the technological driver that either enables or supports it.

14 citations


Proceedings ArticleDOI
21 Oct 2014
TL;DR: This study presents two data mining approaches to predict LOS in an ICU using admission data and supplementary clinical data of the patient collected in real-time to consider the most recent patient condition when the model is induced.
Abstract: Nowadays the efficiency of costs and resources planning in hospitals embody a critical role in the management of these units. Length Of Stay (LOS) is a good metric when the goal is to decrease costs and to optimize resources. In Intensive Care Units (ICU) optimization assumes even a greater importance derived from the high costs associated to inpatients. This study presents two data mining approaches to predict LOS in an ICU. The first approach considered the admission variables and some other physiologic variables collected during the first 24 hours of inpatient. The second approach considered admission data and supplementary clinical data of the patient (vital signs and laboratory results) collected in real-time. The results achieved in the first approach are very poor (accuracy of 73 %). However, when the prediction is made using the data collected in real-time, the results are very interesting (sensitivity of 96.104%). The models induced in second experiment are sensitive to the patient clinical situation and can predict LOS according to the monitored variables. Models for predicting LOS at admission are not suited to the ICU particularities. Alternatively, they should be induced in real-time, using online-learning and considering the most recent patient condition when the model is induced.

13 citations


Proceedings ArticleDOI
21 Oct 2014
TL;DR: Light is shed on the multiple ways these ideas can inform the reification of knowledge into particular IT artifacts, which the paper calls IT Knowledge Artifact (ITKA), and on how seemingly irreconcilable positions can contribute in the design of these computational artifact supporting knowledge work in organizations.
Abstract: Knowledge Artifact (KA) is an analytical construct by which analysts, researchers and designers from different disciplines usually denote those material objects that in organizations regard the creation, use, sharing and representation of knowledge. This paper aims to fill a gap in the existing literature by providing a conceptual framework for the interpretation of the heterogeneous contributions on this concept in the specialist literature. From our survey of the main contributions to the definition of this concept, we outline a spectrum of stances laying between two theoretical extremes: we denote one pole “representational”, as it is grounded on the idea that knowledge can be an “object per se”; and the other pole “socially situated”, as it builds on the viewpoint seeing knowledge as a social practice, that is an epiphenomenon of a situated, context-dependent and performative interaction of human actors through and with “objects of knowing”. In proposing a unifying model to gather complementary dimensions of knowledge together, our aim is to shed light on the multiple ways these ideas can inform the “reification” of knowledge into particular IT artifacts, which we call IT Knowledge Artifact (ITKA), and on how seemingly irreconcilable positions can contribute in the design of these computational artifact supporting knowledge work in organizations.

12 citations


Proceedings ArticleDOI
21 Oct 2014
TL;DR: This paper presents an architecture of a BI platform on a local government organization, geared towards the improvement of citizen offered services quality and efficiency maximization, thus contributing for cost reduction to the taxpayer.
Abstract: The use of business intelligence (BI) systems by organizations is increasingly considered as an asset, which goal is to provide access to information in a timely manner in order to support the decision-making process However, in specific cases such as local government organizations, there are very specific challenges Some of them like privacy rights and applicable law compliance must be carefully observed, making the necessary adaptations of these BI solutions The developed solution brings some important contributions and represents some advances in the eGovernment context applied to local governments where the information is normally used/stored are not normalized and pre-defined Being this a big barrier to development of this type of solutions, the developed architecture is prepared to improve the data quality and avoid this type of mistakes This paper presents an architecture of a BI platform on a local government organization, geared towards the improvement of citizen offered services quality and efficiency maximization, thus contributing for cost reduction to the taxpayer

Proceedings ArticleDOI
21 Oct 2014
TL;DR: This work investigates the most popular document ("tweet") representation methods which feed sentiment evaluation mechanisms and demonstrates the superiority of learning-based methods and in particular of n-gram graphs approaches for predicting the sentiment of tweets.
Abstract: This work extends the set of works which deal with the popular problem of sentiment analysis in Twitter. It investigates the most popular document ("tweet") representation methods which feed sentiment evaluation mechanisms. In particular, we study the bag-of-words, n-grams and n-gram graphs approaches and for each of them we evaluate the performance of a lexicon-based and 7 learning-based classification algorithms (namely SVM, Naive Bayesian Networks, Logistic Regression, Multilayer Perceptrons, Best-First Trees, Functional Trees and C4.5) as well as their combinations, using a set of 4451 manually annotated tweets. The results demonstrate the superiority of learning-based methods and in particular of n-gram graphs approaches for predicting the sentiment of tweets. They also show that the combinatory approach has impressive effects on n-grams, raising the confidence up to 83.15% on the 5-Grams, using majority vote and a balanced dataset (equal number of positive, negative and neutral tweets for training). In the n-gram graph cases the improvement was small to none, reaching 94.52% on the 4-gram graphs, using Orthodromic distance and a threshold of 0.001.

Proceedings ArticleDOI
21 Oct 2014
TL;DR: A semantic model of traces is proposed and classified traces are analyzed by means of TF-IDF to offer users recommendations and decision aid to improve collaboration.
Abstract: Collaboration allows integrating intellectual resources and knowledge from all participants in order to achieve individual or collective goals With the help of informational environments, we can better organize, realize and record collaboration During interactions among users in such environments, each activity produces a set of traces Such traces are recorded and classified, based on a model of traces and can be exploited to improve collaboration In this article, we propose a semantic model of traces and analyze classified traces by means of TF-IDF We exploit the results to offer users recommendations and decision aid

Book ChapterDOI
21 Oct 2014
TL;DR: An approach that enhances an existing forum by introducing a navigation structure that enables searching and navigating the forum content by topics of discussion, and an implementation of the topic-driven content search and navigation and assisted posting forum enhancement approaches for the Moodle learning management system are presented.
Abstract: Online forums represent nowadays one of the most popular and rich repository of user generated information over the Internet. Searching information of interest in an online forum may be substantially improved by a proper organization of the forum content. With this aim, in this paper we propose an approach that enhances an existing forum by introducing a navigation structure that enables searching and navigating the forum content by topics of discussion. Topics and hierarchical relations between them are semi-automatically extracted from the forum content by applying Information Retrieval techniques, specifically Topic Models and Formal Concept Analysis. Then, forum posts and discussion threads are associated to discussion topics on a similarity score basis. Moreover, to support automatic moderation in websites that host several forums, we propose a strategy to assist a user writing a new post in choosing the most appropriate forum into which it should be added. An implementation of the topic-driven content search and navigation and assisted posting forum enhancement approaches for the Moodle learning management system is also presented in the paper, opening to the application of these approaches to several real distance learning contexts. Finally, we also report on two case studies that we have conducted to validate the two approaches and evaluate their benefits.

Proceedings ArticleDOI
21 Oct 2014
TL;DR: The proposed approach TAD consists in disambiguating web queries to build an adaptive semantic for diversity-based image retrieval and demonstrates promising results for the majority of the twelve ambiguous queries.
Abstract: In recent years, the explosive growth of multimedia databases and digital libraries reveals crucial problems in indexing and retrieving images, what led us to develop our own approach. Our proposed approach TAD consists in disambiguating web queries to build an adaptive semantic for diversity-based image retrieval. In fact, the TAD approach is a puzzle constituted by three main components which are the TAWQU (Thesaurus-Based Ambiguous Web Query Understanding) process, the ASC (Adaptive Semantic Construction) process and the DR (Diversity-based Retrieval) process. The Wikipedia pages represent our main source of information. The NUS-WIDE dataset is the bedrock of our adaptive semantic. Actually, it permits us to perform a respectful evaluation. Fortunately, the experiments demonstrate promising results for the majority of the twelve ambiguous queries.

Proceedings ArticleDOI
21 Oct 2014
TL;DR: An original semantic model (MEMORAe-core 2) for collaboration and information sharing is presented and it is shown how the annotation is modeled as information resource and a web platform is presented that uses this semantic model.
Abstract: Many research studies have shown that the organizational learning is a key factor that contributes to the well being of the organization. The process of organizational learning is affected by the collaborative annotation which plays an important role in it. However, current collaborative annotation platforms have a common limitation which is the restricted ability to share, index, retrieve annotations as any other information resource (e.g a document). In this paper, we define the annotation and we indicate how it becomes collaborative. We present an original semantic model (MEMORAe-core 2) for collaboration and information sharing and we show how the annotation is modeled as information resource. We present a web platform (MEMORAe) that uses this semantic model. A use case for the use of this platform within small and medium sized enterprise is also detailed. Within this work, our objective is to support the organizational learning by concentrating on the exchange of ideas by means of the collaborative annotation.

Proceedings ArticleDOI
21 Oct 2014
TL;DR: An unsupervised, domain-independent approach, for sentiment classification on Twitter that is able to differentiate between positive, negative, and objective (neutral) polarities for every word, given the context in which it occurs is proposed.
Abstract: Sentiment classification is not a new topic but data sources having different characteristics require customized methods to exploit the hidden existing semantic while minimizing the noise and irrelevant information. Twitter represents a huge pool of data having specific features. We propose therefore an unsupervised, domain-independent approach, for sentiment classification on Twitter. The proposed approach integrates NLP techniques, Word Sense Disambiguation and unsupervised rule-based classification. The method is able to differentiate between positive, negative, and objective (neutral) polarities for every word, given the context in which it occurs. Finally, the overall tweet polarity decision is taken by our proposed rule-based classifier. We performed a comparative evaluation of our method on four public datasets specialized for this task and the experimental results obtained are very good compared to other state-of-the-art methods, considering that our classifier does not use any training corpus.

Book ChapterDOI
21 Oct 2014
TL;DR: The proposed algorithm has a complexity lower than that of state of the art algorithms, as it is independent of the gap between the antecedent and the consequent, and it is used to mine rules with a consequent that occurs mainly at a given distance.
Abstract: This paper focuses on event prediction in an event sequence, particularly on distant event prediction. We aim at mining episode rules with a consequent temporally distant from the antecedent and with a minimal antecedent. To reach this goal, we propose an algorithm that determines the consequent of an episode rule at an early stage in the mining process, and that applies a span constraint on the antecedent and a gap constraint between the antecedent and the consequent. This algorithm has a complexity lower than that of state of the art algorithms, as it is independent of the gap between the antecedent and the consequent. In addition, the determination of the consequent at an early stage allows to filter out many non relevant rules early in the process, which results in an additional significant decrease of the running time. A new confidence measure is proposed, the temporal confidence, which evaluates the confidence of a rule in relation to the predefined gap. The temporal confidence is used to mine rules with a consequent that occurs mainly at a given distance. The algorithm is evaluated on an event sequence of social networks messages. We show that our algorithm mines minimal rules with a distant consequent, while requiring a small computation time. We also show that these rules can be used to accurately predict distant events.

Book ChapterDOI
21 Oct 2014
TL;DR: This work proposes a new learning method that allows applying supervised algorithms to unsupervised problems, achieving much better annotation predictions, and achieves good effectiveness in novel annotation prediction, outperforming state of the art unsuper supervised methods.
Abstract: Computational analyses for biomedical knowledge discovery greatly benefit from the availability of the description of gene and protein functional features expressed through controlled terminologies and ontologies, i.e. of their controlled annotations. In the last years, several databases of such annotations have become available; yet, these annotations are incomplete and only some of them represent highly reliable human curated information. To predict and discover unknown or missing annotations existing approaches use unsupervised learning algorithms. We propose a new learning method that allows applying supervised algorithms to unsupervised problems, achieving much better annotation predictions. This method, which we also extend from our preceding work with data weighting techniques, is based on the generation of artificial labeled training sets through random perturbations of original data. We tested it on nine Gene Ontology annotation datasets; obtained results demonstrate that our approach achieves good effectiveness in novel annotation prediction, outperforming state of the art unsupervised methods.

Proceedings ArticleDOI
21 Oct 2014
TL;DR: A methodology for applying latent trait analysis on all combinations of questions of a certain test to define competencies by sets of exiting questions that are testing congruent abilities is developed.
Abstract: In preparation of large scale surveys on computer science competencies, we are developing proper competency models and evaluation methodologies, aiming to define competencies by sets of exiting questions that are testing congruent abilities. For this purpose, we have to look for sets of test questions that are measuring joint psychometric constructs (competencies) according to the responses of the test persons. We have developed a methodology for this goal by applying latent trait analysis on all combinations of questions of a certain test. After identifying suitable sets of questions, we test the fit of the mono-parametric Rasch Model and evaluate the distribution of person parameters. As a test bed for first feasibility studies, we have utilized the large scale Bebras Contest in Germany 2009. The results show that this methodology works and might result one day in a set of empirically founded competencies in the field of Computational Thinking.

Proceedings ArticleDOI
21 Oct 2014
TL;DR: This work explores a variety of techniques with the aim of better classify a large Twitter data set accordingly to a user goal, and proposes a methodology where an unsupervised following is cascaded by a supervised technique.
Abstract: The classification problem has got a new importance dimension with the growing aggregated value which has been given to the Social Media such as Twitter. The huge number of small documents to be organized into subjects is challenging the previous resources and techniques that have been using so far. Futhermore, today more than ever, personalization is the most important feature that a system needs to exhibit. The goal of many online systems, which are available in many areas, is to address the needs or desires of each individual user. To achieve this goal, these systems need to be more flexible and faster in order to adapt to the user’s needs. In this work, we explore a variety of techniques with the aim of better classify a large Twitter data set accordingly to a user goal. We propose a methodology where we cascade an unsupervised following by supervised technique. For the unsupervised technique we use standard clustering algorithms, and for the supervised technique we propose the use of a kNN algorithm and a Centroid Based Classifier to perform the experiments. The results are promising because we reduced the amount of work to be done by the specialists and, in addition, we were able to mimic the human assessment decisions 0.7907 of the time, according to the F1-measure.

Proceedings ArticleDOI
21 Oct 2014
TL;DR: The results indicated that the information flows are an important diagnostic tool for the evaluation of the organizational information management status and the basis for the proposition of other models such as knowledge management.
Abstract: This paper discusses the importance of information flows in information management context and presents methodological aspects and a summary of information flow modeling results, part of Knowledge and Information Management Model (KIMM), a proposal developed and applied in the regulatory agency for land transport sector in Brazil. In order to achieve proposed objectives, this paper presents a brief background of KIMM project and the concepts and definitions formalization used such as: information assets, information flows and information life cycle. Aspects such as the characteristics of good information aligned to organizational characterization were also considered in the concepts and definitions section and how it was applied in information flow methodology modeling. The results indicated that the information flows are an important diagnostic tool for the evaluation of the organizational information management status and the basis for the proposition of other models such as knowledge management.

Proceedings ArticleDOI
01 Jan 2014
TL;DR: The author considers the impact and potential of a novel memebased approach aiming to aid individuals throughout their academic and professional life and as contributors and beneficiaries of organizational performance and in light of the anticipated next generation of KM systems.
Abstract: Just like the personal computer revolution, it is possible that Knowledge Management (KM) will in the 21 century experience a decentralizing revolution that gives more power and autonomy to individuals and selforganized groups. Seven decades after Vannevar Bush’s still unfulfilled vision of the Memex. Levy’s scenario stresses the dire need to provide overdue support tools for knowledge workers in our Knowledge Societies, not at the expense of Organizational KM Systems, but rather as the means to foster a fruitful co-evolution. With a prototype system addressing these issues about to be converted into a viable Personal KM system (PKMS), the author follows up on recent publications and considers the impact and potential of a novel memebased approach aiming to aid individuals throughout their academic and professional life and as contributors and beneficiaries of organizational performance and in light of the anticipated next generation of KM systems. 1 A NEXT GENERATION OF KM The first generation of Knowledge Management (KM) has been described as the capturing, storing, and reusing of existing knowledge including “systems of managing knowledge like company yellow pages, experts outlining processes they are involved in, creating learning communities where employees/customers share their knowledge, creating information systems for documenting and storing knowledge, and so on. These first-generation KM initiatives were about viewing knowledge as the foremost strategic asset, measuring it, capturing it, storing it, and protecting it. They were about treating knowledge as an asset, recognizing how it influences strategy, and wanting to make the most of it by managing it properly” (Pasher & Ronen 2011). The second KM generation needs to focus on creating new knowledge and innovation, a process which starts with the “reuse or new use of existing knowledge, adding an invention, and then creating a new product or service that exploits this invention.” This process requires creativity and the awareness that old knowledge becomes obsolete. For reaping the appropriate rewards, it is essential to systematically exploit the knowledge captured and created (Pasher & Ronen 2011). In reviewing a wider range of features for the ‘Next KM Generation’ (suggested by Sveiby, Wiig, Snowdon, McElroy, Ponzi, Miles, St Onge, Allee), the identified seven key themes prioritize confirm this need, but also strongly emphasize the personal and social nature of knowledge (Grant 2008). 1.1 The Personal KM Agenda Despite these accentuations and in contrast to Organizational KM (OKM), Personal KM (PKM) has been placed historically in a narrow individualistic confinement (Cheong & Tsui 2011b). In limiting its scope, PKM has been labelled as sophisticated career and life management with a core focus on personal enquiry (Pauleen & Gorman 2011) or as a means to improve some skills or capabilities of individuals (Davenport 2011), negating its importance relating to group member performance, business processes, or new technologies. This state surprises in light of prominent past suggestions to develop a ‘Memex’ for making one’s “intellectual excursions more enjoyable” (Bush 1945), to offer assistance in allocating one’s “attention efficiently among the [emerging] overabundance of information sources” (Simon 1971), or to provide adequate organizational leadership for building, connecting, and energizing dynamic knowledge-creating environments and their expansion (Nonaka 2000). As a result, PKM remains “a real and pressing problem”, and Bush's dream of a ‘Memex’ has yet to be fully realized on a wide scale (Davies 2011). Appropriately, Wiig argues for shifting the focus of KM toward strengthening the ability of people for enabling them to act in the best interest of their enterprise and its desired strategies and performance (Wiig 2004). In this context, PKM needs more appropriately to be regarded as a bottom-up approach to KM (Pollard 2008), as opposed to the more traditional, top-down Organizational KM. As such, PKM also “goes beyond Personal Information Management” (PIM) which focusses predominantly on information processing without the emphasis on creating new knowledge (Cheong & Tsui 2011a). But, although there are many PKM tools available, “they are not integrated with each other”, and the “currently available PKM systems can provide only a partial support to knowledge workers” (Osis 2011). “While today we have many powerful applications for locating vast amounts of digital information, we lack effective tools for selecting, structuring, personalizing, and making sense of the digital resources available to us” (Kahle 2009). Meanwhile, the organizational, commercial, social and legal innovations driven by technological progress and economic pressures continue to have profound impacts. Work has suffered from a process of fragmentation which will continue to accelerate. Its implications include one’s slipping control over constant interruptions, the loss of time for real concentration, and less learning by observation and reflection (Gratton 2011). With specializations and domain-specific knowledge on the rise, peoples’ identification has shifted from their company to their profession, and vertical hierarchies and traditional career ladders have been replaced by sideways career moves between companies and a horizontal labour market (Florida 2012). The ‘Future of Employment’ study estimates 47% of the current US employment to be still at risk due to recent technological breakthroughs able to turn previously non-routine tasks into well-defined problems susceptible to computerization (Frey & Osborne 2013). What is overdue is in the opinion of the author – an innovative PKM technology to provide the means for life-long-learning, resourcefulness, creative authorship and teamwork throughout an individual’s academic and professional life and for his/her role as contributor and beneficiary of organizational and societal performance. It needs to support the notion that knowledge and skills of knowledge workers are portable and mobile and with it their options on where, how, and for whom they will put their knowledge to work. 1.2 Intents and Barriers to Overcome In Wiig’s opinion (2011), individuals need to be highly knowledgeable not only to function competently as part of the workforce, but also in their daily lives and as public citizens: “In a society with broad personal competences, decision-making everywhere will maximize personal goals, provide effective public agencies and governance, make commerce and industry competitive, and ensure that personal and family decisions and actions will improve societal functions and Quality of Life.” But, “for better performance, people must [also] be provided with resources and opportunities to do their best. They need knowledge and understanding as well as motivation and supportive attitudes.” Such resources may well be autonomous PKM capacities, networked in continuous feedback loops, where individuals are able to determine how their expertise will be used or exchanged with people, communities, or organizations close to them. Levy (2011) expects such systems nourished by the creative conversation of the many networked PKM devices to assume an elementary role that enables the emergence of the distributed processes of collective intelligence, which in turn feed them. Accordingly, he sketches a scenario which stresses the vital role of future education “to encourage in students the sustainable growth of autonomous capacities in PKM” and which envisages KM to experience “a decentralizing revolution that gives more power and autonomy to individuals and self-organized groups” (Levy 2011). Although the author concurs that PKM Systems (PKMS) are destined to become potent drivers of human development (Schmitt 2014b), an enabling technological environment benefitting such a novel solution is presently facing severe barriers wasting individuals’ time and efforts (Schmitt 2014f). The remedies have been summed up into five provisions: 1. Digital personal and personalized knowledge is always in the possession and at the personal disposal of its owner or eligible co-worker, residing in personal hardware or personalized cloud-databases. 2.

Proceedings ArticleDOI
01 Jan 2014
TL;DR: F frail elderly people in Japanese daycare centers and rehabilitation facilities were given a 4-wheel, power-assisted bicycle, called a “Life Walker” (LW), to ride, and technology evaluations were carried out based on functionality, usability, and experience as perceived by the frail elderly riders.
Abstract: As societies age, it is anticipated that we will see a sudden increase in the number of frail elderly persons. New assisted-technology (AT) devices to facilitate the activities of daily life (ADL), especially of walking, are essential for the healthy life of these people. However, frail elderly people suffer a variety of physical and mental weaknesses that tend to hinder their ability to make use of AT devices in the intended manner. Because of this, it is important that new AT devices undergo technology evaluation within the context in which they are to be used, but there is very little research in this area. In this study, frail elderly people in Japanese daycare centers and rehabilitation facilities were given a 4-wheel, power-assisted bicycle, called a “Life Walker” (LW), to ride, and technology evaluations were carried out based on functionality, usability, and experience as perceived by the frail elderly riders. The LW is considered to be best suited for those age 75 and older assessed at level 1 to 3 under Japan’s long-term care insurance program, but the data for the 61 people at the rehabilitation facility who tried out the bicycle under the supervision of a resident physical therapist (PT), indicated that there was considerable individual deviation on the continued use of the AT device. The LW is also meant to enable frail elderly users who have difficulty walking to go outside and enjoy themselves more. It was found, however, that this effect was achieved only when the physical therapist intervened, gave encouragement, adjusted the bicycle settings as needed for the user, and otherwise created new knowledge. It was also found that in order for this kind of knowledge creation to take place, the bicycle must be used in an appropriate setting, the user needs to have a proactive attitude, and organizational support to ensure that therapists are appropriately assigned is necessary.

Book ChapterDOI
21 Oct 2014
TL;DR: Theoretically and empirically it suggests two ontology evaluation metrics - temporal bias and category bias - as well as an evaluation approach that are geared towards accounting for bias in data-drivenOntology evaluation.
Abstract: This chapter explores the multiple dimensions to data-driven ontology evaluation. Theoretically and empirically it suggests two ontology evaluation metrics - temporal bias and category bias, as well as an evaluation approach that are geared towards accounting for bias in data-driven ontology evaluation. Ontologies are a very important technology in the semantic web. They are an approximate representation and formalization of a domain of discourse in a manner that is both machine and human interpretable. Ontology evaluation therefore, concerns itself with measuring the degree to which the ontology approximates the domain. In data-driven ontology evaluation, the correctness of an ontology is measured agains a corpus of documents about the domain. This domain knowledge is dynamic and evolves over several dimensions such as the temporal and categorical. Current research makes an assumption that is contrary to this notion and hence does not account for the existence of bias in ontology evaluation. This chapter addresses this gap through experimentation and statistical evaluation.

Proceedings ArticleDOI
21 Oct 2014
TL;DR: A new approach for automatically classifying web pages into pre-defined topic categories is presented, applying text summarization and sentiment analysis techniques to extract topic and sentiment indicators of web pages, and suggesting that incorporating the sentiment dimension can indeed bring much added value to web content classification.
Abstract: Automatic classification of web content has been studied extensively, using different learning methods and tools, investigating different datasets to serve different purposes. Most of the studies have made use of the content and structural features of web pages. However, previous experience has shown that certain groups of web pages, such as those that contain hatred and violence, are much harder to classify with good accuracy when both content and structural features are already taken into consideration. In this study we present a new approach for automatically classifying web pages into pre-defined topic categories. We apply text summarization and sentiment analysis techniques to extract topic and sentiment indicators of web pages. We then build classifiers based on combined topic and sentiment features. A large amount of experiments were carried out. Our results suggest that incorporating the sentiment dimension can indeed bring much added value to web content classification. Topic similarity based classifiers solely did not perform well, but when topic similarity and sentiment features are combined, the classification model performance is significantly improved for many web categories. Our study offers valuable insights and inputs to the development of web detection systems and Internet safety solutions.

Proceedings ArticleDOI
21 Oct 2014
TL;DR: In this article, the authors present a five-step redesigning approach which takes into account different factors to increase the use of corporate knowledge management systems, and they use as an example the knowledge sharing platform implemented for the employees of Societe du Canal de Provence (SCP).
Abstract: The work presented in this paper focuses on the improvement of corporate knowledge management systems. For the implementation of such systems, companies deploy can important means for small gains. Indeed, management services often notice very limited use compared to what they actually expect. We present a five-step redesigning approach which takes into account different factors to increase the use of these systems. We use as an example the knowledge sharing platform implemented for the employees of Societe du Canal de Provence (SCP). This system was taken into production but very occasionally used. We describe the reasons for this limited use and we propose a design methodology adapted to the context. Promoting the effective use of the system, our approach has been experimented and evaluated with a panel of users working at SCP.

Proceedings ArticleDOI
21 Oct 2014
TL;DR: This paper presents a method that allows the extraction of a semantic and automatic description from the content such as genre, and chose to describe cinematic audiovisual documents based on the documentation prepared in the pre-production phase of films, namely synopsis.
Abstract: Audiovisual documents are among the most proliferated resources. Faced with these huge quantities produced every day, the lack of significant descriptions without missing the important content arises. The extraction of these descriptions requires an analysis of the audiovisual document’s content. The automation of the process of describing audiovisual documents is essential because of the richness and the diversity of the available analytical criteria. In this paper, we present a method that allows the extraction of a semantic and automatic description from the content such as genre. We chose to describe cinematic audiovisual documents based on the documentation prepared in the pre-production phase of films, namely synopsis. The experimental result on Imdb (Internet Movie Database) and the Wikipedia encyclopedia indicate that our method of genre detection is better than the result of these corpuses.

Proceedings ArticleDOI
21 Oct 2014
TL;DR: An algorithm that mines episode rules, which are minimal and have a consequent temporally distant from the antecedent, and a new confidence measure, the temporal confidence, is proposed, which evaluates the confidence of a rule in relation to the predefined gap.
Abstract: This paper focuses on event prediction in an event sequence, where we aim at predicting distant events. We propose an algorithm that mines episode rules, which are minimal and have a consequent temporally distant from the antecedent. As traditional algorithms are not able to mine directly rules with such characteristics, we propose an original way to mine these rules. Our algorithm, which has a complexity similar to that of state of the art algorithms, determines the consequent of an episode rule at an early stage in the mining process, it applies a span constraint on the antecedent and a gap constraint between the antecedent and the consequent. A new confidence measure, the temporal confidence, is proposed, which evaluates the confidence of a rule in relation to the predefined gap. The algorithm is validated on an event sequence of social networks messages. We show that minimal rules with a distant consequent are actually formed and that they can be used to accurately predict distant events.

Proceedings ArticleDOI
21 Oct 2014
TL;DR: This paper introduces a personalized image retrieval system based on user profile modeling depending on user’s context and adopts a fuzzy logic-baseduser profile modeling due to its flexibility in decision making since user preference are always imprecise.
Abstract: Given the continued growth in the number of documents available in the social Web, it becomes increasingly difficult for a user to find relevant resources satisfying his information need. Personalization seems to be an efficient manner to improve the retrieval engine effectiveness. In this paper we introduce a personalized image retrieval system based on user profile modeling depending on user’s context. The context includes user comments, rates, tags and preferences extracted from social network. We adopt a fuzzy logic-based user profile modeling due to its flexibility in decision making since user preference are always imprecise. The user has to specify his initial need description by rating concepts and contexts he is interested in. Concepts and contexts are weighted by the user by associating a score and these scores will infer in our fuzzy model to predict the preference degree related to each concept for such context and return the preference degree. Relying on the score affected for each concept and context we deduce its importance to apply then the appropriate fuzzy rule. As for as the experiments, the advanced user profile modeling with fuzzy logic shows more flexibility in the interpretation of the query.