Showing papers on "Domain knowledge published in 2017"

PDF

Open Access

Journal Article•DOI•

An Introduction to Deep Learning for the Physical Layer

[...]

Timothy J. O'Shea¹, Jakob Hoydis²•Institutions (2)

02 Oct 2017-IEEE Transactions on Cognitive Communications and Networking

TL;DR: In this article, an end-to-end reconstruction task was proposed to jointly optimize transmitter and receiver components in a single process, which can be extended to networks of multiple transmitters and receivers.

...read moreread less

Abstract: We present and discuss several novel applications of deep learning for the physical layer. By interpreting a communications system as an autoencoder, we develop a fundamental new way to think about communications system design as an end-to-end reconstruction task that seeks to jointly optimize transmitter and receiver components in a single process. We show how this idea can be extended to networks of multiple transmitters and receivers and present the concept of radio transformer networks as a means to incorporate expert domain knowledge in the machine learning model. Lastly, we demonstrate the application of convolutional neural networks on raw IQ samples for modulation classification which achieves competitive accuracy with respect to traditional schemes relying on expert features. This paper is concluded with a discussion of open challenges and areas for future investigation.

...read moreread less

1,879 citations

Posted Content•

An Introduction to Deep Learning for the Physical Layer

[...]

Timothy J. O'Shea¹, Jakob Hoydis²•Institutions (2)

Virginia Tech¹, Bell Labs²

02 Feb 2017-arXiv: Information Theory

TL;DR: A fundamental new way to think about communications system design as an end-to-end reconstruction task that seeks to jointly optimize transmitter and receiver components in a single process is developed.

...read moreread less

Abstract: We present and discuss several novel applications of deep learning for the physical layer. By interpreting a communications system as an autoencoder, we develop a fundamental new way to think about communications system design as an end-to-end reconstruction task that seeks to jointly optimize transmitter and receiver components in a single process. We show how this idea can be extended to networks of multiple transmitters and receivers and present the concept of radio transformer networks as a means to incorporate expert domain knowledge in the machine learning model. Lastly, we demonstrate the application of convolutional neural networks on raw IQ samples for modulation classification which achieves competitive accuracy with respect to traditional schemes relying on expert features. The paper is concluded with a discussion of open challenges and areas for future investigation.

...read moreread less

509 citations

Proceedings Article•DOI•

Semantically enhanced software traceability using deep learning techniques

[...]

Jin L.C. Guo¹, Jinghui Cheng¹, Jane Cleland-Huang¹•Institutions (1)

University of Notre Dame¹

20 May 2017

TL;DR: A tracing network architecture that utilizes Word Embedding and Recurrent Neural Network models to generate trace links and significantly out-performed state-of-the-art tracing methods including the Vector Space Model and Latent Semantic Indexing.

...read moreread less

Abstract: In most safety-critical domains the need for trace-ability is prescribed by certifying bodies. Trace links are generally created among requirements, design, source code, test cases and other artifacts; however, creating such links manually is time consuming and error prone. Automated solutions use information retrieval and machine learning techniques to generate trace links; however, current techniques fail to understand semantics of the software artifacts or to integrate domain knowledge into the tracing process and therefore tend to deliver imprecise and inaccurate results. In this paper, we present a solution that uses deep learning to incorporate requirements artifact semantics and domain knowledge into the tracing solution. We propose a tracing network architecture that utilizes Word Embedding and Recurrent Neural Network (RNN) models to generate trace links. Word embedding learns word vectors that represent knowledge of the domain corpus and RNN uses these word vectors to learn the sentence semantics of requirements artifacts. We trained 360 different configurations of the tracing network using existing trace links in the Positive Train Control domain and identified the Bidirectional Gated Recurrent Unit (BI-GRU) as the best model for the tracing task. BI-GRU significantly out-performed state-of-the-art tracing methods including the Vector Space Model and Latent Semantic Indexing.

...read moreread less

224 citations

Proceedings Article•

Label-Free Supervision of Neural Networks with Physics and Domain Knowledge

[...]

Russell Stewart¹, Stefano Ermon¹•Institutions (1)

Stanford University¹

13 Feb 2017

TL;DR: In this paper, the authors introduce a new approach to supervising neural networks by specifying constraints that should hold over the output space, rather than direct examples of input-output pairs.

...read moreread less

Abstract: In many machine learning applications, labeled data is scarce and obtaining more labels is expensive. We introduce a new approach to supervising neural networks by specifying constraints that should hold over the output space, rather than direct examples of input-output pairs. These constraints are derived from prior domain knowledge, e.g., from known laws of physics. We demonstrate the effectiveness of this approach on real world and simulated computer vision tasks. We are able to train a convolutional neural network to detect and track objects without any labeled examples. Our approach can significantly reduce the need for labeled training data, but introduces new challenges for encoding prior knowledge into appropriate loss functions.

...read moreread less

211 citations

Journal Article•DOI•

A hybrid knowledge-based recommender system for e-learning based on ontology and sequential pattern mining

[...]

John K. Tarus¹, John K. Tarus², Zhendong Niu¹, Zhendong Niu³, Abdallah Yousif¹ - Show less +1 more•Institutions (3)

Beijing Institute of Technology¹, Moi University², University UCINF³

01 Jul 2017-Future Generation Computer Systems

TL;DR: The proposed hybrid approach can alleviate both the cold-start and data sparsity problems by making use of ontological domain knowledge and learner’s sequential access pattern respectively before the initial data to work on is available in the recommender system.

...read moreread less

195 citations

Proceedings Article•DOI•

End-to-End Deep Learning of Optimization Heuristics

[...]

Christopher C. Cummins¹, Pavlos Petoumenos, Zheng Wang², Hugh Leather•Institutions (2)

University of Edinburgh¹, Lancaster University²

09 Sep 2017

TL;DR: A deep neural network is developed that learns heuristics over raw code, entirely without using codefeatures, and it is shown that the neural nets can transferlearning from one optimization problem to another, improving the accuracy of new models, without the help of human experts.

...read moreread less

Abstract: Accurate automatic optimization heuristics are necessary for dealing with thecomplexity and diversity of modern hardware and software. Machine learning is aproven technique for learning such heuristics, but its success is bound by thequality of the features used. These features must be hand crafted by developersthrough a combination of expert domain knowledge and trial and error. This makesthe quality of the final model directly dependent on the skill and availabletime of the system architect.Our work introduces a better way for building heuristics. We develop a deepneural network that learns heuristics over raw code, entirely without using codefeatures. The neural network simultaneously constructs appropriaterepresentations of the code and learns how best to optimize, removing the needfor manual feature creation. Further, we show that our neural nets can transferlearning from one optimization problem to another, improving the accuracy of newmodels, without the help of human experts.We compare the effectiveness of our automatically generated heuristics againstones with features hand-picked by experts. We examine two challenging tasks:predicting optimal mapping for heterogeneous parallelism and GPU threadcoarsening factors. In 89% of the cases, the quality of our fully automaticheuristics matches or surpasses that of state-of-the-art predictive models usinghand-crafted features, providing on average 14% and 12% more performance withno human effort expended on designing features.

...read moreread less

168 citations

Journal Article•DOI•

Looking beyond knowledge sharing: an integrative approach to knowledge management culture

[...]

Ali Intezari, Nazim Taskin, David J. Pauleen

19 Apr 2017-Journal of Knowledge Management

TL;DR: The paper suggests that a strategy for implementing successful organizational KM initiatives requires precise understanding and effective management of the core knowledge infrastructures and processes and removes the conceptual ambiguity resulting from the inconsistent use of different terms for the same knowledge process.

...read moreread less

Abstract: Purpose This study aims to identify the main knowledge processes associated with organizational knowledge culture. A diverse range of knowledge processes have been referred to in the extant literature, but little agreement exists on which knowledge processes are critical and should be supported by organizational culture. Design/methodology/approach Using a systematic literature review methodology, this study examined the primary literature – peer-reviewed and scholarly articles published in the top seven knowledge management and intellectual capital (KM/IC)-related journals. Findings The core knowledge processes have been identified – knowledge sharing, knowledge creation and knowledge implementation. The paper suggests that a strategy for implementing successful organizational KM initiatives requires precise understanding and effective management of the core knowledge infrastructures and processes. Although technology infrastructure is an important aspect of any KM initiative, the integration of knowledge into management decisions and practices relies on the extent to which the organizational culture supports or hinders knowledge processes. Research limitations/implications The focus of the study was on the articles published in the top seven KM/IC journals; important contributions in relevant publications in other KM journals, conference papers, books and professional reports may have been excluded. Practical implications Practitioners will benefit from a better understanding of knowledge processes involved in KM initiatives and investments. From a managerial perspective, the study offers an overview of the state of organizational knowledge culture research and suggests that for KM initiatives to be successful, the organization requires an integrated culture that is concerned with knowledge processes as a set of inextricably inter-related processes. Originality/value For the first time, a comprehensive list of diverse terms used in describing knowledge processes has been identified. The findings remove the conceptual ambiguity resulting from the inconsistent use of different terms for the same knowledge process by identifying the three major and overarching knowledge processes. Moreover, this study points to the need to attend to the inextricably interrelated nature of these three knowledge processes. Finally, this is the first time that a study provides evidence that shows the KM studies appear to be biased towards Knowledge sharing.

...read moreread less

131 citations

Proceedings Article•DOI•

Gated End-to-End Memory Networks

[...]

Fei Liu¹, Julien Perez¹•Institutions (1)

Xerox¹

01 Apr 2017

TL;DR: Gated end-to-end trainable memory networks (GMemN2N) as discussed by the authors have been proposed to solve the problem of more complex interactions between the memory and controller modules composing this family of models.

...read moreread less

Abstract: Machine reading using differentiable reasoning models has recently shown remarkable progress. In this context, End-to-End trainable Memory Networks (MemN2N) have demonstrated promising performance on simple natural language based reasoning tasks such as factual reasoning and basic deduction. However, other tasks, namely multi-fact question-answering, positional reasoning or dialog related tasks, remain challenging particularly due to the necessity of more complex interactions between the memory and controller modules composing this family of models. In this paper, we introduce a novel end-to-end memory access regulation mechanism inspired by the current progress on the connection short-cutting principle in the field of computer vision. Concretely, we develop a Gated End-to-End trainable Memory Network architecture (GMemN2N). From the machine learning perspective, this new capability is learned in an end-to-end fashion without the use of any additional supervision signal which is, as far as our knowledge goes, the first of its kind. Our experiments show significant improvements on the most challenging tasks in the 20 bAbI dataset, without the use of any domain knowledge. Then, we show improvements on the Dialog bAbI tasks including the real human-bot conversion-based Dialog State Tracking Challenge (DSTC-2) dataset. On these two datasets, our model sets the new state of the art.

...read moreread less

112 citations

Posted Content•

Feature Engineering for Predictive Modeling using Reinforcement Learning

[...]

Udayan Khurana¹, Horst Samulowitz¹, Deepak S. Turaga¹•Institutions (1)

IBM¹

21 Sep 2017-arXiv: Artificial Intelligence

TL;DR: This work presents a new framework to automate feature engineering, based on performance driven exploration of a transformation graph, which systematically and compactly enumerates the space of given options.

...read moreread less

Abstract: Feature engineering is a crucial step in the process of predictive modeling. It involves the transformation of given feature space, typically using mathematical functions, with the objective of reducing the modeling error for a given target. However, there is no well-defined basis for performing effective feature engineering. It involves domain knowledge, intuition, and most of all, a lengthy process of trial and error. The human attention involved in overseeing this process significantly influences the cost of model generation. We present a new framework to automate feature engineering. It is based on performance driven exploration of a transformation graph, which systematically and compactly enumerates the space of given options. A highly efficient exploration strategy is derived through reinforcement learning on past examples.

...read moreread less

109 citations

Journal Article•DOI•

Knowledge graph for TCM health preservation

[...]

Yu Tong, Li Jinghua, Yu Qi, Tian Ye, Xiaofeng Shun, Lili Xu, Ling Zhu, Hongjie Gao - Show less +4 more

01 Mar 2017-Artificial Intelligence in Medicine

TL;DR: A large-scale knowledge graph is constructed, which integrates terms, documents, databases and other knowledge resources and can facilitate various knowledge services such as knowledge visualization, knowledge retrieval, and knowledge recommendation, and helps the sharing, interpretation, and utilization of TCM health care knowledge.

...read moreread less

108 citations

Journal Article•DOI•

A Deep Learning Approach to Diabetic Blood Glucose Prediction

[...]

Hrushikesh N. Mhaskar¹, Sergei V. Pereverzyev, Maria D. van der Walt²•Institutions (2)

Claremont Graduate University¹, Vanderbilt University²

01 Jul 2017-Frontiers in Applied Mathematics and Statistics

TL;DR: It is demonstrated how deep learning can outperform shallow networks in this example and one novelty is to demonstrate how a parsimonious deep representation can be constructed using domain knowledge.

...read moreread less

Abstract: We consider the question of 30-minute prediction of blood glucose levels measured by continuous glucose monitoring devices, using clinical data. While most studies of this nature deal with one patient at a time, we take a certain percentage of patients in the data set as training data, and test on the remainder of the patients; i.e., the machine need not re-calibrate on the new patients in the data set. We demonstrate how deep learning can outperform shallow networks in this example. One novelty is to demonstrate how a parsimonious deep representation can be constructed using domain knowledge.

...read moreread less

Proceedings Article•DOI•

Deep Supervision with Shape Concepts for Occlusion-Aware 3D Object Parsing

[...]

Chi Li¹, M. Zeeshan Zia, Quoc-Huy Tran, Xiang Yu, Gregory D. Hager¹, Manmohan Chandraker² - Show less +2 more•Institutions (2)

Johns Hopkins University¹, University of California, San Diego²

01 Jul 2017

TL;DR: In this paper, a deep convolutional neural network (CNN) architecture is proposed to localize semantic parts in 2D image and 3D space while inferring their visibility states, given a single RGB image.

...read moreread less

Abstract: Monocular 3D object parsing is highly desirable in various scenarios including occlusion reasoning and holistic scene interpretation. We present a deep convolutional neural network (CNN) architecture to localize semantic parts in 2D image and 3D space while inferring their visibility states, given a single RGB image. Our key insight is to exploit domain knowledge to regularize the network by deeply supervising its hidden layers, in order to sequentially infer intermediate concepts associated with the final task. To acquire training data in desired quantities with ground truth 3D shape and relevant concepts, we render 3D object CAD models to generate large-scale synthetic data and simulate challenging occlusion configurations between objects. We train the network only on synthetic data and demonstrate state-of-the-art performances on real image benchmarks including an extended version of KITTI, PASCAL VOC, PASCAL3D+ and IKEA for 2D and 3D keypoint localization and instance segmentation. The empirical results substantiate the utility of our deep supervision scheme by demonstrating effective transfer of knowledge from synthetic data to real images, resulting in less overfitting compared to standard end-to-end training.

...read moreread less

Proceedings Article•DOI•

AutoLearn — Automated Feature Generation and Selection

[...]

Ambika Kaul, Saket Maheshwary¹, Vikram Pudi¹•Institutions (1)

International Institute of Information Technology, Hyderabad¹

01 Nov 2017

TL;DR: This work presents AutoLearn, a regression-based feature learning algorithm, that requires no domain knowledge and is hence generic, and can improve the overall prediction accuracy by 13.28%, compared to original feature space and 5.87% over other top performing models.

...read moreread less

Abstract: In recent years, the importance of feature engineering has been confirmed by the exceptional performance of deep learning techniques, that automate this task for some applications. For others, feature engineering requires substantial manual effort in designing and selecting features and is often tedious and non-scalable. We present AutoLearn, a regression-based feature learning algorithm. Being data-driven, it requires no domain knowledge and is hence generic. Such a representation is learnt by mining pairwise feature associations, identifying the linear or non-linear relationship between each pair, applying regression and selecting those relationships that are stable and improve the prediction performance. Our experimental evaluation on 18 UC Irvine and 7 Gene expression datasets, across different domains, provides evidence that the features learnt through our model can improve the overall prediction accuracy by 13.28%, compared to original feature space and 5.87% over other top performing models, across 8 different classifiers without using any domain knowledge.

...read moreread less

Posted Content•

Flexible End-to-End Dialogue System for Knowledge Grounded Conversation

[...]

Wenya Zhu, Kaixiang Mo, Yu Zhang, Zhangbin Zhu, Xuezheng Peng, Qiang Yang - Show less +2 more

13 Sep 2017-arXiv: Computation and Language

TL;DR: A dynamic knowledge enquirer which selects different answer entities at different positions in a single response, according to different local context is designed, enabling the model to deal with out-of-vocabulary entities.

...read moreread less

Abstract: In knowledge grounded conversation, domain knowledge plays an important role in a special domain such as Music. The response of knowledge grounded conversation might contain multiple answer entities or no entity at all. Although existing generative question answering (QA) systems can be applied to knowledge grounded conversation, they either have at most one entity in a response or cannot deal with out-of-vocabulary entities. We propose a fully data-driven generative dialogue system GenDS that is capable of generating responses based on input message and related knowledge base (KB). To generate arbitrary number of answer entities even when these entities never appear in the training set, we design a dynamic knowledge enquirer which selects different answer entities at different positions in a single response, according to different local context. It does not rely on the representations of entities, enabling our model deal with out-of-vocabulary entities. We collect a human-human conversation data (ConversMusic) with knowledge annotations. The proposed method is evaluated on CoversMusic and a public question answering dataset. Our proposed GenDS system outperforms baseline methods significantly in terms of the BLEU, entity accuracy, entity recall and human evaluation. Moreover,the experiments also demonstrate that GenDS works better even on small datasets.

...read moreread less

Journal Article•DOI•

Learning styles theory fails to explain learning and achievement: Recommendations for alternative approaches

[...]

Donggun An¹, Martha Carr²•Institutions (2)

Seoul National University¹, University of Georgia²

01 Oct 2017-Personality and Individual Differences

TL;DR: Three major concepts are discussed: lack of a clear, explanatory framework, problems of measurement, and a failure to link learning styles to achievement.

...read moreread less

Proceedings Article•DOI•

Learning the PE Header, Malware Detection with Minimal Domain Knowledge

[...]

Edward Raff, Jared Sylvester, Charles Nicholas¹•Institutions (1)

University of Maryland, Baltimore County¹

05 Sep 2017-arXiv: Machine Learning

TL;DR: This work explores the feasibility of applying neural networks to malware detection and feature learning by restricting ourselves to a minimal amount of domain knowledge in order to extract a portion of the Portable Executable (PE) header.

...read moreread less

Abstract: Many efforts have been made to use various forms of domain knowledge in malware detection. Currently there exist two common approaches to malware detection without domain knowledge, namely byte n-grams and strings. In this work we explore the feasibility of applying neural networks to malware detection and feature learning. We do this by restricting ourselves to a minimal amount of domain knowledge in order to extract a portion of the Portable Executable (PE) header. By doing this we show that neural networks can learn from raw bytes without explicit feature construction, and perform even better than a domain knowledge approach that parses the PE header into explicit features.

...read moreread less

Journal Article•DOI•

Anchored Correlation Explanation: Topic Modeling with Minimal Domain Knowledge

[...]

Ryan J. Gallagher¹, Ryan J. Gallagher², Kyle Reing¹, David C. Kale¹, Greg Ver Steeg¹ - Show less +1 more•Institutions (2)

University of Southern California¹, University of Vermont²

02 Dec 2017-Transactions of the Association for Computational Linguistics

TL;DR: Correlation Explanation is introduced, an alternative approach to topic modeling that does not assume an underlying generative model, and instead learns maximally informative topics through an information-theoretic framework that generalizes to hierarchical and semi-supervised extensions with no additional modeling assumptions.

...read moreread less

Abstract: While generative models such as Latent Dirichlet Allocation (LDA) have proven fruitful in topic modeling, they often require detailed assumptions and careful specification of hyperparameters. Such model complexity issues only compound when trying to generalize generative models to incorporate human input. We introduce Correlation Explanation (CorEx), an alternative approach to topic modeling that does not assume an underlying generative model, and instead learns maximally informative topics through an information-theoretic framework. This framework naturally generalizes to hierarchical and semi-supervised extensions with no additional modeling assumptions. In particular, word-level domain knowledge can be flexibly incorporated within CorEx through anchor words, allowing topic separability and representation to be promoted with minimal human intervention. Across a variety of datasets, metrics, and experiments, we demonstrate that CorEx produces topics that are comparable in quality to those produced by unsupervised and semi-supervised variants of LDA.

...read moreread less

Proceedings Article•DOI•

Human-centric justification of machine learning predictions

[...]

Or Biran, Kathleen R. McKeown¹•Institutions (1)

Columbia University¹

19 Aug 2017

TL;DR: This work proposes a novel approach to producing justifications that is geared towards users without machine learning expertise, focusing on domain knowledge and on human reasoning, and utilizing natural language generation.

...read moreread less

Abstract: Human decision makers in many domains can make use of predictions made by machine learning models in their decision making process, but the usability of these predictions is limited if the human is unable to justify his or her trust in the prediction. We propose a novel approach to producing justifications that is geared towards users without machine learning expertise, focusing on domain knowledge and on human reasoning, and utilizing natural language generation. Through a task-based experiment, we show that our approach significantly helps humans to correctly decide whether or not predictions are accurate, and significantly increases their satisfaction with the justification.

...read moreread less

Proceedings Article•DOI•

Cross-Domain Recommendation via Clustering on Multi-Layer Graphs

[...]

Aleksandr Farseev¹, Ivan Samborskii¹, Andrey Filchenkov², Tat-Seng Chua¹•Institutions (2)

National University of Singapore¹, Saint Petersburg State University of Information Technologies, Mechanics and Optics²

07 Aug 2017

TL;DR: This work introduces a novel cross-network collaborative recommendation framework C3R, which utilizes both individual and group knowledge, while being trained on data from multiple social media sources, and suggests a new approach for automatic construction of inter-network relationship graph based on the data, which eliminates the necessity of having pre-defined domain knowledge.

...read moreread less

Abstract: Venue category recommendation is an essential application for the tourism and advertisement industries, wherein it may suggest attractive localities within close proximity to users' current location. Considering that many adults use more than three social networks simultaneously, it is reasonable to leverage on this rapidly growing multi-source social media data to boost venue recommendation performance. Another approach to achieve higher recommendation results is to utilize group knowledge, which is able to diversify recommendation output. Taking into account these two aspects, we introduce a novel cross-network collaborative recommendation framework C3R, which utilizes both individual and group knowledge, while being trained on data from multiple social media sources. Group knowledge is derived based on new cross-source user community detection approach, which utilizes both inter-source relationship and the ability of sources to complement each other. To fully utilize multi-source multi-view data, we process user-generated content by employing state-of-the-art text, image, and location processing techniques. Our experimental results demonstrate the superiority of our multi-source framework over state-of-the-art baselines and different data source combinations. In addition, we suggest a new approach for automatic construction of inter-network relationship graph based on the data, which eliminates the necessity of having pre-defined domain knowledge.

...read moreread less

Journal Article•DOI•

Integrated knowledge translation: digging deeper, moving forward

[...]

Anita Kothari¹, C. Nadine Wathen¹•Institutions (1)

University of Western Ontario¹

15 Mar 2017-Journal of Epidemiology and Community Health

TL;DR: This paper discusses the need to advance the field through the development of data collection and interpretation tools that creatively engage knowledge users in the research process, and explores consensus-based approaches and networks as alternate sites of knowledge co-production.

...read moreread less

Abstract: Background Integrated knowledge translation has risen in popularity as a solution to the underuse of research in policy and practice settings. It engages knowledge users—policymakers, practitioners, patients/consumers or their advocates, and members of the wider public—in mutually beneficial research that can involve the joint development of research questions, data collection, analysis and dissemination of findings. Knowledge that is co-produced has a better chance of being implemented. Discussion The purpose of this paper is to update developments in the field of integrated knowledge translation through a deeper analysis of the approach in practice-oriented and policy-oriented health research. We present collaborative models that fall outside the scope of integrated knowledge translation, but then explore consensus-based approaches and networks as alternate sites of knowledge co-production. We discuss the need to advance the field through the development, or use, of data collection and interpretation tools that creatively engage knowledge users in the research process. Most importantly, conceptually relevant outcomes need to be identified, including ones that focus on team transformation through the co-production of knowledge. Conclusions We explore some of these challenges and benefits in detail to help researchers understand what integrated knowledge translation means, and whether the approach9s potential added value is worth the investment of time, energy and other resources.

...read moreread less

Journal Article•DOI•

Learning from heterogeneous temporal data in electronic health records.

[...]

Jing Zhao¹, Panagiotis Papapetrou¹, Lars Asker¹, Henrik Boström¹•Institutions (1)

Stockholm University¹

01 Jan 2017-Journal of Biomedical Informatics

TL;DR: Novel representations of temporal data in electronic health records are explored based on symbolic sequence representations of time series data that better account for the temporality of clinical events, which is often key to prediction tasks in the biomedical domain.

...read moreread less

Posted Content•

One button machine for automating feature engineering in relational databases

[...]

Hoang Thanh Lam, Johann-Michael Thiebaut, Mathieu Sinn, Bei Chen, Tiep Mai, Oznur Alkan - Show less +2 more

01 Jun 2017-arXiv: Databases

TL;DR: This paper introduces a system called One Button Machine, or OneBM for short, which automates feature discovery in relational databases, which automatically performs a key activity of data scientists, namely, joining of database tables and applying advanced data transformations to extract useful features from data.

...read moreread less

Abstract: Feature engineering is one of the most important and time consuming tasks in predictive analytics projects It involves understanding domain knowledge and data exploration to discover relevant hand-crafted features from raw data In this paper, we introduce a system called One Button Machine, or OneBM for short, which automates feature discovery in relational databases OneBM automatically performs a key activity of data scientists, namely, joining of database tables and applying advanced data transformations to extract useful features from data We validated OneBM in Kaggle competitions in which OneBM achieved performance as good as top 16% to 24% data scientists in three Kaggle competitions More importantly, OneBM outperformed the state-of-the-art system in a Kaggle competition in terms of prediction accuracy and ranking on Kaggle leaderboard The results show that OneBM can be useful for both data scientists and non-experts It helps data scientists reduce data exploration time allowing them to try and error many ideas in short time On the other hand, it enables non-experts, who are not familiar with data science, to quickly extract value from their data with a little effort, time and cost

...read moreread less

Journal Article•DOI•

AxiSketcher: Interactive Nonlinear Axis Mapping of Visualizations through User Drawings

[...]

Bum Chul Kwon¹, Hannah Kim², Emily Wall², Jaegul Choo³, Haesun Park², Alex Endert² - Show less +2 more•Institutions (3)

IBM¹, Georgia Institute of Technology², Korea University³

01 Jan 2017-IEEE Transactions on Visualization and Computer Graphics

TL;DR: This paper introduces a technique to interpret a user's drawings with an interactive, nonlinear axis mapping approach called AxiSketcher, which enables users to impose their domain knowledge on a visualization by allowing interaction with data entries rather than with data attributes.

...read moreread less

Abstract: Visual analytics techniques help users explore high-dimensional data. However, it is often challenging for users to express their domain knowledge in order to steer the underlying data model, especially when they have little attribute-level knowledge. Furthermore, users' complex, high-level domain knowledge, compared to low-level attributes, posits even greater challenges. To overcome these challenges, we introduce a technique to interpret a user's drawings with an interactive, nonlinear axis mapping approach called AxiSketcher. This technique enables users to impose their domain knowledge on a visualization by allowing interaction with data entries rather than with data attributes. The proposed interaction is performed through directly sketching lines over the visualization. Using this technique, users can draw lines over selected data points, and the system forms the axes that represent a nonlinear, weighted combination of multidimensional attributes. In this paper, we describe our techniques in three areas: 1) the design space of sketching methods for eliciting users' nonlinear domain knowledge; 2) the underlying model that translates users' input, extracts patterns behind the selected data points, and results in nonlinear axes reflecting users' complex intent; and 3) the interactive visualization for viewing, assessing, and reconstructing the newly formed, nonlinear axes.

...read moreread less

Journal Article•DOI•

Source evaluation of domain experts and novices during Web search

[...]

Saskia Brand-Gruwel¹, Yvonne Kammerer², L. van Meeuwen¹, T.A.J.M. van Gog³, T.A.J.M. van Gog⁴ - Show less +1 more•Institutions (4)

Open University¹, Leibniz Institute for Neurobiology², Utrecht University³, Erasmus University Rotterdam⁴

01 Jun 2017-Journal of Computer Assisted Learning

TL;DR: Investigation of how novices in the domain of psychology evaluate Internet sources as compared to domain experts revealed that domain expertise has an impact on individuals' evaluation behaviour during Web search, such that domain experts showed a more sophisticated use of evaluation criteria to judge the reliability of sources and information and selected more reliable information than domain novice.

...read moreread less

Abstract: Nowadays, almost everyone uses the World Wide Web (WWW) to search for information of any kind. In education, students frequently use the WWW for selecting information to accomplish assignments such as writing an essay or preparing a presentation. The evaluation of sources and information is an important sub-skill in this process. But many students have not yet optimally developed this skill. On the basis of verbal reports, eye-tracking data and navigation logs, this study investigated how novices in the domain of psychology evaluate Internet sources as compared to domain experts. In addition, two different verbal reporting techniques, namely thinking aloud and cued retrospective reporting, were compared in order to examine students' evaluation behaviour. Results revealed that domain expertise has an impact on individuals' evaluation behaviour during Web search, such that domain experts showed a more sophisticated use of evaluation criteria to judge the reliability of sources and information and selected more reliable information than domain novices. Furthermore, the different verbal reporting techniques did not lead to different conclusions on criteria use in relation to domain expertise, although in general more utterances concerning evaluation of sources and information were expressed during cued retrospective reporting. Lay Description: What is already known about this topic: When searching the Web, students assess the reliability of information and sources in a spontaneous manner. Students tend to use superficial criteria when judging the trustworthiness of web-based information and sources. Little is known about the differences between domain experts and novices in terms of the ways they spontaneously evaluate web-based information and sources. What this paper adds: Domain knowledge influences evaluation behaviour, with greater domain knowledge leading to more reliable selection of sources. A comparison of techniques for measuring evaluation behaviour when searching the Web for information reveals that different techniques, such as thinking aloud and cued retrospective reporting, have different pros and cons whose implications depend on the research questions being addressed. Implications for practice and/or policy: In formal education, more attention should be paid to equipping students with the requisite skills in searching for and evaluating web-based information. Embedded process-oriented instructional designs are needed that foster the appropriate evaluation behaviours.

...read moreread less

Journal Article•DOI•

Graph-based knowledge reuse for supporting knowledge-driven decision-making in new product development

[...]

Chao Zhang¹, Guanghui Zhou¹, Qi Lu¹, Fengtian Chang¹•Institutions (1)

Xi'an Jiaotong University¹

13 Jul 2017-International Journal of Production Research

TL;DR: A graph-based approach to knowledge reuse for supporting knowledge-driven decision-making in new product development and the feasibility and effectiveness of the proposed approach are demonstrated.

...read moreread less

Abstract: Pre-existing knowledge buried in manufacturing enterprises can be reused to help decision-makers develop good judgements to make decisions about the problems in new product development, which in turn speeds up and improves the quality of product innovation. This paper presents a graph-based approach to knowledge reuse for supporting knowledge-driven decision-making in new product development. The paper first illustrates the iterative process of knowledge-driven decision-making in new product development. Then, a novel framework is proposed to facilitate this process, where knowledge maps and knowledge navigation are involved. Here, OWL ontologies are employed to construct knowledge maps, which appropriately capture and organise knowledge resources generated at various stages of product lifecycle; the Personalised PageRank algorithm is used to perform knowledge navigation, which finds the most relevant knowledge in knowledge maps for a given problem in new product development. Finally, the feasibility and ...

...read moreread less

Proceedings Article•DOI•

Learning the PE Header, Malware Detection with Minimal Domain Knowledge

[...]

Edward Raff, Jared Sylvester, Charles Nicholas¹•Institutions (1)

University of Maryland, Baltimore County¹

03 Nov 2017

TL;DR: In this paper, the authors explore the feasibility of applying neural networks to malware detection and feature learning by restricting themselves to a minimal amount of domain knowledge in order to extract a portion of the Portable Executable (PE) header.

...read moreread less

Journal Article•DOI•

Developing Mathematics Knowledge

[...]

Bethany Rittle-Johnson¹•Institutions (1)

Vanderbilt University¹

01 Sep 2017-Child Development Perspectives

Journal Article•DOI•

Searching for information on the web

[...]

Mylène Sanchiz¹, Jessie Chin², Aline Chevalier¹, Wai-Tat Fu², Franck Amadieu¹, Jibo He³ - Show less +2 more•Institutions (3)

University of Toulouse¹, University of Illinois at Urbana–Champaign², Wichita State University³

01 Jan 2017-Information Processing and Management

TL;DR: Prior domain knowledge improves older adults query and navigation strategies and copes with the age-related decline of cognitive flexibility and outperformed by young ones in open-ended information problems.

...read moreread less

Abstract: Prior domain knowledge improves older adults query and navigation strategies and copes with the age-related decline of cognitive flexibility.Unlike prior results, older adults were outperformed by young ones in open-ended information problems.In open-ended information problems, older adults did not benefit from their prior knowledge and produced semantically less relevant queries as compared to fact-finding problems. This study focuses on the impact of age, prior domain knowledge and cognitive abilities on performance, query production and navigation strategies during information searching. Twenty older adults and nineteen young adults had to answer 12 information search problems of varying nature within two domain knowledge: health and manga. In each domain, participants had to perform two simple fact-finding problems (keywords provided and answer directly accessible on the search engine results page), two difficult fact-finding problems (keywords had to be inferred) and two open-ended information search problems (multiple answers possible and navigation necessary). Results showed that prior domain knowledge helped older adults improve navigation (i.e. reduced the number of webpages visited and thus decreased the feeling of disorientation), query production and reformulation (i.e. they formulated semantically more specific queries, and they inferred a greater number of new keywords).

...read moreread less

Journal Article•DOI•

Natural Language Requirements Processing: A 4D Vision

[...]

Alessio Ferrari, Felice Dell'Orletta, Andrea Esuli, Vincenzo Gervasi¹, Stefania Gnesi - Show less +1 more•Institutions (1)

University of Pisa¹

01 Nov 2017-IEEE Software

TL;DR: The future evolution of the application of NLP technologies in RE can be viewed from four dimensions: discipline, dynamism, domain knowledge, and datasets.

...read moreread less

Abstract: Natural language processing (NLP) and requirements engineering (RE) have had a long relationship, yet their combined use isn’t well established in industrial practice. This situation should soon change. The future evolution of the application of NLP technologies in RE can be viewed from four dimensions: discipline, dynamism, domain knowledge, and datasets.

...read moreread less

Journal Article•DOI•

Detecting duplicate bug reports with software engineering domain knowledge

[...]

Karan Aggarwal¹, Finbarr Timbers¹, Tanner Rutgers¹, Abram Hindle¹, Eleni Stroulia¹, Russell Greiner¹ - Show less +2 more•Institutions (1)

University of Alberta¹

01 Mar 2017-Journal of Software: Evolution and Process

TL;DR: Evaluating this software-literature context method on real-world bug reports produces useful results that indicate this semi-automated method has the potential to substantially decrease the manual effort used in contextual bug deduplication while suffering only a minor loss in accuracy.

...read moreread less

Abstract: Bug deduplication, ie, recognizing bug reports that refer to the same problem, is a challenging task in the software-engineering life cycle Researchers have proposed several methods primarily relying on information-retrieval techniques Our work motivated by the intuition that domain knowledge can provide the relevant context to enhance effectiveness, attempts to improve the use of information retrieval by augmenting with software-engineering knowledge In our previous work, we proposed the software-literature-context method for using software-engineering literature as a source of contextual information to detect duplicates If bug reports relate to similar subjects, they have a better chance of being duplicates Our method, being largely automated, has a potential to substantially decrease the level of manual effort involved in conventional techniques with a minor trade-off in accuracy In this study, we extend our work by demonstrating that domain-specific features can be applied across projects than project-specific features demonstrated previously while still maintaining performance We also introduce a hierarchy-of-context to capture the software-engineering knowledge in the realms of contextual space to produce performance gains We also highlight the importance of domain-specific contextual features through cross-domain contexts: adding context improved accuracy; Kappa scores improved by at least 38% to 108% per project

...read moreread less

Collapse