scispace - formally typeset
Search or ask a question

Showing papers by "Hiroyuki Shindo published in 2019"


Posted Content
TL;DR: This work proposes a new global entity disambiguation (ED) model based on a bidirectional transformer encoder and produces contextualized embeddings for words and entities in the input text that achieves new state-of-the-art results on all but one dataset.
Abstract: We propose a new global entity disambiguation (ED) model based on contextualized embeddings of words and entities. Our model is based on a bidirectional transformer encoder (i.e., BERT) and produces contextualized embeddings for words and entities in the input text. The model is trained using a new masked entity prediction task that aims to train the model by predicting randomly masked entities in entity-annotated texts obtained from Wikipedia. We further extend the model by solving ED as a sequential decision task to capture global contextual information. We evaluate our model using six standard ED datasets and achieve new state-of-the-art results on all but one dataset.

31 citations


Proceedings ArticleDOI
03 Sep 2019
TL;DR: A Neural Attentive Bag-of-Entities model is proposed, which is a neural network model that performs text classification using entities in a knowledge base that combines simple high-recall entity detection based on a dictionary, with a novel neural attention mechanism that enables the model to focus on a small number of unambiguous and relevant entities.
Abstract: This study proposes a Neural Attentive Bag-of-Entities model, which is a neural network model that performs text classification using entities in a knowledge base. Entities provide unambiguous and relevant semantic signals that are beneficial for text classification. We combine simple high-recall entity detection based on a dictionary, to detect entities in a document, with a novel neural attention mechanism that enables the model to focus on a small number of unambiguous and relevant entities. We tested the effectiveness of our model using two standard text classification datasets (i.e., the 20 Newsgroups and R8 datasets) and a popular factoid question answering dataset based on a trivia quiz game. As a result, our model achieved state-of-the-art results on all datasets. The source code of the proposed model is available online at https://github.com/wikipedia2vec/wikipedia2vec.

27 citations


Proceedings ArticleDOI
01 Jun 2019
TL;DR: Text2Quest as discussed by the authors is an interactive game-based approach for action-graph extraction from materials science papers, where procedural text is interpreted as instructions for an interactive role-playing game and a learning agent completes the game by executing the procedure correctly.
Abstract: Understanding procedural text requires tracking entities, actions and effects as the narrative unfolds. We focus on the challenging real-world problem of action-graph extraction from materials science papers, where language is highly specialized and data annotation is expensive and scarce. We propose a novel approach, Text2Quest, where procedural text is interpreted as instructions for an interactive game. A learning agent completes the game by executing the procedure correctly in a text-based simulated lab environment. The framework can complement existing approaches and enables richer forms of learning compared to static texts. We discuss potential limitations and advantages of the approach, and release a prototype proof-of-concept, hoping to encourage research in this direction.

20 citations


Proceedings ArticleDOI
01 Jul 2019
TL;DR: This model incorporates a language model for unsupervised tokenization into a text classifier and then trains both models simultaneously, which achieves better performance than previous methods.
Abstract: For unsegmented languages such as Japanese and Chinese, tokenization of a sentence has a significant impact on the performance of text classification. Sentences are usually segmented with words or subwords by a morphological analyzer or byte pair encoding and then encoded with word (or subword) representations for neural networks. However, segmentation is potentially ambiguous, and it is unclear whether the segmented tokens achieve the best performance for the target task. In this paper, we propose a method to simultaneously learn tokenization and text classification to address these problems. Our model incorporates a language model for unsupervised tokenization into a text classifier and then trains both models simultaneously. To make the model robust against infrequent tokens, we sampled segmentation for each sentence stochastically during training, which resulted in improved performance of text classification. We conducted experiments on sentiment analysis as a text classification task and show that our method achieves better performance than previous methods.

19 citations


Journal ArticleDOI
TL;DR: A novel neural RE model that combines a bidirectional gated recurrent unit model with a form of hierarchical attention that is better suited to RE is proposed and a contextual inference method that can infer the most likely positive examples of an entity pair in bags with very limited contextual information is proposed.
Abstract: Distant supervision (DS) has become an efficient approach for relation extraction (RE) to alleviate the lack of labeled examples in supervised learning. In this paper, we propose a novel neural RE model that combines a bidirectional gated recurrent unit model with a form of hierarchical attention that is better suited to RE. We demonstrate that an additional attention mechanism called piecewise attention, which builds itself upon segment level representations, significantly enhances the performance of the distantly supervised relation extraction task. Our piecewise attention mechanism not only captures crucial segments in each sentence but also reflects the direction of relations between two entities. Furthermore, we propose a contextual inference method that can infer the most likely positive examples of an entity pair in bags with very limited contextual information. In addition, we provide an annotated dataset without false positive examples based on the Riedel testing dataset, and report on the actual performance of several RE models. The experimental results show that our proposed methods outperform the previous state-of-the-art baselines on both original and annotated datasets for the distantly supervised RE task.

13 citations


Posted Content
TL;DR: This work proposes a simple and powerful graph neural networks for molecular property prediction as a directed complete graph in which each atom has a spatial position, and introduces a recursive neural network with simple gating function.
Abstract: Molecule property prediction is a fundamental problem for computer-aided drug discovery and materials science. Quantum-chemical simulations such as density functional theory (DFT) have been widely used for calculating the molecule properties, however, because of the heavy computational cost, it is difficult to search a huge number of potential chemical compounds. Machine learning methods for molecular modeling are attractive alternatives, however, the development of expressive, accurate, and scalable graph neural networks for learning molecular representations is still challenging. In this work, we propose a simple and powerful graph neural networks for molecular property prediction. We model a molecular as a directed complete graph in which each atom has a spatial position, and introduce a recursive neural network with simple gating function. We also feed input embeddings for every layers as skip connections to accelerate the training. Experimental results show that our model achieves the state-of-the-art performance on the standard benchmark dataset for molecular property prediction.

12 citations


Proceedings ArticleDOI
26 Mar 2019
TL;DR: Contour extraction using deep learning possesses high noise immunity and excellent pattern recognition ability, and demonstrates high performance to contour extraction from low SN SEM images and multiple layers pattern ones.
Abstract: With the miniaturization of devices, hot spots caused by wafer topology are becoming a problem in addition to hot spots resulting from design, mask and wafer process, and hot spot evaluation of a wide area in a chip is becoming required. Although DBM (Design Based Metrology) is an effective method for evaluating systematic defects of EUV lithography and multi-patterning, it requires a long time to evaluate because it is necessary to acquire a high-SN SEM image captured by a contour extraction for DBM that can handle low-SN SEM image captured by high-speed SEM scanning conditions. Contour extraction using deep learning possesses high noise immunity and excellent pattern recognition ability, and demonstrates high performance to contour extraction from low SN SEM images and multiple layers pattern ones. The proposed method is composed of annotation operation of SEM image samples, training process using annotation data and SEM image samples, and contour extraction process using the trained outcome. In the evaluation experiment, we confirmed that satisfactory contours are extracted from low SN SEM images and multiple layers pattern ones.

11 citations


Proceedings ArticleDOI
01 Jun 2019
TL;DR: This work proposes a simple and accurate model for coordination boundary identification that makes use of probabilities of coordinators and conjuncts in the CKY parsing to find the optimal combination of coordinate structures.
Abstract: We propose a simple and accurate model for coordination boundary identification. Our model decomposes the task into three sub-tasks during training; finding a coordinator, identifying inside boundaries of a pair of conjuncts, and selecting outside boundaries of it. For inference, we make use of probabilities of coordinators and conjuncts in the CKY parsing to find the optimal combination of coordinate structures. Experimental results demonstrate that our model achieves state-of-the-art results, ensuring that the global structure of coordinations is consistent.

10 citations


Posted Content
TL;DR: This paper proposes an extension of IOBES model to improve the performance of BioNER and proposes a new segment Representation model, FROBES, which outperforms other models for multi-word entities with length greater than two.
Abstract: Biomedical Named Entity Recognition (BioNER) is a crucial step for analyzing Biomedical texts, which aims at extracting biomedical named entities from a given text. Different supervised machine learning algorithms have been applied for BioNER by various researchers. The main requirement of these approaches is an annotated dataset used for learning the parameters of machine learning algorithms. Segment Representation (SR) models comprise of different tag sets used for representing the annotated data, such as IOB2, IOE2 and IOBES. In this paper, we propose an extension of IOBES model to improve the performance of BioNER. The proposed SR model, FROBES, improves the representation of multi-word entities. We used Bidirectional Long Short-Term Memory (BiLSTM) network; an instance of Recurrent Neural Networks (RNN), to design a baseline system for BioNER and evaluated the new SR model on two datasets, i2b2/VA 2010 challenge dataset and JNLPBA 2004 shared task dataset. The proposed SR model outperforms other models for multi-word entities with length greater than two. Further, the outputs of different SR models have been combined using majority voting ensemble method which outperforms the baseline models performance.

9 citations


Proceedings ArticleDOI
01 Jun 2019
TL;DR: This article proposed a new model combining Segment-level Attention-based Convolutional Neural Networks (SACNNs) and Dependency-based Recurrent Neural Networks(DepRNNs), which can handle the long-distance relations from the shortest dependency path of relation entities.
Abstract: Recently, relation classification has gained much success by exploiting deep neural networks. In this paper, we propose a new model effectively combining Segment-level Attention-based Convolutional Neural Networks (SACNNs) and Dependency-based Recurrent Neural Networks (DepRNNs). While SACNNs allow the model to selectively focus on the important information segment from the raw sequence, DepRNNs help to handle the long-distance relations from the shortest dependency path of relation entities. Experiments on the SemEval-2010 Task 8 dataset show that our model is comparable to the state-of-the-art without using any external lexical features.

8 citations


Proceedings ArticleDOI
26 Mar 2019
TL;DR: Experimental results showed that the proposed method could estimate the design layout from the low-SN SEM image and improve the pattern matching success rate, and it is expected that this method will be advantageous for evaluating mass systematic defects during the process development.
Abstract: With the miniaturization of devices, hot spot evaluation of a wide area of a wafer for small change points such as wafer topology is required. DBM (Design Based Metrology) is an effective method for evaluating systematic defects of multiple patterning and EUV lithography. However, it takes a long time to evaluate because it is necessary to acquire a high-SN SEM image captured by low-speed SEM scanning conditions. Therefore, we developed a new pattern matching method of DBM by utilizing deep learning technology. Our proposed method can handle low-SN SEM images captured under high-speed SEM scanning conditions. In the proposed method, we use deep learning to estimate design layout from SEM image, and then perform pattern matching between this estimated design layout and the true design layout. The proposed method is particularly effective for pattern matching of low-SN SEM images and circuit pattern distorted during manufacturing process. It is expected that this method will be advantageous for evaluating mass systematic defects during the process development. Experimental results showed that the proposed method could estimate the design layout from the low-SN SEM image and improve the pattern matching success rate.

Posted Content
TL;DR: The authors proposed a Neural Attention-Bag-of-Entities model, which combines simple high-recall entity detection based on a dictionary, to detect entities in a document, with a novel neural attention mechanism that enables the model to focus on a small number of unambiguous and relevant entities.
Abstract: This study proposes a Neural Attentive Bag-of-Entities model, which is a neural network model that performs text classification using entities in a knowledge base. Entities provide unambiguous and relevant semantic signals that are beneficial for capturing semantics in texts. We combine simple high-recall entity detection based on a dictionary, to detect entities in a document, with a novel neural attention mechanism that enables the model to focus on a small number of unambiguous and relevant entities. We tested the effectiveness of our model using two standard text classification datasets (i.e., the 20 Newsgroups and R8 datasets) and a popular factoid question answering dataset based on a trivia quiz game. As a result, our model achieved state-of-the-art results on all datasets. The source code of the proposed model is available online at this https URL.

Patent
Shinichi Shinoda1, Masayoshi Ishikawa1, Yasutaka Toyoda1, Yuichi Abe1, Hiroyuki Shindo1 
18 Jan 2019
TL;DR: In this article, a machine learning model is used to generate a design data image from an inspection target image, using the design data as a teacher and using the inspection target as a source image corresponding to the image.
Abstract: The image evaluation device includes a design data image generation unit that images design data; a machine learning unit that creates a model for generating a design data image from an inspection target image, using the design data image as a teacher and using the inspection target image corresponding to the design data image; a design data prediction image generation unit that predicts the design data image from the inspection target image, using the model created by the machine learning unit; a design data image generation unit that images the design data corresponding to the inspection target image; and a comparison unit that compares a design data prediction image generated by the design data prediction image generation unit and the design data image. As a result, it is possible to detect a systematic defect without using a defect image and generating misinformation frequently.

Journal ArticleDOI
TL;DR: Jastudy, a computer-assisted language learning (CALL) system designed specifically for Chinese-speaking learners studying Japanese functional expressions, and uses a ranking system, which gives easier sentences a higher rank, when selecting example sentences.
Abstract: Because a large number of Chinese characters are commonly used in both Japanese and Chinese, Chinese-speaking learners of Japanese as a second language (JSL) find it more challenging to learn Japanese functional expressions than to learn other Japanese vocabulary. To address this challenge, we have developed Jastudy, a computer-assisted language learning (CALL) system designed specifically for Chinese-speaking learners studying Japanese functional expressions. Given a Japanese sentence as an input, the system automatically detects Japanese functional expressions using a character-based bidirectional long short-term memory with a conditional random field (BiLSTM-CRF) model. The sentence is then segmented and the parts of speech (POS) are tagged (word segmentation and POS tagging) by a Japanese morphological analyzer, MeCab ( http://taku910.github.io/mecab/ ), trained using a CRF model. In addition, the system provides JSL learners with appropriate example sentences that illustrate Japanese functional expressions. The system uses a ranking system, which gives easier sentences a higher rank, when selecting example sentences. A support vector machine for ranking (SVMRank) algorithm estimates the readability of example sentences, using Japanese-Chinese common words as an important feature. A k-means clustering algorithm is used to cluster example sentences that contain functional expressions with the same meanings, based on part-of-speech, conjugation form, and semantic attributes. Finally, to evaluate the usefulness of the system, we have conducted experiments and reported on a preliminary user study involving Chinese-speaking JSL learners.

Journal ArticleDOI
17 Jul 2019
TL;DR: A browser-based scientific article search system based on triples of distributed representations of articles, each triple representing a scientific discourse facet (Objective, Method, or Result) using both text and citation information is presented.
Abstract: We present a browser-based scientific article search system with graphical visualization. This system is based on triples of distributed representations of articles, each triple representing a scientific discourse facet (Objective, Method, or Result) using both text and citation information. Because each facet of an article is encoded as a separate vector, the similarity between articles can be measured by considering the articles not only in their entirety but also on a facet-by-facet basis. Our system provides three search options: a similarity ranking search, a citation graph with facet-labeled edges, and a scatter plot visualization with facets as the axes.