scispace - formally typeset
Search or ask a question

Showing papers by "Hiroyuki Shindo published in 2020"


Posted Content
TL;DR: New pretrained contextualized representations of words and entities based on the bidirectional transformer, and an entity-aware self-attention mechanism that considers the types of tokens (words or entities) when computing attention scores are proposed.
Abstract: Entity representations are useful in natural language tasks involving entities. In this paper, we propose new pretrained contextualized representations of words and entities based on the bidirectional transformer. The proposed model treats words and entities in a given text as independent tokens, and outputs contextualized representations of them. Our model is trained using a new pretraining task based on the masked language model of BERT. The task involves predicting randomly masked words and entities in a large entity-annotated corpus retrieved from Wikipedia. We also propose an entity-aware self-attention mechanism that is an extension of the self-attention mechanism of the transformer, and considers the types of tokens (words or entities) when computing attention scores. The proposed model achieves impressive empirical performance on a wide range of entity-related tasks. In particular, it obtains state-of-the-art results on five well-known datasets: Open Entity (entity typing), TACRED (relation classification), CoNLL-2003 (named entity recognition), ReCoRD (cloze-style question answering), and SQuAD 1.1 (extractive question answering). Our source code and pretrained representations are available at this https URL.

288 citations


Proceedings ArticleDOI
02 Oct 2020
TL;DR: This article proposed new pretrained contextualized representations of words and entities based on the bidirectional transformer, which treats words and entity in a given text as independent tokens, and outputs contextualised representations of them.
Abstract: Entity representations are useful in natural language tasks involving entities. In this paper, we propose new pretrained contextualized representations of words and entities based on the bidirectional transformer. The proposed model treats words and entities in a given text as independent tokens, and outputs contextualized representations of them. Our model is trained using a new pretraining task based on the masked language model of BERT. The task involves predicting randomly masked words and entities in a large entity-annotated corpus retrieved from Wikipedia. We also propose an entity-aware self-attention mechanism that is an extension of the self-attention mechanism of the transformer, and considers the types of tokens (words or entities) when computing attention scores. The proposed model achieves impressive empirical performance on a wide range of entity-related tasks. In particular, it obtains state-of-the-art results on five well-known datasets: Open Entity (entity typing), TACRED (relation classification), CoNLL-2003 (named entity recognition), ReCoRD (cloze-style question answering), and SQuAD 1.1 (extractive question answering). Our source code and pretrained representations are available at https://github.com/studio-ousia/luke.

142 citations


Proceedings ArticleDOI
01 Oct 2020
TL;DR: Wikipedia2Vec, a Python-based open-source tool for learning the embeddings of words and entities from Wikipedia, is presented and achieves a state-of-the-art result on the KORE entity relatedness dataset, and competitive results on various standard benchmark datasets.
Abstract: The embeddings of entities in a large knowledge base (e.g., Wikipedia) are highly beneficial for solving various natural language tasks that involve real world knowledge. In this paper, we present Wikipedia2Vec, a Python-based open-source tool for learning the embeddings of words and entities from Wikipedia. The proposed tool enables users to learn the embeddings efficiently by issuing a single command with a Wikipedia dump file as an argument. We also introduce a web-based demonstration of our tool that allows users to visualize and explore the learned embeddings. In our experiments, our tool achieved a state-of-the-art result on the KORE entity relatedness dataset, and competitive results on various standard benchmark datasets. Furthermore, our tool has been used as a key component in various recent studies. We publicize the source code, demonstration, and the pretrained embeddings for 12 languages at https://wikipedia2vec.github.io/.

140 citations


Posted Content
TL;DR: A new length-controllable abstractive summarization model that incorporates a word-level extractive module in the encoder-decoder model instead of length embeddings to generate an informative and length-controlled summary.
Abstract: We propose a new length-controllable abstractive summarization model. Recent state-of-the-art abstractive summarization models based on encoder-decoder models generate only one summary per source text. However, controllable summarization, especially of the length, is an important aspect for practical applications. Previous studies on length-controllable abstractive summarization incorporate length embeddings in the decoder module for controlling the summary length. Although the length embeddings can control where to stop decoding, they do not decide which information should be included in the summary within the length constraint. Unlike the previous models, our length-controllable abstractive summarization model incorporates a word-level extractive module in the encoder-decoder model instead of length embeddings. Our model generates a summary in two steps. First, our word-level extractor extracts a sequence of important words (we call it the "prototype text") from the source text according to the word-level importance scores and the length constraint. Second, the prototype text is used as additional input to the encoder-decoder model, which generates a summary by jointly encoding and copying words from both the prototype text and source text. Since the prototype text is a guide to both the content and length of the summary, our model can generate an informative and length-controlled summary. Experiments with the CNN/Daily Mail dataset and the NEWSROOM dataset show that our model outperformed previous models in length-controlled settings.

21 citations


Proceedings ArticleDOI
20 Mar 2020
TL;DR: A deep-learning-based D2DB inspection that can distinguish a defect deformation from a normal deformation by learning the luminosity distribution in normal images is proposed, and it is shown that this inspection can detect unseen defects.
Abstract: In the drive toward sub-10-nm semiconductor devices, manufacturers have been developing advanced lithography technologies such as extreme ultraviolet lithography and multiple patterning. However, these technologies can cause unexpected defects, and a high-speed inspection is thus required to cover the entire surface of a wafer. A Die-to-Database (D2DB) inspection is commonly known as a high-speed inspection. The D2DB inspection compares an inspection image with a design layout, so it does not require a reference image for comparing with the inspection image, unlike a die-to-die inspection, thereby achieving a high-speed inspection. However, conventional D2DB inspections suffer from erroneous detection because the manufacturing processes deform the circuit pattern from the design layout, and such deformations will be detected as defects. To resolve this issue, we propose a deep-learning-based D2DB inspection that can distinguish a defect deformation from a normal deformation by learning the luminosity distribution in normal images. Our inspection detects outliers of the learned luminosity distribution as defects. Because our inspection requires only normal images, we can train the model without defect images, which are difficult to obtain with enough variety. In this way, our inspection can detect unseen defects. Through experiments, we show that our inspection can detect only the defect region on an inspection image.

10 citations


Patent
05 Mar 2020
TL;DR: In this article, a machine learning-based pattern inspection system is proposed to inspect an image of an inspection target pattern of an electronic device using an identifier constituted by machine learning, based on the image of the inspection target patterns of the electronic device and data used to manufacture the inspected patterns.
Abstract: A pattern inspection system inspects an image of an inspection target pattern of an electronic device using an identifier constituted by machine learning, based on the image of the inspection target pattern of the electronic device and data used to manufacture the inspection target pattern The system includes a storage unit which stores a plurality of pattern images of the electronic device and pattern data used to manufacture a pattern of the electronic device, and an image selection unit which selects a learning pattern image used in the machine learning from the plurality of pattern images, based on the pattern data and the pattern image stored in the storage unit

1 citations


01 Apr 2020
TL;DR: This paper proposed a length-controllable abstractive summarization model, which incorporates a word-level extractive module in the encoder-decoder model instead of length embeddings.
Abstract: We propose a new length-controllable abstractive summarization model. Recent state-of-the-art abstractive summarization models based on encoder-decoder models generate only one summary per source text. However, controllable summarization, especially of the length, is an important aspect for practical applications. Previous studies on length-controllable abstractive summarization incorporate length embeddings in the decoder module for controlling the summary length. Although the length embeddings can control where to stop decoding, they do not decide which information should be included in the summary within the length constraint. Unlike the previous models, our length-controllable abstractive summarization model incorporates a word-level extractive module in the encoder-decoder model instead of length embeddings. Our model generates a summary in two steps. First, our word-level extractor extracts a sequence of important words (we call it the "prototype text") from the source text according to the word-level importance scores and the length constraint. Second, the prototype text is used as additional input to the encoder-decoder model, which generates a summary by jointly encoding and copying words from both the prototype text and source text. Since the prototype text is a guide to both the content and length of the summary, our model can generate an informative and length-controlled summary. Experiments with the CNN/Daily Mail dataset and the NEWSROOM dataset show that our model outperformed previous models in length-controlled settings.

1 citations


Proceedings ArticleDOI
01 Dec 2020
TL;DR: The method can identify the coordination boundaries without training on labeled data, and can be applied even if coordination structure annotations are not available, and is comparable to a recent supervised method when the coordinator conjoins simple noun phrases.
Abstract: We propose a simple method for nominal coordination boundary identification. As the main strength of our method, it can identify the coordination boundaries without training on labeled data, and can be applied even if coordination structure annotations are not available. Our system employs pre-trained word embeddings to measure the similarities of words and detects the span of coordination, assuming that conjuncts share syntactic and semantic similarities. We demonstrate that our method yields good results in identifying coordinated noun phrases in the GENIA corpus and is comparable to a recent supervised method for the case when the coordinator conjoins simple noun phrases.