scispace - formally typeset
Search or ask a question

Showing papers by "Mark Johnson published in 2019"


Proceedings ArticleDOI
01 Oct 2019
TL;DR: The nocaps benchmark as discussed by the authors is a large-scale benchmark for object captioning, which consists of 166,100 human-generated captions describing 15,100 images from the Open Images validation and test sets.
Abstract: Image captioning models have achieved impressive results on datasets containing limited visual concepts and large amounts of paired image-caption training data. However, if these models are to ever function in the wild, a much larger variety of visual concepts must be learned, ideally from less supervision. To encourage the development of image captioning models that can learn visual concepts from alternative data sources, such as object detection datasets, we present the first large-scale benchmark for this task. Dubbed ‘nocaps’, for novel object captioning at scale, our benchmark consists of 166,100 human-generated captions describing 15,100 images from the Open Images validation and test sets. The associated training data consists of COCO image-caption pairs, plus Open Images image-level labels and object bounding boxes. Since Open Images contains many more classes than COCO, nearly 400 object classes seen in test images have no or very few associated training captions (hence, nocaps). We extend existing novel object captioning models to establish strong baselines for this benchmark and provide analysis to guide future work.

105 citations


Journal ArticleDOI
TL;DR: A passive localization system in which multiple sniffers monitor the WiFi traffic and locate the standard WiFi transmitters based on the time-of-arrival measurements is presented, and it is shown that the positioning accuracy is significantly improved over existing systems.
Abstract: Ubiquitous wireless indoor localization can be achieved by leveraging the widespread deployment of WiFi systems. Most existing WiFi-based localization solutions are based on received signal strength (RSS) fingerprinting, which requires a database of the RSS values in the application environment to be built and maintained. The latest 802.11ac WiFi standard offers channels with wide bandwidths, which enables accurate timing-based positioning. This paper presents a passive localization system in which multiple sniffers monitor the WiFi traffic and locate the standard WiFi transmitters based on the time-of-arrival measurements. Multiple implementation issues are addressed, including sniffer clock synchronization and hardware delay calibration. The proposed system is evaluated experimentally using a prototype developed by us. It is shown that the positioning accuracy is significantly improved over existing systems.

31 citations


Proceedings ArticleDOI
02 Aug 2019
TL;DR: This paper proposes an entailment score that exploits the new facts discovered by the link prediction model, and then form entailment graphs between relations, which are then used to predict improved link prediction scores.
Abstract: Link prediction and entailment graph induction are often treated as different problems. In this paper, we show that these two problems are actually complementary. We train a link prediction model on a knowledge graph of assertions extracted from raw text. We propose an entailment score that exploits the new facts discovered by the link prediction model, and then form entailment graphs between relations. We further use the learned entailments to predict improved link prediction scores. Our results show that the two tasks can benefit from each other. The new entailment score outperforms prior state-of-the-art results on a standard entialment dataset and the new link prediction scores show improvements over the raw link prediction scores.

21 citations


Proceedings ArticleDOI
01 Jul 2019
TL;DR: This paper investigated how external syntactic information can be used most effectively in the Semantic Role Labeling (SRL) task and showed that using a constituency representation as input features improves performance the most, achieving a new state-of-the-art for non-ensemble SRL models.
Abstract: There are many different ways in which external information might be used in a NLP task. This paper investigates how external syntactic information can be used most effectively in the Semantic Role Labeling (SRL) task. We evaluate three different ways of encoding syntactic parses and three different ways of injecting them into a state-of-the-art neural ELMo-based SRL sequence labelling model. We show that using a constituency representation as input features improves performance the most, achieving a new state-of-the-art for non-ensemble SRL models on the in-domain CoNLL’05 and CoNLL’12 benchmarks.

21 citations


Proceedings ArticleDOI
01 Jun 2019
TL;DR: This article showed that neural parsers can find EDITED disfluency nodes with an accuracy surpassing that of specialized disfluencies detection systems, thus making these specialized mechanisms unnecessary, and also investigated a modified loss function that puts more weight on EDITED nodes.
Abstract: This paper studies the performance of a neural self-attentive parser on transcribed speech. Speech presents parsing challenges that do not appear in written text, such as the lack of punctuation and the presence of speech disfluencies (including filled pauses, repetitions, corrections, etc.). Disfluencies are especially problematic for conventional syntactic parsers, which typically fail to find any EDITED disfluency nodes at all. This motivated the development of special disfluency detection systems, and special mechanisms added to parsers specifically to handle disfluencies. However, we show here that neural parsers can find EDITED disfluency nodes, and the best neural parsers find them with an accuracy surpassing that of specialized disfluency detection systems, thus making these specialized mechanisms unnecessary. This paper also investigates a modified loss function that puts more weight on EDITED nodes. It also describes tree-transformations that simplify the disfluency detection task by providing alternative encodings of disfluencies and syntactic information.

16 citations


Posted Content
TL;DR: It is shown here that neural Parsers can find EDITED disfluency nodes, and the best neural parsers find them with an accuracy surpassing that of specialized disfluencies detection systems, thus making these specialized mechanisms unnecessary.
Abstract: This paper studies the performance of a neural self-attentive parser on transcribed speech. Speech presents parsing challenges that do not appear in written text, such as the lack of punctuation and the presence of speech disfluencies (including filled pauses, repetitions, corrections, etc.). Disfluencies are especially problematic for conventional syntactic parsers, which typically fail to find any EDITED disfluency nodes at all. This motivated the development of special disfluency detection systems, and special mechanisms added to parsers specifically to handle disfluencies. However, we show here that neural parsers can find EDITED disfluency nodes, and the best neural parsers find them with an accuracy surpassing that of specialized disfluency detection systems, thus making these specialized mechanisms unnecessary. This paper also investigates a modified loss function that puts more weight on EDITED nodes. It also describes tree-transformations that simplify the disfluency detection task by providing alternative encodings of disfluencies and syntactic information.

6 citations


Posted Content
TL;DR: The authors investigated how external syntactic information can be used most effectively in the Semantic Role Labeling (SRL) task and showed that using a constituency representation as input features improves performance the most, achieving a new state-of-the-art for non-ensemble SRL models.
Abstract: There are many different ways in which external information might be used in an NLP task. This paper investigates how external syntactic information can be used most effectively in the Semantic Role Labeling (SRL) task. We evaluate three different ways of encoding syntactic parses and three different ways of injecting them into a state-of-the-art neural ELMo-based SRL sequence labelling model. We show that using a constituency representation as input features improves performance the most, achieving a new state-of-the-art for non-ensemble SRL models on the in-domain CoNLL'05 and CoNLL'12 benchmarks.

6 citations


Proceedings ArticleDOI
01 Jul 2019
TL;DR: All components, including speech recognition, natural language understanding, dialogue management, execution and text-to-speech, run locally on the embedded device and this simplifies deployment, minimizes server costs and most importantly, eliminates user privacy risks.
Abstract: This paper describes a spoken-language end-to-end task-oriented dialogue system for small embedded devices such as home appliances. While the current system implements a smart alarm clock with advanced calendar scheduling functionality, the system is designed to make it easy to port to other application domains (e.g., the dialogue component factors out domain-specific execution from domain-general actions such as requesting and updating slot values). The system does not require internet connectivity because all components, including speech recognition, natural language understanding, dialogue management, execution and text-to-speech, run locally on the embedded device (our demo uses a Raspberry Pi). This simplifies deployment, minimizes server costs and most importantly, eliminates user privacy risks. The demo video in alarm domain is here youtu.be/N3IBMGocvHU

5 citations


Proceedings ArticleDOI
01 Dec 2019
TL;DR: A discrete Fourier transform (DFT) based feature extraction technique is developed to generate input features which vary significantly across different classes of the classification problem, which translates the problem into a multi-class image classification problem.
Abstract: This paper considers the problem of stand-off detection of human presence and movement in indoor environments. We develop a novel approach using IEEE 802.11ac compressed beamforming reports (CBRs). In the proposed system, a sniffer device collects CBRs communicated between devices inside an indoor environment by listening to the IEEE 802.11ac channels. We translate the problem into a multi-class image classification problem. We develop a discrete Fourier transform (DFT) based feature extraction technique to generate input features which vary significantly across different classes of the classification problem. The pattern of these interclass variations of extracted features remains consistent across different indoor environments. The proposed system was trained and tested using measurements from offices, meeting rooms and lecture theatres. It achieved an accuracy higher than 90% even for rooms that were not included as part of the training set.

Proceedings ArticleDOI
01 Dec 2019
TL;DR: This paper considers target localization from ambient radio frequency (RF) signals transmitted by the target, and presents an algorithm that jointly locates the target and self-locates the receivers, and exploits time-difference-of-arrival (TDoA) between multipath components at each receiver.
Abstract: This paper considers target localization from ambient radio frequency (RF) signals transmitted by the target. We consider a new practical scenario with an array of asynchronous receivers deployed at arbitrary locations, and present an algorithm that jointly locates the target and self-locates the receivers. The approach exploits time-difference-of-arrival (TDoA) between multipath components (multipath TDoA) at each receiver. We derive lower bounds for the localization errors of all the target and receiver locations. The performance is verified numerically and demonstrated experimentally with a hardware implementation that was tested in an anechoic chamber using passive IEEE 802.11ac receivers. We have shown that sub-meter level accuracy can be achieved using 6 static receivers.