Showing papers on "Character (mathematics) published in 2017"

PDF

Open Access

Journal Article•DOI•

Enriching Word Vectors with Subword Information

[...]

Piotr Bojanowski¹, Edouard Grave¹, Armand Joulin¹, Tomas Mikolov¹•Institutions (1)

12 Jun 2017-Transactions of the Association for Computational Linguistics

TL;DR: This paper proposed a new approach based on skip-gram model, where each word is represented as a bag of character n-grams, words being represented as the sum of these representations, allowing to train models on large corpora quickly and allowing to compute word representations for words that did not appear in the training data.

...read moreread less

Abstract: Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Popular models to learn such representations ignore the morphology of words, by assigning a distinct vector to each word. This is a limitation, especially for languages with large vocabularies and many rare words. In this paper, we propose a new approach based on the skipgram model, where each word is represented as a bag of character n-grams. A vector representation is associated to each character n-gram, words being represented as the sum of these representations. Our method is fast, allowing to train models on large corpora quickly and allows to compute word representations for words that did not appear in the training data. We evaluate our word representations on nine different languages, both on word similarity and analogy tasks. By comparing to recently proposed morphological word representations, we show that our vectors achieve state-of-the-art performance on these tasks.

...read moreread less

7,537 citations

Journal Article•DOI•

Phase-functioned neural networks for character control

[...]

Daniel Holden¹, Taku Komura¹, Jun Saito•Institutions (1)

University of Edinburgh¹

20 Jul 2017

TL;DR: A real-time character control mechanism using a novel neural network architecture called a Phase-Functioned Neural Network that takes as input user controls, the previous state of the character, the geometry of the scene, and automatically produces high quality motions that achieve the desired user control.

...read moreread less

Abstract: We present a real-time character control mechanism using a novel neural network architecture called a Phase-Functioned Neural Network. In this network structure, the weights are computed via a cyclic function which uses the phase as an input. Along with the phase, our system takes as input user controls, the previous state of the character, the geometry of the scene, and automatically produces high quality motions that achieve the desired user control. The entire network is trained in an end-to-end fashion on a large dataset composed of locomotion such as walking, running, jumping, and climbing movements fitted into virtual environments. Our system can therefore automatically produce motions where the character adapts to different geometric environments such as walking and running over rough terrain, climbing over large rocks, jumping over obstacles, and crouching under low ceilings. Our network architecture produces higher quality results than time-series autoregressive models such as LSTMs as it deals explicitly with the latent variable of motion relating to the phase. Once trained, our system is also extremely fast and compact, requiring only milliseconds of execution time and a few megabytes of memory, even when trained on gigabytes of motion data. Our work is most appropriate for controlling characters in interactive scenes such as computer games and virtual reality systems.

...read moreread less

440 citations

Proceedings Article•DOI•

chrF++: words helping character n-grams.

[...]

Maja Popović

01 Sep 2017

TL;DR: Character n-gram F-score (CHRF) is shown to correlate very well with human relative rankings of different machine translation outputs, especially for morphologically rich target languages, however, its relation with direct human assessments is not yet clear.

...read moreread less

Abstract: Character n-gram F-score (CHRF) is shown to correlate very well with human relative rankings of different machine translation outputs, especially for morphologically rich target languages. However, its relation with direct human assessments is not yet clear. In this work, Pearson’s correlation coefficients for direct assessments are investigated for two currently available target languages, English and Russian. First, different β parameters (in range from 1 to 3) are re-investigated with direct assessment, and it is confirmed that β = 2 is the optimal option. Then separate character and word n-grams are investigated, and the main finding is that, apart from character n-grams, word 1-grams and 2-grams also correlate rather well with direct assessments. Further experiments show that adding word unigrams and bigrams to the standard CHRF score improves the correlations with direct assessments, though it is still not clear which option is better, unigrams only (CHRF+) or unigrams and bigrams (CHRF++). This should be investigated in future work on more target languages.

...read moreread less

203 citations

Proceedings Article•DOI•

Detecting Hate Speech in Social Media

[...]

Shervin Malmasi¹, Marcos Zampieri²•Institutions (2)

Harvard University¹, University of Wolverhampton²

10 Nov 2017

TL;DR: The authors used character n-grams, word ngrams and word skip-gram features to detect hate speech in social media, while distinguishing this from general profanity and hate speech from each other.

...read moreread less

Abstract: In this paper we examine methods to detect hate speech in social media, while distinguishing this from general profanity. We aim to establish lexical baselines for this task by applying supervised classification methods using a recently released dataset annotated for this purpose. As features, our system uses character n-grams, word n-grams and word skip-grams. We obtain results of 78% accuracy in identifying posts across three classes. Results demonstrate that the main challenge lies in discriminating profanity and hate speech from each other. A number of directions for future work are discussed.

...read moreread less

189 citations

Proceedings Article•DOI•

WordSup: Exploiting Word Annotations for Character Based Text Detection

[...]

Han Hu¹, Chengquan Zhang², Yuxuan Luo³, Yuzhuo Wang, Junyu Han³, Errui Ding³ - Show less +2 more•Institutions (3)

Microsoft¹, Huazhong University of Science and Technology², Baidu³

22 Aug 2017

TL;DR: A weakly supervised framework that can utilize word annotations, either in tight quadrangles or the more loose bounding boxes, for character detector training is proposed, able to train a robust character detector by exploiting word annotations in the rich large-scale real scene text datasets, e.g. ICDAR15 and COCO-text.

...read moreread less

Abstract: Imagery texts are usually organized as a hierarchy of several visual elements, i.e. characters, words, text lines and text blocks. Among these elements, character is the most basic one for various languages such as Western, Chinese, Japanese, mathematical expression and etc. It is natural and convenient to construct a common text detection engine based on character detectors. However, training character detectors requires a vast of location annotated characters, which are expensive to obtain. Actually, the existing real text datasets are mostly annotated in word or line level. To remedy this dilemma, we propose a weakly supervised framework that can utilize word annotations, either in tight quadrangles or the more loose bounding boxes, for character detector training. When applied in scene text detection, we are thus able to train a robust character detector by exploiting word annotations in the rich large-scale real scene text datasets, e.g. ICDAR15 [19] and COCO-text [39]. The character detector acts as a key role in the pipeline of our text detection engine. It achieves the state-of-the-art performance on several challenging scene text detection benchmarks. We also demonstrate the flexibility of our pipeline by various scenarios, including deformed text detection and math expression recognition.

...read moreread less

164 citations

Book•

Character Strengths Interventions: A Field Guide for Practitioners

[...]

Ryan M. Niemiec

01 Jun 2017

TL;DR: The VIA Institute on Character Index as mentioned in this paper defined seven core concepts of the science of character, including behavioral traps, misconceptions, and strategies for applying character strengths in practice.

...read moreread less

Abstract: Dedication Foreword Preface Acknowledgements Chapter 1 Foundations of Strengths-Based Practice: Seven Core Concepts of the Science of Character Chapter 2 Signature Strengths: Research and Practice Chapter 3 Practice Essentials: Six Integration Strategies for a Strengths-Based Practice Chapter 4 Behavioral Traps, Misconceptions, and Strategies Chapter 5 Advanced Issues in Applying Character Strengths Chapter 6 Character Strength Spotlights: 24 Practitioner-Friendly Handouts Chapter 7 How to Apply Character Strengths Interventions Chapter 8 Research-Based Interventions for Character Strengths Chapter 9 Afterword References Appendix A Background on the VIA Classification of Character Strengths and the VIA Survey Appendix B Checklist for Strengths-Based Practitioners Appendix C A Sampling of Strengths-Based Models Appendix D Frequently Asked Questions About Character Strengths Appendix E Comparison of VIA Survey with StrengthsFinder (Gallup) and Myers-Briggs Type Indicator (MBTI) Appendix F Flagship Papers on Character Strengths Appendix G 10 Character Strengths Concepts and Applications in Specific Movies Appendix H About the VIA Institute on Character Index

...read moreread less

142 citations

Posted Content•

WordSup: Exploiting Word Annotations for Character based Text Detection

[...]

Han Hu¹, Chengquan Zhang², Yuxuan Luo³, Yuzhuo Wang, Junyu Han³, Errui Ding³ - Show less +2 more•Institutions (3)

Microsoft¹, Huazhong University of Science and Technology², Baidu³

22 Aug 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: Zhang et al. as discussed by the authors proposed a weakly supervised framework that can utilize word annotations, either in tight quadrangles or the more loose bounding boxes, for character detector training.

...read moreread less

Abstract: Imagery texts are usually organized as a hierarchy of several visual elements, i.e. characters, words, text lines and text blocks. Among these elements, character is the most basic one for various languages such as Western, Chinese, Japanese, mathematical expression and etc. It is natural and convenient to construct a common text detection engine based on character detectors. However, training character detectors requires a vast of location annotated characters, which are expensive to obtain. Actually, the existing real text datasets are mostly annotated in word or line level. To remedy this dilemma, we propose a weakly supervised framework that can utilize word annotations, either in tight quadrangles or the more loose bounding boxes, for character detector training. When applied in scene text detection, we are thus able to train a robust character detector by exploiting word annotations in the rich large-scale real scene text datasets, e.g. ICDAR15 and COCO-text. The character detector acts as a key role in the pipeline of our text detection engine. It achieves the state-of-the-art performance on several challenging scene text detection benchmarks. We also demonstrate the flexibility of our pipeline by various scenarios, including deformed text detection and math expression recognition.

...read moreread less

129 citations

Posted Content•

Detecting Hate Speech in Social Media

[...]

Shervin Malmasi¹, Marcos Zampieri²•Institutions (2)

Harvard University¹, University of Wolverhampton²

18 Dec 2017-arXiv: Computation and Language

TL;DR: This paper aims to establish lexical baselines for this task by applying supervised classification methods using a recently released dataset annotated for this purpose, and obtains results of 78% accuracy in identifying posts across three classes.

...read moreread less

94 citations

Journal Article•DOI•

On the complete perturbative solution of one-matrix models

[...]

A. D. Mironov¹, A. Morozov¹•Institutions (1)

Institute on Taxation and Economic Policy¹

10 Aug 2017-Physics Letters B

TL;DR: In this article, the authors summarize the recent results about complete solvability of Hermitian and rectangular complex matrix models, and show that the integrability and Virasoro constraints are simple corollaries, but no vice versa.

...read moreread less

94 citations

Patent•

Text named entity recognition method based on Bi-LSTM, CNN and CRF

[...]

Tang Siliang, Wu Fei, Ning Zhang, Hongliang Dai, Zhuang Yueting, Yin Zhang - Show less +2 more

19 Apr 2017

TL;DR: In this paper, a text named entity recognition method based on Bi-LSTM, CNN and CRF was proposed, which is an end-to-end model without the need of data preprocessing in the unmarked corpus with the exception of the pre-trained word vector.

...read moreread less

Abstract: The invention discloses a text named entity recognition method based on Bi-LSTM, CNN and CRF. The method includes the following steps: (1) using a convolutional nerve network to encode and convert information on text word character level to a character vector; (2) combining the character vector and word vector into a combination which, as an input, is transmitted to a bidirectional LSTM neural network to build a model for contextual information of every word; and (3) in the output end of the LSTM neural network, utilizing continuous conditional random fields to carry out label decoding to a whole sentence, and mark the entities in the sentence. The invention is an end-to-end model without the need of data pre-processing in the un-marked corpus with the exception of the pre-trained word vector, therefore the invention can be widely applied for statement marking of different languages and fields.

...read moreread less

93 citations

Book Chapter•DOI•

National Character: The Study of Modal Personality and Sociocultural Systems

[...]

Alex Inkeles

12 Jul 2017

Journal Article•DOI•

Toward a Framework of Leader Character in Organizations

[...]

Mary Crossan¹, Alyson Byrne², Gerard Seijts¹, Mark Reno¹, Lucas Monzani¹, Jeffrey Gandz¹ - Show less +2 more•Institutions (2)

University of Western Ontario¹, Memorial University of Newfoundland²

01 Nov 2017-Journal of Management Studies

TL;DR: In this paper, a framework of leader character is proposed, which provides rigor through a three-phase, multi-method approach involving 1,817 leaders, and relevance by using an engaged scholarship epistemology to validate the framework with practicing leaders.

...read moreread less

Abstract: While the construct of character is well grounded in philosophy, ethics, and more recently psychology, it lags in acceptance and legitimacy within management research and mainstream practice. Our research seeks to remedy this through four contributions. First, we offer a framework of leader character that provides rigor through a three-phase, multi-method approach involving 1,817 leaders, and relevance by using an engaged scholarship epistemology to validate the framework with practicing leaders. This framework highlights the theoretical underpinnings of the leader character model and articulates the character dimensions and elements that operate in concert to promote effective leadership. Second, we bring leader character into mainstream management research, extending the traditional competency and interpersonal focus on leadership to embrace the foundational component of leader character. In doing this, we articulate how leader character complements and strengthens several existing theories of leadership. Third, we extend the virtues-based approach to ethical decision making to the broader domain of judgment and decision making in support of pursuing individual and organization effectiveness. Finally, we offer promising directions for future research on leader character that will also serve the larger domain of leadership research. This article is protected by copyright. All rights reserved.

...read moreread less

Journal Article•DOI•

Correlators in tensor models from character calculus

[...]

A. D. Mironov¹, A. Morozov¹•Institutions (1)

Institute on Taxation and Economic Policy¹

10 Nov 2017-Physics Letters B

TL;DR: It is claimed that the 2 m -fold Gaussian correlators of rank r tensors are given by r -linear combinations of dimensions with the Young diagrams of size m, which emphasizes a close similarity between technical methods in matrix and tensor models and supports a hope to understand the emerging structures in very similar terms.

...read moreread less

Journal Article•DOI•

A tripartite taxonomy of character: Evidence for intrapersonal, interpersonal, and intellectual competencies in children

[...]

Daeun Park¹, Eli Tsukayama², Geoffrey P. Goodwin³, Sarah D. Patrick³, Angela L. Duckworth³ - Show less +1 more•Institutions (3)

Chungbuk National University¹, University of Southern California², University of Pennsylvania³

01 Jan 2017-Contemporary Educational Psychology

TL;DR: The findings support a tripartite taxonomy of character in the school context, and positive peer relations were most consistently predicted by interpersonal character, class participation by intellectual character, and report card grades by intrapersonal character.

...read moreread less

Journal Article•DOI•

The Fractional Occupation Number Weighted Density as a Versatile Analysis Tool for Molecules with a Complicated Electronic Structure

[...]

Christoph Bauer¹, Andreas Hansen¹, Stefan Grimme¹•Institutions (1)

University of Bonn¹

02 May 2017-Chemistry: A European Journal

TL;DR: The fractional occupation number weighted density analysis is explored as a general theoretical diagnostic for complicated electronic structures and opens a full quantum-mechanical, unbiased route to the automatic detection of errors in experimental protein X-ray structures, such as false protonation states or misplaced atoms.

...read moreread less

Abstract: The fractional occupation number weighted density (FOD) analysis is explored as a general theoretical diagnostic for complicated electronic structures. Its main feature is to provide robustly and quickly the information on the localization of "hot" (strongly correlated and chemically active) electrons in a molecule. We demonstrate its usage in four different prototypical applications: 1) As a new and fast measure of the biradical character of polycyclic aromatic hydrocarbons, 2) for the selection of active orbital spaces in multiconfigurational or complete active space self consistent field (MCSCF/CASSCF) treatments, 3) as a possibility to describe molecular-energy landscapes consistently in regions with varying biradical character, as exemplified by partial double-bond torsions, and 4) as a powerful visualization method for static electron correlation effects in large biomolecules in connection with an efficient semi-empirical tight-binding molecular orbital scheme. The last application opens a full quantum-mechanical, unbiased route to the automatic detection of errors in experimental protein X-ray structures, such as false protonation states or misplaced atoms. In the first example, the complete (unfragmented) quantum-chemical calculation of the FOD for an entire metalloprotein with more than 7500 atoms is described.

...read moreread less

Posted Content•

From Characters to Words to in Between: Do We Capture Morphology?

[...]

Clara Vania, Adam Lopez

26 Apr 2017-arXiv: Computation and Language

TL;DR: This article found that character representations are effective across typologies, and that a previously unstudied combination of character trigram representations composed with bi-LSTMs outperforms most others.

...read moreread less

Abstract: Words can be represented by composing the representations of subword units such as word segments, characters, and/or character n-grams. While such representations are effective and may capture the morphological regularities of words, they have not been systematically compared, and it is not understood how they interact with different morphological typologies. On a language modeling task, we present experiments that systematically vary (1) the basic unit of representation, (2) the composition of these representations, and (3) the morphological typology of the language modeled. Our results extend previous findings that character representations are effective across typologies, and we find that a previously unstudied combination of character trigram representations composed with bi-LSTMs outperforms most others. But we also find room for improvement: none of the character-level models match the predictive accuracy of a model with access to true morphological analyses, even when learned from an order of magnitude more data.

...read moreread less

Journal Article•DOI•

The Critical CoHA of a Quiver With Potential

[...]

Ben Davison¹•Institutions (1)

Institute of Science and Technology Austria¹

01 Jun 2017-Quarterly Journal of Mathematics

TL;DR: In this article, the Kontsevich-Soibelman construction of the cohomological Hall algebra (CoHA) of BPS states and Lusztig's construction of canonical bases for quantum enveloping algebras were studied.

...read moreread less

Abstract: Pursuing the similarity between the Kontsevich–Soibelman construction of the cohomological Hall algebra (CoHA) of BPS states and Lusztig's construction of canonical bases for quantum enveloping algebras, and the similarity between the integrality conjecture for motivic Donaldson–Thomas invariants and the PBW theorem for quantum enveloping algebras, we build a coproduct on the CoHA associated to a quiver with potential. We also prove a cohomological dimensional reduction theorem, further linking a special class of CoHAs with Yangians, and explaining how to connect the study of character varieties with the study of CoHAs.

...read moreread less

Clothing As Culture: Delineating National Character In Costume Prints, C. 1600-1650

[...]

Heather A Hughes

01 Jan 2017

TL;DR: The Clothing as Culture project as mentioned in this paper investigates the emergence and didactic functions of costume prints produced between 1600 and 1650, reframing them as artifacts of an era when clothing was considered the primary visual indicator of cultural difference.

...read moreread less

Abstract: At the turn of the seventeenth century, European printmakers began issuing single-sheet series portraying how people dressed in different parts of the world. These works are only briefly acknowledged in artists’ monographs—if such studies exist—or treated summarily in studies of fashion illustration, where their aims are insufficiently differentiated from those of fashion plates. This dissertation investigates the emergence and didactic functions of costume prints produced between 1600 and 1650, reframing them as artifacts of an era when clothing was considered the primary visual indicator of cultural difference. Collected in the albums of connoisseurs, affixed to the walls of alehouses, or incorporated into household objects, costume prints that pair national types with descriptions of customs and behaviors instructed viewers to read clothing as an index of civility, morality, and status. The project addresses the interplay between images and inscriptions, parallels with period texts, and the varied modes of reception. To acknowledge the fluid boundaries of early modern print culture, it encompasses a range of artists, audiences, and regions, chiefly the Low Countries, England, and France. Arranged chronologically and according to the geographic scope of each costume series, the dissertation traces how Europeans’ increasing knowledge of global sartorial diversity precipitated an intensified preoccupation with the role of dress in their own societies. In three chapters, the project considers how Pieter de Jode’s series of European costumes draw from the representational strategies of illustrated voyage accounts and from the principles of antiquarianism, cosmography, and geography; explores how allegories of the Twelve Months and the Four Continents rely on the premise of an inextricable bond between appearance and character to rank the peoples of the world; and examines the divergent attitudes toward luxury in English society by contrasting the demonization of French fashion in popular satires with Wenceslaus Hollar’s sensuous depictions of women’s attire. Through these studies, Clothing as Culture situates costume prints in the ongoing process of self-awareness about the capacity of clothing to constitute individual and collective identities in early modern Europe. Degree Type Dissertation Degree Name Doctor of Philosophy (PhD) Graduate Group History of Art First Advisor Larry Silver

...read moreread less

Posted Content•

Scene Text Recognition with Sliding Convolutional Character Models.

[...]

Fei Yin, Yichao Wu, Xu-Yao Zhang, Cheng-Lin Liu

06 Sep 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: The proposed scene text recognition method with character models on convolutional feature map bases on character models trained free of lexicon, and can recognize unknown words has a number of appealing properties.

...read moreread less

Abstract: Scene text recognition has attracted great interests from the computer vision and pattern recognition community in recent years. State-of-the-art methods use concolutional neural networks (CNNs), recurrent neural networks with long short-term memory (RNN-LSTM) or the combination of them. In this paper, we investigate the intrinsic characteristics of text recognition, and inspired by human cognition mechanisms in reading texts, we propose a scene text recognition method with character models on convolutional feature map. The method simultaneously detects and recognizes characters by sliding the text line image with character models, which are learned end-to-end on text line images labeled with text transcripts. The character classifier outputs on the sliding windows are normalized and decoded with Connectionist Temporal Classification (CTC) based algorithm. Compared to previous methods, our method has a number of appealing properties: (1) It avoids the difficulty of character segmentation which hinders the performance of segmentation-based recognition methods; (2) The model can be trained simply and efficiently because it avoids gradient vanishing/exploding in training RNN-LSTM based models; (3) It bases on character models trained free of lexicon, and can recognize unknown words. (4) The recognition process is highly parallel and enables fast recognition. Our experiments on several challenging English and Chinese benchmarks, including the IIIT-5K, SVT, ICDAR03/13 and TRW15 datasets, demonstrate that the proposed method yields superior or comparable performance to state-of-the-art methods while the model size is relatively small.

...read moreread less

Journal Article•DOI•

Distinguishing Relational Aspects of Character Strengths with Subjective and Psychological Well-being

[...]

Melanie Hausler¹, Melanie Hausler², Cornelia Strecker², Alexandra Huber¹, Alexandra Huber², Mirjam Brenner², Thomas Höge², Stefan Höfer¹ - Show less +4 more•Institutions (2)

Innsbruck Medical University¹, University of Innsbruck²

11 Jul 2017-Frontiers in Psychology

TL;DR: Out of the 24 character strengths, the happiness-related strengths were more likely to correlate with PWB and SWB than any other character strength and “Persistence” showed the highest correlation with the PWB aspect mastery.

...read moreread less

Abstract: Research has shown that character strengths are positively linked with well-being in general. However, there has not been a fine-grained analysis up to date. This study examines the individual relational aspects between the 24 character strengths, subjective well-being (SWB), and different aspects of psychological well-being (PWB) at two times of measurement (N=117). Results showed that overall the “good character” was significantly stronger related with PWB than with SWB. The character strength “hope” was at least moderately correlated with the PWB aspects meaning, optimism and autonomy, and “zest” with the PWB aspects relationships and engagement. “Persistence” showed the highest correlation with the PWB aspect mastery. Out of the 24 character strengths, the happiness-related strengths (hope, zest, gratitude, curiosity, and love) were more likely to correlate with PWB and SWB than any other character strength. This study offers a more fine-grained and thorough understanding of specific relational aspects between the 24 character strengths and a broad range of well-being aspects. Future studies should take up a detailed strategy when exploring relationships between character strengths and well-being.

...read moreread less

Journal Article•DOI•

The Varieties of Character and Some Implications for Character Education

[...]

Jason Baehr¹•Institutions (1)

Loyola Marymount University¹

22 Mar 2017-Journal of Youth and Adolescence

TL;DR: It is argued that “intellectual character education,” which emphasizes the development of intellectual virtues like curiosity, open-mindedness, and intellectual courage, is an underexplored but especially promising approach in this context.

...read moreread less

Abstract: The moral and civic dimensions of personal character have been widely recognized and explored. Recent work by philosophers, psychologists, and education theorists has drawn attention to two additional dimensions of character: intellectual character and "performance" character. This article sketches a "four-dimensional" conceptual model of personal character and some of the character strengths or "virtues" proper to each dimension. In addition to exploring how the dimensions of character are related to each other, the article also examines the implications of this account for character education undertaken in a youth or adolescent context. It is argued that "intellectual character education," which emphasizes the development of intellectual virtues like curiosity, open-mindedness, and intellectual courage, is an underexplored but especially promising approach in this context. The relationship between intellectual character education and traditional character education, which emphasizes the development of moral and civic virtues like kindness, generosity, and tolerance, is also explored.

...read moreread less

Proceedings Article•DOI•

Character-based Bidirectional LSTM-CRF with words and characters for Japanese Named Entity Recognition.

[...]

Shotaro Misawa¹, Motoki Taniguchi¹, Yasuhide Miura¹, Tomoko Ohkuma¹•Institutions (1)

Fuji Xerox¹

01 Sep 2017

TL;DR: Proposing a neural model for predicting a tag for each character using word and character information and demonstrating that this model outperforms the state-of-the-art neural English NER model in Japanese.

...read moreread less

Abstract: Recently, neural models have shown superior performance over conventional models in NER tasks. These models use CNN to extract sub-word information along with RNN to predict a tag for each word. However, these models have been tested almost entirely on English texts. It remains unclear whether they perform similarly in other languages. We worked on Japanese NER using neural models and discovered two obstacles of the state-of-the-art model. First, CNN is unsuitable for extracting Japanese sub-word information. Secondly, a model predicting a tag for each word cannot extract an entity when a part of a word composes an entity. The contributions of this work are (1) verifying the effectiveness of the state-of-the-art NER model for Japanese, (2) proposing a neural model for predicting a tag for each character using word and character information. Experimentally obtained results demonstrate that our model outperforms the state-of-the-art neural English NER model in Japanese.

...read moreread less

Proceedings Article•DOI•

Learning Character-level Compositionality with Visual Features

[...]

Frederick Liu¹, Han Lu, Chieh Lo¹, Graham Neubig²•Institutions (2)

Carnegie Mellon University¹, Nara Institute of Science and Technology²

17 Apr 2017

TL;DR: This article used a convolutional neural network (CNN) to produce a visual character embedding for Chinese, Japanese, and Korean text classification task and showed that the model learns to focus on the parts of characters that carry topical content, resulting in embeddings that are coherent in visual space.

...read moreread less

Abstract: Previous work has modeled the compositionality of words by creating character-level models of meaning, reducing problems of sparsity for rare words. However, in many writing systems compositionality has an effect even on the character-level: the meaning of a character is derived by the sum of its parts. In this paper, we model this effect by creating embeddings for characters based on their visual characteristics, creating an image for the character and running it through a convolutional neural network to produce a visual character embedding. Experiments on a text classification task demonstrate that such model allows for better processing of instances with rare characters in languages such as Chinese, Japanese, and Korean. Additionally, qualitative analyses demonstrate that our proposed model learns to focus on the parts of characters that carry topical content which resulting in embeddings that are coherent in visual space.

...read moreread less

Proceedings Article•DOI•

Bengali handwritten character recognition using deep convolutional neural network

[...]

Bishwajit Purkaystha¹, Tapos Datta¹, Saiful Islam¹•Institutions (1)

Shahjalal University of Science and Technology¹

01 Dec 2017

TL;DR: A convolutional deep model to recognize Bengali handwritten characters is proposed that first learnt a useful set of features by using kernels and local receptive fields, and then it has employed densely connected layers for the discrimination task.

...read moreread less

Abstract: Handwritten character recognition is a nontrivial task as it seeks to recognize the correct class for user independent handwritten characters. This problem becomes even more challenging for a highly stylized, morphologically complex, and potentially juxtapositional characters comprising language like Bengali. As a result, the improvements over the years in Bengali character recognition are significantly less as compared to the other languages. In this paper, we propose a convolutional deep model to recognize Bengali handwritten characters. We first learnt a useful set of features by using kernels and local receptive fields, and then we have employed densely connected layers for the discrimination task. Our system has been tested on BanglaLekha-Isolated dataset. It achieves 98.66% accuracy on numerals (10 character classes), 94.99% accuracy on vowels (11 character classes), 91.60% accuracy on compound letters (20 character classes), 91.23% accuracy on alphabets (50 character classes), and 89.93% accuracy on almost all Bengali characters (80 character classes). Most of the errors incurred by our model in recognition task are due to extreme proximity in shapes among characters. A significant number of errors was caused by the mislabeled, irrecoverably distorted, and illegal data examples.

...read moreread less

Proceedings Article•DOI•

Word-Context Character Embeddings for Chinese Word Segmentation

[...]

Hao Zhou¹, Zhenting Yu¹, Yue Zhang², Shujian Huang¹, Xinyu Dai¹, Jiajun Chen¹ - Show less +2 more•Institutions (2)

Nanjing University¹, Singapore University of Technology and Design²

01 Sep 2017

TL;DR: This work investigates training character embeddings on a word-based context in a similar way, showing that the simple method improves state-of-the-art neural word segmentation models significantly, beating tri-training baselines for leveraging auto-segmented data.

...read moreread less

Abstract: Neural parsers have benefited from automatically labeled data via dependency-context word embeddings We investigate training character embeddings on a word-based context in a similar way, showing that the simple method improves state-of-the-art neural word segmentation models significantly, beating tri-training baselines for leveraging auto-segmented data

...read moreread less

Proceedings Article•DOI•

An Extensive Empirical Evaluation of Character-Based Morphological Tagging for 14 Languages

[...]

Georg Heigold¹, Guenter Neumann, Josef van Genabith²•Institutions (2)

German Research Centre for Artificial Intelligence¹, Saarland University²

01 Apr 2017

TL;DR: This paper investigates neural character-based morphological tagging for languages with complex morphology and large tag sets and shows that the CNN based approach performs slightly worse and less consistently than the RNN based approach.

...read moreread less

Abstract: This paper investigates neural character-based morphological tagging for languages with complex morphology and large tag sets. Character-based approaches are attractive as they can handle rarely- and unseen words gracefully. We evaluate on 14 languages and observe consistent gains over a state-of-the-art morphological tagger across all languages except for English and French, where we match the state-of-the-art. We compare two architectures for computing character-based word vectors using recurrent (RNN) and convolutional (CNN) nets. We show that the CNN based approach performs slightly worse and less consistently than the RNN based approach. Small but systematic gains are observed when combining the two architectures by ensembling.

...read moreread less

Proceedings Article•DOI•

Multi-level language modeling and decoding for open vocabulary end-to-end speech recognition

[...]

Takaaki Hori¹, Shinji Watanabe¹, John R. Hershey¹•Institutions (1)

Mitsubishi Electric Research Laboratories¹

01 Dec 2017

TL;DR: A novel method for end-to-end ASR decoding with LMs at both the character and word level, which achieved 5.6 % WER for the Eval'92 test set using only the SI284 training set and WSJ text data, which is the best score reported on this benchmark.

...read moreread less

Abstract: We propose a combination of character-based and word-based language models in an end-to-end automatic speech recognition (ASR) architecture. In our prior work, we combined a character-based LSTM RNN-LM with a hybrid attention/connectionist temporal classification (CTC) architecture. The character LMs improved recognition accuracy to rival state-of-the-art DNN/HMM systems in Japanese and Mandarin Chinese tasks. Although a character-based architecture can provide for open vocabulary recognition, the character-based LMs generally under-perform relative to word LMs for languages such as English with a small alphabet, because of the difficulty of modeling Linguistic constraints across long sequences of characters. This paper presents a novel method for end-to-end ASR decoding with LMs at both the character and word level. Hypotheses are first scored with the character-based LM until a word boundary is encountered. Known words are then re-scored using the word-based LM, while the character-based LM provides for out-of-vocabulary scores. In a standard Wall Street Journal (WSJ) task, we achieved 5.6 % WER for the Eval'92 test set using only the SI284 training set and WSJ text data, which is the best score reported for end-to-end ASR systems on this benchmark.

...read moreread less

Journal Article•DOI•

Higher traces, noncommutative motives, and the categorified Chern character

[...]

Marc Hoyois¹, Sarah Scherotzke², Nicolò Sibilla³•Institutions (3)

Massachusetts Institute of Technology¹, University of Bonn², University of Kent³

17 Mar 2017-Advances in Mathematics

TL;DR: In this article, a categorization of the Chern character is proposed, which refines earlier work of Toen and Vezzosi and of Ganter and Kapranov, and shows that the secondary Chern character factors through secondary K-theory.

...read moreread less

Journal Article•DOI•

Measuring multi-configurational character by orbital entanglement

[...]

Christopher J. Stein¹, Markus Reiher¹•Institutions (1)

ETH Zurich¹

22 Feb 2017-Molecular Physics

TL;DR: In this article, a new orbital-entanglement-based multi-configurational diagnostic termed Zs(1) was proposed, which can be evaluated from a partially converged, but qualitatively correct, and therefore inexpensive density matrix renormalization group wave function.

...read moreread less

Abstract: One of the most critical tasks at the very beginning of a quantum chemical investigation is the choice of either a multi- or single-configurational method. Naturally, many proposals exist to define a suitable diagnostic of the multi-configurational character for various types of wave functions in order to assist this crucial decision. Here, we present a new orbital-entanglement-based multi-configurational diagnostic termed Zs(1). The correspondence of orbital entanglement and static (or non-dynamic) electron correlation permits the definition of such a diagnostic. We chose our diagnostic to meet important requirements such as well-defined limits for pure single-configurational and multi-configurational wave functions. The Zs(1) diagnostic can be evaluated from a partially converged, but qualitatively correct, and therefore inexpensive density matrix renormalisation group wave function as in our recently presented automated active orbital selection protocol. Its robustness and the fact that it can ...

...read moreread less

Predicting Bitcoin price fluctuation with Twitter sentiment analysis

[...]

Evita Stenqvist, Jacob Lönnö

01 Jan 2017

TL;DR: Programmatically deriving sentiment has been the topic of many a thesis: it’s application in analyzing 140 character sentences, to that of 400-word Hemingway sentences; the methods ranging from nai ...

...read moreread less

Abstract: Programmatically deriving sentiment has been the topic of many a thesis: it’s application in analyzing 140 character sentences, to that of 400-word Hemingway sentences; the methods ranging from nai ...

...read moreread less

Collapse