scispace - formally typeset
Search or ask a question
Conference

International Conference on Asian Language Processing 

About: International Conference on Asian Language Processing is an academic conference. The conference publishes majorly in the area(s): Computer science & Machine translation. Over the lifetime, 900 publications have been published by the conference receiving 3497 citations.


Papers
More filters
Proceedings ArticleDOI
04 Dec 2014
TL;DR: The results of this work are an Indonesian POS tagset consisting of 23 tags and an Indonesian corpus of over 250.000 lexical tokens that have been manually tagged using this tagset.
Abstract: We describe our work on designing a linguistically principled part of speech (POS) tagset for the Indonesian language. The process involves a detailed study and analysis of existing tagsets and the manual tagging of an Indonesian corpus. The results of this work are an Indonesian POS tagset consisting of 23 tags and an Indonesian corpus of over 250.000 lexical tokens that have been manually tagged using this tagset.

76 citations

Proceedings ArticleDOI
01 Oct 2015
TL;DR: Topic2Vec as discussed by the authors proposes to learn topic representations in the same semantic vector space with words, as an alternative to probability distribution, which achieves interesting and meaningful results in many tasks.
Abstract: Latent Dirichlet Allocation (LDA) mining thematic structure of documents plays an important role in nature language processing and machine learning areas. However, the probability distribution from LDA only describes the statistical relationship of occurrences in the corpus and usually in practice, probability is not the best choice for feature representations. Recently, embedding methods have been proposed to represent words and documents by learning essential concepts and representations, such as Word2Vec and Doc2Vec. The embedded representations have shown more effectiveness than LDA-style representations in many tasks. In this paper, we propose the Topic2Vec approach which can learn topic representations in the same semantic vector space with words, as an alternative to probability distribution. The experimental results show that Topic2Vec achieves interesting and meaningful results.

69 citations

Proceedings ArticleDOI
01 Dec 2017
TL;DR: InSet, an Indonesian sentiment lexicon built to identify written opinion and categorize it into positive or negative opinion, which could be utilized to analyze public sentiment towards particular topic, event, or product is proposed.
Abstract: In this study, we propose InSet, an Indonesian sentiment lexicon built to identify written opinion and categorize it into positive or negative opinion, which could be utilized to analyze public sentiment towards particular topic, event, or product. Composed using collection of words from Indonesian tweet, InSet was constructed by manually weighting each words and enhanced by adding stemming and synonym set. As the result, we obtained 3,609 positive words and 6,609 negative words with score ranging between −5 and +5. Based on the experiment utilizing the InSet, our method outperforms other rarely found Indonesian lexicon that we used as baseline.

49 citations

Proceedings ArticleDOI
13 Nov 2012
TL;DR: A novel approach to automatic recognition of code-switching speech using parallel automatic speech recognizers for speech recognition and rescoring, which shows reduction in WER, when they are used for code switching speech recognition.
Abstract: In this paper, we propose a novel approach to automatic recognition of code-switching speech The proposed method consists of two phases: automatic speech recognition, and rescoring The framework uses parallel automatic speech recognizers for speech recognition The lattices produced are subsequently joined and rescored to estimate the most probable word sequence Experiment shows that the proposed approach reduction of more than 5% WER, when tested on English/Malay code switching speech In addition, the framework has shown to be very robust Besides, we also propose an acoustic model adaptation approach known as hybrid approach of interpolation and merging to cross adapt acoustic models of different languages to recognize code switching speech The adapted acoustic models show reduction in WER, when they are used for code switching speech recognition

46 citations

Proceedings ArticleDOI
01 Nov 2018
TL;DR: This study builds an Indonesian twitter dataset for emotion classification task for under-resourced language, especially Indonesian, and conducts feature engineering to decide the best feature in emotion classification.
Abstract: The rapid growth of Twitter usage attracts many researchers to utilize Twitter data for several purposes, including emotion analysis. However, there is a resource limitation in standard dataset for emotion analysis task for under-resourced language, especially Indonesian. In this study, we build an Indonesian twitter dataset for emotion classification task which is publicly available. In addition, we conduct feature engineering to decide the best feature in emotion classification. The features used in this research are lexicon-based, Bag-of-Words, word embeddings, orthography and Part-Of-Speech (POS)tag features. We test those features in two datasets with different characteristics. F1-score is employed as an evaluation metric. The results of our experiments show that implementing the combination of all proposed features in our built dataset can achieve 69.73% of F1-Score, which outperforms the baseline model by 26.64%.

43 citations

Performance
Metrics
No. of papers from the Conference in previous years
YearPapers
202285
202061
201981
201867
201787
201685