Home
/
Authors
/
Su Mei Xi

Author

Su Mei Xi

Bio: Su Mei Xi is an academic researcher from University of Suwon. The author has contributed to research in topics: Feature detection (computer vision) & Feature extraction. The author has an hindex of 1, co-authored 8 publications receiving 10 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Image caption automatic generation method based on weighted feature

[...]

Su Mei Xi¹, Young Im Cho²•Institutions (2)

University of Suwon¹, Qilu University of Technology²

01 Oct 2013

TL;DR: A novel system which generates sentential annotations for general images by employing a weighted feature clustering algorithm on the semantic concept clusters of the image regions and establishing a relationship between clustering regions and semantic concepts according to the labeled images in the training set.

...read moreread less

Abstract: For people to use numerous images effectively on the web, technologies must be able to explain image contents and must be capable of searching for data that users need. Moreover, images must be described with natural sentences based not only on the names of objects contained in an image but also on their mutual relations. We propose a novel system which generates sentential annotations for general images. Firstly, a weighted feature clustering algorithm is employed on the semantic concept clusters of the image regions. For a given cluster, we determine relevant features based on their statistical distribution and assign greater weights to relevant features as compared to less relevant features. In this way the computing of clustering algorithm can avoid dominated by trivial relevant or irrelevant features. Then, the relationship between clustering regions and semantic concepts is established according to the labeled images in the training set. Under the condition of the new unlabeled image regions, we calculate the conditional probability of each semantic keyword and annotate the new images with maximal conditional probability. Experiments on the Corel image set show the effectiveness of the new algorithm.

...read moreread less

11 citations

Journal Article•DOI•

Improved Bimodal Speech Recognition Study Based on Product Hidden Markov Model

[...]

Su Mei Xi, Young Im Cho¹•Institutions (1)

University of Suwon¹

30 Sep 2013-The International Journal of Fuzzy Logic and Intelligent Systems

TL;DR: Experimental results show that compared with other bimodal speech recognition approaches, this approach obtains better speech recognition performance.

...read moreread less

Abstract: Recent years have been higher demands for automatic speech recognition (ASR) systems that are able to operate robustly in an acoustically noisy environment This paper proposes an improved product hidden markov model (HMM) used for bimodal speech recognition A two-dimensional training model is built based on dependently trained audio-HMM and visual-HMM, reflecting the asynchronous characteristics of the audio and video streams A weight coefficient is introduced to adjust the weight of the video and audio streams automatically according to differences in the noise environment Experimental results show that compared with other bimodal speech recognition approaches, this approach obtains better speech recognition performance

...read moreread less

1 citations

Journal Article•DOI•

Comparison of Application Effect of Natural Language Processing Techniques for Information Retrieval

[...]

Su Mei Xi, Young Im Cho

01 Nov 2012-Journal of Institute of Control Robotics and Systems

TL;DR: The experiment results show that basic natural language processing techniques with small calculated consumption and simple implementation help a small for information retrieval, so the role of natural language understanding may be larger in the question answering system, automatic abstract and information extraction.

...read moreread less

Abstract: In this paper, some applications of natural language processing techniques for information retrieval have been introduced, but the results are known not to be satisfied. In order to find the roles of some classical natural language processing techniques in information retrieval and to find which one is better we compared the effects with the various natural language techniques for information retrieval precision, and the experiment results show that basic natural language processing techniques with small calculated consumption and simple implementation help a small for information retrieval. Senior high complexity of natural language processing techniques with high calculated consumption and low precision can not help the information retrieval precision even harmful to it, so the role of natural language understanding may be larger in the question answering system, automatic abstract and information extraction.

...read moreread less

1 citations

Proceedings Article•DOI•

Distributed information retrieval for Multi-XML document based on keywords

[...]

Su Mei Xi, Young Im Cho¹•Institutions (1)

College of Information Technology¹

12 Nov 2012

TL;DR: This article proposed a new approach for automatically correcting queries over Multi-XML, called MXDR(Multi- XML Distributed Retrieval), which first classed multi-X ML documents by a clustering method, and elicited the common structure information of XML datasets.

...read moreread less

Abstract: This article proposed a new approach for automatically correcting queries over Multi-XML, called MXDR(Multi-XML Distributed Retrieval). We first classed multi-XML documents by a clustering method, and elicited the common structure information. Then generated certifiable structured queries by analyzing the given keywords query and the common structure information of XML datasets. We can evaluate the generated structured queries over the XML data sources with any existing structure search engine.

...read moreread less

Journal Article•DOI•

Study of Cross-media Retrieval Technique Based on Ontology

[...]

Su Mei Xi¹, Young Im Cho¹•Institutions (1)

University of Suwon¹

01 Dec 2012-The International Journal of Fuzzy Logic and Intelligent Systems

TL;DR: A new method for cross-media retrieval which uses ontology to organize different media objects and the experiment results show that the proposed method is effective in cross- media retrieval.

...read moreread less

Abstract: With the recent advances in information retrieval, cross-media retrieval has been attracting lot of attention, but several issues remain problems such as constructing effective correlations, calculating similarity between different kinds of media objects. To gain better cross-media retrieval performance, it is crucial to mine the semantic correlations among the heterogeneous multimedia data. This paper introduces a new method for cross-media retrieval which uses ontology to organize different media objects. The experiment results show that the proposed method is effective in cross-media retrieval.

...read moreread less

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Jointly Modeling Embedding and Translation to Bridge Video and Language

[...]

Yingwei Pan¹, Tao Mei², Ting Yao², Houqiang Li¹, Yong Rui² - Show less +1 more•Institutions (2)

University of Science and Technology of China¹, Microsoft²

27 Jun 2016

TL;DR: Liu et al. as discussed by the authors presented a unified framework, named Long Short-Term Memory with visual-semantic Embedding (LSTM-E), which can simultaneously explore the learning of LSTM and visualsemantic embedding.

...read moreread less

Abstract: Automatically describing video content with natural language is a fundamental challenge of computer vision. Re-current Neural Networks (RNNs), which models sequence dynamics, has attracted increasing attention on visual interpretation. However, most existing approaches generate a word locally with the given previous words and the visual content, while the relationship between sentence semantics and visual content is not holistically exploited. As a result, the generated sentences may be contextually correct but the semantics (e.g., subjects, verbs or objects) are not true. This paper presents a novel unified framework, named Long Short-Term Memory with visual-semantic Embedding (LSTM-E), which can simultaneously explore the learning of LSTM and visual-semantic embedding. The former aims to locally maximize the probability of generating the next word given previous words and visual content, while the latter is to create a visual-semantic embedding space for enforcing the relationship between the semantics of the entire sentence and visual content. The experiments on YouTube2Text dataset show that our proposed LSTM-E achieves to-date the best published performance in generating natural sentences: 45.3% and 31.0% in terms of BLEU@4 and METEOR, respectively. Superior performances are also reported on two movie description datasets (M-VAD and MPII-MD). In addition, we demonstrate that LSTM-E outperforms several state-of-the-art techniques in predicting Subject-Verb-Object (SVO) triplets.

...read moreread less

563 citations

Posted Content•

Jointly Modeling Embedding and Translation to Bridge Video and Language

[...]

Yingwei Pan¹, Tao Mei², Ting Yao², Houqiang Li¹, Yong Rui² - Show less +1 more•Institutions (2)

University of Science and Technology of China¹, Microsoft²

07 May 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: A novel unified framework, named Long Short-Term Memory with visual-semantic Embedding (LSTM-E), which can simultaneously explore the learning of LSTM and visual- semantic embedding and outperforms several state-of-the-art techniques in predicting Subject-Verb-Object (SVO) triplets.

...read moreread less

Abstract: Automatically describing video content with natural language is a fundamental challenge of multimedia. Recurrent Neural Networks (RNN), which models sequence dynamics, has attracted increasing attention on visual interpretation. However, most existing approaches generate a word locally with given previous words and the visual content, while the relationship between sentence semantics and visual content is not holistically exploited. As a result, the generated sentences may be contextually correct but the semantics (e.g., subjects, verbs or objects) are not true. This paper presents a novel unified framework, named Long Short-Term Memory with visual-semantic Embedding (LSTM-E), which can simultaneously explore the learning of LSTM and visual-semantic embedding. The former aims to locally maximize the probability of generating the next word given previous words and visual content, while the latter is to create a visual-semantic embedding space for enforcing the relationship between the semantics of the entire sentence and visual content. Our proposed LSTM-E consists of three components: a 2-D and/or 3-D deep convolutional neural networks for learning powerful video representation, a deep RNN for generating sentences, and a joint embedding model for exploring the relationships between visual content and sentence semantics. The experiments on YouTube2Text dataset show that our proposed LSTM-E achieves to-date the best reported performance in generating natural sentences: 45.3% and 31.0% in terms of BLEU@4 and METEOR, respectively. We also demonstrate that LSTM-E is superior in predicting Subject-Verb-Object (SVO) triplets to several state-of-the-art techniques.

...read moreread less

419 citations

Patent•

Semantic Natural Language Vector Space

[...]

Zhaowen Wang¹, Quanzeng You¹, Hailin Jin¹, Chen Fang¹•Institutions (1)

Adobe Systems¹

13 Jan 2016

TL;DR: In this article, instead of outputting results of caption analysis directly, the framework is adapted to output points in a semantic word vector space, which are not tied to particular words or a single dictionary.

...read moreread less

Abstract: Techniques for image captioning with word vector representations are described. In implementations, instead of outputting results of caption analysis directly, the framework is adapted to output points in a semantic word vector space. These word vector representations reflect distance values in the context of the semantic word vector space. In this approach, words are mapped into a vector space and the results of caption analysis are expressed as points in the vector space that capture semantics between words. In the vector space, similar concepts with have small distance values. The word vectors are not tied to particular words or a single dictionary. A post-processing step is employed to map the points to words and convert the word vector representations to captions. Accordingly, conversion is delayed to a later stage in the process.

...read moreread less

65 citations

Patent•

Image Captioning with Weak Supervision

[...]

Zhaowen Wang¹, Quanzeng You¹, Hailin Jin¹, Chen Fang¹•Institutions (1)

Adobe Systems¹

13 Jan 2016

TL;DR: In this article, weak supervision data for a target image is obtained and utilized to provide detail information that supplements global image concepts derived for image captioning, where weak supervision refers to noisy data that is not closely curated and may include errors.

...read moreread less

Abstract: Techniques for image captioning with weak supervision are described herein. In implementations, weak supervision data regarding a target image is obtained and utilized to provide detail information that supplements global image concepts derived for image captioning. Weak supervision data refers to noisy data that is not closely curated and may include errors. Given a target image, weak supervision data for visually similar images may be collected from sources of weakly annotated images, such as online social networks. Generally, images posted online include “weak” annotations in the form of tags, titles, labels, and short descriptions added by users. Weak supervision data for the target image is generated by extracting keywords for visually similar images discovered in the different sources. The keywords included in the weak supervision data are then employed to modulate weights applied for probabilistic classifications during image captioning analysis.

...read moreread less

29 citations

Book Chapter•DOI•

A Survey on Automatic Image Captioning

[...]

Gargi Srivastava¹, Rajeev Srivastava¹•Institutions (1)

Indian Institute of Technology (BHU) Varanasi¹

09 Jan 2018

TL;DR: The survey presents various techniques used by researchers for scene analysis performed on different image datasets, which helps to generate better image captions.

...read moreread less

Abstract: Automatic image captioning is the process of providing natural language captions for images automatically. Considering the huge number of images available in recent time, automatic image captioning is very beneficial in managing huge image datasets by providing appropriate captions. It also finds application in content based image retrieval. This field includes other image processing areas such as segmentation, feature extraction, template matching and image classification. It also includes the field of natural language processing. Scene analysis is a prominent step in automatic image captioning which is garnering the attention of many researchers. The better the scene analysis the better is the image understanding which further leads to generate better image captions. The survey presents various techniques used by researchers for scene analysis performed on different image datasets.

...read moreread less

13 citations