scispace - formally typeset
Search or ask a question

Showing papers on "Multimedia database published in 2019"


Journal ArticleDOI
TL;DR: A multimodal search engine that combines visual and textual cues to retrieve items from a multimedia database aesthetically similar to the query, dubbed DeepStyle, which outperforms baseline methods by more than 20% on tested datasets.
Abstract: In this paper, we propose a multimodal search engine that combines visual and textual cues to retrieve items from a multimedia database aesthetically similar to the query. The goal of our engine is to enable intuitive retrieval of fashion merchandise such as clothes or furniture. Existing search engines treat textual input only as an additional source of information about the query image and do not correspond to the real-life scenario, where the user looks for “the same shirt but of denim”. Our novel method, dubbed DeepStyle, mitigates those shortcomings by using a joint neural network architecture to model contextual dependencies between features of different modalities. We prove the robustness of this approach on two different challenging datasets of fashion items and furniture where our DeepStyle engine outperforms baseline methods by more than 20% on tested datasets. Our search engine is commercially deployed and available through a Web-based application.

41 citations


Journal ArticleDOI
TL;DR: An approach to automatically convert simple modern standard Arabic children’s stories to the best representative images that can efficiently illustrate the meaning of words is introduced.
Abstract: In this paper, we introduce an approach to automatically convert simple modern standard Arabic children’s stories to the best representative images that can efficiently illustrate the meaning of words. It is a kind of imitating the imaginative process when children read a story, yet a great challenge for a machine to achieve it. For simplification issues, we apply several techniques to find the images and we associate them with related words dynamically. First, we apply natural language processing techniques to analyze the text in stories and we extract keywords of all characters and events in each sentence. Second, we apply an image captioning process through a pre-trained deep learning model for all retrieved images from our multimedia database as well as the Google search engine. Third, using sentence similarities, most significant images are retrieved back by selecting top- $k$ highest similarity values. It is worth mentioning that using the captioning process, to rank top- $k$ images, has shown reasonable precision values as per our preliminary results. The option to refine or validate the ranked images to compose the final visualization for each story is also provided to ensure a flexible and safe learning environment.

14 citations


Book ChapterDOI
01 Jan 2019
TL;DR: A multimedia database of Chinese musical instruments, which includes, for each instrument, text descriptions, images, audio clips of playing techniques, music clips, videos of the craft process and recording process, and acoustic analysis materials is presented.
Abstract: Throughout history, more than 2000 Chinese musical instruments have existed or been historically recorded, they are of non-negligible importance in Chinese musicology. However, the public knows little about them. In this work, we present a multimedia database of Chinese musical instruments. This database includes, for each instrument, text descriptions, images, audio clips of playing techniques, music clips, videos of the craft process and recording process, and acoustic analysis materials. Motivation and selecting criteria of the database are introduced in detail. Potential applications based on this database are discussed, and we take the research on subjective auditory attributes of Chinese musical instruments as an example.

10 citations


Patent
Lu Guang1, Liu Shui1, Luo Xiajun, Ye Shiquan, Ju Qiang, Xie Jian 
16 May 2019
TL;DR: In this article, an intelligent playing method and apparatus based on preference feedback is described. But the method is not suitable for multimedia games and does not support the use of multimedia databases.
Abstract: The embodiments of the disclosure disclose an intelligent playing method and apparatus based on preference feedback. An embodiment of the method comprises: receiving voice feedback on currently played multimedia from a user; analyzing a user intention based on the voice feedback; calculating, in response to the user intention indicating updating a currently played multimedia list, a similarity between multimedia in a multimedia database and the current multimedia; and updating the currently played multimedia list based on the voice feedback and the similarity. The embodiment improves the quality and pertinence in playing multimedia.

3 citations


Proceedings ArticleDOI
16 May 2019
TL;DR: An approach to automatically convert modern standard Arabic children’s stories to the finest representative images that can efficiently illustrate the meaning of words is introduced.
Abstract: Children with learning difficulties (LD) are increasing dramatically in the Arab world. Such children require quick intervention especially during the early childhood years. This paper presents a dynamic multimedia system for helping children with LD to overcome their learning problems. We introduce an approach to automatically convert modern standard Arabic children’s stories to the finest representative images that can efficiently illustrate the meaning of words. Specifically, first, we apply natural language processing techniques to analyze the text in stories and we extract keywords of all characters and events in each sentence. Second, we apply an image captioning process through a pre-trained deep learning model for all retrieved images from our multimedia database as well as the Google search engine. Third, using sentence similarities, most significant images are retrieved back by selecting top highest similarity values. The proposed system aims to better enhance understanding, communications, and thinking skills for children with LD in elementary schools and special education centers.

3 citations


Proceedings ArticleDOI
19 Apr 2019
TL;DR: This paper presents an approach to dynamically transform simple Modern Standard Arabic children’s stories scripts to the best representative images that can illustrate efficiently the meaning of words and word senses.
Abstract: In this paper, we present an approach to dynamically transform simple Modern Standard Arabic children’s stories scripts to the best representative images that can illustrate efficiently the meaning of words and word senses. We connect formally multiple datasets involved in our framework. We then apply several techniques to find the images and associate them with related word senses. First, we apply natural language processing techniques to analyze the text in stories and we build a semantic representation of main characters and events in each paragraph. Second, we apply various query formulation techniques as a brief scenario to enhance image web search which showed better accuracy as per preliminary results. Third, most significant queries are chosen to retrieve a list of possible candidate images from our multimedia database and search engines (i.e., Google and Bing). Instructors can then select and validate the ranked contextual images to compose the final visualization for each paragraph.

1 citations


Book ChapterDOI
26 Jun 2019
TL;DR: This research, the WWW CBIR searching using Query by Approximate Shapes is based on decomposing a query into a graph of primitives, which stores in nodes the type of a primitive with its attributes and in edges the mutual position relations of connected nodes.
Abstract: The problem of matching two graphs has been considered by many researchers. Our research, the WWW CBIR searching using Query by Approximate Shapes is based on decomposing a query into a graph of primitives, which stores in nodes the type of a primitive with its attributes and in edges the mutual position relations of connected nodes. When the graphs of primitives are stored in a multimedia database used for World Wide Web CBIR searching, the methods of comparisons should be effective because of a very huge number of stored data. Finding such methods was a motivation for this research. In this initial research only the simplest methods are examined: NEH-based, random search-based and Greedy.

1 citations


Patent
09 Jul 2019
TL;DR: In this paper, a multimedia data processing method and system, a storage medium and mobile device are presented, which includes the following steps: obtaining multimedia data selected from a multimedia database; loading additional information to the multimedia data to generate an on-demand object, the additional information comprising at least one of the following types: a video, a picture, a character and a voice.
Abstract: The invention discloses a multimedia data processing method and system, a storage medium and mobile device. The method comprises the following steps: obtaining multimedia data selected from a multimedia database; loading additional information to the multimedia data to generate an on-demand object, the additional information comprising at least one of the following types: a video, a picture, a character and a voice; and loading the video-on-demand object to a video-on-demand platform, the video-on-demand platform being used for storing the video-on-demand object allowing the shared video-on-demand. The technical problem of poor user experience caused by the fact that an existing multimedia playing application does not support custom video-on-demand is solved.

Patent
Lu Guang1, Ye Shiquan1, Luo Xiajun1, Yin Xiangjie1
16 May 2019
TL;DR: In this article, the authors present a method and apparatus for playing multimedia based on matching between a lexeme of the voice playing request and a semantic slot to obtain semantic slot information of the request.
Abstract: The embodiments of the disclosure disclose a method and apparatus for playing multimedia. An embodiment of the method comprises: receiving a voice playing request inputted by a user; matching between a lexeme of the voice playing request and a semantic slot to obtain semantic slot information of the request; determining, based on a result of matching between multimedia in a multimedia database and the semantic slot information of the request, multimedia used for playing, and feeding back reply information to the voice playing request by voice; and playing the multimedia used for playing. The embodiment improves the accuracy of the voice interaction and the accuracy and pertinence in playing multimedia.

Patent
06 Sep 2019
TL;DR: In this paper, an interactive multimedia content broadcasting system consisting of at least one multimedia database storing multimedia contents and a plurality of characteristics associated with these contents, some of these characteristics being quantized attributes of multimedia contents is presented.
Abstract: The invention relates to an interactive multimedia content broadcasting system, said system comprising: - at least one multimedia database storing multimedia contents and a plurality of characteristics associated with these contents, some of these characteristics being quantized attributes of multimedia contents; - at least one sensor for measuring communication-wise cerebral activity so as to acquire one or more user related data sets; - a communication-wise media reader so as to read the multimedia contents; - a user database associating at least one multimedia content attribute with an information item for reaction characterization of at least one user; - at least one computing device adapted for the implementation of diverse processing, the media reader and/or the computer device(s) being adapted to choose for the user and broadcast thereto a multimedia content as a function of the database characterization information.

Patent
15 Jan 2019
TL;DR: In this article, a file management method, a mobile terminal and a computer-readable storage medium is described, which comprises the following steps: obtaining multimedia file records from a multimedia database, wherein the multimedia database stores records of all files on the mobile terminal.
Abstract: The invention discloses a file management method, a mobile terminal and a computer-readable storage medium. The method comprises the following steps: obtaining multimedia file records from a multimedia database, wherein the multimedia database stores records of all files on the mobile terminal. Generating a local database of a file manager according to the multimedia file record; detecting whetherthe file manager receives a file retrieval instruction; when the file retrieval instruction is detected, the local database is retrieved according to the file retrieval instruction and the retrievalresult is fed back to the user. Compared with the prior art, since only multimedia file records are stored in the established local database, the amount of data is greatly reduced compared with the multimedia database, so the retrieval result can be quickly obtained during the retrieval, the retrieval efficiency is improved, and the user experience is better.