scispace - formally typeset
Search or ask a question

Showing papers by "Yahoo! published in 2016"


Journal ArticleDOI
TL;DR: This publicly available curated dataset of almost 100 million photos and videos is free and legal for all.
Abstract: This publicly available curated dataset of almost 100 million photos and videos is free and legal for all.

1,157 citations


Proceedings ArticleDOI
Chikashi Nobata1, Joel Tetreault1, Achint Oommen Thomas, Yashar Mehdad1, Yi Chang1 
11 Apr 2016
TL;DR: A machine learning based method to detect hate speech on online user comments from two domains which outperforms a state-of-the-art deep learning approach and a corpus of user comments annotated for abusive language, the first of its kind.
Abstract: Detection of abusive language in user generated online content has become an issue of increasing importance in recent years. Most current commercial methods make use of blacklists and regular expressions, however these measures fall short when contending with more subtle, less ham-fisted examples of hate speech. In this work, we develop a machine learning based method to detect hate speech on online user comments from two domains which outperforms a state-of-the-art deep learning approach. We also develop a corpus of user comments annotated for abusive language, the first of its kind. Finally, we use our detection tool to analyze abusive language over time and in different settings to further enhance our knowledge of this behavior.

945 citations


Journal ArticleDOI
TL;DR: This study aimed to investigate the incidence and mortality of breast cancer in the world using age-specific incidenceand mortality rates for the year 2012 acquired from the global cancer project (GLOBOCAN 2012) as well as data about incidence andortality of the cancer based on national reports.
Abstract: Breast cancer is the most common malignancy in women around the world. Information on the incidence and mortality of breast cancer is essential for planning health measures. This study aimed to investigate the incidence and mortality of breast cancer in the world using age-specific incidence and mortality rates for the year 2012 acquired from the global cancer project (GLOBOCAN 2012) as well as data about incidence and mortality of the cancer based on national reports. It was estimated that 1,671,149 new cases of breast cancer were identified and 521,907 cases of deaths due to breast cancer occurred in the world in 2012. According to GLOBOCAN, it is the most common cancer in women, accounting for 25.1% of all cancers. Breast cancer incidence in developed countries is higher, while relative mortality is greatest in less developed countries. Education of women is suggested in all countries for early detection and treatment. Plans for the control and prevention of this cancer must be a high priority for health policy makers; also, it is necessary to increase awareness of risk factors and early detection in less developed countries.

792 citations


Book ChapterDOI
08 Oct 2016
TL;DR: This work presents a generalized framework that encompasses a broad family of approaches and includes cross-dimensional pooling and weighting steps that boost the effect of highly active spatial responses and at the same time regulate burstiness effects.
Abstract: We propose a simple and straightforward way of creating powerful image representations via cross-dimensional weighting and aggregation of deep convolutional neural network layer outputs. We first present a generalized framework that encompasses a broad family of approaches and includes cross-dimensional pooling and weighting steps. We then propose specific non-parametric schemes for both spatial- and channel-wise weighting that boost the effect of highly active spatial responses and at the same time regulate burstiness effects. We experiment on different public datasets for image search and show that our approach outperforms the current state-of-the-art for approaches based on pre-trained networks. We also provide an easy-to-use, open source implementation that reproduces our results.

426 citations


Proceedings ArticleDOI
09 Mar 2016
TL;DR: In this paper, recursive recurrent neural networks with attention modeling (R2AM) were used for lexicon-free optical character recognition in natural scene images, and they achieved state-of-the-art performance on the Street View Text, IIIT5k, ICDAR and Synth90k.
Abstract: We present recursive recurrent neural networks with attention modeling (R2AM) for lexicon-free optical character recognition in natural scene images. The primary advantages of the proposed method are: (1) use of recursive convolutional neural networks (CNNs), which allow for parametrically efficient and effective image feature extraction, (2) an implicitly learned character-level language model, embodied in a recurrent neural network which avoids the need to use N-grams, and (3) the use of a soft-attention mechanism, allowing the model to selectively exploit image features in a coordinated way, and allowing for end-to-end training within a standard backpropagation framework. We validate our method with state-of-the-art performance on challenging benchmark datasets: Street View Text, IIIT5k, ICDAR and Synth90k.

333 citations


Posted Content
TL;DR: This work presents recursive recurrent neural networks with attention modeling (R2AM) for lexicon-free optical character recognition in natural scene images and validates the method with state-of-the-art performance on challenging benchmark datasets.
Abstract: We present recursive recurrent neural networks with attention modeling (R$^2$AM) for lexicon-free optical character recognition in natural scene images. The primary advantages of the proposed method are: (1) use of recursive convolutional neural networks (CNNs), which allow for parametrically efficient and effective image feature extraction; (2) an implicitly learned character-level language model, embodied in a recurrent neural network which avoids the need to use N-grams; and (3) the use of a soft-attention mechanism, allowing the model to selectively exploit image features in a coordinated way, and allowing for end-to-end training within a standard backpropagation framework. We validate our method with state-of-the-art performance on challenging benchmark datasets: Street View Text, IIIT5k, ICDAR and Synth90k.

327 citations


Journal ArticleDOI
01 May 2016
TL;DR: A series of recent advances in real-time WSANs for industrial control systems are reviewed, with a focus on cyber-physical codesign of wireless control systems that integrate wireless and control designs.
Abstract: With recent adoption of wireless sensor-actuator networks (WSANs) in industrial automation, industrial wireless control systems have emerged as a frontier of cyber-physical systems. Despite their success in industrial monitoring applications, existing WSAN technologies face significant challenges in supporting control systems due to their lack of real-time performance and dynamic wireless conditions in industrial plants. This article reviews a series of recent advances in real-time WSANs for industrial control systems: 1) real-time scheduling algorithms and analyses for WSANs; 2) implementation and experimentation of industrial WSAN protocols; 3) cyber-physical codesign of wireless control systems that integrate wireless and control designs; and 4) a wireless cyber-physical simulator for codesign and evaluation of wireless control systems. This article concludes by highlighting research directions in industrial cyber-physical systems.

317 citations


Proceedings ArticleDOI
23 May 2016
TL;DR: A streaming benchmark for three representative computation engines: Flink, Storm and Spark Streaming is developed and a performance comparison of the three data engines in terms of 99th percentile latency and throughput for various configurations is provided.
Abstract: Streaming data processing has been gaining attention due to its application into a wide range of scenarios. To serve the booming demands of streaming data processing, many computation engines have been developed. However, there is still a lack of real-world benchmarks that would be helpful when choosing the most appropriate platform for serving real-time streaming needs. In order to address this problem, we developed a streaming benchmark for three representative computation engines: Flink, Storm and Spark Streaming. Instead of testing speed-of-light event processing, we construct a full data pipeline using Kafka and Redis in order to more closely mimic the real-world production scenarios. Based on our experiments, we provide a performance comparison of the three data engines in terms of 99th percentile latency and throughput for various configurations.

268 citations


Journal ArticleDOI
TL;DR: A new high UAD-producing alginate lyase, AlySY08, has been purified from the marine bacterium Vibrio sp.
Abstract: Unsaturated alginate disaccharides (UADs), enzymatically derived from the degradation of alginate polymers, are considered powerful antioxidants. In this study, a new high UAD-producing alginate lyase, AlySY08, has been purified from the marine bacterium Vibrio sp. SY08. AlySY08, with a molecular weight of about 33 kDa and a specific activity of 1070.2 U/mg, showed the highest activity at 40 °C in phosphate buffer at pH 7.6. The enzyme was stable over a broad pH range (6.0-9.0) and retained about 75% activity after incubation at 40 °C for 2 h. Moreover, the enzyme was active in the absence of salt ions and its activity was enhanced by the addition of NaCl and KCl. AlySY08 resulted in an endo-type alginate lyase that degrades both polyM and polyG blocks, yielding UADs as the main product (81.4% of total products). All these features made AlySY08 a promising candidate for industrial applications in the production of antioxidants from alginate polysaccharides.

250 citations


Journal ArticleDOI
TL;DR: A deeper functional, biochemical and molecular characterization of the various EV classes might identify more selective clinical markers, and significantly advance the knowledge of the pathogenesis and disease progression of many cancer types.
Abstract: Extracellular Vesicles (EVs) have received considerable attention in recent years, both as mediators of intercellular communication pathways that lead to tumor progression, and as potential sources for discovery of novel cancer biomarkers. For many years, research on EVs has mainly investigated either the mechanism of biogenesis and cargo selection and incorporation, or the methods of EV isolation from available body fluids for biomarker discovery. Recent studies have highlighted the existence of different populations of cancer-derived EVs, with distinct molecular cargo, thus pointing to the possibility that the various EV populations might play diverse roles in cancer and that this does not happen randomly. However, data attributing cancer specific intercellular functions to given populations of EVs are still limited. A deeper functional, biochemical and molecular characterization of the various EV classes might identify more selective clinical markers, and significantly advance our knowledge of the pathogenesis and disease progression of many cancer types.

244 citations


Journal ArticleDOI
TL;DR: A review of mining signed networks in the context of social media and discuss some promising research directions and new frontiers can be found in this article, where the authors classify and review tasks of signed network mining with representative algorithms.
Abstract: Many real-world relations can be represented by signed networks with positive and negative links, as a result of which signed network analysis has attracted increasing attention from multiple disciplines. With the increasing prevalence of social media networks, signed network analysis has evolved from developing and measuring theories to mining tasks. In this article, we present a review of mining signed networks in the context of social media and discuss some promising research directions and new frontiers. We begin by giving basic concepts and unique properties and principles of signed networks. Then we classify and review tasks of signed network mining with representative algorithms. We also delineate some tasks that have not been extensively studied with formal definitions and also propose research directions to expand the field of signed network mining.

Proceedings ArticleDOI
Yashar Mehdad1, Joel Tetreault1
01 Sep 2016
TL;DR: This study investigates the effectiveness of character-based features for abusive language detection in user-generated online comments, and shows that such methods outperform previous state-of-theart approaches and other strong baselines.
Abstract: Although word and character n-grams have been used as features in different NLP applications, no systematic comparison or analysis has shown the power of character-based features for detecting abusive language. In this study, we investigate the effectiveness of such features for abusive language detection in user-generated online comments, and show that such methods outperform previous state-of-theart approaches and other strong baselines.

Proceedings ArticleDOI
01 Jan 2016
TL;DR: This work proposes a novel algorithm to incorporate boosting weights into the deep learning architecture based on least squares objective function and shows that it is possible to use networks of different structures within the proposed boosting framework and BoostCNN is able to select the best network structure in each iteration.
Abstract: In this work, we propose a new algorithm for boosting Deep Convolutional Neural Networks (BoostCNN) to combine the merits of boosting and modern neural networks. To learn this new model, we propose a novel algorithm to incorporate boosting weights into the deep learning architecture based on least squares objective function. We also show that it is possible to use networks of different structures within the proposed boosting framework and BoostCNN is able to select the best network structure in each iteration. This not only results in superior performance but also reduces the required manual effort for finding the right network structure. Experiments show that the proposed method is able to achieve state-of-the-art performance on several fine-grained classification tasks such as bird, car, and aircraft classification.

Proceedings ArticleDOI
27 Jun 2016
TL;DR: Tang et al. as mentioned in this paper collected a new dataset, Tumblr GIF (TGIF), with 100k animated GIFs from Tumblr and 120k natural language descriptions obtained via crowdsourcing, which they used for image sequence description.
Abstract: With the recent popularity of animated GIFs on social media, there is need for ways to index them with rich meta-data. To advance research on animated GIF understanding, we collected a new dataset, Tumblr GIF (TGIF), with 100K animated GIFs from Tumblr and 120K natural language descriptions obtained via crowdsourcing. The motivation for this work is to develop a testbed for image sequence description systems, where the task is to generate natural language descriptions for animated GIFs or video clips. To ensure a high quality dataset, we developed a series of novel quality controls to validate free-form text input from crowd-workers. We show that there is unambiguous association between visual content and natural language descriptions in our dataset, making it an ideal benchmark for the visual content captioning task. We perform extensive statistical analyses to compare our dataset to existing image and video description datasets. Next, we provide baseline results on the animated GIF description task, using three representative techniques: nearest neighbor, statistical machine translation, and recurrent neural networks. Finally, we show that models fine-tuned from our animated GIF description dataset can be helpful for automatic movie description.

Proceedings Article
12 Feb 2016
TL;DR: A machine-learning model is presented that achieves high performance in predicting clickbaits and shows that the degree of informality of a web-page (as measured by different metrics) is a strong indicator of it being a clickbait.
Abstract: Clickbaits are articles with misleading titles, exaggerating the content on the landing page. Their goal is to entice users to click on the title in order to monetize the landing page. The content on the landing page is usually of low quality. Their presence in user homepage stream of news aggregator sites (e.g., Yahoo news, Google news) may adversely impact user experience. Hence, it is important to identify and demote or block them on homepages. In this paper, we present a machine-learning model to detect clickbaits. We use a variety of features and show that the degree of informality of a web-page (as measured by different metrics) is a strong indicator of it being a clickbait. We conduct extensive experiments to evaluate our approach and analyze properties of clickbait and non-clickbait articles. Our model achieves high performance (74.9% F-1 score) in predicting clickbaits.

Journal ArticleDOI
TL;DR: A LogDet divergence-based metric learning with triplet constraint model which can learn Mahalanobis matrix with high precision and robustness is established.
Abstract: Multivariate time series (MTS) datasets broadly exist in numerous fields, including health care, multimedia, finance, and biometrics. How to classify MTS accurately has become a hot research topic since it is an important element in many computer vision and pattern recognition applications. In this paper, we propose a Mahalanobis distance-based dynamic time warping (DTW) measure for MTS classification. The Mahalanobis distance builds an accurate relationship between each variable and its corresponding category. It is utilized to calculate the local distance between vectors in MTS. Then we use DTW to align those MTS which are out of synchronization or with different lengths. After that, how to learn an accurate Mahalanobis distance function becomes another key problem. This paper establishes a LogDet divergence-based metric learning with triplet constraint model which can learn Mahalanobis matrix with high precision and robustness. Furthermore, the proposed method is applied on nine MTS datasets selected from the University of California, Irvine machine learning repository and Robert T. Olszewski’s homepage, and the results demonstrate the improved performance of the proposed approach.

Journal ArticleDOI
10 Dec 2016-Foods
TL;DR: This chapter describes the use of different plant and vegetable food residues asNutraceuticals and functional foods and their uses are well addressed along with their disease management and their action as nutraceutical delivery vehicles.
Abstract: This chapter describes the use of different plant and vegetable food residues as nutraceuticals and functional foods. Different nutraceuticals are mentioned and explained. Their uses are well addressed along with their disease management and their action as nutraceutical delivery vehicles.

Journal ArticleDOI
TL;DR: A machine learning system to compose fashion outfits automatically to score fashion outfit candidates based on the appearances and metadata and achieves an AUC of 85% for the scoring component, and an accuracy of 77% for a constrained composition task.
Abstract: Composing fashion outfits involves deep understanding of fashion standards while incorporating creativity for choosing multiple fashion items (e.g., Jewelry, Bag, Pants, Dress). In fashion websites, popular or high-quality fashion outfits are usually designed by fashion experts and followed by large audiences. In this paper, we propose a machine learning system to compose fashion outfits automatically. The core of the proposed automatic composition system is to score fashion outfit candidates based on the appearances and meta-data. We propose to leverage outfit popularity on fashion oriented websites to supervise the scoring component. The scoring component is a multi-modal multi-instance deep learning system that evaluates instance aesthetics and set compatibility simultaneously. In order to train and evaluate the proposed composition system, we have collected a large scale fashion outfit dataset with 195K outfits and 368K fashion items from Polyvore. Although the fashion outfit scoring and composition is rather challenging, we have achieved an AUC of 85% for the scoring component, and an accuracy of 77% for a constrained composition task.

Proceedings ArticleDOI
13 Aug 2016
TL;DR: This paper introduces three key techniques for base relevance -- ranking functions, semantic matching features and query rewriting, and describes solutions for recency sensitive relevance and location sensitive relevance.
Abstract: Search engines play a crucial role in our daily lives. Relevance is the core problem of a commercial search engine. It has attracted thousands of researchers from both academia and industry and has been studied for decades. Relevance in a modern search engine has gone far beyond text matching, and now involves tremendous challenges. The semantic gap between queries and URLs is the main barrier for improving base relevance. Clicks help provide hints to improve relevance, but unfortunately for most tail queries, the click information is too sparse, noisy, or missing entirely. For comprehensive relevance, the recency and location sensitivity of results is also critical. In this paper, we give an overview of the solutions for relevance in the Yahoo search engine. We introduce three key techniques for base relevance -- ranking functions, semantic matching features and query rewriting. We also describe solutions for recency sensitive relevance and location sensitive relevance. This work builds upon 20 years of existing efforts on Yahoo search, summarizes the most recent advances and provides a series of practical relevance solutions. The performance reported is based on Yahoo's commercial search engine, where tens of billions of urls are indexed and served by the ranking system.

Journal ArticleDOI
TL;DR: An analysis of humans’ perceptions of formality in four different genres is performed and used to develop a statistical model for predicting formality, which is evaluated under different feature settings and genres.
Abstract: This paper presents an empirical study of linguistic formality. We perform an analysis of humans’ perceptions of formality in four different genres. These findings are used to develop a statistical model for predicting formality, which is evaluated under different feature settings and genres. We apply our model to an investigation of formality in online discussion forums, and present findings consistent with theories of formality and linguistic coordination.

Journal ArticleDOI
TL;DR: The role of microbial biofilms in the etiology of female UTI and different male prostatitis syndromes, their consequences, as well as the challenges for therapy are presented.
Abstract: Urinary tract infections (UTIs) are one of the most important causes of morbidity and health care spending affecting persons of all ages. Bacterial biofilms play an important role in UTIs, responsible for persistent infections leading to recurrences and relapses. UTIs associated with microbial biofilms developed on catheters account for a high percentage of all nosocomial infections and are the most common source of Gram-negative bacteremia in hospitalized patients. The purpose of this mini-review is to present the role of microbial biofilms in the etiology of female UTI and different male prostatitis syndromes, their consequences, as well as the challenges for therapy.

Journal ArticleDOI
TL;DR: In this article, the authors explore the relationship between antibiotic concentrations in wastewater before wastewater treatment and quantities of antibiotics used in the rural hospital, over a period of one year in 2013, and find that significant concentrations of antibiotics were present in the wastewater both before and after wastewater treatment of both the rural and the urban hospital.
Abstract: Hospital effluents represent an important source for the release of antibiotics and antibiotic resistant bacteria into the environment. This study aims to determine concentrations of various antibiotics in wastewater before and after wastewater treatment in a rural hospital (60 km from the center of Hanoi) and in an urban hospital (in the center of Hanoi) in Vietnam, and it aims to explore the relationship between antibiotic concentrations in wastewater before wastewater treatment and quantities of antibiotics used in the rural hospital, over a period of one year in 2013. Water samples were collected using continuous sampling for 24 h in the last week of every month. The data on quantities of antibiotics delivered to all inpatient wards were collected from the Pharmacy department in the rural hospital. Solid-phase extraction and high performance liquid chromatography-tandem mass spectrometry were used for chemical analysis. Significant concentrations of antibiotics were present in the wastewater both before and after wastewater treatment of both the rural and the urban hospital. Ciprofloxacin was detected at the highest concentrations in the rural hospital’s wastewater (before treatment: mean = 42.8 µg/L; after treatment: mean = 21.5 µg/L). Metronidazole was detected at the highest concentrations in the urban hospital’s wastewater (before treatment: mean = 36.5 µg/L; after treatment: mean = 14.8 µg/L). A significant correlation between antibiotic concentrations in wastewater before treatment and quantities of antibiotics used in the rural hospital was found for ciprofloxacin (r = 0.78; p = 0.01) and metronidazole (r = 0.99; p < 0.001).

Journal ArticleDOI
20 Sep 2016-Viruses
TL;DR: Hepatitis E virus genotypes 1 and 2 cause epidemic and endemic diseases in resource poor countries, primarily spreading through contaminated drinking water, while HEV genotypes 3 and 4 cause autochthonous infections in developed, and many developing countries, by means of a unique zoonotic food-borne transmission.
Abstract: Hepatitis E virus (HEV), an RNA virus of the Hepeviridae family, has marked heterogeneity. While all five HEV genotypes can cause human infections, genotypes HEV-1 and -2 infect humans alone, genotypes HEV-3 and -4 primarily infect pigs, boars and deer, and genotype HEV-7 primarily infects dromedaries. The global distribution of HEV has distinct epidemiological patterns based on ecology and socioeconomic factors. In resource-poor countries, disease presents as large-scale waterborne epidemics, and few epidemics have spread through person-to-person contact; however, endemic diseases within these countries can potentially spread through person-to-person contact or fecally contaminated water and foods. Vertical transmission of HEV from infected mother to fetus causes high fetal and perinatal mortality. Other means of transmission, such as zoonotic transmission, can fluctuate depending upon the region and strain of the virus. For instance, zoonotic transmission can sometimes play an insignificant role in human infections, such as in India, where human and pig HEV infections are unrelated. However, recently China and Southeast Asia have experienced a zoonotic spread of HEV-4 from pigs to humans and this has become the dominant mode of transmission of hepatitis E in eastern China. Zoonotic HEV infections in humans occur by eating undercooked pig flesh, raw liver, and sausages; through vocational contact; or via pig slurry, which leads to environmental contamination of agricultural products and seafood. Lastly, blood transfusion-associated HEV infections occur in many countries and screening of donors for HEV RNA is currently under serious consideration. To summarize, HEV genotypes 1 and 2 cause epidemic and endemic diseases in resource poor countries, primarily spreading through contaminated drinking water. HEV genotypes 3 and 4 on the other hand, cause autochthonous infections in developed, and many developing countries, by means of a unique zoonotic food-borne transmission.

Journal ArticleDOI
TL;DR: In this paper, the authors investigate whether the existence of a separate risk committee and risk committee characteristics are associated with market risk disclosures and test whether the role of a risk committee in affecting market risk disclosure varies for different firm life cycle stages.
Abstract: Manuscript Type Empirical Research Question/Issue This study investigates whether the existence of a separate risk committee and risk committee characteristics are associated with market risk disclosures. It also tests whether the role of a risk committee in affecting market risk disclosures varies for different firm life cycle stages. Research Findings/Insights Using 677 firm-year observations of financial firms from Gulf Cooperation Council (GCC) countries during the years 2007–2011, we find that firms with a separate risk committee are associated with greater market risk disclosures, an effect that is more pronounced for mature-stage firms. Furthermore, findings suggest that risk committee qualifications and size have a significant positive impact on market risk disclosures. Theoretical/Academic Implications This study complements the corporate governance literature by incorporating agency theory, legitimacy theory, stakeholder theory, and the resource-based theory to provide more robust evidence of the impact of a separate risk committee and the firm life cycle on market risk disclosures. Our results support the monitoring effect of a separate risk committee and suggest that a separate risk committee can improve “firm-level corporate governance” in the GCC countries characterized by a poor informational environment. Practitioner/Policy Implications Findings from this study provide evidence that the existence, qualifications, and size of risk committees may be used as a channel to improve the disclosure level, suggesting a policy prescription for regulators and policymakers. Investors may also find these results useful in forming their own expectations about firm-level risk disclosures.

Proceedings ArticleDOI
01 Jan 2016
TL;DR: A principled way to mathematically model positive and negative links simultaneously is provided and a novel framework NCSSN for node classification in signed social networks is proposed.
Abstract: Node classification in social networks has been proven to be useful in many real-world applications. The vast majority of existing algorithms focus on unsigned social networks (or social networks with only positive links), while little work exists for signed social networks. It is evident from recent developments in signed social network analysis that negative links have added value over positive links. Therefore, the incorporation of negative links has the potential to benefit various analytical tasks. In this paper, we study the novel problem of node classification in signed social networks. We provide a principled way to mathematically model positive and negative links simultaneously and propose a novel framework NCSSN for node classification in signed social networks. Experimental results on real-world signed social network datasets demonstrate the effectiveness of the proposed framework NCSSN. Further experiments are conducted to gain a deeper understanding of the importance of negative links for NCSSN.

Journal ArticleDOI
TL;DR: In this paper, the authors applied a new methodology that relies on tagging information of georeferenced pictures to the cities of London and Barcelona to capture both unpleasant and pleasant sounds, and studied the relationship between soundscapes and emotions.
Abstract: Urban sound has a huge influence over how we perceive places. Yet, city planning is concerned mainly with noise, simply because annoying sounds come to the attention of city officials in the form of complaints, whereas general urban sounds do not come to the attention as they cannot be easily captured at city scale. To capture both unpleasant and pleasant sounds, we applied a new methodology that relies on tagging information of georeferenced pictures to the cities of London and Barcelona. To begin with, we compiled the first urban sound dictionary and compared it with the one produced by collating insights from the literature: ours was experimentally more valid (if correlated with official noise pollution levels) and offered a wider geographical coverage. From picture tags, we then studied the relationship between soundscapes and emotions. We learned that streets with music sounds were associated with strong emotions of joy or sadness, whereas those with human sounds were associated with joy or surprise. Finally, we studied the relationship between soundscapes and people's perceptions and, in so doing, we were able to map which areas are chaotic, monotonous, calm and exciting. Those insights promise to inform the creation of restorative experiences in our increasingly urbanized world.

Proceedings ArticleDOI
01 Oct 2016
TL;DR: This work first studies the relationship between the textual and visual aspects in multimodal posts from three major social media platforms, and runs a crowdsourcing task to quantify the extent to which images are perceived as necessary by human annotators.
Abstract: Sarcasm is a peculiar form of sentiment expression, where the surface sentiment differs from the implied sentiment. The detection of sarcasm in social media platforms has been applied in the past mainly to textual utterances where lexical indicators (such as interjections and intensifiers), linguistic markers, and contextual information (such as user profiles, or past conversations) were used to detect the sarcastic tone. However, modern social media platforms allow to create multimodal messages where audiovisual content is integrated with the text, making the analysis of a mode in isolation partial. In our work, we first study the relationship between the textual and visual aspects in multimodal posts from three major social media platforms, i.e., Instagram, Tumblr and Twitter, and we run a crowdsourcing task to quantify the extent to which images are perceived as necessary by human annotators. Moreover, we propose two different computational frameworks to detect sarcasm that integrate the textual and visual modalities. The first approach exploits visual semantics trained on an external dataset, and concatenates the semantics features with state-of-the-art textual features. The second method adapts a visual neural network initialized with parameters trained on ImageNet to multimodal sarcastic posts. Results show the positive effect of combining modalities for the detection of sarcasm across platforms and methods.

Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors propose a robust deep rank-net that, given a video, generates a ranked list of its segments according to their suitability as a GIF, and train the model to learn what visual content is often selected for GIFs by using over 100k user generated GIFs and their corresponding video sources.
Abstract: We introduce the novel problem of automatically generating animated GIFs from video. GIFs are short looping video with no sound, and a perfect combination between image and video that really capture our attention. GIFs tell a story, express emotion, turn events into humorous moments, and are the new wave of photojournalism. We pose the question: Can we automate the entirely manual and elaborate process of GIF creation by leveraging the plethora of user generated GIF content? We propose a Robust Deep RankNet that, given a video, generates a ranked list of its segments according to their suitability as GIF. We train our model to learn what visual content is often selected for GIFs by using over 100K user generated GIFs and their corresponding video sources. We effectively deal with the noisy web data by proposing a novel adaptive Huber loss in the ranking formulation. We show that our approach is robust to outliers and picks up several patterns that are frequently present in popular animated GIFs. On our new large-scale benchmark dataset, we show the advantage of our approach over several state-of-the-art methods.

Proceedings ArticleDOI
11 Apr 2016
TL;DR: This work provides a principled and mathematical approach to exploit signed social networks for recommendation, and proposes a model, RecSSN, to leverage positive and negative links insigned social networks.
Abstract: Recommender systems play a crucial role in mitigating the information overload problem in social media by suggesting relevant information to users. The popularity of pervasively available social activities for social media users has encouraged a large body of literature on exploiting social networks for recommendation. The vast majority of these systems focus on unsigned social networks (or social networks with only positive links), while little work exists for signed social networks (or social networks with positive and negative links). The availability of negative links in signed social networks presents both challenges and opportunities in the recommendation process. We provide a principled and mathematical approach to exploit signed social networks for recommendation, and propose a model, RecSSN, to leverage positive and negative links in signed social networks. Empirical results on real-world datasets demonstrate the effectiveness of the proposed framework. We also perform further experiments to explicitly understand the effect of signed networks in RecSSN.

Journal ArticleDOI
TL;DR: The present study has highlighted the rich herbal knowledge about newfound medicinal plants and their new uses in the Mediterranean region, which could be useful not only in facilitating other studies such as phytochemical and pharmacological investigations and upgrading the sources of biomolecules beneficial to people but also in reopening discussion on pharmacovigilance in herbal medicine as an imperative requirement for local authorities.