scispace - formally typeset
Open AccessProceedings Article

A Probabilistic Model of the Categorical Association Between Colors.

Reads0
Chats0
TLDR
A non-parametric probabilistic model that can be used to encode relationships in color naming datasets, and it is shown that the uniqueness of a color name (color saliency) can be captured using the entropy of the probability distribution.
Abstract
In this paper we describe a non-parametric probabilistic model that can be used to encode relationships in color naming datasets. This model can be used with datasets with any number of color terms and expressions, as well as terms from multiple languages. Because the model is based on probability theory, we can use classic statistics to compute features of interest to color scientists. In particular, we show that the uniqueness of a color name (color saliency) can be captured using the entropy of the probability distribution. We demonstrate this approach by applying this model to two different datasets: the multi-lingual World Color Survey (WCS), and a database collected via the web by Dolores Labs. We demonstrate how saliency clusters similarly named colors for both datasets, and compare our WCS results to those of Kay and his colleagues. We compare the two datasets to each other by converting them to a common colorspace (IPT). Introduction There has been growing interest in how to use color naming data to improve color models. Better color name databases[7, 10, 11, 12, 14, 2] and online naming studies[18, 8] have stimulated recent work. Color naming databases and associated models have been been useful in color transfer[5], gamut mapping[19, 20], and methods for specifying or selecting colors in an image[15, 16, 17]. In this paper, we examine the issue of how to represent and quantify the association between colors induced by names. Current methods that incorporate naming data represent the category associated with a color using either a single name[5, 6], a vector[19], or by a set of fuzzy logic memberships[1, 2, 17]. We present a probabilistic framework for working with colors. We define the categorical association of a color c as a conditional probability P(C|c) over colors C in the color space C . For a color c, the probability P(C|c) represents how likely other colors in the space C are assigned the same linguistic label as c. Our choice of using a probability over colors in our framework is motivated by the following criteria not met by current approaches. Our model satisfies three design goals. (1) Our approach can incorporate categorical effects from any number of color words, expressions involving multiple words, and different languages. (2) Our framework is based on a non-parametric model which can capture the differences in color name distributions such as “yellow” having a narrow focus and “green” having a wide distribution[21]. (3) Embedding our representation in a probabilistic framework enables us to apply a wide array of statistical and probabilistic tools to further analyze and study the effect of categories on colors. We implement our model on two datasets. We extract color naming data from six languages in the World Color Survey which contains naming information at 330 colors on the surface of the Munsell solid[7]. We also investigate online naming data collected by DoloresLabs which contains names given to 10,000 randomly sampled colors in the RGB cube[8]. Our framework can incorporate cross-linguistic data and combine contributions from color words with similar meanings. We introduce the concept of salient colors based on the statistical notion of entropy. Salient colors from our approach show good correspondence basic color terms identified by Berlin and Kay[3]. Our approach also reveals two regions that are consistently named in the sRGB cube not corresponding to typical basic color terms. We compare qualitatively the differences in salient name regions between the World Color Survey and the DoloresLabs datasets. Motivations and Related Work The goal of this paper is to present a computational framework for modeling color categories derived from experimental data. Our framework is motivated by three issues that are at best partially addressed in the current literature. 1. We would like a framework that can include all possible words for describing a color and not be limited to a predefined list of terms. 2. We would like a non-parametric model capable of capturing the details in categorical association but still be robust to noise in the naming dataset. 3. We would like a framework that can support a rich set of computational and mathematical operations, so that more in-depth studies of categorical effects can be built on the framework. In particular, our approach is grounded in probability theory. The first issue addresses how to account for the many potential expressions for describing a color. In 1969, Berlin and Kay defined color words as basic color terms if their meanings cannot be derived from other words, and proposed that there are a total of eleven basic color terms. Basic color terms were shown to be universal across languages. While some languages such as English contain all eleven terms, others may have developed only a subset of the words[3]. Subsequent studies confirmed that basic color terms are words with the highest consensus between speakers[4], but found twelve basic color terms in Russian contradicting the limit on the number of terms[22]. Kay and McDaniel hypothesized that as languages evolve, some individuals may consider additional words such as aqua/turquoise (green and blue), chartreuse/lime (yellow and green), and maroon/burgundy (red and black) as basic color terms[9]. Many existing methods assume eleven or a fixed number of color categories and cannot process the full set of responses from recent surveys such as the HP Labs Multilingual Naming Experiment[18] and the DoloresLabs Naming Dataset[8], which have hundreds of color words. Chang et al.’s category-preserving color transfer algorithm defines eleven convex regions in the color space corresponding to the basic color terms[5]. Motomura’s categorical color mapping algorithm maps foci of the eight chromatic basic color terms between the source and target gamuts[19]. Moroney’s system for translating colors to names operates on the n most frequently used color words. We want a framework where all words are included and contributions from words with similar cognitive concepts such as “maroon” and “burgundy” are combined based on their similarity. Secondly, color names exhibit different naming distributions. Colors such as “red” and “yellow” are known to have a narrow and well-defined center while colors such as “green” and “blue” are known to be composed of a broad range of hue.[21] We want our framework have the flexibility to capture the details in the distributions while being robust to noise in the data. Current approaches tend to model color categories as a volume in color space, using various parameterized models, or using non-parameterized approaches such as histograms. Partitioning the color space[12, 5] assume color names occupy discrete and non-overlapping regions in the color space. Motomura’s gamut-mapping algorithm assumes that each basic term has an ellipsoid-shaped distribution and models the distributions using an 81-parameter covariance matrix[19]. Benavente models the color naming space using a set of 6-parameter SigmoidGaussian distributions[1]. One advantage of parameterized models is that they are constructed from a small number of parameters which can be estimated accurately. In his adaptive lexical classification system, Moroney proposes an alternative implementation in which color names are represented as non-parametric histograms[16]. While histograms can capture any shape of distribution, Moroney reported noise in the data due to limited number of data points and suggests that smoothing operators or hedging be applied to post-process the histograms.1 Finally, we would like a framework capable of supporting a rich set of computational and mathematical tools. Instead of being merely a representation, the framework should allows us to perform further computation and analysis on how categories affect the way we associate colors. Treating the association between colors as a probability distribution positions our framework within the well-studied domain of probability theory. Methodology Colors and Color Words A naming dataset consists of a list of responses in the form of “color”-“color word” pairs that record the words used to describe a color. A “color” refers to the stimuli shown to a respondent and varies between datasets from Munsell color chips viewed under controlled lighting to rectangles of colors displayed on uncalibrated monitors. Unconstrained surveys allow respondents to use any expression whereas constrained surveys ask respondents to choose from a predefined list of words. An unconstrained color expression could include, e.g., “granny smith apple green”, “light robin’s egg pastel blue”, or “mix all the paint together”. In practice, most expressions recorded in unconstrained surveys consistent of a single word or a simple set of words such as “blue” or 1We should emphasize our application differs from Moroney’s in that his work is on modeling the distribution of color names while our work is on modeling the association between colors due to naming effects. “bluish green”. We will use the term “color words” from this point on even though it could refer to any possible expressions for describing a color. A naming dataset can be tabulated using a word count table where the list of all colors presented in the survey is displayed along the columns, and a list of all color words recorded is displayed along the rows. Each entry in the table indicates the number of times a corresponding color word is used to describe the corresponding color. Depending on the nature of the naming dataset, the density of word count table may vary. The World Color Survey (WCS)[7] is cross-linguistic and unconstrained, and collects naming data on a set of 330 colors. The word count table for the WCS consisting of 2300 rows by 330 columns with 20% non-zero entries. In comparison, the DoloresLabs color name dataset[8] while also unconstrained uses 10000 randomly-sampled colors. A total of 1966 expressi

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Selecting semantically-resonant colors for data visualization

TL;DR: An algorithm for automatic selection of semantically‐resonant colors to represent data (e.g., using blue for data about “oceans”, or pink for “love”) is introduced.
Proceedings ArticleDOI

Color naming models for color selection, image editing and palette design

TL;DR: This paper presents a method for constructing a probabilistic model of color naming from a large, unconstrained set of human color name judgments, and describes how the model can be used to map between colors and names and define metrics for color saliency and color name distance.
Proceedings ArticleDOI

Modeling how people extract color themes from images

TL;DR: This work presents a method for extracting color themes from images using a regression model trained on themes created by people, and finds that themes extracted by Turk participants were similar to ones extracted by artists.
Proceedings ArticleDOI

Somewhere Over the Rainbow: An Empirical Assessment of Quantitative Colormaps

TL;DR: It is found that a combination of perceptual color space and color naming measures more accurately predict user performance than either alone, though the overall accuracy is poor.
Book ChapterDOI

Coloring with Words: Guiding Image Colorization Through Text-based Palette Generation

TL;DR: A novel approach to generate multiple color palettes that reflect the semantics of input text and then colorize a given grayscale image according to the generated color palette, using a manually curated dataset called Palette-and-Text (PAT).
References
More filters
Journal ArticleDOI

Divergence measures based on the Shannon entropy

TL;DR: A novel class of information-theoretic divergence measures based on the Shannon entropy is introduced, which do not require the condition of absolute continuity to be satisfied by the probability distributions involved and are established in terms of bounds.
Book

Basic Color Terms: Their Universality and Evolution

Paul Kay
TL;DR: In this paper, the data, hypothesis, and general findings have been presented, including the evolution of basic color terms, and the data and hypothesis of the color term evolution, and some speculations.
Journal ArticleDOI

Salience of chromatic basic color terms confirmed by three measures

TL;DR: Using single color terms of their choice, nine subjects named each of 424 colors twice under carefully-controlled conditions to support the conception that basic color terms refer to fundamental sensations for which there is a specific physiological basis.
Journal ArticleDOI

Locating basic colours in the munsell space

TL;DR: In this article, the location of the eleven basic surface colours within Munsell space using a monolexemic colour naming technique is defined and a comparison of the Munsell and OSA results reveals that while the centroids obtained using the two systems correspond reasonably well, Munsell focal samples have a much higher saturation than their OSA counterparts.