Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jul 31;115(31):7937-7942.
doi: 10.1073/pnas.1800521115. Epub 2018 Jul 18.

Efficient compression in color naming and its evolution

Affiliations

Efficient compression in color naming and its evolution

Noga Zaslavsky et al. Proc Natl Acad Sci U S A. .

Abstract

We derive a principled information-theoretic account of cross-language semantic variation. Specifically, we argue that languages efficiently compress ideas into words by optimizing the information bottleneck (IB) trade-off between the complexity and accuracy of the lexicon. We test this proposal in the domain of color naming and show that (i) color-naming systems across languages achieve near-optimal compression; (ii) small changes in a single trade-off parameter account to a large extent for observed cross-language variation; (iii) efficient IB color-naming systems exhibit soft rather than hard category boundaries and often leave large regions of color space inconsistently named, both of which phenomena are found empirically; and (iv) these IB systems evolve through a sequence of structural phase transitions, in a single process that captures key ideas associated with different accounts of color category evolution. These results suggest that a drive for information-theoretic efficiency may shape color-naming systems across languages. This principle is not specific to color, and so it may also apply to cross-language variation in other semantic domains.

Keywords: categories; color naming; information theory; language evolution; semantic typology.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
(A) Shannon’s (23) communication model. In our instantiation of this model, the source message M and its reconstruction M^ are distributions over objects in the universe U. We refer to these messages as meanings. M is compressed into a code, or word, W. We assume that W is transmitted over an idealized noiseless channel and that the reconstruction M^ of the source message is based on W. The accuracy of communication is determined by comparing M and M^, and the complexity of the lexicon is determined by the mapping from M to W. (B) Color communication example, where U is a set of colors, shown for simplicity along a single dimension. A specific meaning m is drawn from p(m). The speaker communicates m by uttering the word “blue,” and the listener interprets blue as meaning m^.
Fig. 2.
Fig. 2.
(Upper) The WCS stimulus palette. Columns correspond to equally spaced Munsell hues. Rows correspond to equally spaced lightness values. Each stimulus is at the maximum available saturation for that hue/lightness combination. (Lower) These colors are irregularly distributed in 3D CIELAB color space.
Fig. 3.
Fig. 3.
Color-naming systems across languages (blue circles) achieve near-optimal compression. The theoretical limit is defined by the IB curve (black). A total of 93% of the languages achieve better trade-offs than any of their hypothetical variants (gray circles). Small light-blue Xs mark the languages in Fig. 4, which are ordered by complexity.
Fig. 4.
Fig. 4.
Similarity between color-naming distributions of languages (data rows) and the corresponding optimal encoders at βl (IB rows). Each color category is represented by the centroid color of the category. (A) Mode maps. Each chip is colored according to its modal category. (B) Contours of the naming distribution. Solid lines correspond to level sets between 0.5 and 0.9; dashed lines correspond to level sets of 0.4 and 0.45. (C) Naming probabilities along the hue dimension of row F in the WCS palette.
Fig. 5.
Fig. 5.
Bifurcations of the IB color categories (Movie S1). The y axis shows the relative accuracy of each category w (defined in Materials and Methods). Colors correspond to centroids and width is proportional to the weight of each category, i.e., qβ(w). Black vertical lines correspond to the IB systems in Fig. 4.

Similar articles

Cited by

References

    1. Ferrer i Cancho R, Solé RV. Least effort and the origins of scaling in human language. Proc Natl Acad Sci USA. 2003;100:788–791. - PMC - PubMed
    1. Levy RP, Jaeger TF. Speakers optimize information density through syntactic reduction. In: Schölkopf B, Platt JC, Hoffman T, editors. Advances in Neural Information Processing Systems. Vol 19. MIT Press; Cambridge, MA: 2007. pp. 849–856.
    1. Piantadosi ST, Tily H, Gibson E. Word lengths are optimized for efficient communication. Proc Natl Acad Sci USA. 2011;108:3526–3529. - PMC - PubMed
    1. Gibson E, et al. A noisy-channel account of crosslinguistic word-order variation. Psychol Sci. 2013;24:1079–1088. - PubMed
    1. Jameson K, D’Andrade RG. It’s not really red, green, yellow, blue: An inquiry into perceptual color space. In: Hardin CL, Maffi L, editors. Color Categories in Thought and Language. Cambridge Univ Press; Cambridge, UK: 1997. pp. 295–319.

Publication types