Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Aug;13(121):20160466.
doi: 10.1098/rsif.2016.0466.

Hierarchical compression of Caenorhabditis elegans locomotion reveals phenotypic differences in the organization of behaviour

Affiliations

Hierarchical compression of Caenorhabditis elegans locomotion reveals phenotypic differences in the organization of behaviour

Alex Gomez-Marin et al. J R Soc Interface. 2016 Aug.

Abstract

Regularities in animal behaviour offer insights into the underlying organizational and functional principles of nervous systems and automated tracking provides the opportunity to extract features of behaviour directly from large-scale video data. Yet how to effectively analyse such behavioural data remains an open question. Here, we explore whether a minimum description length principle can be exploited to identify meaningful behaviours and phenotypes. We apply a dictionary compression algorithm to behavioural sequences from the nematode worm Caenorhabditis elegans freely crawling on an agar plate both with and without food and during chemotaxis. We find that the motifs identified by the compression algorithm are rare but relevant for comparisons between worms in different environments, suggesting that hierarchical compression can be a useful step in behaviour analysis. We also use compressibility as a new quantitative phenotype and find that the behaviour of wild-isolated strains of C. elegans is more compressible than that of the laboratory strain N2 as well as the majority of mutant strains examined. Importantly, in distinction to more conventional phenotypes such as overall motor activity or aggregation behaviour, the increased compressibility of wild isolates is not explained by the loss of function of the gene npr-1, which suggests that erratic locomotion is a laboratory-derived trait with a novel genetic basis. Because hierarchical compression can be applied to any sequence, we anticipate that compressibility can offer insights into the organization of behaviour in other animals including humans.

Keywords: Caenorhabditis elegans; behaviour; genetics.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Dictionary-based compression extracts hierarchical structure in posture sequences. (a) Locomotion is represented as a sequence of discrete postural states. At each point in time, the original skeleton (black) is matched by its nearest-neighbour posture in a set of 90 template postures. The orange dot indicates the head. The numbers beneath each shape are the labels of the template postures in each case. (b) Simple sequence to illustrate the compressive algorithm. For the indicated sequence, the subsequence that results in the greatest compression when it is replaced by a new state label is {1, 2, 1}. In the second iteration {3, 2, 2, 3} and {3, 3, 2, 2} are equally compressive. We simply take the sequence that occurs first in the sorted list of unique sequences. The arc diagram on the right connects adjacent repeats of dictionary sequences. (c) An arc diagram for a sequence of worm locomotion (blue) and the corresponding arc diagram for the same sequence following random shuffling (black). (d) Selected c-grams discovered from 150 min (approx. 104 postures) of worm behaviour. The most compressive sequence (i), the most nested c-gram (ii) and three other behaviours (iii) are plotted underneath dendrograms that show the hierarchical structure represented in the dictionary. The numbers in red indicate the number of times that the sequence under each branch occurred in the 150 min. (Online version in colour.)
Figure 2.
Figure 2.
c-grams are rare but relevant subsequences. Hits are any sequences that are found to have a different frequency between N2 animals crawling on food, off food or performing chemotaxis. (a) The longest hit is a bout of forward locomotion that is more common during chemotaxis. The box plot shows the frequency of this behaviour in the three conditions (red points are outliers, which are greater than the difference between the 25th and 75th percentiles outside of the box). (b) In each condition, the most compressive sequence is a hit in at least one comparison, indicating that compressive sequences are more likely to be modulated across conditions than n-grams as a whole. (c) The c-gram hits are more evenly spaced across the frequency distribution than those found using all n-grams. (d) Canonical worm behaviours are identified through compression and these would be missed by focusing only on the most frequently occurring n-grams. The behaviours are shown on the left with their highest frequency rank observed across all worms in the comparison group shown in red to the right. (Online version in colour.)
Figure 3.
Figure 3.
Worm locomotion sequences are poised between random and deterministic, which leads to intermediate compressibility. (a) The compressibility per posture increases as a function of length for N2 locomotion sequences (orange). Uniform random sequences with 90 states (black) and a deterministic sequence consisting of 1–90 repeated (red) provide lower and upper bounds on compression. Shuffled (blue) and sorted (green) sequences provide related bounds constrained by having the same posture probability distributions as the observed locomotion sequences. A Markov chain simulated using the observed posture transition probabilities provides a more realistic model of locomotion sequences. (b) Compressibility as a function of length for individual worms shows the variability in compressibility. Many of the least compressible individuals have shorter uncompressed lengths, indicating that these worms moved less (had fewer posture transitions) during the 15 min they were recorded. (Online version in colour.)
Figure 4.
Figure 4.
Wild-isolate locomotion is more stereotyped than that of most mutant strains. (a) Two-dimensional histogram of the distribution of compressibility against postural state duration for a set of 239 mutant strains that are not uncoordinated (‘other mutants’). The red bars show the mean ± s.e., for a selection of strains. The contours show the extent at half-maximum of the distributions for 18 wild isolates (orange) and 63 uncoordinated mutants (green). The wild-isolate and uncoordinated distributions are plotted separately in (b). (c) Box plots show the compressibility measured on 500-posture chunks for the strains highlighted in (a). CB4856 is more compressible than either N2 (p = 4.7 × 10−8) or npr-1(ad609) (p = 3.3 × 10−5) using a rank-sum test. (Online version in colour.)

Similar articles

Cited by

References

    1. Tinbergen N. 2010. On aims and methods of ethology. Z. Tierpsychol. 20, 410–433. (10.1111/j.1439-0310.1963.tb01161.x) - DOI
    1. Dawkins R. 1976. Hierarchical organisation: a candidate principle for ethology. In Growing points in ethology (eds PPG Bateson, RA Hinde), pp. 7–54. Cambridge, UK: Cambridge University Press.
    1. Branson K, Robie AA, Bender J, Perona P, Dickinson MH. 2009. High-throughput ethomics in large groups of Drosophila. Nat. Methods 6, 451–457. (10.1038/nmeth.1328) - DOI - PMC - PubMed
    1. Yemini E, Jucikas T, Grundy LJ, Brown AEX, Schafer WR. 2013. A database of Caenorhabditis elegans behavioral phenotypes. Nat. Methods 10, 877–879. (10.1038/nmeth.2560) - DOI - PMC - PubMed
    1. Yu H, Aleman-Meza B, Gharib S, Labocha MK, Cronin CJ, Sternberg PW, Zhong W. 2013. Systematic profiling of Caenorhabditis elegans locomotive behaviors reveals additional components in G-protein G q signaling. Proc. Natl Acad. Sci. USA 110, 11 940–11 945. (10.1073/pnas.1310468110) - DOI - PMC - PubMed

Publication types