Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Sep 20;11(9):e0162423.
doi: 10.1371/journal.pone.0162423. eCollection 2016.

Long-Range Correlations in Sentence Series From A Story of the Stone

Affiliations
Free PMC article

Long-Range Correlations in Sentence Series From A Story of the Stone

Tianguang Yang et al. PLoS One. .
Free PMC article

Abstract

A sentence is the natural unit of language. Patterns embedded in series of sentences can be used to model the formation and evolution of languages, and to solve practical problems such as evaluating linguistic ability. In this paper, we apply de-trended fluctuation analysis to detect long-range correlations embedded in sentence series from A Story of the Stone, one of the greatest masterpieces of Chinese literature. We identified a weak long-range correlation, with a Hurst exponent of 0.575±0.002 up to a scale of 104. We used the structural stability to confirm the behavior of the long-range correlation, and found that different parts of the series had almost identical Hurst exponents. We found that noisy records can lead to false results and conclusions, even if the noise covers a limited proportion of the total records (e.g., less than 1%). Thus, the structural stability test is an essential procedure for confirming the existence of long-range correlations, which has been widely neglected in previous studies. Furthermore, a combination of de-trended fluctuation analysis and diffusion entropy analysis demonstrated that the sentence series was generated by a fractional Brownian motion.

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Statistics of sentence lengths.
(a) Sentence length series. Inset: some sentences with large lengths are clustered together. This cluster is due to the traditional Chinese full stop symbol (small open circle) that was used in the 30th chapter, which produced noisy data. (b) Sentence length follows a right-skewed distribution, which is the log-normal distribution in (c).
Fig 2
Fig 2. Long-range correlations in the noisy and cleaned series.
Noisy records can result in incorrect estimates of the Hurst exponents (solid circles) and consequently false conclusions, even if they only cover a limited proportion of the total series. The effect of noise can be removed using a cleansing procedure (open circles). The X-and E-part of the text are the first to 80th chapters, and the 81th to the 120th chapters, which are currently attributed to Xueqin Cao and E Gao, respectively.
Fig 3
Fig 3. Structural stability test.
The total polluted series was separated into 12 non-overlapping segments with a length of 2881. (a) All the curves obey almost perfect power-law relationships (gray open circles), except the 3rd segment which deviates significantly (red solid circles). (b) Hurst exponents for the 12 segments. The unreasonable large value of 0.87 for the 3rd noisy segment was corrected to 0.621 (red solid circle) after applying the cleaning procedure. There is a slightly decreasing trend.
Fig 4
Fig 4. Diffusion entropy analysis of the cleaned total, cleaned X-part, and E-part series.
Estimated values of the scaling exponent for the three series were almost identical.

Similar articles

See all similar articles

Cited by 1 article

References

    1. Pinker S. The language instinct. NewYork, HarperCollins; 1994.
    1. Bickerton D. Adam’s tongue: how humans made language, how language made humans. NewYork, Hill and Wang; 2009.
    1. Zipf GK. Human Behavior and the Principle of Least Effort. Addison-Wesley, Cambridge; 1949.
    1. Montemurro MA, Pury PA. Long-range fractal correlations in literary corpora. Fractals 2002. December;10:451–461. 10.1142/S0218348X02001257 - DOI
    1. Altmann EG, Cristadoro G, Esposti MD. On the origin of long-range correlations in texts. Proc. Natl. Acad. Sci. 2012. July;109:11582–11587. 10.1073/pnas.1117723109 - DOI - PMC - PubMed

Grant support

This work is supported by the National Science Foundation of China under the Grant No. 10975099 to Professor Huijie Yang; National Natural Science Foundation of China 11505114 to Prof. Changgui Gu; the Shanghai Municipal Education Commission 13YZ072 to Professor Huijie Yang; the Shanghai Municipal Education Commission D-USST02to Professor Huijie Yang; and the Shanghai Municipal Education Commission QD2015016 to Prof. Changgui Gu.
Feedback