Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jul 31;10(1):3439.
doi: 10.1038/s41467-019-11401-8.

Increasing trend of scientists to switch between topics

Affiliations

Increasing trend of scientists to switch between topics

An Zeng et al. Nat Commun. .

Abstract

Despite persistent efforts in understanding the creativity of scientists over different career stages, little is known about the underlying dynamics of research topic switching that drives innovation. Here, we analyze the publication records of individual scientists, aiming to quantify their topic switching dynamics and its influence. We find that the co-citing network of papers of a scientist exhibits a clear community structure where each major community represents a research topic. Our analysis suggests that scientists have a narrow distribution of number of topics. However, researchers nowadays switch more frequently between topics than those in the early days. We also find that high switching probability in early career is associated with low overall productivity, yet with high overall productivity in latter career. Interestingly, the average citation per paper, however, is in all career stages negatively correlated with the switching probability. We propose a model that can explain the main observed features.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Illustration of the co-citing network (CCN) of a typical highly cited scientist and its growth history. a The data and method used to construct the co-citing network. The papers authored by the scientist are marked in green, and the references of these papers are marked in red. b The co-citing network consists of all the papers published by this scientist. Each paper is represented by a node, and two papers are connected if they share at least one reference. The communities of this network are identified with the fast unfolding algorithm, which detects communities by maximizing the modularity function. The network contains several large-size communities, as well as some small clusters and isolated nodes. Each major community represents a main research topic of this scientist. c The community connectivity matrix shows that nodes within each community are well connected, yet nodes of different communities are much less connected. Here, the connectivity between two communities is computed as the real number of links between them over the possible maximum number of links between them. d The time series presented at the bottom describes the growth history of the network and meanwhile reveals how this scientist moves from one research topic to another during her career. In the sub-figure of time series, each point is a paper, and the color corresponds to the community in the co-citing network. The height of the point is the number of links (i.e., connectivity) that the paper has in the network
Fig. 2
Fig. 2
Structural properties of co-citing networks. a The size of the co-citing network (CCN) versus the size of CCN’s giant component (GC). Each point represents a scientist. Most of the points are located below but close to the diagonal line, indicating that CCNs are in general connected and have relatively large GCs. This is supported by the inset where the distribution of the relative size of GC is presented. b The maximized modularity in real CCNs (Qreal) and the maximized modularity in their degree-preserved reshuffled counterparts (Qrand). All the points are located under the diagonal line, indicating that the community structure in real networks is truly significant. c The distribution of the number of communities (nc) for all scientists. Three curves are presented where all communities are taken into account (legend as all communities), small communities with less than 3 nodes are eliminated (legend as size > 2), and small communities with less than 6 nodes are eliminated (legend as size > 5). d Fraction of papers in different communities. e Inverse cumulative probability of fraction of nodes in the biggest community (legend as top one), the two largest communities (legend as top two), and the three largest communities (legend as top three), respectively. f The Gini coefficient of the distribution of PACS codes in different communities. Communities are ranked by size in descending order. A larger Gini coefficient corresponds to a more heterogeneous distribution, suggesting that higher fraction of papers in a community share the same PACS codes. The real data are compared with a random counterpart, where the PACS codes are reshuffled among each individual scientist’s papers while the community structure is preserved. The error bars in this figure represent standard deviations
Fig. 3
Fig. 3
Evolution of yearly involved communities and switching probability. a The mean number of yearly involved major communities for individual scientists in different career years. b The switching probability between two adjacent publications from one major community to another major community of scientists in different career years. The inset shows the switching probability as a function of the number of papers published in a career. c Comparison of the overall switching probability (all scientists) with the switching probability of the 10% most productive scientists in different career years. The results suggest that high productivity is associated with low switching probability in the early career, but with high switching probability in the later career. d Comparison of the overall switching probability (all scientists) with the switching probability of the 10% scientists who has the highest mean citation per paper. For each paper, we only consider the number of citations 10 years after its publication (c10). The results suggest that high average citation per paper in all career periods correlates with low switching probability. In the insets of (c, d), we present the p-value of the Kolmogorov–Smirnov test distinguishing between the two switching probability distributions in each career year
Fig. 4
Fig. 4
Evolving trend of number of communities and switching probability as the development of science. a The mean number of communities of scientists who started their career in different years. b The average switching probability of scientists who started their career in different years. The error bars here represent standard deviations. As our data ends in 2010, it cannot capture the full career of scientists who started their careers in recent years. We thus filter out some scientists when we study the evolution of science here. We only consider scientists’ first y career years and remove (i) all the scientists that did not reach yet y years of career (for a fair temporal comparison), and (ii) those who published less than 30 papers in their first y career years (for a meaningful community detection). The results of y = 10,20,30 are presented in this figure. As science evolves (during the years), the number of major communities that each scientist has stays almost unchanged, while the frequency that scientists switch between communities increases during the years. c Distributions of the number of communities (for y = 30) for scientists who started their career between 1940 and 1950, and for those who started their career between 1970 and 1980. The p-value of the Kolmogorov–Smirnov test is 0.961, suggesting a significant similarity between these two distributions. d Distributions of the switching probability (for y = 30) of scientists who started their career between 1940 and 1950, and of those who started their career between 1970 and 1980. The p-value of the Kolmogorov–Smirnov test is 2.34 × 10−8, suggesting a significant difference between these two distributions (i.e., increase of switching probability)
Fig. 5
Fig. 5
Performance of the exploitation–exploration model (EEM). a Illustration of the EEM. The research activity is modeled as a node activation process in the knowledge space. When a scientist publishes a paper, she activates a node (i.e., a new knowledge) in the knowledge space. The network activated by this scientist at the end forms her personal network recording all her papers and the relations between them. The underlying toy network is a demonstration of the knowledge space, and the red nodes are the nodes already activated by a scientist, with a number recording the step in which the node is activated. The simplest model for the node activation process is the standard random walk, assuming that a scientist randomly activates a neighboring node of the last activated node. Therefore, one of the neighboring nodes (marked in green with a bigger size) of the red node 4 will be randomly picked and activated. In the EEM, we introduce an exploitation process and an exploration process. With probability p, the scientist randomly re-exploits the neighborhood of one of the previously activated nodes. In the figure, the scientist makes exploitation by jumping back to the red node 1 and randomly activating one of its neighbors. With probability q, the scientist explores nodes beyond the closest neighbors of node 4. For simplicity, we assume that the scientist randomly activates in the exploration step a next-nearest neighbor. b Comparison of the co-citing networks (CCN) as well as the paper publishing time series generated by the random walk model and by the EEM. The parameters including the initial paper and the number of papers in each year are set the same as in Fig. 1. In (c, d), these parameters are of all analyzed authors. c The number of yearly involved communities for different p, while q = 0. d The distribution of the number of communities that each scientist is involved during her career for different q. e, f Estimation of the probability p and q of each scientist based on the real data, plotted as their probability density functions
Fig. 6
Fig. 6
Structural properties of the generated scientists’ CCNs based on the EEM. a The size of the modeled co-citing network (CCN) versus the size of CCN’s giant component (GC). Each point represents a modeled scientist. b The maximized modularity in the modeled CCNs (Qmodel) and the maximized modularity in their degree-preserved reshuffled counterparts (Qrand). c The Gini coefficient of the distribution of PACS codes in different communities. Communities are ranked by size in a descending order. The model data are compared with a random counterpart, where the PACS codes are reshuffled. d The fraction of papers in different communities of real data and model data. e The inverse cumulative probability of fraction of nodes in the three largest communities for real data and model data. f The distribution of the maximum degree in scientists’ real CCNs and modeled CCNs. In this figure, the parameters of EEM are chosen as p = 0.6 and q = 0.2, and the error bars represent standard deviations

Similar articles

Cited by

References

    1. Zeng A, et al. The science of science: from the perspective of complex systems. Phys. Rep. 2017;714–715:1–73. doi: 10.1016/j.physrep.2017.10.001. - DOI
    1. Qi M, Zeng A, Li M, Fan Y, Di Z. Standing on the shoulders of giants: the effect of outstanding scientists on young collaborators careers. Scientometrics. 2017;111:1839–1850. doi: 10.1007/s11192-017-2328-8. - DOI
    1. Amjad T, et al. Standing on the shoulders of giants. J. Informetr. 2017;11:307–323. doi: 10.1016/j.joi.2017.01.004. - DOI
    1. Rzhetsky A, Foster JG, Foster IT, Evans JA. Choosing experiments to accelerate collective discovery. Proc. Natl Acad. Sci. USA. 2015;112:14569–14574. doi: 10.1073/pnas.1509757112. - DOI - PMC - PubMed
    1. Domenico MD, Omodei E, Arenas A. Quantifying the diaspora of knowledge in the last century. Appl. Netw. Sci. 2016;1:15. doi: 10.1007/s41109-016-0017-9. - DOI - PMC - PubMed

Publication types