Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 10:11:e75308.
doi: 10.7554/eLife.75308.

Conformist social learning leads to self-organised prevention against adverse bias in risky decision making

Affiliations

Conformist social learning leads to self-organised prevention against adverse bias in risky decision making

Wataru Toyokawa et al. Elife. .

Abstract

Given the ubiquity of potentially adverse behavioural bias owing to myopic trial-and-error learning, it seems paradoxical that improvements in decision-making performance through conformist social learning, a process widely considered to be bias amplification, still prevail in animal collective behaviour. Here we show, through model analyses and large-scale interactive behavioural experiments with 585 human subjects, that conformist influence can indeed promote favourable risk taking in repeated experience-based decision making, even though many individuals are systematically biased towards adverse risk aversion. Although strong positive feedback conferred by copying the majority's behaviour could result in unfavourable informational cascades, our differential equation model of collective behavioural dynamics identified a key role for increasing exploration by negative feedback arising when a weak minority influence undermines the inherent behavioural bias. This 'collective behavioural rescue', emerging through coordination of positive and negative feedback, highlights a benefit of collective learning in a broader range of environmental conditions than previously assumed and resolves the ostensible paradox of adaptive collective behavioural flexibility under conformist influences.

Keywords: collective behaviour; computational biology; conformity; hot stove effect; human; physics of living systems; reinforcement learning; risky decision making; social learning; systems biology.

Plain language summary

When it comes to making decisions, like choosing a restaurant or political candidate, most of us rely on limited information that is not accurate enough to find the best option. Considering others’ decisions and opinions can help us make smarter choices, a phenomenon called “collective intelligence”. Collective intelligence relies on individuals making unbiased decisions. If individuals are biased toward making poor choices over better ones, copying the group’s behavior may exaggerate biases. Humans are persistently biased. To avoid repeated failure, humans tend to avoid risky behavior. Instead, they often choose safer alternatives even when there might be a greater long-term benefit to risk-taking. This may hamper collective intelligence. Toyokawa and Gaissmaier show that learning from others helps humans make better decisions even when most people are biased toward risk aversion. The experiments first used computer modeling to assess the effect of individual bias on collective intelligence. Then, Toyokawa and Gaissmaier conducted an online investigation in which 185 people performed a task that involved choosing a safer or risker alternative, and 400 people completed the same task in groups of 2 to 8. The online experiment showed that participating in a group changed the learning dynamics to make information sampling less biased over time. This mitigated people’s tendency to be risk-averse when risk-taking is beneficial. The model and experiments help explain why humans have evolved to learn through social interactions. Social learning and the tendency of humans to conform to the group’s behavior mitigates individual risk aversion. Studies of the effect of bias on individual decision-making in other circumstances are needed. For example, would the same finding hold in the context of social media, which allows individuals to share unprecedented amounts of sometimes incorrect information?

PubMed Disclaimer

Conflict of interest statement

WT, WG No competing interests declared

Figures

Figure 1.
Figure 1.. Mitigation of suboptimal risk aversion by social influence.
(a) A schematic diagram of the task. A safe option provides a constant reward πs=1 whereas a risky option provides a reward randomly drawn from a Gaussian distribution with mean μ=1.5 and s.d.=1. (b, c): The emergence of suboptimal risk aversion (the hot stove effect) depending on a combination of the reinforcement learning parameters; (b): under no social influence (i.e. the copying weight σ=0), and (c): under social influences with different values of the conformity exponents θ and copying weights σ. The dashed curve is the asymptotic equilibrium at which asocial learners are expected to end up choosing the two alternatives with equal likelihood (i.e. Pr,t=0.5), which is given analytically by β=(2-α)/α(Denrell, 2007). The coloured background is a result of the agent-based simulation with total trials T=150 and group size N=10, showing the average proportion of choosing the risky option in the second half of the learning trials Pr,t>75>0.5 under a given combination of the parameters. (d): The differences between the mean proportion of risk aversion of asocial learners and that of social learners, highlighting regions in which performance is improved (orange) or undermined (purple) by social learning.
Figure 1—figure supplement 1.
Figure 1—figure supplement 1.. The simulation result with a wider parameter space.
The effect of the relationship between individual learning rate (α) and individual inverse temperature (β) across the different combinations of social learning parameters on the mean proportion of choosing the risky alternative in the second half of the trials of the two-armed bandit task described in Figure 1 in the main text. The dashed curves give a set of parameter combinations with which asocial learners are expected to choose the risky alternative in the same proportion as they choose the safe alternative (i.e. Pr=0.5) in the infinite time horizon T, given by β=(2-α)/α.
Figure 1—figure supplement 2.
Figure 1—figure supplement 2.. The results of the value-shaping social influence model.
The relationships between individual learning rate (α) and individual inverse temperature (β) across different combinations of social learning parameters. The coloured background shows the average proportion of choosing the risky option in the second half of the learning trials Pr,t>75>0.5. Different social learning weights (σvs) are shown from top to bottom (σvs{0,0.1,0.25,0.5,1,2}). Different conformity exponents are shown from left to right (θ{0.5,1,2}). The dashed curve is the asymptotic equilibrium at which asocial learners are expected to end up choosing both alternatives with equal likelihood (i.e. Pr=0.5), given by β=(2-α)/α.
Figure 1—figure supplement 3.
Figure 1—figure supplement 3.. The simulation result with the negative risk premium.
The relationships between individual learning rate (α) and individual inverse temperature (β) across different combinations of social learning parameters. The coloured background shows the average proportion of choosing the risky option in the second half of the learning trials Pr,t>75>0.5. Different social learning weights (σ) are shown from top to bottom (σ{0,0.25,0.5,0.75,0.9}). Different conformity exponents are shown from left to right (θ{1,2,4,8}). The risk premium is negativeμ=-0.5.
Figure 1—figure supplement 4.
Figure 1—figure supplement 4.. The simulation result with the Bernoulli noise distribution.
The relationships between individual learning rate (α) and individual inverse temperature (β) across different combinations of social learning parameters. The coloured background shows the average proportion of choosing the risky option in the second half of the learning trials Pr,t>75>0.5. Different social learning weights (σ) are shown from top to bottom (σ{0,0.2,0.4,0.6,0.8}). Different conformity exponents are shown from left to right (θ{1,2,4,8}). The binary payoff distribution was used where the safe alternative always provides πs=1 while the risky alternative provides either a 70% chance of πr=0 or a 30% chance of πr=5 . The risk premium was 1.5.
Figure 1—figure supplement 5.
Figure 1—figure supplement 5.. The simulation results under the positive risk premium experimental setups (a,d: the 1-risky-1-safe; b,e: the 1-risky-3-safe; c,f: the 2-risky-2-safe).
The relationships between individual learning rate (α) and individual inverse temperature (β) across different combinations of social learning parameters. (a–c): The coloured background shows the average proportion of choosing the risky option in the second half of the learning trials (Pr,t>75>0.5) under social influences with different values of the conformity exponents θ and copying weights σ. The dashed curve is the asymptotic equilibrium at which asocial learners are expected to end up choosing the two alternatives with equal likelihood (i.e. Pr=0.5). (d–f): The differences between the mean proportion of risk aversion of asocial learners and that of social learners, highlighting regions in which performance is improved (that is, risk-seeking increases; orange) or undermined (that is, risk-aversion is amplified; purple) by social learning.
Figure 1—figure supplement 6.
Figure 1—figure supplement 6.. The simulation results under the negative risk premium experimental setup.
The relationships between individual learning rate (α) and individual inverse temperature (β) across different combinations of social learning parameters. (left): The coloured background shows the average proportion of choosing the (optimal) safe option in the second half of the learning trials under social influences with different values of the conformity exponents θ and copying weights σ. The dashed curve shows the proportion of choosing the safe option at Ps=0.85. (right): The differences between the mean proportion of risk aversion of asocial learners and that of social learners, highlighting regions in which (suboptimal) risk-seeking increases (orange) and (optimal) risk-aversion increases (purple) by social learning.
Figure 2.
Figure 2.. The effect of social learning on average decision performance.
The x axis is a product of two reinforcement learning parameters α(β+1), namely, the susceptibility to the hot stove effect. The y axis is the mean probability of choosing the optimal risky alternative in the last 75 trials in a two-armed bandit task whose setup was the same as in Figure 1. The black solid curve is the analytical prediction of the asymptotic performance of individual reinforcement learning with infinite time horizon T+ (Denrell, 2007). The analytical curve shows a choice shift emerging at α(β+1)=2; that is, individual learners ultimately prefer the safe to the risky option in the current setup of the task when α(β+1)>2. The dotted curves are mean results of agent-based simulations of social learners with two different mean values of the copying weight σ{0.25,0.5} (green and yellow, respectively) and asocial learners with σ=0 (purple). The difference between the agent-based simulation with σ=0 and the analytical result was due to the finite number of decision trials in the simulation, and hence, the longer the horizon, the closer they become (Figure 2—figure supplement 1). Each panel shows a different combination of the inverse temperature β and the conformity exponent θ.
Figure 2—figure supplement 1.
Figure 2—figure supplement 1.. The effect of social learning on the average decision performance on the longer time horizon.
The x axis is an interaction of two reinforcement learning parameters α(β+1), that is, the susceptibility to the hot stove effect. The y axis is the mean probability of choosing the optimal risky alternative in the last 75 trials in the two-armed bandit task whose setup was the same as in Figures 1 and 2 in the main text (i.e. μ=0.5, s.d. = 1) except for the longer time horizon T=1075 compared to the time horizon used in the main text (T=150). The dotted curves are the mean result of agent-based simulations of groups of social learners with two different mean values of the copying weight σ{0.25,0.5} or individual learners with σ=0. Each panel shows a different combination of the inverse temperature β and the conformity exponent θ. The black solid curve is the theoretical benchmark where individual reinforcement learners were expected to asymptote with T+. Compared to Figure 2 in the main text, individual learners got closer to the benchmark. On the other hand, the performance of social learners remained deviated from the benchmark, suggesting that social influence had a qualitative impact on the course of learning and decision making, rather than merely slowing down approaching the equilibrium of individual learning.
Figure 2—figure supplement 2.
Figure 2—figure supplement 2.. The effect of social learning on the time evolution of decision performance.
The x axis is the number of trials. The y axis is the mean proportion of choosing the optimal risky alternative. Each colour shows a different β. For the asocial learning condition (i.e. σ=0), the analytical benchmark to which reinforcement learners asymptote is shown as a horizontal line. Conformity exponent θ was 2. Group size was 8. The simulation was repeated 1000 times for each combination of parameters. Compared to asocial learning cases, social learning (σ=0.3) qualitatively alters the course of learning, rather than just speeding up or slowing down learning.
Figure 3.
Figure 3.. The effect of individual heterogeneity on the proportion of choosing the risky option in the two-armed bandit task.
(a) The effect of heterogeneity of α, (b) β, (c) σ, and (d) θ. Individual values of a focal behavioural parameter were varied across individuals in a group of five. Other non-focal parameters were identical across individuals within a group. The basic parameter values assigned to non-focal parameters were α=0.5, β=7, σ=0.3, and θ=2, and groups’ mean values of the various focal parameters were matched to these basic values. We simulated 3 different heterogeneous compositions: The majority (3 of 5 individuals) potentially suffered the hot stove effect αi(βi+1)>2 (a, b) or had the highest diversity in social learning parameters (c, d; purple); the majority were able to overcome the hot stove effect αi(βi+1)<2 (a, b) or had moderate heterogeneity in the social learning parameters (c, d; blue); and all individuals had αi(βi+1)>2 but smaller heterogeneity (green). The yellow diamond shows the homogeneous groups’ performance. Lines are drawn through average results across the same compositional groups. Each round dot represents a group member’s mean performance. The diamonds are the average performance of each group for each composition category. For comparison, asocial learners’ performance, with which the performance of social learners can be evaluated, is shown in gray. For heterogeneous α and β, the analytical solution of asocial learning performance is shown as a solid-line curve. We ran 20,000 replications for each group composition.
Figure 4.
Figure 4.. The population dynamics model.
(a) A schematic diagram of the dynamics. Solid arrows represent a change in population density between connected states at a time step. The thicker the arrow, the larger the per-capita rate of behavioural change. (b, c) The results of the asocial, baseline model where PS-=PR+=ph and PR-=PS+=pl (ph>pl). Both figures show the equilibrium bias towards risk seeking (i.e., Nr-Ns) as a function of the degree of risk premium e as well as of the per-capita probability of moving to the less preferred behavioural option pl. (b) The explicit form of the curve is given by -n(ph-pl){(1-e)ph-epl}/(ph+pl){(1-e)ph+epl}. (c) The dashed curve is the analytically derived neutral equilibrium of the asocial system that results in NR*=NS*, given by e=ph/(ph+pl). (d) The equilibrium of the collective behavioural dynamics with social influences. The numerical results were obtained with NS,t=0-=NS,t=0+=5, NR,t=0=10, and ph=0.7.
Figure 4—figure supplement 1.
Figure 4—figure supplement 1.. The result of the differential equation model.
The effect of both the per capita probability of exploration pl and e (i.e. the ratio of individuals who prefer behavioural state R) on the equilibrium degree of risk seeking (i.e. NR*-NS*), across the different combinations of social influence parameters. Different social influence weights are shown from top to bottom (σ{0,0.25,0.5,0.75}). Different conformity exponents are shown from left to right (θ{1,2,10}). The dashed curve is e=ph/(ph+pl). The numeric solution was obtained with conditions NS,t=0-=NS,t=0+=5, NR,t=0=10, and ph=0.7.
Figure 5.
Figure 5.. The approximate bifurcation analysis.
The relationships between the social influence weight σ and the equilibrium number of individuals in the risky behavioural state NR across different conformity exponents θ{0,1,2,10} and different values of risk premium e{0.55,0.65,0.7,0.75}, are shown as black dots. The background colours indicate regions where the system approaches either risk aversion (NR<NS; blue) or risk seeking (NR>NS; red). The horizontal dashed line is NR=NS=10. Two locally stable equilibria emerge when θ2, which suggests that the system has a bifurcation when σ is sufficiently large. The other parameters are set to ph=0.7, pl=0.2, and N=20.
Figure 5—figure supplement 1.
Figure 5—figure supplement 1.. The approximate bifurcation analysis.
The relationship between the social influence weight σ and the equilibrium number of individuals choosing the risky alternative NR across the different conformity exponents θ({0,1,2,10}), shown as black dots. The triangular points shown in the background of each panel indicate regions in which the group approaches risk aversion (i.e., NR<10; blue) or the risk-seeking equilibrium (i.e. NR>10; red). Two different equilibria mean that the system has a bifurcation under a given σ. The direction of the background triangles indicates whether NR increases (Δ) or decreases () relative to its starting position. The other parameters are set to ph=0.7, pl=0.2.
Figure 6.
Figure 6.. Prediction of the fit learning model.
Results of a series of agent-based simulations with individual parameters that were drawn randomly from the best fit global parameters. Independent simulations were conducted 100,000 times for each condition. Group size was fixed to six for the group condition. Lines are means (black-dashed: individual, coloured-solid: group) and the shaded areas are 80% Bayesian credible intervals. Mean performances of agents with different σi are shown in the colour gradient. (a) A two-armed bandit task. (b) A 1-risky-3-safe (four-armed) bandit task. (c) A 2-risky-2-safe (four-armed) bandit task. (d) A negative risk premium two-armed bandit task.
Figure 6—figure supplement 1.
Figure 6—figure supplement 1.. Experimental results with the mixed logit model regression.
The black triangles are subjects in the individual learning condition; the orange dots are those in the group condition with group sizes ranging from 2 to 8. The solid lines are predictions from a mixed logit model for the individual condition (black) and for the group condition (orange), with the shaded area showing the 95% Bayesian credible intervals (CIs). (a) A two-armed bandit task (N=168). (b) A 1-risky-3-safe (four-armed) bandit task (N=148). (c) A 2-risky-2-safe (four-armed) bandit task (N=151). (d) A negative risk premium (RP) two-armed bandit task (N=118). The width of the CI for the individual condition in the negative RP task is due to the lack of data points in the region. The x axis is αi(βi+1), namely, the susceptibility to the hot stove effect. (a, b, and d) The y axis is the mean proportion of choosing the risky alternative averaged over the second half of the trials. (c) The y axis is the mean proportion of choosing the optimal risky alternative averaged over the second half of the trials. The horizontal lines show the chance-level probability.
Figure 6—figure supplement 2.
Figure 6—figure supplement 2.. Bayesian model comparison.
(a) The model recovery performance: model frequencies (dark shade) and exceedance probability (XP) for each pair of simulated and fitted models, calculated by the Widely Applicable Information Criterion (WAIC). (b–d) Model comparison results. The lengths of the bars indicate model frequencies. Exceedance probability (XP) of the decision-biasing model is shown.
Figure 6—figure supplement 3.
Figure 6—figure supplement 3.. The parameter recovery performance.
The top half and bottom half of the figure are the results of parameter recovery test 1 and 2, respectively. The left column shows the global parameters fitted for each of the two four-armed bandit tasks, the 1-risky-3-safe task (N=105) and the 2-risky-2-safe task (N=105). The red points are the true values and the black points are the mean posterior values (i.e. recovered values). The 95% Bayesian credible intervals are shown with error bars. The middle and right column are individual-level parameters across the two task conditions (N=210). The x axis is the true value and the y axis is the fitted (i.e. the mean posterior) individual value. The differences between the true value and the estimated value are shown in different colours (Dark: fit well). The Pearson’s correlation coefficients between the true and fitted values are shown.

Similar articles

Cited by

References

    1. Alem S, Perry CJ, Zhu X, Loukola OJ, Ingraham T, Søvik E, Chittka L. Associative Mechanisms Allow for Social Learning and Cultural Transmission of String Pulling in an Insect. PLOS Biology. 2016;14:e1002564. doi: 10.1371/journal.pbio.1002564. - DOI - PMC - PubMed
    1. Aoki K, Feldman MW. Evolution of learning strategies in temporally and spatially variable environments: a review of theory. Theoretical Population Biology. 2014;91:3–19. doi: 10.1016/j.tpb.2013.10.004. - DOI - PMC - PubMed
    1. Aplin LM, Sheldon BC, McElreath R. Conformity does not perpetuate suboptimal traditions in a wild population of songbirds. PNAS. 2017;114:7830–7837. doi: 10.1073/pnas.1621067114. - DOI - PMC - PubMed
    1. Arbilly M, Motro U, Feldman MW, Lotem A. Evolution of social learning when high expected payoffs are associated with high risk of failure. Journal of the Royal Society, Interface. 2011;8:1604–1615. doi: 10.1098/rsif.2011.0138. - DOI - PMC - PubMed
    1. Baldini R. Success-biased social learning: cultural and evolutionary dynamics. Theoretical Population Biology. 2012;82:222–228. doi: 10.1016/j.tpb.2012.06.005. - DOI - PubMed

Publication types

Grants and funding

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.