The beauty and the tragedy of the modern world is that it eliminates many situations that require people to demonstrate a commitment to the collective good.
- Sebastian Junger¹
Imagine two scientists, Kotrina and Amber, who have just obtained their PhDs and are entering the job market.
Kotrina has four empirical papers. She is first author on two, including a publication in a prominent psychology journal, Journal of Experimental Psychology: General. Kotrina has 75 citations, with two papers cited 25 times each—not bad for a newly-minted PhD. She has also mentored five undergraduate students on their honors theses and has obtained a modest research grant.
Amber has seven empirical papers. She is first author on five, including three publications in prominent journals, Psychological Science, Journal of Experimental Psychology: General, and PNAS. Amber has over 200 citations, with four papers cited more than 40 times each—impressive for a newly-minted PhD. She has also mentored five undergraduate students on their honors theses and has obtained a major research grant.
Suppose you were a member of a search committee, and Kotrina and Amber were in the running for your department's final interview spot. Who would you choose?
Amber, right? She has more publications, in more prominent journals, with more citations, and has obtained more funding. Sure, you know that focusing on proxy measures like publication count, citations, and funding can distort science by incentivizing less-rigorous research.² But it really does seem like Amber is doing better work, at a higher rate of productivity, and with more potential for external support. If you had to select the best individual scientist, Amber would seem like the obvious choice.
Is choosing the best scientist that simple?
Now imagine that you talk to colleagues and learn a bit more about each candidate.
You learn that Amber is sometimes negligent: she doesn’t carefully document her experimental procedures, doesn’t check her code for bugs, and doesn’t make her materials available and accessible to others. You also learn that Amber engages in questionable research practices³ to increase her chances of getting statistically significant findings (incentives, right?).⁴ As a consequence, some of Amber’s publications probably contain false positives, which will waste the time of other scientists who try to build on her work.
Amber is so motivated to be successful that she neglects many prosocial aspects of academic work: she rarely performs departmental service or helps colleagues when they ask for assistance, and writes lazy peer reviews. To top it off, Amber is a terrible mentor—colleagues have seen Amber exploiting her students, stealing their ideas without giving proper credit, and withdrawing mentorship from students who were struggling. Sure, Amber may be a productive individual, but she is a lousy colleague and community member.
In contrast, you learn that Kotrina is exceptionally diligent: she carefully documents her experimental procedures, double checks her code for bugs, and makes her materials readily accessible to others. Kotrina works hard to avoid questionable research practices and conducts her research slowly and methodically. As a consequence, her publications are more likely to contain true findings and theoretical advances, contributing to the gradual accumulation of scientific knowledge.
Kotrina is deeply committed to helping the people in her community. She serves on departmental committees, helps colleagues whenever they ask for help, and is a thoughtful, constructive peer reviewer. To top it off, Kotrina is a dedicated mentor: she devotes personal time to help students become better scholars, credits students for their contributions, and steps up her commitment when students are struggling. Sure, Kotrina may not be the most productive individual, but she is an ideal colleague and community member.
Knowing all of this, would you reconsider your choice? Is it possible to separate Amber and Kotrina’s scientific contributions from their effects on the productivity and well-being of colleagues and the broader scientific community?
Typical evaluation criteria ignore indirect effects
Of course, Kotrina and Amber are caricatures—real differences between candidates are rarely so clear cut. Yet, their tale is useful because it illustrates the two pathways by which scientists contribute to science: directly and indirectly.
A ‘direct’ effect is one in which the causal path goes straight from a scientist’s efforts to a measurable scientific outcome. An ‘indirect’ effect is one in which the causal path from a scientist’s efforts to a measurable scientific outcome goes through other scientists. In other words, indirect contributions are mediated by their effects on other scientists’ direct contributions. The following Directed Acyclic Graph (DAG)⁵ ⁶ illustrates these two pathways:
Each and every scientist can contribute to science via these two pathways. This means that without accounting for both direct and indirect contributions, it is impossible to determine a scientist’s total contribution to any scientific outcome.
This should make us a bit worried, given that many indirect effects are left out of the metrics that we use to assess scientists’ productivity.
But maybe this is just a minor issue. After all, no metric can capture all relevant factors, so how harmful is it really if we only measure direct effects? Are there tangible repercussions for the efficiency of science, the well-being of scientists, the spread of good scientific practices, or other dimensions that truly matter?
The repercussions of ignoring indirect effects
The thing is, ignoring indirect effects does have serious consequences. Consider three:
1. Ignoring indirect effects fails to reward scientists who help others and fails to penalize scientists who harm others.
2. Ignoring indirect effects increases the intensity of competition between individual scientists.
3. Ignoring indirect effects reduces the incentive to specialize in unique skills that complement others.
Ignoring indirect effects fails to reward scientists who help others and fails to penalize scientists who harm others.
Consider an extreme case in which a scientist generates little direct output, such as not having any first-authored publications. Given current evaluation criteria, such a scientist would struggle to find a research position, get grants, and receive awards. Would this be justified?
The problem is that, if the scientist has large positive indirect effects, then their total contribution may be large enough to warrant recognition and rewards, despite the fact that they produce little work themselves.
Many of us know of such scientists—they are not exceptionally productive, but they lift up their department and are a joy to have as colleagues. That is, not only do these scientists improve others’ work, they also increase others’ well-being. By failing to recognize their indirect contributions, current evaluation criteria don’t give them the recognition that they deserve.
Now consider a scientist who generates substantial direct output, such as having many first-authored publications. Should this scientist be hired, get grants, and receive awards?
If the scientist also has large positive indirect effects, then focusing on direct output would lead us to underestimate their total contribution. However, if the scientist is productive in-part by imposing negative indirect effects on others, then they achieve personal success at others’ expense. As a result, ignoring indirect effects would lead us to overestimate their total contribution.
Sure, such a scientist might still be worth hiring and rewarding: a superstar may be so prolific that they are worth keeping around, even if they harm others. But it is impossible to know without accounting for indirect effects.
Many of us also know of such scientists—they are productive, but are lousy colleagues and may even be exploitative. That is, not only do these scientists harm others’ work, they also reduce others’ well-being. By failing to recognize that these individuals indirectly harm science, current evaluation criteria give them more recognition than they deserve.
Thinking back to Kotrina and Amber, accounting for indirect effects could mean that not only is Kotrina a better choice now, she might still be the better choice if Amber’s research was of stellar quality, simply because Kotrina is a much better great colleague and community member.
Ignoring indirect effects increases the intensity of competition between individual scientists
Ignoring indirect effects increases the intensity of individual-level competition by reducing the “stake” that scientists have in the outcomes of other scientists. In biology, this is well established: evolutionary mechanisms that cause individuals to have a stake in each other’s outcomes (such as relatedness) result in a “shared fate,” reducing individual-level competition and promoting cooperation in many cases.⁷ ⁸
Of course, competition can be useful (promoting innovation, increasing effort, and incentivizing individuals to tackle diverse problems).⁹ ¹⁰ ¹¹ The problem is that individual-level competition incentivizes scientists to only engage in those behaviors that benefit themselves, even though individually-beneficial behaviors are a mere subset of the behaviors that benefit science as a whole. So, understandably, scientists end up engaging in low levels of many collectively-beneficial behaviors, such as sharing code and well-documented datasets, doing replication research, conducting rigorous peer-reviews, and criticizing the work of others.
Competition also incentivizes scientists to harm others, in situations where scientists benefit from the failures of their competitors. Think of two labs competing for priority of discovery or two scientists competing for the same grant—one’s failure increases the other’s chance of success.
In focus-group discussions with scientists at major research universities, Anderson et al.¹² document unnerving examples of the things that scientists do when competition is intense, including strategically withholding and misreporting research findings to sabotage competitors’ progress, delaying peer review of competitors’ papers to “beat them to the punch,” and lying to and exploiting PhD students to make progress on projects.
It’s no surprise that competition and the pursuit of self-interest can make everyone worse off.¹³ And there’s a clear analogy with science: selecting scientists based on individual productivity, while ignoring indirect effects, generates intense individual-level competition, exacerbating the disconnect between what scientists must do to have successful careers and what is best for science and the well-being of scientists.²
Ignoring indirect effects reduces the incentive to specialize in unique skills that complement others.
A focus on direct, individual contributions using a narrow set of metrics, such as first-authored papers, creates an additional problem: scientists have fewer incentives to specialize in roles that are not rewarded by the prevailing regime, even if these roles are essential for science.
In psychology, for example, scientists are incentivized to become a ‘content specialist’ and develop a unique and identifying brand (“Pat studies evolved fear predispositions; Terry studies short-term mating strategies; Kim studies working-memory constraints”). By contrast, there is little incentive to become a ‘methodological specialist’ and develop a skill set that complements the skills of others (“Pat is an expert statistician; Terry is a fantastic mentor; Kim is a dedicated peer-reviewer”).
In an empirically-dominated discipline like psychology, more so than in disciplines like physics or economics, a scientist would struggle to find a job as a dedicated theorist, even though having dedicated theorists would likely benefit the discipline. So instead of becoming methodological specialists, people spread themselves thin across those competencies that are rewarded.¹⁴ In this way, ignoring indirect effects hinders the efficient division of labor that is crucial for “large team” science.¹⁵ ¹⁶
Accounting for indirect effects: lessons from animal husbandry and professional sports
How should we approach the problem of accounting for indirect effects in scientific evaluation?
Is there a way to reduce the disconnect between what is in scientists’ self-interest and what is in the interest of the larger entities in which scientists are embedded, such as departments and fields?
For a first hint at a solution, it’s useful to see how other fields have dealt with similar problems.
Just as we seek to select scientists to improve scientific outcomes, in domesticated livestock, breeders seek to select animals in a way that maximizes the amount of some commodity.
For example, in laying hens, breeders seek to maximize hens’ lifetime egg production. You might think that the best way to go about this is to select the most productive individuals. The problem is that each hen’s productivity depends on the behaviors of other hens in their social environment, and the most productive hens are the nastiest ones: they feather-peck and cannibalize fellow group members (and it’s hard to produce eggs when you’re getting eaten alive). Because productive individuals achieve their productivity by harming others, selecting the most productive hens can actually lead to lower total egg-yield.¹⁷ ¹⁸
In terms of both economics and animal-welfare, feather-pecking and cannibalism are severe problems.¹⁹ How can breeders solve these?
Instead of selecting the most productive individuals, breeders can select individuals from the most productive groups (hens in the most productive coops are preferentially allowed to reproduce), which implicitly accounts for hens’ indirect effects on group members.¹⁸ In one application of this approach to poultry, mortality dropped from 68% to 9% in just a few generations, and laying increased from 91 to 237 eggs.²⁰
Sports team managers must evaluate which players have the largest positive effect on team performance. Superstar players with impressive individual performances (such as scoring many points) might seem like the natural choice.²¹ But superstars aren’t always the players that have the largest positive impact on a team, which is why evaluations of professional athletes rely on metrics that capture indirect effects.²²
In the National Hockey League (NHL), the “plus-minus” statistic provides information about a team’s performance when a player is on-versus-off the ice. Other metrics include involvements in goal-scoring attempts, shots blocked, plus-minus in number of attempts on goal (Corsi), and attempts to account for success due to luck (PDO).²³
The same thing holds in Major League Baseball (MLB). A player’s batting average and number of home runs are useful for capturing direct contributions, but are terrible for capturing indirect ones, where metrics like runs batted in (RBI) and defensive runs saved (DRS) are more useful. There are even position-specific metrics, such as a pitcher’s earned-run average (ERA), because, well, it wouldn’t make much sense to judge different specialists based on identical criteria.
And despite the utility of these measures, it is acknowledged that players affect team performance in ways that are “hidden” from metrics, such as boosting morale.²²
Changing the level of selection: an overarching principle to account for indirect effects
Animal husbandry and professional sports illustrate an overarching principle for improving group-level outcomes: accounting for indirect effects by changing the level of selection, from lower levels (individuals) to higher ones (teams).
When thinking about institutional reform, shifting to a higher level of selection would promote cooperation by causing individuals within groups to have a shared fate: each individual’s success would become tied to that of group members, creating incentives for altruistic behavior at the lower level.
In evolutionary biology, multilevel-selection theory provides a formal framework for analyzing such situations, where ‘individuals’ are structured into ‘groups’ (genes within cells, cells within individuals, individuals within groups) and selection operates at both individual and group levels.²⁴ ²⁵ ²⁶ When the strength of between-group selection is sufficiently strong, evolution can favor within-group cooperation that leads to an advantage in between-group competition.
Just change the level of selection. Easy, right?
Nothing is easy in war.
- Dwight D. Eisenhower²⁷
Changing the level of selection has significant potential to improve science. It works in professional sports and animal husbandry; and throughout evolutionary history, in cases where natural selection at higher levels has dominated selection at lower ones, the resulting “superorganisms” (eukaryotic cells, eusocial insects) became so ecologically dominant that many other species had little hope.²⁸
Of course, superorganisms are not a predestined evolutionary outcome, and neither is large-scale cooperation in science, particularly in a system of recognition and rewards that largely ignores indirect effects.
The challenge is that the analogy between the above situations and science is imperfect.
In science, we don’t have a single outcome measure to maximize. We have to worry about Campbell’s Law, “the more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”²⁹ We don’t just want large teams, but a mix of large-teams and independent scientists.³⁰ We need to worry about crowding out intrinsic motivations with extrinsic ones.³¹ And we must keep in mind that there’s no free lunch when it comes to cooperation: you only get cooperation at lower levels by promoting competition at higher ones.³²
These challenges will be far from easy to overcome, to say the least.
But nothing is easy, especially nothing worth having, so we could do worse than to consider whether changing the level at which we select scientists — from individuals to the larger entities in which scientists are embedded — is a viable approach. Indeed, ongoing initiatives to broaden evaluation criteria³³ are moving in related directions, for example, by assessing research integrity³⁴ and making “room for everyone’s talent” at universities.³⁵
Because what’s the alternative?
The status quo?
A world in which we fail to reward the scientists who make the largest overall contributions, where there is insufficient large-scale cooperation, and where the Kotrinas suffer while the Ambers peck their way to success?
We can do better than that.
1. Junger, S. Tribe: On homecoming and belonging. p. 59.(Twelve, 2016).
2. Smaldino, P. E. & McElreath, R. The natural selection of bad science. R. Soc. Open Sci. 3, 160384 (2016).
3. John, L. K., Loewenstein, G. & Prelec, D. Measuring the prevalence of questionable research practices with incentives for truth telling. Psychol. Sci. 23, 524–532 (2012).
4. Yarkoni, A. T. No, it’s not The Incentives — it’s you.  https://www.talyarkoni.org/blog/2018/10/02/no-its-not-the-incentives-its-you/ (2018).
5. Pearl, J. Causal diagrams for empirical research. Biometrika 82, 669–688 (1995).
6. Rohrer, J. M. Thinking Clearly About Correlations and Causation: Graphical Causal Models for Observational Data. Adv. Methods Pract. Psychol. Sci. 1, 27–42 (2018).
7. Aktipis, A. et al. Understanding cooperation through fitness interdependence. Nat. Hum. Behav. 2, 429 (2018).
8. Fletcher, J. A. & Doebeli, M. A simple and general explanation for the evolution of altruism. Proc. R. Soc. B Biol. Sci. 276, 13–19 (2009).
9. Dechenaux, E., Kovenock, D. & Sheremeta, R. M. A survey of experimental research on contests, all-pay auctions and tournaments. Exp. Econ. 18, 609–669 (2015).
10. Balietti, S., Goldstone, R. L. & Helbing, D. Peer review and competition in the Art Exhibition Game. Proc. Natl. Acad. Sci. 201603723 (2016).
11. Hagstrom, W. O. Competition in science. Am. Sociol. Rev. 1–18 (1974).
12. Anderson, M. S., Ronning, E. A., De Vries, R. & Martinson, B. C. The perverse effects of competition on scientists’ work and relationships. Sci. Eng. Ethics 13, 437–461 (2007).
13. Frank, R. H. The Darwin economy: Liberty, competition, and the common good. (Princeton University Press, 2012).
14. Eiko. Are we asking too much? A list of competencies people expect me to have. Eiko Fried https://eiko-fried.com/are-we-asking-too-much-a-list-of-competencies-people-expect-me-to-have/ (2017).
15. Forscher, P. S. et al. A manifesto for team science. (2020).
16. Wuchty, S., Jones, B. F. & Uzzi, B. The increasing dominance of teams in production of knowledge. Science 316, 1036–1039 (2007).
17. Muir, W. M. Incorporation of competitive effects in forest tree or animal breeding programs. Genetics 170, 1247–1259 (2005).
18. Wade, M. J., Bijma, P., Ellen, E. D. & Muir, W. Group selection and social evolution in domesticated animals. Evol. Appl. 3, 453–465 (2010).
19. El-Lethey, H., Aerni, V., Jungi, T. W. & Wechsler, B. Stress and feather pecking in laying hens in relation to housing conditions. Br. Poult. Sci. 41, 22–28 (2000).
20. Muir, W. M. Group selection for adaptation to multiple-hen cages: selection program and direct responses. Poult. Sci. 75, 447–458 (1996).
21. Lucifora, C. & Simmons, R. Superstar effects in sport: Evidence from Italian soccer. J. Sports Econ. 4, 35–55 (2003).
22. Duch, J., Waitzman, J. S. & Amaral, L. A. N. Quantifying the Performance of Individual Players in a Team Activity. PLOS ONE 5, e10937 (2010).
23. Beginner’s Guide to Advanced Hockey Statistics — Northwestern Sports Analytics Group. https://sites.northwestern.edu/nusportsanalytics/2020/05/06/advanced-hockey-statistics/.
24. Okasha, S. Evolution and the levels of selection. (Oxford University Press, 2006).
25. Hamilton, W. D. Innate social aptitudes of man: an approach from evolutionary genetics. (1975).
26. Wilson, D. S., Van Vugt, M. & O’Gorman, R. Multilevel selection theory and major evolutionary transitions: Implications for psychological science. Curr. Dir. Psychol. Sci. 17, 6–9 (2008).
27. Eisenhower, D. D. Crusade in Europe. p. 450. (JHU Press, 1997).
28. Szathmáry, E. & Smith, J. M. The major evolutionary transitions. Nature 374, 227–232 (1995).
29. Campbell, D. T. Assessing the impact of planned social change. Eval. Program Plann. 2, 67–90 (1979).
30. Wu, L., Wang, D. & Evans, J. A. Large teams develop and small teams disrupt science and technology. Nature 566, 378–382 (2019).
31. Bowles, S. The moral economy: Why good incentives are no substitute for good citizens. (Yale University Press, 2016).
32. Panchanathan, K. George Price, the Price equation, and cultural group selection. (2011).
33. Moher, D. et al. Assessing scientists for hiring, promotion, and tenure. PLOS Biol. 16, e2004089 (2018).
34. Moher, D. et al. The Hong Kong Principles for assessing researchers: Fostering research integrity. PLOS Biol. 18, e3000737 (2020).
35. Position paper ‘Room for everyone’s talent’ | NWO. https://www.nwo.nl/en/position-paper-room-everyones-talent.