Thursday, June 15, 2017

Guest Post: Adjusting for Publication Bias in Meta-Analysis - A Response to Data Colada [61]

A recent blogpost on Data Colada raises the thorny but important issue of adjusting for publication bias in meta-analysis. In this guest post, three statisticians weigh in with their perspective.

Datacolada Post [61] Why p-curve excludes ps>.05
Response of Blakeley B. McShane, Ulf Böckenholt, and Karsten T. Hansen

The quick version:
Below, we offer a six-point response to the recent blogpost by Simonsohn, Simmons, Nelson (SSN) on adjusting for publication bias in meta-analysis (or click here for a PDF with figures). We disagree with many of the points raised in the blogpost for reasons discussed in our recent paper on this topic [MBH2016]. Consequently, our response focuses on clarifying and expounding upon points discussed in our paper and provides a more nuanced perspective on selection methods such as the three-parameter selection model (3PSM) and the p-curve (a one-parameter selection model (1PSM)).

We emphasize that all statistical models make assumptions, that many of these are likely to be wrong in practice, and that some of these may strongly impact the results. This is especially the case for selection methods and other meta-analytic adjustment techniques. Given this, it is a good idea to examine how results vary depending on the assumptions made (i.e., sensitivity analysis) and we encourage researchers to do precisely this by exploring a variety of approaches. We also note that it is generally good practice to use models that perform relatively well when their assumptions are violated. The 3PSM performs reasonably well in some respects when its assumptions are violated while the p-curve does not perform so well. Nonetheless, we do not view the 3PSM or any other model as a panacea capable of providing a definitive adjustment for publication bias and so we reiterate our view that selection methods—and indeed any adjustment techniques—should at best be used only for sensitivity analysis.


The full version:
Note: In the below, “statistically significant” means “statistically significant and directionally consistent” as in the Simonsohn, Simmons, Nelson (SSN) blogpost. In addition, the “p-curve” refers to the methodology discussed in SNS2014 that yields a meta-analytic effect size estimate that attempts to adjust for publication bias.(1)

Point 1: It is impossible to definitively adjust for publication bias in meta-analysis 
As stated in MBH2016, we do not view the three-parameter selection model (3PSM) or any other model as a panacea capable of providing a definitive adjustment for publication bias. Indeed, all meta-analytic adjustment techniques—whether selection methods such as the 3PSM and the p-curve or other tools such as trim-and-fill and PET-PEESE—make optimistic and rather rigid assumptions; further, the adjusted estimates are highly contingent on these assumptions. Thus, these techniques should at best be used only for sensitivity analysis.
[For more details in MBH2016, see the last sentence of the abstract; last paragraph of the introduction; point 7 in Table 1; and most especially the entire Discussion.]

Point 2: Methods discussions must be grounded in the underlying statistical model
All statistical models make assumptions. Many of these are likely to be wrong in practice and some of these may strongly impact the results. This is especially the case for selection methods and other meta-analytic adjustment techniques. Therefore, grounding methods discussions in the underlying statistical model is incredibly important for clarity of both thought and communication.
SSN argue against the 3PSM assumption that, for example, a p=0.051 and p=0.190 study are equally likely to be published; we agree this is probably false in practice. The question, then, is what is the impact of this assumption and can it be relaxed? Answer: it is easily relaxed, especially with a large number of studies. We believe the p-curve assumptions that (i) effect sizes are homogenous, (ii) non-statistically significant studies are entirely uninformative (and are thus discarded), and (iii) a p=0.049 study and a p=0.001 study are equally likely to be published are also doubtful. Further, we know via Jensen’s Inequality that the homogeneity assumption can have substantial ramifications when it is false—as it is in practically all psychology research.
[For more details in MBH2016, see the Selection Methods and Modeling Considerations sections for grounding a discussion in a statistical model and the Simulation Evaluation section for the performance of the p-curve.]

Point 3: Model evaluation should focus on estimation (ideally across a variety of settings and metrics)
SSN’s simulation focuses solely on Type I error—a rather uninteresting quantity given that the null hypothesis of zero effect for all people in all times and in all places is generally implausible in psychology research (occasional exceptions like ESP notwithstanding). Indeed, we generally expect effects to be small and variable across people, times, and places. Thus, “p < 0.05 means true” dichotomous reasoning is overly simplistic and contributes to current difficulties in replication. Instead, we endorse a more holistic assessment of model performance—one that proceeds across a variety of settings and metrics and that focuses on estimation of effect sizes and the uncertainty in them. Such an evaluation reveals that the 3PSM actually performs quite well in some respects—even in SSN’s Cases 2-5 and variants thereof in which it is grossly misspecified (i.e., when its assumptions are violated; see Point 6 below).
[For more details in MBH2016, see the Simulation Design and Evaluation Metrics subsection.]

Point 4: The statistical model underlying the p-curve is identical to the model of Hedges, 1984 [H1984]
Both the p-curve and H1984 are one-parameter selection models (1PSM) that make identical statistical assumptions: effect sizes are homogenous across studies and only studies with results that are statistically significant are “published” (i.e., included in the meta-analysis). Stated another way, the statistical model underlying the two approaches is 100% identical and hence if you accept the assumptions of the p-curve you therefore accept the assumptions of H1984 and vice versa.
The only difference between the two methods is how the single effect size parameter is estimated from the data:
H1984 uses principled maximum likelihood estimation (MLE) while p-curve minimizes the Kolmogorov-Smirnov (KS) test statistic. As MLE possesses a number of mathematical optimality properties; easily generalizes to more complicated models such as the 3PSM (as well as others even more complicated); and yields likelihood values, standard errors, and confidence intervals, it falls on SSN to mathematically justify why they view the proposed KS approach to be superior to MLE for psychology data.(2)
[For more details in MBH2016, see the Early Selection Methods and p-methods subsections.]

Point 5: Simulations require empirical and mathematical grounding
For a simulation to be worthwhile (i.e., in the sense of leading to generalizable insight), the values of the simulation parameters chosen (e.g., effect sizes, sample sizes, number of studies, etc.) and the data-generating process must reflect reality reasonably well. Further still, there should ideally be mathematical justification of the results. Indeed, with sufficient mathematical justification a simulation is entirely unnecessary and can be used merely to illustrate results graphically.
The simulations in MBH2016 provide ample mathematical justification for the results based on: (i) the optimal efficiency properties of the maximum likelihood estimator (MLE; Simulation 1), (ii) the loss of efficiency resulting from discarding data (Simulation 2), and (iii) the bias which results from incorrectly assuming homogeneity as a consequence of Jensen’s Inequality (Simulation 3). We remain uncertain about the extent to which Cases 2-5 of the SSN simulations reflect reality and thus seek mathematical justification for the generalizability of the results. Nonetheless, they seem of value if viewed solely for the purpose of assessing the 3PSM model estimates when that model is misspecified.
[For more details in MBH2016, see the Simulation Evaluation section.]

Point 6: The 3PSM actually performs quite well in SSN’s simulation—even when misspecified.
Only in Case 1 of the SSN simulation is the 3PSM properly specified (and even this is not quite true as the 3PSM allows for heterogeneity but the simulation assumes homogeneity). SSN show that when the 3PSM is misspecified (Cases 2-5), its Type I error is far above the nominal α=0.05 level. We provide further results in the figures here.
• The blue bars in the left panel of Figure 1 reproduce the SSN result. We also add results for the 1PSM as estimated via KS (p-curve) and MLE (H1984). As can be seen, the Type I error of the 1PSM MLE remains calibrated at the nominal level. In the right panel, we plot estimation accuracy as measured by RMSE (i.e, the typical deviation of the estimated value from the true value). As can be seen, the 3PSM is vastly superior to the two 1PSM implementations in some cases and approximately equivalent to them in the remaining ones.
• In Figure 2, we change the effect size from zero to small (d=0.2); the 3PSM has much higher power and better estimation accuracy as compared to the two 1PSM implementations.
• In Figure 3, we return to zero effect size but add heterogeneity (τ=0.2). The 1PSM has uncalibrated Type I error for all cases while the 3PSM remains calibrated in Case 1; in terms of estimation accuracy, the 3PSM is vastly superior to the two 1PSM implementations in some cases and approximately equivalent to them in the remaining ones.(3)
• In Figure 4, we change the effect size from zero to small and add heterogeneity. The 3PSM generally has similar power and better estimation accuracy as compared to the two 1PSM implementations (indeed, only in Case 1 does the 1PSM have better power but this comes at the expense of highly inaccurate estimates). 

In sum, the 3PSM actually performs quite well compared to the two 1PSM implementations—particularly when the focus is on estimation accuracy as is proper; this is especially encouraging given that the 1PSM is correctly specified in all five cases of Figures 1-2 while the 3PSM is only correctly specified in Case 1 of the figures. Although these results favor the 3PSM relative to the two 1PSM implementations, we reiterate our view that selection methods—and indeed any adjustment techniques—should at best be used only for sensitivity analysis.


Footnotes
(1) The same authors have developed a distinct methodology also labelled p-curve that attempts to detect questionable research practices. This note does not comment on that methodology.
(2) Both MLE and KS are asymptotically consistent and thus asymptotically equivalent for the statistical model specified here. Consequently, any justification will likely hinge on small sample properties which can be mathematically intractable for this class of models. Justifications based on robustness to model specification are not germane here because if a different specification deemed more appropriate, the model would be re-specified according to this more appropriate specification and that model estimated.
(3) A careful reading of SNS2014 reveals that the p-curve is not meant to estimate the population average effect size. As shown here and in MBH2016, it cannot as no 1PSM can. This is important because we believe that the heterogeneous effect sizes (i.e., τ > 0) are the norm in psychology research.


References
[H1984] Hedges, L. V. (1984). Estimation of effect size under nonrandom sampling: The effects of censoring studies yielding statistically insignificant mean differences. Journal of Educational and Behavioral Statistics, 9, 61–85.

[MBH2016] McShane, B.B., Böckenholt, U., and Hansen, K.T. (2016), “Adjusting for Publication Bias in Metaanalysis: An Evaluation of Selection Methods and Some Cautionary Notes.” Perspectives on Psychological Science, 11(5), 730-749.

[SNS2014] Simonsohn,U., Nelson, L.D. and Simmons, J.P. (2014) “p-Curve and Effect Size: Correcting for Publication Bias Using Only Significant Result”, Psychological Science, 2014, Vol.9(6), 666-681.

Monday, April 17, 2017

Everything is F*cking Nuanced: The Syllabus

Psych 342: Everything is Fucking Nuanced
Prof. Alison Ledgerwood
Class meetings: Ongoing, forever

A common theme in discussions about replicability and improving research practices across scientific disciplines has been debating whether or not science (or a specific scientific discipline) is “in crisis.” The implicit logic seems to be that we have to first establish that there is a crisis before research practices can begin to improve, or conversely, that research practices need not change if there is not a crisis. This debate can be interesting, but it also risks missing the point. Science is hard, reality is messy, and doing research well requires constantly pushing ourselves and our field to recognize where there is room for improvement in our methods and practices.

We can debate how big the sense of crisis should be till the cows come home. But the fact is, whether you personally prefer to describe the current state of affairs as “science in shambles” or “science working as it should,” we have a unique opportunity right now to improve our methods and practices simply because (a) there is always room for improvement and (b) we are paying far more attention to several key problems than we were in the past (when many of the same issues were raised and then all too often ignored; e.g., Cohen, 1992; Greenwald, 1975; Maxwell, 2004; Rosenthal, 1979).

In this class, we will move beyond splashy headlines like “Why most published research findings are false,” “Everything is fucked,”* and “Psychology is in crisis over whether it’s in crisis” to consider the less attention-grabbing but far more important question of Where do we go from here? Along the way, we will learn that the problems we face are both challenging and nuanced, and that they require careful and nuanced solutions.


Week 1 - Introduction to the F*cking Nuance: How we got here, and the single most important lesson we can learn going forward
Spellman, B. A. (2015). A short (personal) future history of Revolution 2.0. Perspectives on Psychological Science, 10, 886-899.

Ledgerwood, A. (2016). Introduction to the special section on improving research practices: Thinking deeply across the research cycle. Perspectives on Psychological Science, 11, 661-663.


 

Week 2: Estimating Replicability is F*cking Nuanced

Open Science Collaboration (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716.


Etz, A., & Vandekerckhove, J. (2016). A Bayesian Perspective on the Reproducibility Project: Psychology. PLoS ONE 11(2): e0149794. 


Stanley, D. J., & Spence, J. R. (2014). Expectations for replications: Are yours realistic? Perspectives on Psychological Science, 9, 305-318.


Anderson, S. F., & Maxwell, S. E. (2016). There’s more than one way to conduct a replication study: Beyond statistical significance. Psychological Methods, 21, 1.


Week 3: Power is F*cking Nuanced
 

Maxwell, S. E. (2004). The persistence of underpowered studies in psychological research: Causes, consequences, and remedies. Psychological Methods, 9, 147-163.

Button, K. S., Ioannidis, J. P., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S., & Munafò, M. R. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 365-376.


Perugini, M., Gallucci, M., & Costantini, G. (2014). Safeguard power as a protection against imprecise power estimates. Perspectives on Psychological Science, 9, 319-332.


McShane, B. B., & Böckenholt, U. (2014). You cannot step into the same river twice: When power analyses are optimistic. Perspectives on Psychological Science, 9, 612-625.

 

Week 4: Selecting an Optimal Research Strategy is F*cking Nuanced
 

Finkel, E. J., Eastwick, P. W., & Reis, H. T. (in press). Replicability and other features of a high-quality science: Toward a balanced and empirical approach. Journal of Personality and Social Psychology.
 

Miller, J., & Ulrich, R. (2016). Optimizing research payoff. Perspectives on Psychological Science, 11, 664-691.


Week 5: Interpreting Results from Individual Studies is F*cking Nuanced
 

De Groot, A. D. (2014). The meaning of “significance” for different types of research [translated and annotated by Eric-Jan Wagenmakers, Denny Borsboom, Josine Verhagen, Rogier Kievit, Marjan Bakker, Angelique Cramer, Dora Matzke, Don Mellenbergh, and Han LJ van der Maas]. Acta psychologica, 148, 188-194.

Schönbrodt, F. D., & Perugini, M. (2013). At what sample size do correlations stabilize? Journal of Research in Personality, 47, 609-612. 


Ledgerwood, A., Soderberg, C. K., & Sparks, J. (in press). Designing a study to maximize informational value. In J. Plucker & M. Makel (Eds.), Toward a more perfect psychology: Improving trust, accuracy, and transparency in research. Washington, DC: American Psychological Association. (See section on “Distinguishing Exploratory and Confirmatory Analyses.”)


Week 6: Maximizing What We Learn from Exploratory (Data-Dependent) Analyses is F*cking Nuanced


Steegen, S., Tuerlinckx, F., Gelman, A., & Vanpaemel, W. (2016). Increasing transparency through a multiverse analysis. Perspectives on Psychological Science, 11, 702-712.

Sagarin, B. J., Ambler, J. K., & Lee, E. M. (2014). An ethical approach to peeking at data. Perspectives on Psychological Science, 9, 293-304.

Wang, Y., Sparks, J., Gonzales, J., Hess, Y. D., & Ledgerwood, A. (2017). Using independent covariates in experimental designs: Quantifying the trade-off between power boost and Type I error inflation. Journal of Experimental Social Psychology, 72, 118-124.

 

Week 7: The Role of Direct, Systematic, and Conceptual Replications is F*cking Nuanced

Pashler, H., & Harris, C. R. (2012). Is the replicability crisis overblown? Three arguments examined. Perspectives on Psychological Science, 7, 531-536.

Roediger, H. L. (2012). Psychology’s woes and a partial cure: The value of replication. APS Observer, 25, 9.

Fabrigar, L. R., & Wegener, D. T. (2016). Conceptualizing and evaluating the replication of research results. Journal of Experimental Social Psychology, 66, 68-80.

Crandall, C. S., & Sherman, J. W. (2016). On the scientific superiority of conceptual replications for scientific progress. Journal of Experimental Social Psychology, 66, 93-99.


Week 8: Thinking Cumulatively about Evidence is F*cking Nuanced 

Braver, S. L., Thoemmes, F. J., & Rosenthal, R. (2014). Continuously cumulating meta-analysis and replicability. Perspectives on Psychological Science, 9, 333-342.

Tsuji, S., Bergmann, C., & Cristia, A. (2014). Community-augmented meta-analyses: Toward cumulative data assessment. Perspectives on Psychological Science, 9, 661-665.


McShane, B.B. and Böckenholt, U. (2017). Single paper meta-analysis: Benefits for study summary, theory-testing, and replicability. Journal of Consumer Research, 43, 1048-1063.



Week 9: Dealing with Publication Bias in Meta-Analysis is F*cking Nuanced

 
Inzlicht, M., Gervais, W., & Berkman, E. (September 11, 2015). Bias-Correction Techniques Alone Cannot Determine Whether Ego Depletion is Different from Zero: Commentary on Carter, Kofler, Forster, & McCullough, 2015.


McShane, B.B., Böckenholt, U., and Hansen, K.T. (2016). Adjusting for publication bias in meta-analysis: An evaluation of selection methods and some cautionary notes. Perspectives on Psychological Science, 11, 730-749.
 


Week 10: Incentive Structures Need Some F*cking Nuance

Maner, J. K. (2014). Let’s put our money where our mouth is: If authors are to change their ways, reviewers (and editors) must change with them. Perspectives on Psychological Science, 9, 343-351. 


Tullett, A. M. (2015). In search of true things worth knowing: Considerations for a new article prototype. Social and Personality Psychology Compass 9: 188–201.

Nosek, B. A., Spies, J. R., & Motyl, M. (2012). Scientific utopia: II. Restructuring incentives and practices to promote truth over publishability. Perspectives on Psychological Science, 7, 615-631.


Pickett, C. (2017, April 12). Let's Look at the Big Picture: A System-Level Approach to Assessing Scholarly Merit. Retrieved from osf.io/tv6nb
 


Week 11: Keep Reading...



*Note that in contrast to the other two headlines mentioned here,

Sanjay’s Everything is Fucked” title is obviously intentionally hyperbolic for comedic effect. He goes on to write: “What does it mean, in science, for something to be fucked? …In this class we will go a step further and say that something is fucked if it presents hard conceptual challenges to which implementable, real-world solutions for working scientists are either not available or routinely ignored in practice.” His post, as well as the other two articles noted here, raise important issues in thoughtful ways. But if you just focus on the titles, as many people have, you might find yourself sliding into a polarizing argument about how bad things are or aren’t. And this polarizing argument can distract us from the more pressing question of how do we get better, right now, starting today.

Tuesday, January 24, 2017

Why the F*ck I Waste My Time Worrying about Equality


Last week, I spent an enjoyable hour of a conference hanging out with five extremely smart people in remarkably tall chairs on a stage, talking about some of the opportunities and potential pitfalls of social media as a vehicle for scientific discourse. (Video here.)

The conversation was thought-provoking, as conversations with smart people tend to be…I sometimes disagreed with the other speakers, but always found their positions reasonable and often we would realize that we agreed more than we disagreed as we delved further into a topic.

One of the issues that came up early on was the question of gender bias in online discussions of research, because data from a survey of psychologists using social media suggested that there are some pervasive discrepancies between men and women when it comes to participating in scientific discourse on social media as well as with respect to how helpful men and women think participating in social media is for their careers. A comment from a female participant in the open-ended section of the survey summed up a common sentiment: “I just don’t have time for this sh*t!”

Late in the panel discussion that followed, an audience member asked about whether gender bias was apparent in the panel itself—were the male panelists talking more often or longer than the female panelists? Some intrepid coders went back to the video and figured out the answer was almost certainly no. But the fact that this question was even asked seems to have offended some people online, as illustrated by this comment:


At the apparent risk of causing someone to become literally sick, I’m going to take just a moment here to wonder why the fuck I worry about gender equality.

Let’s set aside for a moment the current political context in the United States and why that might make a person especially prone to worrying about gender equality. Let’s talk about just what’s going on in our science these days.

Early on in our panel discussion last Friday, Brian Nosek made the excellent point that “science proceeds through conversation.” He went on to elaborate that scientific conversation needs criticism and skepticism in order to flourish—and I completely agree. But I also think it’s worth juxtaposing this idea that science proceeds through conversation against the data presented at the beginning of the session, which suggested some big inequalities in WHO is participating in scientific discourse online. Across various social media platforms (PsychMAP, PMDG, and Twitter), the data from the SPSP survey suggest that men participate more than women. Moreover, if you look at who is posting in the Facebook forums, it turns out most of the content is being driven by about nine people. Think about that for a moment. NINE people—out of thousands of scholars involved in these forums—are driving what we talk about in these conversations. [UPDATE 1/26/17: The "about nine people" estimate mentioned in the presentation of the SPSP survey was a ballpark estimate of the number of people IN THE SURVEY saying that they post frequently on Facebook methods groups. You could translate this estimate as "about 2% of respondents post frequently," but it should definitely NOT be taken as meaning that only nine people post on social media! The point I was trying to make here was that a very tiny fraction of the field is currently driving the majority of the conversation on these platforms, and that I think we could do better.]

The idea that conversation is central to the entire scientific enterprise highlights why we should care deeply about WHO is participating in these conversations. If there are inequalities in who is talking, that means there are inequalities in who is participating in science itself. To the extent that the forums we build for scientific discourse enable and promote equality in conversation, they are enabling and promoting equality in who can be part of science. And the reverse is true as well: If we create forums that exclude rather than include, then we are creating a science that excludes as well.*

What makes a science exclusionary? Proponents of open science often point (rightly) to things like old boys networks and the tendency for established gate-keepers sometimes to prioritize well-known names over merit in publication or funding or speaker invitation decisions. But there are other factors that influence the exclusiveness or inclusiveness of a science as well. For instance, we know from Amanda Diekman’s work on why women opt out of STEM careers that when a career is perceived as less likely to fulfill communal goals, women are more likely than men to lose interest in the field (see also Sapna Cheryan's research on gendered stereotypes in computer science). Changes that make a field seem more combative and less communal are therefore likely to disproportionately push away women (and indeed anyone who prioritizes communal goals). 

Meanwhile, participating in a conversation about science obviously means not only that you are talking, but that someone is listening to you. To the extent that audience attention is finite (we only have so many hours a day to devote to listening, after all), then the more one person speaks, the less attention is left over to spend on other speakers. That means that the people who talk the most end up setting the threshold for getting heard—if you don’t comment as loudly or as frequently as the loudest and most frequent contributors, you risk being drowned out in the din. In such an environment, who is talking—that is, who gets to participate in science itself—becomes less of an open, level playing field and more of a competition where people with more time and more willingness to engage in this particular style of discourse get to drive disproportionately the content of scientific conversation.**

Here again, we might think about various demographic inequalities. Take just the question of time: Women in academia tend to spend substantially more time on service commitments than do men. Scholars at teaching institutions spend more time in scheduled teaching activities than do their peers with more flexible schedules at research institutions. Primary caregivers have greater demands on their time than people with stay-at-home partners or people with the means to pay for full time childcare. If we create venues for scientific discourse where your ability to participate effectively depends on how much time you have to make your voice heard over the din, then we are effectively saying: We prioritize the voices of men more than women, of scholars at research rather than teaching institutions, and of people with more versus less childcare support.

So to those who keep saying why worry about inequality in scientific discourse, just so you know, this is what it sounds like you are saying: Why worry about inequality, because the existing inequalities don’t bother me. I’m fine with them. I’m okay with our science excluding some groups more than others. I’d like to focus on other things instead, and let psychology become more like other STEM fields in terms of what they look like demographically.

And you know what? You are totally entitled to that opinion.

And I am entitled to mine. Which is, in a nutshell: Fuck that.


--
*Note that I'm talking about any forum for scientific discourse, not just social media. For the record, I thinks social media offers some amazing opportunities for increasing inclusiveness in science. And I think that with some careful attention and creativity, we could maximize those benefits while mitigating some of the issues I raise here. (Here's an example of one recent attempt to do that.)

**Again, this issue is not remotely unique to social media...it's true of lab meetings, conference panels, publishing in traditional journals with limited page space, you name it.

Monday, September 5, 2016

The Only Heuristic You'll Ever Need


I don’t know about you, but when the shouting gets shouty, I like to wrap myself in a warm blanket of thoughtful nuance. Fortunately, I have here in front of me a set of six manuscripts that do exactly that, and they are headed your way in the latest special section on improving research practices in the forthcoming September issue of Perspectives on Psychological Science. 

I have talked before about the tendency for humans to love a good cognitive shortcut, and I suspect that cognitive shortcuts act as both antecedents to and consequences of the shouting matches that sometimes erupt in the ongoing conversation on research practices. One of my favorite drinking games these days* is to take a shot every time somebody claims “everyone knows X” or “nobody is arguing Y” or “I don’t think anyone would do Z.” It turns out that this is a prime example of the false consensus effect—a heuristic that leads people to overestimate the extent to which other people share their own beliefs, preferences, and behaviors. We tend to use our own beliefs and behaviors as a guesstimate and generalize from there.

Meanwhile, if I simplify the landscape of perspectives into two sides, I’m more likely to perceive the “other” side as unified, homogeneous, and extreme in their positions, and I contribute in turn to other people’s perceptions that there are only two sides. These and other heuristics tend to sink us further into polarizing arguments and unhelpful finger-pointing, and impede our ability to have constructive discussions, learn from each other, change our own minds, and build consensus.

Moreover, cognitive shortcuts also played a major role in creating the problems with our methods and practices that we are now confronting (p < .05, anyone?). As I note in my introduction to our new special section (available here, in UC’s open access repository, if you’d like a sneak peek): The single most important lesson we can draw from our past in this respect is that we need to think more carefully and more deeply about our methods and our data. Heuristics got us into this mess. Careful thinking will help get us out. The only heuristic you'll ever need in science is this: Don't rely on heuristics. 

And this is why the papers in this special section feel like a warm blanket of thoughtful nuance to me: Together, they highlight the importance of thinking carefully at each phase of the research process, from selecting among multiple possible research strategies, to analyzing one’s data, to aggregating across multiple studies to build a more comprehensive picture of a given topic area.

They hammer home the importance of thinking carefully about tradeoffs when choosing one research strategy over another (e.g., running fewer studies with larger samples or more studies with smaller samples), echoing and building on recent calls to fully consider both the pros and cons of a given research strategy when seeking to design smart changes for one’s own lab or for the field as a whole (see e.g., Finkel, Eastwick, & Reis, in press; Gelman,2013; Ledgerwood, Soderberg, & Sparks, in press). They push us to more carefully examine and transparently communicate the assumptions we make when we analyze our data. And they unpack some of the idealized assumptions underlying various meta-analytic techniques—including p-curve and p-uniform, as well as traditional methods—and show us what happens when those assumptions are violated, as they often are in the real world. (Don’t worry, there’s a better way to do meta-analysis, and the last article in the special section explains how.)

Most importantly, the articles all provide concrete advice both on how we can be more careful and more transparent about the assumptions we make throughout the research process, and on how we can continue to improve our research practices in a thoughtful, smart, and nuanced way. 

So if you’re feeling tired of the shouting, and you’re ready for some nuance, stay tuned: The following articles are coming your way, open access, very shortly.
 

*Just kidding!**

**Or am I?