Your science is justified by the question, not the answer

Registration of clinical trials, prior to any patient recruitment, is now common. Though trial registrations often omit the important details regarding their proposed analyses (despite advice to the contrary), most trialists seem to agree, at least in principle, that you should transparently describe your plans for clinical trial data before they are collected. Unfortunately, this remains a foreign concept in other areas of clinical and public health research.

Before I started working as a biostatistician involved in clinical trials, I worked with birth cohort studies. Whereas the vast majority of clinical trials are designed to answer a single research question (or small set of them), the prospective cohort study design allows researchers to answer many questions, even thousands, including questions that nobody envisioned at the start of the study. Given the costs of recruiting and following-up hundreds or thousands of study participants, many believe that it’s our responsibility to extract as much value from these data as we can. However, this well-intentioned mind-set can quickly be distorted to promote toxic research practices, where perverse incentives to publish, combined with the poor overall quality of peer-review, have created a body of flawed, often useless, research with results affected by various forms of p-hacking (intentional or otherwise) and file-drawer effects.

I imagine that most people reading this will be well aware of these issues, but the world of cohort studies has done little, in my opinion, to address them. Pre-registration of planned data analyses would go a long way to solving these problems, which has been repeatedly suggested by many others. The key questions then are where should people register their analyses, and how can we realistically compel them to do so?

Research funders have played an important role in the open access research movement, through decree, and by covering the costs of open access publishing. They must play a similar role in the pre-registration of data analyses. I believe that any new cohort study should be required by the funder to maintain a transparent, current list of the analyses they plan to conduct. Many studies already use some form of a publication committee to plan research outputs, so this would be a natural extension of their work. These study-specific registries could be hosted using a variety of online, easy to use, low-cost tools that provide a timestamp and DOI. Most importantly, funders should then require researchers to update progress on the registered analyses as a part of their regular reporting processes (hopefully replacing the less useful information that is often asked for). Subsequent manuscripts should then reference the respective analysis registration(s), and take the opportunity to justify any departures in the supplemental material. I also suspect such public registries would help identify helpful collaborators, and justify when researchers might prefer to not yet share the study data.

While I don’t see a substantial logistical challenge to cohort studies maintaining their own registry of planned data analyses, I suspect that many reading this will think that it’s impossible none-the-less.

This is why: I continue to collaborate on cohort studies that I didn’t design, where my involvement is typically sought long after the data have been collected. I will typically ask my colleagues to first send me a draft of the introduction and the methods of recruitment and measurement, as if we were preparing a manuscript. I then tell them that I will add the appropriate statistical methods and a dispassionate summary of the results, including the tables and plots, after which we will plan to meet minds, outline the discussion, and start editing the paper. This request almost always causes discomfort, even when everyone acknowledges it’s just a process to move things along. People always want to see the results first. In my opinion, this is because we have adopted a view of science where the justification for a research project is given by the results. But this is completely backwards! It’s the question that drives us. Is it important enough to answer? And can it be answered in a scientific manner?

In other words, you should be able to start drafting a manuscript before the data are collected, much less analysed (using exactly what went into your grant and/or ethics application). This shouldn’t cause any discomfort at all. This is the standard we should all strive to attain. This propensity to want to see the results first and then spin the paper (and that’s exactly what it is – spin) is harming science, and those who want to discredit science are starting to exploit it.

“It is difficult to get a man to understand something, when his salary depends upon his not understanding it!” – Upton Sinclair