A scientist struggles to adjust a financial research hurdle in a surreal landscape.

Are the Gatekeepers Failing? Rethinking How We Validate Financial Research

Lena Kashyap in Business & Economy December 2025 • 4 min read.

"A new study questions whether current statistical standards are enough to prevent false discoveries in financial research, suggesting a need to focus on more reliable methods."

Imagine you've stumbled upon a groundbreaking new factor that seems to predict stock returns. Exciting, right? But how do we know if this 'factor' is real or just a statistical fluke? For decades, researchers have relied on a simple rule: if the factor's 't-statistic' (a measure of its statistical significance) exceeds 1.96, it's considered a genuine discovery. This threshold has acted as a gatekeeper, separating true insights from false positives.

However, some experts are starting to question whether this gatekeeper is doing its job effectively. They argue that the 1.96 hurdle might be too low, leading to a flood of published findings that don't hold up under scrutiny. This has sparked a debate about raising the bar – increasing the stringency of statistical tests to reduce the number of false discoveries in academic publications.

A recent study throws a wrench into this debate, suggesting that simply raising the statistical hurdle might not be the answer. The author, Andrew Y. Chen, argues that published financial data is inherently biased, making it difficult to empirically justify raising the existing standards. This article breaks down Chen's findings and explores the complexities of validating financial research.

The Problem with Raising the Bar: Why T-Statistic Hurdles Might Not Work

A scientist struggles to adjust a financial research hurdle in a surreal landscape.

Chen's research highlights a critical issue: publication bias. The academic world tends to favor statistically significant results, meaning that studies finding no effect or a weak effect are less likely to be published. This creates a skewed picture, where the existing literature overrepresents successful findings and underrepresents those that 'failed' to meet the statistical hurdle.

Think of it like this: imagine a fishing competition where only the biggest fish are weighed and displayed. You might get the impression that all the fish in the lake are huge, but that's only because you're not seeing all the smaller ones that didn't make the cut. Similarly, published financial data only shows the 'biggest fish' – the statistically significant results – while hiding the 'smaller fish' – the non-significant ones.

Unseen Data: Researchers can only directly analyze published data.
Extrapolation Challenges: To properly adjust statistical hurdles, we need to understand the distribution of both published and unpublished results. However, unpublished data is, by definition, unobserved. Estimating the characteristics of this missing data requires extrapolation, which can be unreliable.
Weak Identification: Due to the need for extrapolation, Chen argues that attempts to empirically justify raising t-statistic hurdles suffer from weak identification. In other words, the available data doesn't provide enough information to confidently determine the optimal hurdle.

To illustrate this point, Chen presents a theoretical analysis and an empirical study of cross-sectional return predictability. He demonstrates that the weak identification problem makes it difficult to definitively say whether t-statistic hurdles should be raised, lowered, or remain the same.

A New Path Forward: Focusing on What We Can See

So, if raising t-statistic hurdles isn't the answer, what is? Chen's research offers a more promising alternative: focusing on statistical methods that target only published findings. These methods, such as empirical Bayes shrinkage and the False Discovery Rate (FDR), can be strongly identified because they rely on data that is readily available.

About this Article -

This article was crafted using a human-AI hybrid and collaborative approach. AI assisted our team with initial drafting, research insights, identifying key questions, and image generation. Our human editors guided topic selection, defined the angle, structured the content, ensured factual accuracy and relevance, refined the tone, and conducted thorough editing to deliver helpful, high-quality information.See our About page for more information.

Everything You Need To Know

Why has the traditional t-statistic of 1.96 as a gatekeeper in financial research come under scrutiny?

The t-statistic threshold of 1.96, which has long been used to determine the statistical significance of financial research findings, is now being questioned due to concerns that it may be too lenient. This low hurdle potentially leads to the publication of many false positives, meaning that findings which appear significant at first glance do not hold up when subjected to further scrutiny. This has sparked a debate on whether to raise this threshold to improve the reliability of academic publications.

What is publication bias, and how does it complicate efforts to validate financial research?

Publication bias refers to the tendency in academic publishing to favor statistically significant results over those that show no effect or a weak effect. This bias creates a skewed view of the data because published financial data overrepresents 'successful' findings and underrepresents 'failed' ones. This makes it difficult to empirically justify raising t-statistic hurdles since assessments of statistical significance rely heavily on the distribution of both published and unpublished results, and unpublished data is, by definition, unobserved and requires extrapolation.

According to Chen's research, why might simply raising the t-statistic hurdle not be an effective solution to improve the validity of financial research?

According to Andrew Y. Chen, simply raising the t-statistic hurdle may not be effective because published financial data is inherently biased due to publication bias. Chen argues that attempts to empirically justify raising t-statistic hurdles suffer from weak identification. This means that the available data doesn't provide enough information to confidently determine the optimal hurdle because the characteristics of the missing, unpublished data, must be extrapolated, which can be unreliable.

What alternative statistical methods does Chen's research suggest for validating financial research, and why are they considered more promising?

Chen's research suggests focusing on statistical methods that target only published findings, such as empirical Bayes shrinkage and the False Discovery Rate (FDR). These methods are considered more promising because they can be strongly identified since they rely on readily available data. Unlike methods that require extrapolation from unpublished data, empirical Bayes shrinkage and FDR can provide more reliable assessments of the validity of published research.

What are the implications of weak identification in the context of adjusting statistical hurdles in financial research, as highlighted by Chen's work?

Weak identification, as highlighted by Chen's work, implies that the available data does not provide enough information to confidently determine whether t-statistic hurdles should be raised, lowered, or remain the same. This is primarily because assessing the appropriateness of these hurdles requires understanding the distribution of both published and unpublished results. Since unpublished data is, by definition, unobserved, estimating its characteristics involves extrapolation, which can be unreliable and lead to uncertain conclusions about the optimal statistical threshold. This uncertainty undermines efforts to improve the validity of financial research by simply adjusting the t-statistic hurdle.