Statistical curve morphing into kernel shape

Kernel Choice: The Unsung Hero of Accurate Boundary Inference

"Why selecting the right kernel can dramatically improve the reliability of local polynomial density estimation."


In various fields, including economics, statistics, and social sciences, understanding probability density at boundary points is crucial for accurate analysis. For example, economists study wage distributions at the lower tail to understand poverty and inequality. In regression discontinuity (RD) analysis, testing for manipulation often involves examining the density of a 'running' variable at a specific threshold. A popular tool for this is the local polynomial density (LPD) estimator.

The LPD estimator, known for its desirable features such as automatic boundary adaptation, has become a go-to method for manipulation testing in RD designs. Its widespread adoption is evident in numerous empirical economics papers and related fields. However, despite its popularity, the LPD estimator sometimes exhibits poor numerical performance at boundaries, leading to unreliable results.

Recent research reveals that kernel selection has a profound impact on the performance of LPD estimators at boundary points. Commonly used kernels with compact support can lead to larger variances in both asymptotic and finite samples. This negatively affects manipulation testing, often resulting in low statistical power. Conversely, using a kernel function with unbounded support, such as the spline-type kernel (Laplace density), can significantly improve accuracy and power.

Why Kernel Choice Matters: Beyond the Standard Assumptions

Statistical curve morphing into kernel shape

Kernel selection often takes a backseat in discussions about kernel-based estimators, with the general belief that it doesn't significantly impact performance. However, for LPD estimators at boundary points, this assumption doesn't hold. Compactly supported kernels, commonly used in these estimations, lead to larger asymptotic and finite-sample variances. This translates to a non-negligible efficiency loss, making the estimator less reliable.

One critical drawback of using compactly supported kernels with LPD estimators is the lack of a finite variance in finite samples. This is not just a theoretical concern; it can severely undermine the accuracy and reliability of manipulation tests in RD designs. In manipulation testing, where the sample size decreases on one side of the cutoff, this issue is exacerbated, potentially leading to low power even with large discontinuities.

  • Asymptotic Efficiency: Contrary to common assumptions, the asymptotic efficiency of the LPD estimator is heavily influenced by the kernel choice. Compactly supported kernels have a larger asymptotic variance than spline-type kernels.
  • Finite Sample Variance: LPD estimators using compactly supported kernels do not have a finite variance in finite samples, a property inherited from local polynomial techniques.
  • Impact on Manipulation Testing: Undesirable variance properties can significantly lower the power of manipulation tests in RD designs, especially when sample sizes are small near the cutoff.
Theoretical and numerical investigations suggest that using non-compactly supported kernel functions, such as the spline-type kernel, is preferable for boundary inference with LPD estimators. This approach offers a simple yet powerful remedy to improve efficiency, power, and overall numerical performance. The spline-type kernel's ability to assign larger weights near the boundary reduces bias and improves the accuracy of the estimation.

The Path Forward: Kernel Selection as a Key Tool

Kernel selection is a critical aspect of boundary inference when using the LPD estimator. Choosing the right kernel function, particularly one with non-compact support like the spline-type kernel, can significantly improve estimation and inference accuracy. This has far-reaching implications, particularly for manipulation testing in RD designs, where accurate and powerful tests are essential for reliable policy and research conclusions.

About this Article -

This article was crafted using a human-AI hybrid and collaborative approach. AI assisted our team with initial drafting, research insights, identifying key questions, and image generation. Our human editors guided topic selection, defined the angle, structured the content, ensured factual accuracy and relevance, refined the tone, and conducted thorough editing to deliver helpful, high-quality information.See our About page for more information.

This article is based on research published under:

DOI-LINK: https://doi.org/10.48550/arXiv.2306.07619,

Title: Kernel Choice Matters For Boundary Inference Using Local Polynomial Density: With Application To Manipulation Testing

Subject: econ.em

Authors: Shunsuke Imai, Yuta Okamoto

Published: 13-06-2023

Everything You Need To Know

1

What is the role of the Local Polynomial Density (LPD) estimator in the context of boundary inference?

The Local Polynomial Density (LPD) estimator is a key tool used for boundary inference, particularly in fields like economics and statistics. It helps in understanding probability density at boundary points. For example, it is used to study wage distributions at the lower tail to understand poverty and inequality, and it is used in regression discontinuity (RD) analysis to test for manipulation by examining the density of a 'running' variable at a specific threshold. The LPD estimator's ability to adapt to boundaries makes it a popular choice in many empirical studies.

2

Why is kernel selection so important for the Local Polynomial Density (LPD) estimator, especially at boundaries?

Kernel selection significantly impacts the performance of the Local Polynomial Density (LPD) estimator at boundary points. Unlike the common assumption that kernel choice doesn't greatly affect performance, for LPD estimators, the choice of kernel has a profound effect. Compactly supported kernels, often used, lead to larger variances. The use of non-compactly supported kernel functions, such as the spline-type kernel, is preferable for boundary inference with LPD estimators.

3

What are the drawbacks of using compactly supported kernels with the Local Polynomial Density (LPD) estimator?

Compactly supported kernels with the Local Polynomial Density (LPD) estimator have several drawbacks. One critical issue is the lack of finite variance in finite samples. This characteristic can severely undermine the accuracy and reliability of manipulation tests. The asymptotic efficiency of the LPD estimator is also affected, with compactly supported kernels leading to a larger asymptotic variance compared to spline-type kernels. This translates to reduced statistical power in manipulation tests, especially when sample sizes are small near the cutoff.

4

How does the choice of kernel function influence the accuracy and power of manipulation tests in regression discontinuity (RD) designs?

The choice of kernel function significantly impacts the accuracy and power of manipulation tests in regression discontinuity (RD) designs. Using compactly supported kernels can lead to lower power due to larger variances, making it harder to detect manipulation. In contrast, employing a kernel function with unbounded support, like the spline-type kernel, improves accuracy and power. The spline-type kernel assigns larger weights near the boundary, reducing bias and improving the accuracy of the estimation, which is essential for reliable policy and research conclusions.

5

What are the key advantages of using a spline-type kernel (Laplace density) with the Local Polynomial Density (LPD) estimator for boundary inference?

The spline-type kernel (Laplace density) offers several key advantages when used with the Local Polynomial Density (LPD) estimator for boundary inference. It significantly improves estimation and inference accuracy compared to compactly supported kernels. This is due to its ability to assign larger weights near the boundary, reducing bias and improving the accuracy of the estimation. Furthermore, the spline-type kernel enhances the power of manipulation tests in regression discontinuity (RD) designs. This results in more reliable tests, supporting sound policy and research decisions.

Newsletter Subscribe

Subscribe to get the latest articles and insights directly in your inbox.