Aktuelle Forschungsarbeiten
Auf der Seite finden Sie einen ?berblick zu aktuellen Forschungsarbeiten des Lehrstuhls. Die Working Paper k?nnen gerne auf Nachfrage versendet werden.
Twofold nested error regression models with data-driven transformation
Kyalo, R.K.; Schmid, T.; Würz,N.
Small area estimation effectively addresses the issue of small sample sizes within sub-populations. Typically, the target population is divided into multiple nested hierarchical levels, such as counties and sub-counties. A twofold nested error regression model with area and sub-area random effects captures the variability across these levels. For estimating non-linear indicators like poverty measures, the twofold EBP model can be used. The model relies on normality assumptions of the error terms - a condition often unmet in real data applications. This research enhances the twofold nested error regression model by incorporating data-driven transformations, improving the model's robustness and flexibility. MSE estimation is performed using resampling methods. Model-based simulations compare the proposed model's performance with onefold EBP methods that include either area or sub-area random effects. Results show that the proposed twofold EBP method with data-driven transformation adapts to the distribution shape, thereby providing more efficient estimates than a fixed logarithmic transformation or no transformation. Finally, the twofold EBP with data-driven transformation is used to generate poverty estimates for rural and urban regions within Kenyan counties, offering a more nuanced and accurate assessment of poverty levels.
Small Area Estimation under limited auxiliary data and complex survey data
Neef, S.; Schmid, T.; Würz, N.
Abstract: The paper proposes an Empirical Best Linear Unbiased Predictor that allows for fixed and data-driven transformations under limited auxiliary data while simultaneously adjusting for complex survey designs. Fixed or data-driven transformations are a common method for reducing the skewness of a variable. However, when calculating the area mean and in cases where only limited auxiliary data is available a first and second-order bias are introduced due to Jensen’s inequality (Würz et al., 2022). Additional bias is introduced from disregarding a complex survey design. By incorporating the design weights into the estimation and utilizing KDE, we hope to reduce this bias. We furthermore propose a weighted bootstrap estimator for precise quantification of the variance. The method is applied to data from the Socio-Economic Panel and evaluated through a model-based simulation study. We want to show that compared to already established methods like the (weighted) EBP the proposed estimator produces similar results requiring less information.
Alternative Selection Mechanisms in Online Surveys
Prücklmair, F.; Rendtel, U.
Abstract: The rapid changes in internet accessibility and the evolving ways it is used could raise doubts about whether earlier findings on selection effects in internet surveys still hold true. We address the following questions: 1) Is the selection criterion by mere internet access still a reasonable self-selection criterion? 2) If this is no longer true, how should alternative self-selection processes be modeled? 3) How can these selection processes be controlled? Are demographic control variables sufficient to establish the Missing at Random (MAR) condition, or is it possible to establish the MAR condition with more powerful control variables? 4) To what extent do the weighting procedures correct a self-selection bias? We investigate these questions in the setting of a simulation study where we assume four different selection models. These involve the length of internet use, posting behavior, and interest in politics and are based on theoretical considerations. We use the European Social Survey (ESS) as a simulation environment, which contains these variables and demographic background variables. It also includes our outcome variable, the vote in the 2017 Bundestag election in Germany. In order to judge the differences of the non-probability results and the simulated universe we compare the results of ESS estimates of the 2017 Bundestags elections the real election results.
Area-level small area estimation with random forests
Harmening, S.; Lee, Y.; Runge, M.; Schmid, T.
Abstract: An approach that combines a small area estimation model with tree-based methods to provide a solution when only area-level data are available is presented, namely the area-level mixed-effects random forest. In particular, the linear regression synthetic part of the Fay-Herriot model is replaced by a random forest to link survey data with related administrative information or data from other sources. By using a random forest, possible interactions and nonlinear relationships are accounted for, and automatic variable selection and robustness to outliers are indirectly provided as a property of the random forest. To obtain point estimates for an indicator of interest, the familiar structure of the Fay-Herriot estimator is retained. The estimation is done by implementing an expectation maximization algorithm. To determine the uncertainty of the point estimator, a nonparametric bootstrap method for estimating the mean squared error is presented. The use of data transformations like the log transformation is investigated in the context of machine learning methods. In particular, a log transformation is applied to the direct estimates and due to the nonlinearity of the logarithm, the final point mixed-effects random forest and mean squared error estimates on the original scale are back-transformed by taking into account a bias-correction. To evaluate the accuracy and precision of the proposed estimator and its uncertainty measure, model-based simulations are carried out. The presented methodology is illustrated by using household survey and remote sensing data from Mozambique to estimate average per capita consumption at a km grid-level.
Small area estimation with generalized random forests: Estimating poverty rates in Mexico
Frink, N.; Schmid, T.
Abstract: Identifying and addressing poverty is challenging in administrative units with limited information on income distribution and well-being. To overcome this obstacle, small area estimation methods have been developed to provide reliable and efficient estimators at disaggregated levels, enabling informed decision-making by policymakers despite the data scarcity. From a theoretical perspective, we propose a robust and flexible approach for estimating poverty indicators based on binary response variables within the small area estimation context: the generalized mixed effects random forest. Our method employs machine learning techniques to identify predictive, non-linear relationships from data, while also modeling hierarchical structures. Mean squared error estimation is explored using a parametric bootstrap. From an applied perspective, we examine the impact of information loss due to converting continuous variables into binary variables on the performance of small area estimation methods. We evaluate the proposed point and uncertainty estimates in both model- and design-based simulations. Finally, we apply our method to a case study revealing spatial patterns of poverty in the Mexican state of Tlaxcala.
For further information please click on the link
Gradient Boosting for Hierarchical Data in Small Area Estimation
Messer, P.; Schmid, T.
Abstract: This paper introduces Mixed Effect Gradient Boosting (MEGB), which combines the strengths of Gradient Boosting with Mixed Effects models to address complex, hierarchical data structures often encountered in statistical analysis. The methodological foundations, including a review of the Mixed Effects model and the Extreme Gradient Boosting method, leading to the introduction of MEGB are shown in detail. It highlights how MEGB can derive area-level mean estimations from unit-level data and calculate Mean Squared Error (MSE) estimates using a nonparametric bootstrap approach. The pa
