# 统计代写|数据科学、大数据和数据多样性代写Data Science, Big Data and Data Variety代考|DATA100

## 统计代写|数据科学、大数据和数据多样性代写Data Science, Big Data and Data Variety代考|Leveraging Machine Learning Algorithms for Finite Population

Breidt and Opsomer (2017) explored how MLMs can be used as the basis of the working models in the model-assisted approach to design-based estimation for probability samples. They provided a general framework within which predictions from various MLMs can be incorporated and used to derive finite population estimates and inference. They specifically illustrated how methods like $k$-nearest neighbors, CARTs and neural networks could be used within this framework. More recently, Buelens et al. (2018) explored similar approaches for using MLMs for inference from nonprobability samples. Their work compares quasi-randomization methods to model-based methods (or super-population models, as more formally explained by Elliott and Valliant (2017)), where various MLMs are used in creating the models. More specifically, Buelens et al. (2018) compared sample mean estimation, quasi-randomization pseudo-weighting based on poststratification via a known auxiliary variable for the entire population, generalized linear models, and a host of MLMs including $k$-nearest neighbors, ANNs, regression trees and support vector machines as the basis of generating model-based estimates. They tuned each of the MLMs using a repeated split-sample scenario based on 10 bootstrap replications, and the optimal values of the respective tuning parameters were then used to form models upon which final estimates were generated. In predicting a continuous outcome, Buelens and colleagues reported generally adequate, although varied, results from the MLMs compared to using either the sample mean or the pseudo-weighted quasi-randomization based estimator. Generally, the MLMs removed more or nearly the same amount of bias due to self-selection compared to either the sample mean or pseudo-weighted estimator, especially under moderate to severe levels of self-selection. They also identified support vector machines as a top performer compared to all other methods in almost all scenarios they examined.
Model-based estimates generally work well if the predictions made using the model are well suited for population members not included in the sample. The models are estimated using data from the sample (be it probability or nonprobability) based on a set of covariates that are available from members of the sample and population. If these covariates fail to fully represent the self-selection bias mechanism, the range of values on the outcome of interest may differ between the sample and population members not included in the sample. If so, some models will not be able to generate predicted values that extrapolate beyond the range of values observed from the population and this limitation will result in biased model-based estimates. Buelens and colleagues noted this limitation for the sample mean, pseudo-weighted estimator, $k$-nearest neighbors, and regression trees. They noted that generalized linear models, neural networks, and support vector machines (from among the MLMs they explored) are stronger choices in this situation. Reiterating points of exchangeability made by Mercer et al. (2017), predictive algorithm that can utilize it are very important. As this current work demonstrates. MLMs are not all equal in their ability to use such information adequately and some have limitations for such applications making method selection an important component in addition to variable selection for creating finite population inference using nonprobability samples.

## 统计代写|数据科学、大数据和数据多样性代写Data Science, Big Data and Data Variety代考|Discussion and Conclusions

This collection of examples, while not exhaustive, provides a glimpse into how survey researchers and social scientists are applying an assortment of data science methods within the new landscape. These examples show that data science methods are and can add value to the survey research process. But what is not as clear, yet, is just how the social sciences can add value to the broader data science community and Big Data ecosystem. Grimmer (2015) was quick to identify strengths that social scientists and survey researchers can bring to this conversation by stating, “Data scientists have significantly more experience with large datasets but they tend to have little training in how to infer causal effects in the face of substantial selection. Social scientists must have an integral role in this collaboration; merely being able to apply statistical techniques to massive datasets is insufficient. Rather, the expertise from a field that has handled observational data for many years is required.” In fact, the much-reported algorithmic bias among data scientists is a problem almost certainly related to coverage bias – an issue survey researchers have been investigating and mitigating for decades. More generally, advances on understanding and quantifying data error sources are yet another example where survey researchers can add value to the Big Data ecosystem. But apart from methodology, we believe that survey data in and of itself can add much needed insights into this ecosystem with the value proposition being directly related to the fact that surveys are often designed to maximize information about context and provide signals related to the “why” question. In this vein, survey data offer valuable additional sources of information that can enhance prediction accuracy.

This chapter is in no way comprehensive. Our goal was to provide survey researchers and social scientists, as well as data and computer scientists with some examples and ideas that illustrate how MLMs are and can be used throughout the main phases of the survey research process. If we were to cluster these examples in terms of how MLMs have been used, we would find four major uses simply labeled as processing, preparation, prioritization, and prediction.

## 统计代写|数据科学、大数据和数据多样性代写Data Science, Big Data and Data Variety代考|Adapting Machine Learning Methods to the Survey Setting

myassignments-help数学代考价格说明

1、客户需提供物理代考的网址，相关账户，以及课程名称，Textbook等相关资料~客服会根据作业数量和持续时间给您定价~使收费透明，让您清楚的知道您的钱花在什么地方。

2、数学代写一般每篇报价约为600—1000rmb，费用根据持续时间、周作业量、成绩要求有所浮动(持续时间越长约便宜、周作业量越多约贵、成绩要求越高越贵)，报价后价格觉得合适，可以先付一周的款，我们帮你试做，满意后再继续，遇到Fail全额退款。

3、myassignments-help公司所有MATH作业代写服务支持付半款，全款，周付款，周付款一方面方便大家查阅自己的分数，一方面也方便大家资金周转，注意:每周固定周一时先预付下周的定金，不付定金不予继续做。物理代写一次性付清打9.5折。

Math作业代写、数学代写常见问题

myassignments-help擅长领域包含但不是全部: