# 统计代写|广义线性模型代写generalized linear model代考|MAST30025

## 统计代写|广义线性模型代写generalized linear model代考|Zero Inflated Count Models

Sometimes we see count response data where the number of zeroes appearing is significantly greater than the Poisson or negative binomial models would predict. Consider the number of arrests for criminal offenses incurred by individuals. A large number of people have never been arrested by the police while a smaller number have been detained on multiple occasions. Modifying the Poisson by adding a dispersion parameter does not adequately model this divergence from the standard count distributions.

We consider a sample of 915 biochemistry graduate students as analyzed by Long (1990). The response is the number of articles produced during the last three years of the $\mathrm{PhD}$. We are interested in how this is related to the gender, marital status, number of children, prestige of the department and productivity of the advisor of the student. The dataset may be found in the pscl package of Zeileis et al. (2008) which also provides the new model fitting functions needed in this section. We start by fitting a Poisson regression model: We can see that deviance is significantly larger than the degrees of freedom. Some experimentation reveals that this cannot be solved by using a richer linear predictor or by eliminating some outliers. We might consider a dispersed Poisson model or negative binomial but some thought suggests that there are good reasons why a student might produce no articles at all. We count and predict how many students produce between zero and seven articles. Very few students produce more than seven articles so we ignore these. The predprob function produces the predicted probabilities for each case. By summing these, we get the expected number for each article count.

## 统计代写|广义线性模型代写generalized linear model代考|Contingency Tables

A contingency table is used to show cross-classified categorical data on two or more variables. The variables can be nominal or ordinal. A nominal variable has categories with no natural ordering; for example, consider the automotive companies Ford, General Motors and Toyota. An ordering could be imposed using some criterion like sales, but there is nothing inherent in the categories that makes any particular ordering obvious. An ordinal variable has a natural default ordering. For example, a disease might be recorded as absent, mild or severe. The five-point Likert scale ranging through strongly disagree, disagree, neutral, agree and strongly agree is another example.

An interval scale is an ordinal variable that has categories with a distance measure. This is often the result of continuous data that has been discretized into intervals. For example, age groups $0-18,18-34,34-55$ and $55+$ might be used to record age information. If the intervals are relatively wide, then methods for ordinal data can be used where the additional information about the intervals may be useful in the modeling. If the intervals are quite narrow, then we could replace interval response with the midpoint of the interval and then use continuous data methods. One could argue that all so-called continuous data is of this form, because such data cannot be measured with arbitrary precision. Height might be given to the nearest centimeter, for example.

# 广义线性模型代考

## 统计代写|广义线性模型代写generalized linear model代考|Contingency Tables

