### Poisson regression

A Poisson regression model allows you to model the relationship between a Poisson distributed response variable and one or more explanatory variables. It is suitable for modelling the number of *events* that occur in a given time period or area.

The explanatory variables can be either numeric or categorical. You can also include interaction terms in the model.

You can build a Poisson regression model with the `glm`

function. For example, to build a model with a response variable named `counts`

and three explanatory variables named `var1`

, `var2`

and `var3`

, use the command:

> glm(`counts`

~`var1`

+`var2`

+`var3`

,`dataset`

, family=poisson)

Setting the `family`

argument to `poisson`

tells R to treat the response variable as Poisson distributed and build a Poisson regression model using the log link function.

The `glm`

output displays some basic information about the model including the coefficient estimates. However there are more components to the output that are not automatically displayed, such as summary statistics, residuals and fitted values. You can save the all of this output to an object, as shown below.

>`modelname`

<-glm(`counts`

~`var1`

+`var2`

+`var3`

,`dataset`

, family=poisson)

Once you have created a glm object, you can access the various components of the results in the same way that you would for any other R model output object, using functions such as `summary`

, `anova`

, `coef`

and `residuals`

. For example, to view a summary of the model, use the command:

`> summary(``modelname`

)

Other useful functions include

## Example: Poisson regression using `warpbreaks`

data

Consider the `warpbreaks`

dataset, which is included with R. The dataset gives the results of an experiment to determine the effect of wool type (A or B) and tension (low, medium or high) on the number of warp breaks per loom. Data was collected for nine looms for each combination of settings. To view more information about the dataset, enter `help(warpbreaks)`

. To view the dataset, enter the dataset name:

> warpbreaks

breaks wool tension 1 26 A L 2 30 A L 3 54 A L 4 25 A L 5 70 A L 6 52 A L 7 51 A L 8 26 A L 9 67 A L 10 18 A M 11 21 A M 12 29 A M 13 17 A M 14 12 A M 15 ... 54 28 B H

Suppose that you want to build a model to see how the frequency of warp breaks is effected by wool type, wool tension and the interaction between the two.

As the response variable `breaks`

is a count, it is best modelled as a Poisson distributed variable. The model should have terms for the wool type, wool tension and the interaction between the two.

To build the model, use the command:

> breaksmodel<-glm(breaks~wool*tension, warpbreaks, family=poisson)

Once you have created the model object, use the `summary`

function to view the results:

> summary(breaksmodel)

The output is shown below.

Call: glm(formula = breaks ~ wool * tension, family = poisson, data = warpbreaks) Deviance Residuals: Min 1Q Median 3Q Max -3.3383 -1.4844 -0.1291 1.1725 3.5153 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.79674 0.04994 76.030 < 2e-16 *** woolB -0.45663 0.08019 -5.694 1.24e-08 *** tensionM -0.61868 0.08440 -7.330 2.30e-13 *** tensionH -0.59580 0.08378 -7.112 1.15e-12 *** woolB:tensionM 0.63818 0.12215 5.224 1.75e-07 *** woolB:tensionH 0.18836 0.12990 1.450 0.147 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 297.37 on 53 degrees of freedom Residual deviance: 182.31 on 48 degrees of freedom AIC: 468.97 Number of Fisher Scoring iterations: 4

From the results we can see that wool, tension and their interaction all have a significant effect on the number of warp breaks.