A Poisson regression model allows you to model the relationship between a Poisson distributed response variable and one or more explanatory variables. It is suitable for modelling the number of events that occur in a given time period or area.
The explanatory variables can be either numeric or categorical. You can also include interaction terms in the model.
You can build a Poisson regression model with the
glm function. For example, to build a model with a response variable named
counts and three explanatory variables named
var3, use the command:
family argument to
poisson tells R to treat the response variable as Poisson distributed and build a Poisson regression model using the log link function.
glm output displays some basic information about the model including the coefficient estimates. However there are more components to the output that are not automatically displayed, such as summary statistics, residuals and fitted values. You can save the all of this output to an object, as shown below.
Once you have created a glm object, you can access the various components of the results in the same way that you would for any other R model output object, using functions such as
residuals. For example, to view a summary of the model, use the command:
Other useful functions include
Example: Poisson regression using
warpbreaks dataset, which is included with R. The dataset gives the results of an experiment to determine the effect of wool type (A or B) and tension (low, medium or high) on the number of warp breaks per loom. Data was collected for nine looms for each combination of settings. To view more information about the dataset, enter
help(warpbreaks). To view the dataset, enter the dataset name:
breaks wool tension 1 26 A L 2 30 A L 3 54 A L 4 25 A L 5 70 A L 6 52 A L 7 51 A L 8 26 A L 9 67 A L 10 18 A M 11 21 A M 12 29 A M 13 17 A M 14 12 A M 15 ... 54 28 B H
Suppose that you want to build a model to see how the frequency of warp breaks is effected by wool type, wool tension and the interaction between the two.
As the response variable
breaks is a count, it is best modelled as a Poisson distributed variable. The model should have terms for the wool type, wool tension and the interaction between the two.
To build the model, use the command:
> breaksmodel<-glm(breaks~wool*tension, warpbreaks, family=poisson)
Once you have created the model object, use the
summary function to view the results:
The output is shown below.
Call: glm(formula = breaks ~ wool * tension, family = poisson, data = warpbreaks) Deviance Residuals: Min 1Q Median 3Q Max -3.3383 -1.4844 -0.1291 1.1725 3.5153 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.79674 0.04994 76.030 < 2e-16 *** woolB -0.45663 0.08019 -5.694 1.24e-08 *** tensionM -0.61868 0.08440 -7.330 2.30e-13 *** tensionH -0.59580 0.08378 -7.112 1.15e-12 *** woolB:tensionM 0.63818 0.12215 5.224 1.75e-07 *** woolB:tensionH 0.18836 0.12990 1.450 0.147 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 297.37 on 53 degrees of freedom Residual deviance: 182.31 on 48 degrees of freedom AIC: 468.97 Number of Fisher Scoring iterations: 4
From the results we can see that wool, tension and their interaction all have a significant effect on the number of warp breaks.