Category Archives: Statistical models

Building a Poisson regression model in R

Poisson regression

A Poisson regression model allows you to model the relationship between a Poisson distributed response variable and one or more explanatory variables. It is suitable for modelling the number of events that occur in a given time period or area.

The explanatory variables can be either numeric or categorical. You can also include interaction terms in the model.

You can build a Poisson regression model with the glm function. For example, to build a model with a response variable named counts and three explanatory variables named var1, var2 and var3, use the command:

> glm(counts~var1+var2+var3, dataset, family=poisson)

Setting the family argument to poisson tells R to treat the response variable as Poisson distributed and build a Poisson regression model using the log link function.

The glm output displays some basic information about the model including the coefficient estimates. However there are more components to the output that are not automatically displayed, such as summary statistics, residuals and fitted values. You can save the all of this output to an object, as shown below.

> modelname<-glm(counts~var1+var2+var3, dataset, family=poisson)

Once you have created a glm object, you can access the various components of the results in the same way that you would for any other R model output object, using functions such as summary, anova, coef and residuals. For example, to view a summary of the model, use the command:

> summary(modelname)

Other useful functions include

Example: Poisson regression using warpbreaks data

Consider the warpbreaks dataset, which is included with R. The dataset gives the results of an experiment to determine the effect of wool type (A or B) and tension (low, medium or high) on the number of warp breaks per loom. Data was collected for nine looms for each combination of settings. To view more information about the dataset, enter help(warpbreaks). To view the dataset, enter the dataset name:

> warpbreaks
   breaks wool tension
1      26    A       L
2      30    A       L
3      54    A       L
4      25    A       L
5      70    A       L
6      52    A       L
7      51    A       L
8      26    A       L
9      67    A       L
10     18    A       M
11     21    A       M
12     29    A       M
13     17    A       M
14     12    A       M
15     ...

54     28    B       H

Suppose that you want to build a model to see how the frequency of warp breaks is effected by wool type, wool tension and the interaction between the two.

As the response variable breaks is a count, it is best modelled as a Poisson distributed variable. The model should have terms for the wool type, wool tension and the interaction between the two.

To build the model, use the command:

> breaksmodel<-glm(breaks~wool*tension, warpbreaks, family=poisson)

Once you have created the model object, use the summary function to view the results:

> summary(breaksmodel)

The output is shown below.

glm(formula = breaks ~ wool * tension, family = poisson, data = warpbreaks)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.3383  -1.4844  -0.1291   1.1725   3.5153  

               Estimate Std. Error z value Pr(>|z|)    
(Intercept)     3.79674    0.04994  76.030  < 2e-16 ***
woolB          -0.45663    0.08019  -5.694 1.24e-08 ***
tensionM       -0.61868    0.08440  -7.330 2.30e-13 ***
tensionH       -0.59580    0.08378  -7.112 1.15e-12 ***
woolB:tensionM  0.63818    0.12215   5.224 1.75e-07 ***
woolB:tensionH  0.18836    0.12990   1.450    0.147    
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 297.37  on 53  degrees of freedom
Residual deviance: 182.31  on 48  degrees of freedom
AIC: 468.97

Number of Fisher Scoring iterations: 4

From the results we can see that wool, tension and their interaction all have a significant effect on the number of warp breaks.

Social Widgets powered by