Category Archives: Hypothesis tests

Performing a one-sample t-test in R

One-sample t-test

A t-test is used to test hypotheses about the mean value of a population from which a sample is drawn. A t-test is suitable if the data is believed to be drawn from a normal distribution, or if the sample size is large.

A one-sample t-test is used to compare the mean value of a sample with a constant value denoted μ0. The test has the null hypothesis that the population mean is equal to μ0 and the alternative hypothesis that it is not equal to μ0.

The test can also be performed with a one-sided alternative hypothesis, which is known as a one-tailed test. The one-sided alternative hypothesis is either than the population mean is less than μ0 or that the population mean is greater than μ0.

You can perform a one-sample t-test with the t.test function. To compare a sample mean with a constant value mu0, use the command:

> t.test(dataset$sample1, mu=mu0)

The mu argument gives the value with which you want to compare the sample mean. It is optional and has a default value of zero.

By default, R performs a two-tailed test. To perform a one-tailed test, set the alternative argument to "greater" or "less", as shown below.

> t.test(dataset$sample1, mu=mu0, alternative="greater")

A 95% confidence interval for the population mean is included with the output. To adjust the size of the interval, use the conf.level argument.

> t.test(dataset$sample1, mu=mu0, conf.level=0.99)

Example 10.1. One-tailed, one-sample t-test using the bottles data

A bottle filling machine is set to fill bottles with soft drink to a volume of 500 ml. The actual volume is known to follow a normal distribution. The manufacturer believes the machine is under-filling bottles. A sample of 20 bottles is taken and the volume of liquid inside is measured. The results are given in the bottles dataset, which is available here.

> bottles
   Volume
1  484.11
2  459.49
3  471.38
4  512.01
5  494.48
6  528.63
7  493.64
8  485.03
9  473.88
10 501.59
11 502.85
12 538.08
13 465.68
14 495.03
15 475.32
16 529.41
17 518.13
18 464.32
19 449.08
20 489.27

To calculate the sample mean, use the command:

> mean(bottles$Volume)
[1] 491.5705

Suppose you want to use a one-sample t-test to determine whether the bottles are being consistently under filled, or whether the low mean volume for the sample is purely the result of random variation. A one-sided test is suitable because the manufacturer is specifically interested in knowing whether the volume is less than 500 ml. The test has the null hypothesis that the mean filling volume is equal to 500 ml, and the alternative hypothesis that the mean filling volume is less than 500 ml. A significance level of 0.01 is to be used.

To perform the test, use the command:

> t.test(bottles$Volume, mu=500, alternative="less", conf.level=0.99)

This gives the following output:

One Sample t-test
data: bottles$Volume
t = -1.5205, df = 19, p-value = 0.07243
alternative hypothesis: true mean is less than 500
99 percent confidence interval:
     -Inf 505.6495
sample estimates:
mean of x
 491.5705

From the output, we can see that the mean bottle volume for the sample is 491.6 ml. The one-sided 99% confidence interval tells us that mean filling volume is likely to be less than 505.6 ml. The p-value of 0.07243 tells us that if the mean filling volume of the machine were 500 ml, the probability of selecting a sample with a mean volume less than or equal to this one would be approximately 7%.

Since the p-value is not less than the significance level of 0.01, we cannot reject the null hypothesis that the mean filling volume is equal to 500 ml. This means that there is no evidence that the bottles are being under-filled.


Performing Bartlett’s test in R

Bartlett’s test

Bartlett’s test allows you to compare the variance of two or more samples to determine whether they are drawn from populations with equal variance. It is suitable for normally distributed data. The test has the null hypothesis that the variances are equal and the alterntive hypothesis that they are not equal.1

This test is useful for checking the assumptions of an analysis of variance.

You can perform Bartlett’s test with the bartlett.test function. If your data is in stacked form (with the values for both samples stored in one variable), use the command:

> bartlett.test(values~groups, dataset)

where values is the name of the variable containing the data values and groups is the name of the variable that specifies which sample each value belongs too.

If your data is in unstacked form (with the samples stored in separate variables) nest the variable names inside the list function as shown below.

> bartlett.test(list(dataset$sample1, dataset$sample2, dataset$sample3))

If you are unsure whether your data is in stacked or unstacked form, see the article Stacking a dataset in R for examples of data in both forms.

Example 10.10. Bartlett’s test using the PlantGrowth data

Consider the PlantGrowth dataset (included with R), which gives the dried weight of three groups of ten batches of plants, where each group of ten batches received a different treatment. The weight variable gives the weight of the batch and the groups variable gives the treatment received (either ctrl, trt1 or trt2). To view more information about the dataset, enter help(PlantGrowth). To view the data, enter the dataset name:

> PlantGrowth
   weight group
1    4.17  ctrl
2    5.58  ctrl
3    5.18  ctrl
4    6.11  ctrl
5    4.50  ctrl
6    4.61  ctrl
7    5.17  ctrl
8    4.53  ctrl
9    5.33  ctrl
10   5.14  ctrl
11   4.81  trt1
12   4.17  trt1
13   ...

30   5.26  trt2

Suppose you want to use Bartlett’s test to determine whether the the variance in weight is the same for all treatment groups. A significance level of 0.05 will be used.

To perform the test, use the command:

> bartlett.test(weight~group, PlantGrowth)

This gives the output:

        Bartlett test of homogeneity of variances

data:  weight by group 
Bartlett's K-squared = 2.8786, df = 2, p-value = 0.2371

From the output we can see that the p-value of 0.2371 is not less than the significance level of 0.05. This means we cannot reject the null hypothesis that the variance is the same for all treatment groups. This means that there is no evidence to suggest that the variance in plant growth is different for the three treatment groups.

[1] Montgomery, D.C. and Runger, G.C., 2007. Applied Statistics and Probability for Engineers, 4th ed. John Wiley & Sons.

Performing a binomial test in R

Binomial test

A binomial test compares the number of successes observed in a given number of trials with a hypothesised probability of success. The test has the null hypothesis that the real probability of success is equal to some value denoted p, and the alternative hypothesis that it is not equal to p. The test can also be performed with a one-sided alternative hypothesis that the real probability of success is either greater than p or that it is less than p.

You can perform a binomial test with the binom.test function.  The command takes the general form:

> binom.test(nsuccesses, ntrials, p)

where nsuccesses is the number of successes observed, ntrials is the total number of trials and p is the hypothesised probability of success.

Alternatively you can give the number of successes and the number of failures observed, as shown below.

> binom.test(c(nsuccesses, nfailures), p)

To perform a one-sided test, set the alternative argument to "less" or "greater" as required.

> binom.test(nsuccesses, ntrials, p, alternative="greater")

The output includes a 95% confidence interval for the true probability. To adjust the size of this interval, use the conf.level argument as shown.

> binom.test(nsuccesses, ntrials, p, conf.level=0.99)

Example: Binomial test for die rolls

In a game, you suspect your opponent is using a die which is biased to roll a six greater than 1/6 of the time. Suppose you want to prove this by rolling the die 300 times and using a binomial test to determine whether the probability of rolling a six is equal to 1/6. A one-tailed test with a significance level of 0.05 will be used.

You roll the die 300 times and throw a total of 60 sixes. To perform the test, use the command:

> binom.test(60, 300, 1/6, alternative="greater")
   Exact binomial test

data:  60 and 300 
number of successes = 60, number of trials = 300, p-value = 0.07299
alternative hypothesis: true probability of success is greater than 0.1666667 
95 percent confidence interval:
 0.1626847 1.0000000 
sample estimates:
probability of success 
                   0.2

From the output you can see that the p-value is 0.07299. As this is not less that the significance level of 0.05, we cannot reject the null hypothesis that the probability of rolling a six is 1/6. This means that there is no evidence to prove that the die is not fair.


Social Widgets powered by AB-WebLog.com.