Accuracy of Trip Data from Sample Surveys of Travel Demand


Preliminary transport model design tasks: model specifications, travel, person and household segmentation, data collection, travel surveys, model estimation

Transport model development guidelines and checklists

Conference papers on specific transport modelling issues

Links to transport modelling guidance and technical references

Site map


The travel data used for deriving travel demand estimates (trip ends, trip matrices and flows along transport links) and which is used in estimating and calibrating transport models derives from sample surveys.

Two common types of travel survey are household travel surveys and travel intercept surveys. Household travel surveys are typically based on very low sample rates, perhaps 0.5% to 5% of households in the study area. Intercept surveys are those involving interviewing people while they are making a journey they include roadside interview surveys in which cars are stopped at the roadside (or lower cost versions where questionnaires in the form of reply-paid postcards are handed out at the roadside) and public transport passenger surveys on buses and trains or at bus stops and stations. Travel intercept surveys can achieve much higher sampling rates, of the order of 10 to 15%.

My purpose here is to offers some rules of thumb which are helpful in thinking about the confidence you can have in sample travel data.

Sources of error in travel surveys

The sources of error may be grouped into three types:

Examples of these types of error are given in the table.

Household travel survey Intercept travel survey
Sampling error
Relatively high errors due to the low trip samples and indirect sampling of trips. Much higher samples of the targeted travel can normally be achieved, reducing sampling error.
Measurement error
Under-recording of irregular trips. Misreporting of trips of household members (where these are absent when the questionnaire is completed). Interviews must be taken quickly and responses to any question may be inaccurate, incomplete or misrecorded.
Insufficiently accurate trip origin and destination address data (can be particularly marked in intercept data). Misinterpretation of the trip purpose codes by respondents. The survey period may not be typical of the average travel patterns, usually a greater risk for intercept surveys carried out on just one day.
Bias in the sampling frame
In part this is dependent on the source of the sample. For example, if it is taken from a telephone directory it will omit those households not in the directory. Survey response rates vary by household type and location implying some households will be over-represented and others under-represented. For roadside surveys this may arise from minor roads omitted from the survey or variable sampling rates across the day and by vehicle type. For public transport surveys there are far greater difficulties in designing representative sampling methods, leading to significant potential for bias. Often, intercept surveys are in only one direction of travel.
The potential for double-counting through multiple interception of particular movements is possible in all types of intercept survey.

For household travel surveys, many different techniques are used to collect the interviews which introduce varying degrees of significant bias which is much discussed in the literature. This paper on the under-reporting of trips in household travel surveys touches on the topic.

I discuss only sampling error here, but there is plenty of international literature on the other sources of error.

The accuracy of sampled trip data

What follows stems from work done by Martin and Voorhees Associates in the late 70s for the UK Department of Transport, related to the Regional Highway Traffic Model Project. Key players in the work were Mike Brewer and John Bates.

For an intercept survey where a flow of people or vehicles is sampled, such that each person or vehicle has an equal chance of being intercepted, then the estimate of the proportion of the flow accounted for by a particular sub-group has a Binomial distribution, corrected for non-replacement. For most practical purposes this can be approximated by a Poisson distribution (this broadly assumes that there is a low sampling rate, the sub-group is a small part of the total flow and sampling rate is uniform).

The simplest way of expressing the key aspect of a Poisson distribution is that if a sample S of a flow (of vehicles or passengers) is repeatedly taken, then the sub-sample s for a particular sub-flow would have a variance equal to the mean (ie s).

There is no such simple way of calculating the variance of trip data from a household survey, except through the use of the observed sample variance. But the aforementioned study was able to compare the resulting variance characteristics with those of the Poisson distribution which approximates the variance of intercept survey samples. This suggested that the standard error (ie the square root of the variance) of a household survey was about 50% higher than a Poisson calculation.

The formulae for computing the standard error of the estimate of a set of trips (in the form of trip ends, trip matrices or flows on links or services becomes as follows.

If Sx is the sample for the set of trips x, g is the average expansion factor (ie 1/sampling rate) and Tx is the expanded estimate of the set of trips x, then:

Intercept travel survey:

Standard error of Sx = Sx½

Standard error of Tx = g * Sx½

Alternatively the standard error expressed as a % error on the flow Tx is Sx

Household travel survey:

Standard error of Sx = 1.5*Sx½

Standard error of Tx = 1.5*g*Sx½

Alternatively the standard error expressed as a % error on the flow Tx is 1.5*Sx

These formulae are incorporated in the spreadsheet. A very important point is that the % error of the estimate is simply a function of the sample size Sx (and not the sampling rate).

Figure 1 and Figure 2 give illustrations of the range of the 95% confidence limits on estimates of travel demand from the two types of survey for varying levels of demand and different sampling rates; the confidence limits are expressed as a % of the estimated mean. The limitations of household travel data for estimating other than dense flows is evident, as is the level of improvement achieved by collecting more data (through higher rates of sampling).