Technical details

Technical details

# Analysis techniques

## Cohort analysis

A number of the chapters in this report employ 'pseudo-cohort' analysis (which we describe as 'cohort analysis' throughout). When applied to attitudinal research cohort analysis traditionally uses longitudinal survey data to examine how the views of the same individuals have changed over time. Pseudo-cohort analysis uses cross-sectional data and is based on the assumption that a particular age group within a given year is equivalent to an age group five years older, five years later.

Cohort analysis is used to explore whether change over time can be explained by generation, period or lifecycle effects - or a combination of the three:

- A
**generational effect**can be identified when each successive generation expresses an attitude that is different to the one which preceded it. As a result, and when these differences all occur in a similar direction, change at the population level can be driven by the ageing of the population, as older generations die out and younger generations enter the population (in the case of British Social Attitudes, those aged 18+). - A
**period effect**can be identified when the views of all or most generations change in a consistent way within a particular period. This can often be linked to an external event. - A
**lifecycle effect**can be identified when the views of all of most generations change in a particular way during a particular life-stage such as adolescence or retirement or, slternatively, across the life-cycle.

While a cohort or generation is a subjective construct and these can be defined in a number of ways, in this report we have consistently allocated respondents to cohorts based on their decade of birth (for instance, those born between 1980 and 1989 are defined as the '1980s' cohort). This approach was adopted to explore in considerable detail how the attitudes of generations have changed over time and in relation to one another.

Data has only been included in the charts and tables produced to illustrate cohort analysis when data on the measure of interest is available for at least 100 cases in a given year. This means that, in some cases, data is not presented for the oldest cohort, given the small sample sizes involved.

## Regression

Regression analysis aims to summarise the relationship between a 'dependent' variable and one or more 'independent' variables. It shows how well we can estimate a respondent's score on the dependent variable from knowledge of their scores on the independent variables. It is often undertaken to support a claim that the phenomena measured by the independent variables cause the phenomenon measured by the dependent variable. However, the causal ordering, if any, between the variables cannot be verified or falsified by the technique. Causality can only be inferred through special experimental designs or through assumptions made by the analyst.

All regression analysis assumes that the relationship between the dependent and each of the independent variables takes a particular form. In linear regression it is assumed that the relationship can be adequately summarised by a straight line. This means that a one percentage point increase in the value of an independent variable is assumed to have the same impact on the value of the dependent variable on average, irrespective of the previous values of those variables.

Strictly speaking the technique assumes that both the dependent and the independent variables are measured on an interval-level scale, although it may sometimes still be applied even where this is not the case. For example, one can use an ordinal variable (e.g. a Likert scale) as a *dependent* variable if one is willing to assume that there is an underlying interval-level scale and the difference between the observed ordinal scale and the underlying interval scale is due to random measurement error. Often the answers to a number of Likert-type questions are averaged to give a dependent variable that is more like a continuous variable. Categorical or nominal data can be used as *independent* variables by converting them into dummy or binary variables; these are variables where the only valid scores are 0 and 1, with 1 signifying membership of a particular category and 0 otherwise.

The assumptions of linear regression cause particular difficulties where the *dependent* variable is binary. The assumption that the relationship between the dependent and the independent variables is a straight line means that it can produce estimated values for the dependent variable of less than 0 or greater than 1. In this case it may be more appropriate to assume that the relationship between the dependent and the independent variables takes the form of an S-curve, where the impact on the dependent variable of a one-point increase in an independent variable becomes progressively less the closer the value of the dependent variable approaches 0 or 1. *Logistic* regression is an alternative form of regression which fits such an S-curve rather than a straight line. The technique can also be adapted to analyse multinomial non-interval-level dependent variables, that is, variables which classify respondents into more than two categories.

The two statistical scores most commonly reported from the results of regression analyses are:

*A measure of variance explained:* This summarises how well all the independent variables combined can account for the variation in respondents' scores in the dependent variable. The higher the measure, the more accurately we are able in general to estimate the correct value of each respondent's score on the dependent variable from knowledge of their scores on the independent variables.

*A parameter estimate:* This shows how much the dependent variable will change on average, given a one-unit change in the independent variable (while holding all other independent variables in the model constant). The parameter estimate has a positive sign if an increase in the value of the independent variable results in an increase in the value of the dependent variable. It has a negative sign if an increase in the value of the independent variable results in a decrease in the value of the dependent variable. If the parameter estimates are standardised, it is possible to compare the relative impact of different independent variables; those variables with the largest standardised estimates can be said to have the biggest impact on the value of the dependent variable.

Regression also tests for the statistical significance of parameter estimates. A parameter estimate is said to be significant at the five per cent level if the range of the values encompassed by its 95 per cent confidence interval (see also section on sampling errors) are either all positive or all negative. This means that there is less than a five per cent chance that the association we have found between the dependent variable and the independent variable is simply the result of sampling error and does not reflect a relationship that actually exists in the general population.

## Factor analysis

Factor analysis is a statistical technique which aims to identify whether there are one or more apparent sources of commonality to the answers given by respondents to a set of questions. It ascertains the smallest number of *factors* (or dimensions) which can most economically summarise all of the variation found in the set of questions being analysed. Factors are established where respondents who gave a particular answer to one question in the set tended to give the same answer as each other to one or more of the other questions in the set. The technique is most useful when a relatively small number of factors are able to account for a relatively large proportion of the variance in all of the questions in the set.

The technique produces a *factor loading* for each question (or variable) on each factor. Where questions have a high loading on the same factor, then it will be the case that respondents who gave a particular answer to one of these questions tended to give a similar answer to each other at the other questions. The technique is most commonly used in attitudinal research to try to identify the underlying ideological dimensions which apparently structure attitudes towards the subject in question.

Technical details

- Until 1991 allBritish Social Attitudes samples were drawn from the Electoral Register (ER). However, following concern that this sampling frame might be deficient in its coverage of certain population subgroups, a 'splicing' experiment was conducted in 1991. We are grateful to the Market Research Development Fund for contributing towards the costs of this experiment. Its purpose was to investigate whether a switch to PAF would disrupt the time series - for instance, by lowering response rates or affecting the distribution of responses to particular questions. In the event, it was concluded that the change from ER to PAF was unlikely to affect time trends in any noticeable ways, and that no adjustment factors were necessary. Since significant differences in efficiency exist between PAF and ER, and because we considered it untenable to continue to use a frame that is known to be biased, we decided to adopt PAF as the sampling frame for future British Social Attitudes surveys. For details of the PAF/ER 'splicing' experiment, see Lynn and Taylor (1995).
- This includes households not containing any adults aged 18 or over, vacant dwelling units, derelict dwelling units, non-resident addresses and other deadwood.
- In 1993 it was decided to mount a split-sample experiment designed to test the applicability of Computer-Assisted Personal Interviewing (CAPI) to the British Social Attitudes survey series. As the name implies, CAPI involves the use of a laptop computer during the interview, with the interviewer entering responses directly into the computer. There was, however, concern that a different interviewing technique might alter the distribution of responses and so affect the year-on-year consistency of British Social Attitudes data. Following the experiment, it was decided to change over to CAPI completely in 1994 (the self-completion questionnaire still being administered in the conventional way). The results of the experiment are discussed in The 11th Report (Lynn and Purdon, 1994).
- Interview times recorded as less than 20 minutes were excluded, as these timings were likely to be errors.
- An experiment was conducted on the 1991 British Social Attitudes survey (Jowell et al., 1992) which showed that sending advance letters to sampled addresses before fieldwork begins has very little impact on response rates. However, interviewers do find that an advance letter helps them to introduce the survey on the doorstep, and a majority of respondents have said that they preferred some advance notice. For these reasons, advance letters have been used on the British Social Attitudes surveys since 1991.
- Because of methodological experiments on scale development, the exact items detailed in this section have not been asked on all versions of the questionnaire each year.
- In 1994 only, this item was replaced by: Ordinary people get their fair share of the nation's wealth.
*[Wealth1]* - In constructing the scale, a decision had to be taken on how to treat missing values ("Don't know" and "Not answered"). Respondents who had more than two missing values on the left-right scale and more than three missing values on the libertarian-authoritarian and welfarism scales were excluded from that scale. For respondents with fewer missing values, "Don't know" was recoded to the mid-point of the scale and "Not answered" was recoded to the scale mean for that respondent on their valid items.
- See www.bsa-30.natcen.ac.uk.