Stats and Methods Urban Legend 1: Formative Measurement

2011 April 25

by Richard N. Landers

Previous Post:	SIOP 2011: Day 2/3 Summary
Next Post:	Stats and Methods Urban Legend 2: Control Variables Improve Your Study

In what I can only assume is a special issue of Organizational Research Methods, several researchers discuss statistical and methodological myths and urban legends commonly seen in the organizational sciences (which is a term I’ve adopted for organizational behavior, human resources, and industrial/organizational psychology). Four articles in this issue stuck out to me, which are the ones I’ll be discussing over the next few weeks. First up: Edwards1 writes “The Fallacy of Formative Measurement.”

Before getting into this, I will give a small disclaimer: I am not a research methods specialist, nor am I a quantitative psychologist. I am industrial/organizational psychologist with an interest in research methods as a means to study phenomena I find interesting. As a result, I am approaching from the perspective of an end-user of methods rather than a person who studies them explicitly. So any misrepresentation here is probably my fault! That said…

According to Edwards, there are two models of the relationships between constructs (theoretical concepts we want to measure, like job satisfaction, happiness, learning, etc.) and measures (the actual data we end up analyzing). The distinction between these is important when creating models to analyze data, especially in the context of structural equation modeling or other explicit data modeling techniques.

The first and more traditional view is called reflective measurement. From this perspective, constructs cause measures, i.e. the construct is the “real” underlying characteristic, and a measure is simply a reflection of that characteristic.

The second and more recent view is called formative measurement. From this perspective, measures cause constructs, such that the only real constructs are latent constructs, i.e. ones that can be detected by looking for patterns in data and then carefully refining measurement.

The major conflict between these two perspectives is the role of theory. From a reflective perspective, we should develop theories of the relationships between constructs and use those theories to shape how data will be collected. From a formative perspective, we should look for patterns in data and let the data provide the basis for our theories, progressively shaping our theories as the data lead us to do so.

To add to the confusion, the two perspectives typically provide similar results. One researcher coming from the perspective of reflective measurement (developing theory, testing it) may end up at the same conclusions as another researcher coming from the reflective perspective (measuring a host of constructs, looking for patterns). But that doesn’t necessarily mean that the approaches are equally preferable.

Edwards discusses six key ways that formative measurement differs from reflective measurement:

Dimensionality: Reflective measurement accepts that redundancy between measures is inevitable because all measurement is an imperfect representation of a construct. The goal, however, is to get as “pure” a measure of the construct as possible, so redundancy should be minimized. This is reflected in the typical measure development process – iteratively removing items that load on multiple factors in order to tap the construct as effectively as possible. In a formative measurement model, measures (and items) that are multidimensional by definition reflect multidimensional constructs. Consider this double-barreled Likert-type agreement item: “Sometimes I like pancakes and sometimes I like broccoli.” This question is multidimensional, but is it more likely that this is a bad question and should be refined (reflective) or that there is an underlying multidimensional construct (formative)?
Internal Consistency: In a reflective model, two measures/items designed to tap the same construct should correlate highly. Formative measures have no such expectation; in fact, internal consistency might be a sign of a poor model as there is not definitive that the facets of a construct should be correlated. Edwards argues that this difference has led some researchers to conclude, finding that their reflective measures are multidimensional, that those measures are in fact formative. Unfortunately, this is not a valid conclusion.
Identification: Identification refers to the ability to derive unique values for each model parameter. It’s probably a little overkill to go into the details of this here. But in a nutshell, one needs for a model to be overidentified in order to determine how well that model fits the data. For a formative measure to be identified, one must add reflective measures as outcomes of the formative measure. But because measures cause constructs in a formative model, adding these measures has implications about the construct (which, if I may editorialize a bit here, makes a huge mess of things).
Measurement Error: In a reflective model, measurement error is associated with each item – this is the degree to which the item is unique and measures constructs other than the construct that it is supposed to measure. In a formative model, measurement error is associated with the latent construct instead. Each measure contributes some explanation of the latent construct, and whatever is left over is error. Thus the implicit assumption in a formative model is that the items do not possess unique measurement error – which seems a little odd, at least in the context of psychological constructs. Errors in recall, day to day affect fluctuations, etc. – these and many other sources of item-level measurement error appear to be discarded in formative models.
Construct Validity: In a reflective model, construct validity refers to the degree to which a measure reflects the construct they are supposed to measure. Construct validation is intended to determine how well the measures represent the construct, which is a process that ultimately relies on researcher judgment. In a formative model, these decisions are made statistically, using by examining the interrelationships within the model itself. This means that the construct’s validity is driven by the kinds of variables you test its validity with, which makes it quite tricky.
Causality: As stated before, in a reflective model, constructs cause measures, while in a formative model, measures cause constructs.

Thus, reflective models describe measures as imperfect indicators of underlying phenomena while formative models describe measures as an indistinguishable part of the constructs they are tied to.

I think the problem with pure formative models can be illustrated with this example. Consider a 10-question scale measuring the psychological state of happiness. Using a reflective model, we would use theory to conceptually define happiness and develop 10 questions to measure the construct. We would iteratively work to include only items that had high item-total correlations so that we had consistency of measurement, although we accept that we won’t have perfect measurement, as that would be impossible. We also realize that happiness exists independently of whether or not we measure it, and whether or not we measure it well.

A formative model would not make many of these assumptions. For example, once we created a 10-question scale, we have created a construct. If multidimensionality in measurement is discovered, that indicates a multidimensional construct. Rather than saying happiness exists as a human characteristic that we are attempting to measure, we say that this measurement is a part of the definition of whatever construct is being measured (hopefully happiness). Individual items in the measure do not contain both common elements of happiness and their own unique contributions; instead, they contribute directly to the variance in the constructs. For example, an item “I am generally happy at work.” contains variance contributed both by general happiness and work-specific (unique) happiness when explored reflectively. From a formative perspective, the unique variance gets added to the error associated with the latent trait – it is a part of the construct, but a part that doesn’t help explain anything theoretically. Simply because we asked the question, it becomes a part of the construct – which, in my opinion, doesn’t really make any sense.

Thus, Edwards’ conclusion is that formative measurement is based on invalid assumptions about the nature of data in the six categories listed above. He continues by suggesting alternatives to pure formative models by combining elements of reflective models with them. These mixed models are no longer purely formative, but maintain many of the advantages of formative measurement – for example, better handling of multidimensional constructs.

For my own work, this has convinced me that, at a minimum, formative models as they exist currently are too controversial to be employed safely, both from a publishing perspective and for me to be confident in my own conclusions.

Edwards, J. (2010). The fallacy of formative measurement. Organizational Research Methods, 14 (2), 370-388 DOI: 10.1177/1094428110378369 [↩]

Previous Post:	SIOP 2011: Day 2/3 Summary
Next Post:	Stats and Methods Urban Legend 2: Control Variables Improve Your Study

3 Responses leave one →

disgruntledphd permalink
April 27, 2011
You might want to check out the in press section of New Ideas in Psychology, as there appears to be a similar discussion taking place there.
Thanks for pointing me towards the articles above though, they should prove very useful for my thesis methodology section.
Reply
jebyrnes permalink
May 2, 2011
Formative measurements (aka, Compsite Variables) have come on in a big way in Ecology these days. I know, I know, different discipline. But, we use statistical tools such as Structural Equation Modeling for very different purposes with different types of measurements. As such, formative constructs are often quite relevant to us. It may be interesting to you to see how the kind of data one uses can help create a distinction, and aid in determining when one is more useful than the other. For an excellent discussion of this, see this USGS report by Jim Grace and Ken Bollen. Interesting stuff.
Reply

Trackbacks and Pingbacks

The baby sitter studying statistics | Cranky Ron

technology, education and training research from an industrial/organizational (I/O) psychologist in the ivory tower

Stats and Methods Urban Legend 1: Formative Measurement

Trackbacks and Pingbacks

Leave a Reply

Recent Posts

Professional Links