In a recent issue of the Journal of Virtual Worlds Research, Beck et al. examine the role of the psychological construct “presence” in the context of virtual enviornments (VE). They do this by exploring the study of presence across several disciplines of study. I’ll summarize them here:
- Mass Communication: This is a discipline studying how mass media can be used to communicate to large groups. Researchers in this field discuss presence in terms of “being there” as a part of the virtual environment, a concept called “sentient presence” (SP).
- Human-Computer Interaction: HCI is a discipline combining elements of both human factors engineering and human factors psychology, with a focus on how humans experience computers. In HCI, discussion surrounds a “sense of being there,” called “non-sentient presence” (NSP). This line explains it fairly clearly: “People are considered present in a VE when they report a sensation of being inside the virtual world.” SP is still a part of HCI, but focuses on multi-user VEs (MUVEs). HCI also quanties NSP in terms of four dimensions of immersion: inclusiveness, extensiveness, surrounding, and vividness.
- Education: Definitions of presence is derived from mass communication and HCI, although it has been studied as a function of both media and participants in media. An interesting dimension within this field is the study of SP as the ability of the VE resident to project a realistic persona into the VE. One education researcher even found that increased SP was related to increased learning in online courses.
- Psychology: In my field, presence is also discussed in terms of “being there.” Greater presence in a VE is indicated by the perception of virtual objects and experiences as “real.” NSP is examined from three perspectives: focus of attention, locus of attention, and sensus of attention.
Probably the most basic problem I see with this article is that even after reading it, I am not totally clear on what SP and NSP refer to. The researchers attempt to synthesize these perspectives into definitions of both SP and NSP, but do this by listing one sentence from each field, creating two paragraph-long definitions. Which doesn’t really accomplish anything useful, as far as definitions go.
My best guess at this is that NSP refers to immersive presence, i.e. feeling like you are a part of the simulation/VE, while SP refers to an awareness of a living, breathing virtual world around your avatar. One would need SP in order to achieve NSP, by this defintion. But I am not quite convinced that these are both best conceptualized as “presence.”
SP seems similar to the old media construct of suspension of disbelief – that feeling you get when you are really engrossed in a good movie or book and forget/fail to notice that some things aren’t quite realistic. When we watch Star Wars, for example, we don’t stop to question “that doesn’t make sense!” because we have been drawn into the narrative. The same principle seems to apply to virtual worlds, and at least one study in psychology explores this by examining the extent to which VE participants try to interact with computer-controlled characters and constructs as if they were real. This certainly makes sense to me – people high in suspension of disbelief (i.e. SP) are willing to forget that the virtual world is a computer program, and instead think of it as a real virtual world that they can explore.
My personal definition of presence, then, is closer to NSP – when a person loses track of the fact that they are playing a game/participating in a simulation, becoming wholly drawn into the virtual world. The same experience that an actor might have when fully immersed in a role – a total and complete willingness to participate in the narrative. When you have this sense of presence, you feel disoriented and surprised to be pulled out of that world. Is it this level of engagement that virtual worlds enables – and it looks we’re getting pretty close to creating such experiences with the VEs currently available.
Now we just need some scales to measure it!Footnotes:
- Beck, D., Fishwick, P., Kamhawi, R., Coffey, A. J., & Henderson, J. (2011). Synthesizing presence: A multidisciplinary review of the literature. Journal of Virtual Worlds Research, 3 (3) [↩]
In what I can only assume is a special issue of Organizational Research Methods, several researchers discuss common statistical and methodological myths and urban legends (MUL) commonly seen in the organizational sciences (for more introduction, see the first article in the series). Fourth and final in this series: Cortina and Landis write “The Earth is Not Round (p = .00)”
I had initially hoped to report on this fourth article in the same week as Part 3 of this series, but quickly realized that I would need to parse it with a much finer-tooth comb. Cortina and Landis, with this article, are jumping squarely into the null hypothesis significance testing (NHST) vs. effect size testing (EST) debate with a reasonably strong position. If you aren’t familiar with what these terms refers to, here’s a little reminder:
- NHST: The comparison of obtained differences/relationships to a theoretical sampling distribution to determine the probability that we would find that difference/relationship (or one larger) if there were really no difference/relationship in the population (called the null hypothesis). If an observed result is improbable (assuming the null hypothesis were true), we typically use the term “statistically significant.”
- EST: The simple reporting of the observed result, such as the size of a correlation or d-statistic, and the confidence with which have made that estimation.
An influential 1994 paper by Cohen entitled “The earth is round (p<.05)” raised a large number of valid criticisms of the current state of NHST in psychology, and its strong perspective can be best summarized with this statement:
NHST has not only failed to support the advance of psychology as a science, but also has seriously impeded it. (p. 997)
NHST, Cohen argued, is so commonly misused and misunderstood by psychologists that it had a negative net effect on scientific progress. I’ve detailed potential problems with NHST elsewhere, so I won’t go into them here, but here’s the basic problem: it’s so easy to simplify statistical significance testing into a simple “it’s significant and therefore a real effect”/”it’s not significant and therefore not a real effect” that many researchers do exactly that, despite the fact that it is a completely invalid conclusion.
Cohen argues that the replacement of NHST with EST would do much good to fix this problem. Instead of statements like this…
- NHST: The difference between conditions was statistically significant [t(145) = 4.12, p < .05].
…you would see statements like this…
- EST: The difference between conditions was 0.12 standard deviations in magnitude [d = .12, CI(.95): .01 < d < .23].
Same phenomenon; different reporting. EST discourages researchers from making categorical statements like “there was an effect” while simultaneously giving information about the precision of the estimate obtained.
In the present article, Cortina and Landis argue that this shift will do no good. They say that while NHST is frequently misused, there at least exists a well-structured system by which to make judgments about NHST. An effect either is statistically significant, or it is not. EST, on the other hand, has very few standards by which to make judgments.
The most prominent standard used currently in EST is in fact a bastardization of Cohen’s own article. As an example, Cohen defined what might be considered “small,” “medium,” and “large” effects, but recommended that researchers come up with their own standards of comparison within individuals research literatures. And yet, researchers typically use the specific values supplied by Cohen and brandish them as a hammer, making claims about “medium effects” regardless of context. This is the same “dichotomous thinking” that plagues the use of NHST (although perhaps in this context, it is better called “trichotomous” or “polytomous thinking”).
Cortina and Landis thus argue that a shift to EST will be even worse than with NHST due to the lack of structured expectations with EST. Although researchers misuse NHST, at least there are standards by which to say there is misuse! This is a straw man. Any new technique that gains popularity in the literature will require a “breaking in” period as it is explored independently by new researchers – just look at meta-analysis, SEM, HLM, and other relatively new approaches to data. A lack of clarity and direction with a new technique does not mean that the old technique is better or even safer.
So what exactly is the urban legend here? It is that a shift to EST will magically solve the interpretability problems associated with a research literature filled with NHST. I certainly agree that this belief is a myth and potentially a problem. Both NHST and EST are simplifications and summaries of much more complex phenomena, and as a result, information is lost along the way. It takes an expert in the content area, the methodology, and the stats to make valid conclusions about constructs from data. Sometimes these are the same person; sometimes they are not. Even with a complete shift to EST, many researchers would continue to overestimate their abilities and misuse the tools they are given.
The authors ultimately conclude that it’s time to simply embrace that taking one analytic approach to a dataset is not enough. Some combination of well-used NHST and EST is needed. What I find peculiar, though, is that after decrying a lack of structure and expectations in EST, they give a set of “situations…we as a field should be able to agree upon” that are a little vague:
- “If one’s sample is very large, then chance is irrelevant.” An example of N = 50000 is given as an example of a “large sample”.
- “If one’s sample is very small, then quantitative analysis as a whole is irrelevant.” This seems ripe for articles citing this and saying “We only had N = 30, so we don’t need to do quantitative analysis.”
- “If one’s model is simple, then expectations for rejection of chance and for effect magnitudeshould be higher than if one’s model is complex.”
- “If one’s results could be characterized as counterintuitive, either because of the design or because of the questions being asked, then expectations for magnitude should be lower than for results that replicate previous research or might be characterized as reflecting conventional wisdom.”
Something tells me this problem will not be solved any time soon.Footnotes:
I’m virtually attending the Federal Consortium for Virtual Worlds 2011 conference today. The purpose of this conference is to discuss innovation in the area of 3D virtual worlds in government service. This entry will be a live blog of my experiences at this virtual conference.
Day 1 10:20 AM - About half an hour after first attempting to get to this conference, I am finally able to listen to a session. The conference has multiple gateway points to get to the live streaming content, all of which are inside virtual worlds.
That’s fine theoretically, but it doesn’t really meet my needs as an attendee – I need simple, straightforward instructions such that I can get into the conference material easily and quickly. Instead, there’s an artificial barrier to entry. Instead of viewing a webpage with the live video stream, I have to download software and walk over to a webpage with a live video stream. Why?! If there was dedicated content inside the virtual world that encouraged participant interaction, that would be fine, as that is added value to the experience – but as it is, I had to hop between 3 virtual worlds to find anyone else even present, and when I finally got there, the only added value I received was that one of the other users was having microphone problems, which was covering up the conference speaker audio. Hopefully this experience will improve now that I’m settled in…
10:30 AM - Apparently we’ll now be listening to another presenter through Second Life, through the live-stream, through another VW. A lot of layers
10:33 AM - Speaker: there is no reason to have a 3D environment unless it is valuable in a way that can’t be replicated elsewhere, e.g. on a webpage. Yes! 3D virtual worlds add specific value, but using them blindly across situations without specific consideration of their value in those situations doesn’t make any sense.
10:37 AM - Apparently kids REALLY get into role playing in 3D virtual environments. Is it better engagement than in-person though, I wonder? Apparently Chinese kids are very shy in-person, so the MUVE gives them an opportunity to act out in a way they would not otherwise feel comfortable to do.
10:39 AM - “Artificial intelligence is not yet good enough to replace a real teacher.” Inevitable though?
10:54 AM - The best projects, with the most added value, are those that benefit from creative use of the 3D virtual environment – for example, 3D content creation.
11:04 AM - Panel disagreement: “Be one person” and determine who you are online and offline versus “Multi-persona” approaches, where you choose to be a different person depending on the goal – social media, virtual world, whatever. Tricky, tricky. But I wonder what effect these different approaches would have on the person engaging in them.
12:50 PM - The feed has been dead for over an hour now – we’re in a lunch break and vendor fair. Keynote upcoming in about 10 minutes.
12:57 PM - At a vendor booth for the National Center for Telehealth & Technology. They have a PTSD simulator, which starts with the simulation of a “traumatic event.” Disturbing… but looks effective.
1:08 PM - Sitting in the virtual expo hall, but no video. Not sure what’s happening.
1:17 PM - Switched to Internet Explorer… didn’t seem to like Firefox. Working now, but have to catch up with what’s going on.
1:21 PM - Keynote by Ms. Mk Haley: VR not being used so much in the public sphere, but more so behind the scenes. Working on full-body interfaces. Considers “virtual reality” to be a much wider term – Disneyland as a virtual reality, for example.
1:33 PM - I am definitely using the Marshmallow Challenge.
1:38 PM - Engineers given a basically-unwinnable game to play as an exercise in innovation. By working outside the perceived “rules,” the engineers could have won, but considered that cheating. Sometimes the innovation/cheating line is unclear. The engineers got mad.
3:07PM - Legal ambiguity surrounding virtual environments (e.g. what happens if your student vandalizes something online while in your class?), although scary, is not really important because “on the whole, we want to use these spaces for fairly well-known, fairly understood, fairly innocuous things” and thus teaching activities online are no more risky than doing them in person. Not sure I agree…
3:15 PM - The tech needs to evolve such that anyone can access MUVEs anywhere, from mobile phones to immersive VR systems, for them to be truly valuable for getting things done.
3:19 PM - Question from the audience: How do we create standards of behavior in MUVEs? Sometimes people come out of their shell online and the thing that comes out of that shell isn’t very pretty.
3:45 PM - That’s it for me for today – tune in tomorrow!
Day 2 8:48 AM - Back in the saddle! This speaker’s content makes sense – using virtual worlds for science education by creating simulations that accomplish what cannot be accomplished in person (situated learning). Personal simulated ecosystem, for example.
9:01 AM - Paper-and-pencil tests are invalid in education? Methinks someone doesn’t write very good tests. The immersive assessment strategy can certainly assess different competencies than a paper-and-pencil test, but that doesn’t invalidate paper-and-pencil tests – that argument is unnecessary.
9:13 AM - Has similar goals to mine: virtual environments automatically customizing themselves to student needs.
9:16 AM - Oi… my concern with that video is that it encourages children to run in seedy back alleys without supervision.
10:36 AM - Panel on the use of VWs for “command and control centers”
10:40 AM - Again, the appeal of simulations seems to be largely in the ability to quickly make cheap simulations. Rapid prototyping with VOIP and instant messaging.
10:47 AM - Using VW-built prototypes to model hypothetical tactical systems and system displays… sounds similar to the process modeling work by Ross Brown but with more of a focus on information flow rather than physical flow
11:07 AM - “People do better playing war on XBOX Live than we do in the field in…urban occupation environment[s]”
11:35 AM - In Q&A with command and control session… and I’m out of time. Very interesting comments – will summarize thoughts in a dedicated post Monday.
In what I can only assume is a special issue of Organizational Research Methods, several researchers discuss common statistical and methodological myths and urban legends (MUL) commonly seen in the organizational sciences (for more introduction, see the first article in the series). Third up: Aguinis et al. write “Debunking Myths and Urban Legends About Meta-Analysis.”
Meta-analysis has become such a de facto method by which to synthesize a research literature in the organizational sciences that I hardly imagine a modern narrative literature review without one. If you aren’t familiar with it, meta-analysis essentially involves the computation of a grand mean something across research studies. This might be a mean difference (usually a Cohen’s d) or a correlation (usually a Pearson’s r).
Unfortunately, the surge in popularity of this statistical technique has brought with it a large number of researchers employing it without really understanding it – imagine the person who computes an ANOVA without any clue what “ratio of the between- to within-group variability” means. And even if we were to assume all researchers do understand it completely, we now have a large population of “consumers of meta-analyses” that need that same understanding just to accurately interpret a literature review.
Aguinis et al. provide a list of what they believe to be the 7 most common myths and urban legends associated with meta-analysis. My understanding is that this list came out of a session I attended at SIOP 2010 and subsequent discussions. I’ll list each of the myths as Aguinis et al. listed them, and my own interpretation of them:
- MUL #1: A single effect size can summarize a literature. Much as you cannot use a sample mean or sample correlation to conclude anything about a single person within that sample, you cannot generalize from a single meta-analytic estimate about any particular setting. This is why we have moderator analyses; the overall effect size from a meta-analysis only tells you what happens “on average.” There is not necessarily even a single study or setting where you would find the relationship described by that overall effect size.
- MUL #2: Meta-analysis can make lemonade out of lemons; meta-analysis allows researchers to gather a group of inconclusive and perhaps poorly designed studies and draw impressive conclusions with confidence. Larger samples are certainly gathered by meta-analysis than is possible in a single study, which is certainly a strength of approaching data from this perspective. But this has led to the common misconception that you can throw anything you want into a meta-analysis and get out “good” results. It reminds me of the old computer science expression, GIGO: garbage in, garbage out. If you include only poor quality studies, you’ll get a poor quality average.
- MUL #3: File drawer analysis is a valid indicator of possible publication bias. One of the techniques recommended to identify if your research suffers from a publication bias (published studies tend to show stronger results than unpublished ones) is to compute a failsafe N. This value represents how many studies with null results would need to be added to nullify the results of the present meta-analysis. While a low failsafe N indicates potential publication bias, a high failsafe N does not necessarily indicate the absence of it.
- MUL #4: Meta-analysis provides evidence about causal relationships. GIGO all over again. If you aren’t meta-analyzing experiments that provide evidence of causality, your meta-analysis will not magically add that interpretation.
- MUL #5: Meta-analysis has sufficient statistical power to detect moderating effects. It’s a common assumption that by meta-analyzing a research literature, you automatically have sufficient power to detect moderators. While it is true that meta-analyses have greater power to detect moderators than individual primary studies, you do not automatically have sufficient power to detect anything you want to detect.
- MUL #6: A discrepancy between results of a meta-analysis and randomized controlled trials means that the meta-analysis is defective. While a discrepancy might indicate a poorly designed meta-analysis, this is by no means conclusive. Some discrepancy is inevitable because a meta-analysis is an average of studies, and those studies will vary randomly.
- MUL #7: Meta-analytic technical refinements lead to important scientific and practical advancements. Most refinements in meta-analytic technique do not dramatically alter computed estimates. Although you should certainly use the most recent refinements (as they will produce the most accurate estimates), you don’t need to worry too much about forgetting one… although there are certainly a few exceptions to this (my own work on indirect range restriction comes to mind!). The biggest mistake is to redo and attempt to publish a meta-analysis that directly replicates another meta-analysis with only minor changes in approach; the difference between the old and new results will almost never be large enough to justify this unless the meta-analytic k is also dramatically increased.
In what I can only assume is a special issue of Organizational Research Methods, several researchers discuss common statistical and methodological myths and urban legends commonly seen in the organizational sciences (for more introduction, see the first article in the series). Second in the exploration: Spector and Brannick write “Methodological Urban Legends: The Misuse of Statistical Control Variables.”
Spector and Brannick criticize the tendency for researchers conducting correlational research to blindly include “control variables” in an attempt to get better estimates of population correlations, regression slopes, and other statistics. Such researcher effort is typically an attempt to improve methodological rigor when true experimentation isn’t possible, feasible, or convenient. Unfortunately, this is a methodological urban legend. And yet, shockingly, the authors report a study finding a mean of 7.7 control variables in macro-org research and 3.7 in micro-org research.
I will let the authors explain the problem:
Rather than being included on the basis of theory, control variables are often entered with limited (or even no) comment, as if the controls have somehow, almost magically, purified the results, revealing the true relationships among underlying constructs of interest that were distorted by the action of the control variables. This is assumed with often little concern about the existence and nature of mechanisms linking control variables and the variables of interest. Unfortunately, the nature of such mechanisms is critical to determining what inclusion of controls actually does to an analysis and to conclusions based on that analysis.
The authors call the blind inclusion of control variables in any attempt to get more accurate results the purification principle. The problem with the purification principle is that it is false; the inclusion of statistical controls does not purify measurement. Instead, it simply removes the covariance between the control variable and the other variables from later analyses, even though that covariance may be meaningful to the researcher’s hypotheses. The authors give this illustrative example:
A supervisor’s liking for a person might inflate the supervisor’s rating of that person’s job performance across multiple dimensions. Correlations among those dimensions might well be influenced by liking, which in effect has contaminated ratings of performance. Thus, researchers might be tempted to control for liking when evaluating relationships among rating dimensions. Note, however, that whether it is reasonable to control liking in this instance depends on whether liking is in fact distorting observed relationships. If it is not (perhaps, liking is the result of good performance), treating liking as a control will lead to erroneous conclusions. This is because removing variance attributable to a control variable (liking) that is caused by a variable of interest (performance) will remove the effect you wish to study (relationships among performance dimensions) before testing the effect you wish to study, or “‘throwing out the baby with the bathwater.”
So how should one actually use control variables? Two recommendations are given:
- Use specific, well-explored theory to drive the inclusion of controls, which goes beyond simple statements like, “previous researchers used this control” or “this variable is correlated with my outcomes.” If you believe that a specific relationship may be contaminating your results, this may be justification for a control, but you should explicit state why and defend this decision when describing your methods. Follow up on this discussion; test hypotheses about control varibles.
- Don’t control for demographic variables, e.g. race, gender, sex, age. For example, if you find a gender difference in your outcome of interest, controlling for that variable may hide real variance in the outcome that could be explained by whatever real phenomenon is causing that difference. In my own research are, it is not uncommon to control for age when examining the effects of technology on outcomes of interest (e.g. learning). But age does not itself cause trouble with technology; instead, underlying differences like familiarity with technology or comfort with technology or other characteristics may be driving those differences. Simply controlling for age not only removes “real” variance that should remain in the equation but also camouflages a real relationship of interest.
So generally, Spector and Brannick are calling for an organizational science based on iterative theory building, progressively testing alternative hypotheses and narrowing in on answers bit by bit. This approach is closer to what is employed in the natural sciences; instead of testing one-off theories, they build and build, approaching a problem from as many perspectives as possible to narrow in real results.
My only concern is this: since one-off studies with revealing and/or controversial results are the ones most often rewarded with recognition, is this an approach that organizational researchers will really take?Footnotes:
In what I can only assume is a special issue of Organizational Research Methods, several researchers discuss statistical and methodological myths and urban legends commonly seen in the organizational sciences (which is a term I’ve adopted for organizational behavior, human resources, and industrial/organizational psychology). Four articles in this issue stuck out to me, which are the ones I’ll be discussing over the next few weeks. First up: Edwards writes “The Fallacy of Formative Measurement.”
Before getting into this, I will give a small disclaimer: I am not a research methods specialist, nor am I a quantitative psychologist. I am industrial/organizational psychologist with an interest in research methods as a means to study phenomena I find interesting. As a result, I am approaching from the perspective of an end-user of methods rather than a person who studies them explicitly. So any misrepresentation here is probably my fault! That said…
According to Edwards, there are two models of the relationships between constructs (theoretical concepts we want to measure, like job satisfaction, happiness, learning, etc.) and measures (the actual data we end up analyzing). The distinction between these is important when creating models to analyze data, especially in the context of structural equation modeling or other explicit data modeling techniques.
The first and more traditional view is called reflective measurement. From this perspective, constructs cause measures, i.e. the construct is the “real” underlying characteristic, and a measure is simply a reflection of that characteristic.
The second and more recent view is called formative measurement. From this perspective, measures cause constructs, such that the only real constructs are latent constructs, i.e. ones that can be detected by looking for patterns in data and then carefully refining measurement.
The major conflict between these two perspectives is the role of theory. From a reflective perspective, we should develop theories of the relationships between constructs and use those theories to shape how data will be collected. From a formative perspective, we should look for patterns in data and let the data provide the basis for our theories, progressively shaping our theories as the data lead us to do so.
To add to the confusion, the two perspectives typically provide similar results. One researcher coming from the perspective of reflective measurement (developing theory, testing it) may end up at the same conclusions as another researcher coming from the reflective perspective (measuring a host of constructs, looking for patterns). But that doesn’t necessarily mean that the approaches are equally preferable.
Edwards discusses six key ways that formative measurement differs from reflective measurement:
- Dimensionality: Reflective measurement accepts that redundancy between measures is inevitable because all measurement is an imperfect representation of a construct. The goal, however, is to get as “pure” a measure of the construct as possible, so redundancy should be minimized. This is reflected in the typical measure development process – iteratively removing items that load on multiple factors in order to tap the construct as effectively as possible. In a formative measurement model, measures (and items) that are multidimensional by definition reflect multidimensional constructs. Consider this double-barreled Likert-type agreement item: “Sometimes I like pancakes and sometimes I like broccoli.” This question is multidimensional, but is it more likely that this is a bad question and should be refined (reflective) or that there is an underlying multidimensional construct (formative)?
- Internal Consistency: In a reflective model, two measures/items designed to tap the same construct should correlate highly. Formative measures have no such expectation; in fact, internal consistency might be a sign of a poor model as there is not definitive that the facets of a construct should be correlated. Edwards argues that this difference has led some researchers to conclude, finding that their reflective measures are multidimensional, that those measures are in fact formative. Unfortunately, this is not a valid conclusion.
- Identification: Identification refers to the ability to derive unique values for each model parameter. It’s probably a little overkill to go into the details of this here. But in a nutshell, one needs for a model to be overidentified in order to determine how well that model fits the data. For a formative measure to be identified, one must add reflective measures as outcomes of the formative measure. But because measures cause constructs in a formative model, adding these measures has implications about the construct (which, if I may editorialize a bit here, makes a huge mess of things).
- Measurement Error: In a reflective model, measurement error is associated with each item – this is the degree to which the item is unique and measures constructs other than the construct that it is supposed to measure. In a formative model, measurement error is associated with the latent construct instead. Each measure contributes some explanation of the latent construct, and whatever is left over is error. Thus the implicit assumption in a formative model is that the items do not possess unique measurement error – which seems a little odd, at least in the context of psychological constructs. Errors in recall, day to day affect fluctuations, etc. – these and many other sources of item-level measurement error appear to be discarded in formative models.
- Construct Validity: In a reflective model, construct validity refers to the degree to which a measure reflects the construct they are supposed to measure. Construct validation is intended to determine how well the measures represent the construct, which is a process that ultimately relies on researcher judgment. In a formative model, these decisions are made statistically, using by examining the interrelationships within the model itself. This means that the construct’s validity is driven by the kinds of variables you test its validity with, which makes it quite tricky.
- Causality: As stated before, in a reflective model, constructs cause measures, while in a formative model, measures cause constructs.
Thus, reflective models describe measures as imperfect indicators of underlying phenomena while formative models describe measures as an indistinguishable part of the constructs they are tied to.
I think the problem with pure formative models can be illustrated with this example. Consider a 10-question scale measuring the psychological state of happiness. Using a reflective model, we would use theory to conceptually define happiness and develop 10 questions to measure the construct. We would iteratively work to include only items that had high item-total correlations so that we had consistency of measurement, although we accept that we won’t have perfect measurement, as that would be impossible. We also realize that happiness exists independently of whether or not we measure it, and whether or not we measure it well.
A formative model would not make many of these assumptions. For example, once we created a 10-question scale, we have created a construct. If multidimensionality in measurement is discovered, that indicates a multidimensional construct. Rather than saying happiness exists as a human characteristic that we are attempting to measure, we say that this measurement is a part of the definition of whatever construct is being measured (hopefully happiness). Individual items in the measure do not contain both common elements of happiness and their own unique contributions; instead, they contribute directly to the variance in the constructs. For example, an item “I am generally happy at work.” contains variance contributed both by general happiness and work-specific (unique) happiness when explored reflectively. From a formative perspective, the unique variance gets added to the error associated with the latent trait – it is a part of the construct, but a part that doesn’t help explain anything theoretically. Simply because we asked the question, it becomes a part of the construct – which, in my opinion, doesn’t really make any sense.
Thus, Edwards’ conclusion is that formative measurement is based on invalid assumptions about the nature of data in the six categories listed above. He continues by suggesting alternatives to pure formative models by combining elements of reflective models with them. These mixed models are no longer purely formative, but maintain many of the advantages of formative measurement – for example, better handling of multidimensional constructs.
For my own work, this has convinced me that, at a minimum, formative models as they exist currently are too controversial to be employed safely, both from a publishing perspective and for me to be confident in my own conclusions.Footnotes:
Day 2 at SIOP started with a session not quite related to tech research, but rather something I found personally interesting: ways that I/O Psychology is currently “making a difference.” The presentation that struck me the most in the set was one covering the role of I/O in the Bureau of Indian Affairs, which is apparently a division of the US Department of the Interior tasked with working with the several hundred Native American tribes residing in the United States. Historically, the Bureau was responsible for overseeing “Indian Affairs” but is currently in the midst of a cultural transition towards an advisory help-from-within sort of role.
As a result, the Bureau is actively trying to hire Native Americans to fill its ranks (Native Americans serving Native Americans), and there are many, many job roles within the Bureau that need to be filled. Although their recruitment efforts are fairly successful – they are able to recruit several hundred Native Americans each year – these folks often leave within a year. I/O psychologists working within the Bureau discovered the reason and helped design new recruitment and other materials to support Native American retention.
I also attended a session on online recruitment. It was fine, but there was not a whole lot of new information. Online recruitment at the military, for example, consists of live chat and e-mail with potential recruits. The Navy alone apparently holds 700-800 recruitment chats per day with around 100 e-mails per day – that’s a lot of recruits. But that’s also a recruitment technology and approach that’s been around for at least a decade. While the volume is impressive, it’s not particularly innovative.
The one piece of that presentation that I did find interesting was the report on their online social network called MyNavySpace, which is a space for potential recruits to chat and communicate prior to showing up for basic training. Across the board, 26% of recruits don’t show up for basic training, but among those using MyNavySpace, the number drops to about 6%. Whether that’s because recruits using the social network are more motivated or because the social network motivates them is unclear.
On Day 3, I only attended one session, but it was a good one: a group of practitioners discussing serious games and virtual worlds. Several major issues relatively untouched in these research literatures were touched upon, including: the distinction between serious games and gameification, the limitations of artificial intelligence for automatic assessment within serious games, and the lack of evidence of transfer of behaviors from serious games to the workplace.
Two ideas discussed were particularly interesting to me. First, Ben Hawkes at Kenexa brought up research on the uncanny valley and its implications for video-based simulations. The uncanny valley is a fascinating theory – the idea that as the fidelity of human representation in 2D/3D media is only good to up to a point, at which point it becomes suddenly very disturbing. For example, a small photo of a person is more “human” than a name in a chatroom; a Second Life avatar is more “human” than a photo. But at a certain point – think Polar Express – the representation is just downright creepy. It’s close to “human” and yet there is something wrong that really jumps out at us. Some researchers say this is why people find zombies so disturbing – human, but not quite.
Second, I really did not like the idea of “stealth assessment.” There was some belief that people really engaged in a serious game would enter a “flow state,” and people in this flow state would forget they were being assessed (i.e. they would drop their self-monitoring defenses because they were so engaged). Thus, the assessor would get a more honest read of the applicants personality. The two problems I see are 1) this may be somewhat unethical, as we should never be tricking job applicants for any reason and 2) this creates measurement inequalities. If John the Applicant enters the flow state and starts yelling in frustration and Mary the Applicant does not enter the flow state and does not yell, it doesn’t mean that Mary has greater emotional stability than John. There is no way to disentangle the propensity for an employee to enter the flow state and the psychological constructs we think that flow state should let us see.
So that’s it for my SIOP 2011 conference experience. The relatively light density of technology presentations meant that I only spent about half my time at presentations and posters, and the other half chatting with old friends and new collaborators. And isn’t that what conferencing is all about?
This is a capture of my (@rnlanders) Twitter feed on the second and third days of the conference. Day 2 summary will be combined with the Day 3 summary, since each day was a little light on tech.
Day 2 Twitter Feed
9:35AM Day 2 at #SIOP11 begins with a training poster session
11:12AM how I-O is making a difference – work on national intelligence related to cybersecurity #SIOP11
11:16AM – IOs role in India
Nevermind… IOs role in American Indian affairs (part of the dept of interior). Didn’t connect “Indian” to native Americans
RT Sick of hearing about “important” moderators that have an r-squared change of .01. #SIOP #SIOP11
@ErikaWendt @workpsy is there a tweetup today?
US Navy has a Cyberspace Ops unit which includes online recruiting… neat
The #navy uses live chat, Facebook, email for online recruitment… 700-800 chats + 100 emails per day with low Sundays
145k chats per year Navy, 252k chats Air Force, 202k chats Army… Amazing volume
MyNavySpace as corporate social network; members using it tend to be retained – 26% of recruits don’t show up for basic, 6% MNS users dont
Now another piece on applicant reactions to websites. H1 is that nav bar at top of site is best. A bit atheoretical, eh?
@lukasneville Too much of a good thing!
@lukasneville and I do think we can do better than “top is best, left is okay”
Day 3 Twitter Feed
At serious games panel at #SIOP11
@WorkPsy talking about being careful to distinguish serious games and gameification #siop11
computer based simulations as a logical approach to learning and assessment for computer-centric jobs #siop11
@WorkPsy on using peer assessment within a game context to get creativity/innovation scores from games #siop11
serious game assessment is limited by current artificial intelligence tech available – human assessors still needed #siop11
games as a way to assess “real behaviors” b/c employees forget they are being assessed…not sure stealth assessment is so desirable #siop11
@mrand308 it is bad from a measurement perspective too…even if you are getting more information on some folks, you don’t know about others
Is there evidence of transfer from 3D immersive games? Short answer: no, not really. But we’re getting there. #SIOP11
gender, age differences in responses to video vs avatar based SJT… uhoh #siop11
getting a smidge heated in discussion of value of serious games… Several challengers in the audience #siop11
@LenaOgan we actually do know a little, but most of the research is on children
If you’re new here (I always get a bit of a surge of readers around SIOP), you’ll know that my main interests surround the use of technology in IO/OBHR. As a result, “technology” generally describes the sessions that I attend at the SIOP conference.
This year, academic sessions on technology seem to be a bit light. I’m not sure why. But it does mean that I am not attending as much as usual (and have less to talk about!).
Today, I started with an invited talk by Andrea Goldberg on social media and its evolving role in business (and by extension, I/O Psychology). Andrea is an excellent speaker, and as usual, I found myself pondering social media and its quickly evolving role in organizations. Will I/O lead this revolution or be led?
I continued on to my own symposium on current research on multi-user virtual environments (like Second Life). It was well-attended for an event on a new technology. I gave my own introduction on what MUVEs are, their history, and some general examples of the kinds of uses to which they might be put. Sam Kaminsky presented fascinating research from Tara Behrend’s lab on the role of virtual worlds in recruiting, the takeaway being: virtual worlds are quite distracting and make it more difficult to remember information about an organization. My own graduate student, Rachel Johnson, then presented the potential value of virtual worlds when designing training programs. Thomas Whelan then presented his work in the lab of Lynda Aiman-Smith on teamwork in virtual worlds. Finally, we had a very special presentation by Ross Brown on the intersection of business process modeling as realized in Second Life with the work we traditionally conduct in I/O Psychology. It was all wrapped up with a fantastic discussion moderation by our discussant, Jeff Stanton. All in all, quite a success with some very interesting questions.
After lunch, I headed to another session on social media in the workplace. This was a little data-light (which I personally find frustrating) until the third presentation, which was a very intriguing discussion of the use of a corporate social media platform in an attempt to better support and retain aboriginal Canadian employees (roughly equivalent to affirmative action policies in the United States). The platform was initially quite popular, followed by a drop in popularity, followed by a later resurgence after system redesign. The drop in popularity was likely due to a lack of obvious value-added to employees; although they were very enthusiastic about the idea of the corporate social network, they would often forget it was even there. There were a lot of parallels to my own work trying to promote underrepresented groups in STEM fields using the support systems provided by social media.
Finally, I attended a panel discussion on the virtual workplace. While very interesting, there was not much data to speak of – it was almost as if a room full of virtual workers got together in a room to complain about how frustrating virtual work can be.
So that’s it. Tomorrow has more posters and fewer sessions on the schedule, so the live stream will probably be a little less dense.
Post-Conference Edit: Turns out that Christopher Rosett is doing a retrospective on SIOP over at the SIOP Exchange. Here’s the link to Day 1. Take a look!
This is a permanent record of tweets by rnlanders on Day 1 (Thursday) of SIOP. Summary later tonight.
#SIOP11 Day 1 begins!
10:29AM – I always intend to go to the opening plenary… But 8:30 is just so early!
At invited talk by @dcctips on the social media revolution at #SIOP11
I am revolutioning #siop11 #siop11rev
Talking about #wikinomics as a model of mass collaboration and networking
10:47AM – Interesting selection of case studies showcasing ambiguity surrounding social media policy
Got a mention in the social media revolution #SIOP11 #siop11rev
11:04AM – BlogNog platform sounds promising for gathering data
Tech savvy orgs started down the #socialmedia path, but its reach is now much longer #SIOP11 #siop11rev
Heading to get ready for our #siop11 #secondlife and MUVEs symposium
Virtual world / #secondlife session went great! Now to grab lunch? #siop11
Post lunch panel on social media in the workplace #SIOP11
@BreannePH terrible misinformation – direct them toward my January JAP
starting with the standard “everyone is on social media!” preaching to the choir? #siop11
ohhh io psychologists.. how i have missed you and your scientific rigor and structured methodologies…#SIOP #iopsychology
3:54PM – Still hypothetical benefits… “everyone says” doesn’t convince me… Need research!
Still waiting for some #psychology in these #socialmedia presentations… Lots of market research #SIOP11
4:06PM – Can’t help but wonder about results from a presentation on social media from a social media company #siop11 conflict of interests?
I am being buried under a flood of self report data
4:31PM – Using #socialmedia to improve inclusion, retention, sense of community for aboriginal (minority) Canadians… Much better #SIOP11
@BreannePH there are several with a financial interest saying no one cheats in UIT, ever, so stop studying it- a frustrating #siop sometimes
Fascinating case study by RBC, curious about all of their data #SIOP11
5:13PM – A little late to Preparing for the Virtual Workplace, but I made it
@ErikaWendt I think we are in the same room – I am in back right toward the door, in all black
“virtual employees need to toot their own horns more” – I remind my online students that I really am working on the class #SIOP11
Day 1 #SIOP11 complete! Now for nightlife
This post is a live-blog of Wednesday, April 13, during which time I attended the SIOP Junior Faculty Consortium. This blog is sourced from Twitter.
11:35AM – At the junior faculty consortium at #SIOP11
11:55AM – Suzanne Bell talking about “time burglars” – don’t wait on others, budget your own time, schedule research days
12:45AM – Always have list of different projects ad types of projects – insurance against unexpected failure
12:50AM – Excessive documentation for #tenure – keep folders for each year separated by service, research, teaching
1:02PM – James LeBreton: get an academic best buddy to share experiences, review each others papers, etc
1:07PM – Don’t be too good an organizational citizen – there’s only so much time in the day
1:15PM – There’s plenty of time after tenure to do it all – don’t rush into advising, volunteering, reviewing, service
1:23PM – Paper discussed in the Chronicle on the “anti-vita” – our history of missed jobs, unsupported grants, rejected pubs – what a great idea
1:26PM – #SIOP teacher’s bureau is a list if IO psychologists volunteering to give local talks on what IO is for free
2:20PM – Post lunch, ready for words of wisdom
2:23PM – Neal Schmitt – be willing to move
2:30PM – Schmitt: be strategic without being Machiavellian
2:35PM – Paul Sackett: use software Publish or Perish and interpret yourself
2:45PM – Sackett: we all have horror stories (referring to data collection)
2:50PM – Sackett: one such story involves chasing a garbage truck in a police car!
3:05PM – Pat Sackett (via Paul): save some fun for tomorrow
3:10PM – Campion: don’t study what interests you, because you don’t know what interests you
3:14PM – Sackett and Campion: an academic career is a marathon, not a sprint
3:16PM – Campion: methods not content drive publication; study what you can study well
3:17PM – Campion: reviewers – wear them down!
3:30PM – In their entire careers, Sackett and Schmitt have had 1 paper accepted on the first try, Campion has never had one #SIOP11
3:45PM – Sackett: my job as editor was often to decide which rejected paper to publish
4:07PM – Dan Sachau on building student culture
4:10PM – Sachau: assigning a grad student to be consigliere (Godfather reference!)
4:13PM – Sachau: Fall student miniconference with alumni on pontoon boats – awesome
4:27PM – Still getting lots of good advice on being #faculty at #SIOP11 Jr. Faculty Consortium
5:12PM – Barnes-Ferrell: find service opportunities that you love… Avoid all others
5:34PM – Rogelberg: help your department give you good service
Like last year, I’ll be live-blogging from the SIOP conference, which begins next Thursday. This post contains my hypothetical schedule. Of course, the events that you want to see are not always the events you end up seeing, so this is not necessarily definite.
In comparison to last year, I noticed that coverage of “technology at work” is reduced. Whether that’s because fewer people send in technology-related submissions or because the program committee was biased against such submissions (I’m not bitter!!) is unclear.
Personally, I’m on three pieces at SIOP this year, which are highlighted in the chart below. I encourage you to attend Empirical Evidence for Emerging Technology: MUVEs/Virtual Worlds in HR, a symposium I put together with Dr. Tara Behrend at George Washington University. We’ve got a fascinating line-up, including discussion on recruiting, training, and performance appraisal applications of virtual worlds (like Second Life).
|Day||Start||End||Session Title||Room||Session Type|
|Thu||8:30||9:50||Opening Plenary||International Ballroom||Official|
|Thu||10:30||11:20||Collaborative, Virtual, and Open: How the Social Media Revolution Is Changing the Workplace||Marquette, 3rd Floor||Invited Speaker|
|Thu||12:00||1:20||Empirical Evidence for Emerging Technology: MUVEs/Virtual Worlds in HR||Continental B||Symposium|
|Thu||3:30||4:20||Applications of Social Media in the Workplace||International Ballroom South||Symposium|
|Thu||5:00||5:50||Preparing for the Workplace – the Virtual Workplace||Boulevard AB||Panel|
|Fri||9:00||9:50||Training Poster Session
Personality and Synchronicity Interaction Predicts Training Performance in Online Discussion
Training Students to Increase Employment Opportunity Using Social Networking Web Sites
Learner-Controlled Practice Difficulty: The Roles of Cognitive and Motivational Processes
|SE Exhibit Hall||Posters|
|Fri||10:30||11:20||The Greater Good: How I-O Is Making a Difference||Boulevard AB||Symposium|
|Fri||11:30||12:20||Job Performance/Deviance/CWB Poster Session
Predicting Dishonest Online Test Taking Behavior in Unproctored Internet-Based Testing
|SE Exhibit Hall||Posters|
|Fri||2:00||2:50||Catch-all Poster Session
The Viability of Crowdsourcing for Survey Research
|SE Exhibit Hall||Posters|
|Fri||3:30||4:20||Online Recruiting: Taking It to the Next Level||Continental A||Symposium|
|Sat||9:00||9:50||Staffing Poster Session
Internet Job Seekers’ Information Expectations Predict Organizational Attraction
|SE Exhibit Hall||Posters|
|Sat||10:30||11:20||Justice/Ethics/Legal Poster Session
Engagement in Online Communities: All About Pride and Respect
|SE Exhibit Hall||Posters|
|Sat||12:30||1:20||Serious Games and Virtual Worlds: The Next I-O Frontier||Lake Michigan||Symposium/Forum|
|Sat||2:00||2:50||Theme Track: What Convinces Us, Doesn’t Necessarily Convince Execs: What They Didn’t Teach You in Grad School About Influencing||Williford C||Theme Track|
|Sat||3:30||4:20||Theme Track: Closing Keynote and Wrap Up: People Analytics: Is It All In Our Heads||Williford C||Theme Track|
|Sat||4:30||5:20||Closing Plenary||International Ballroom||Official|