Gamification, which refers to the use of game elements in non-game contexts, is commonly used as a way to influence the motivation of people in a variety of contexts, including consumer behavior, employee behavior, and student behavior. Much prior research on gamification has been imprecise in which particular game elements are adopted; for example, a study might implement 3 or 4 game elements simultaneously and compare performance in a “gamified” group to a control group. This generally results in difficult-to-interpret and impractical results; how do you know what aspect of gamification actually changed behavior? And beyond this challenge, the simple framing of an activity as a “game” can potentially alter behavior even further. How can we disentangle the effects of game elements from the effect of game framing?
In an upcoming article in Games and Culture, Lieberoth sought to explore the effect of such framing empirically, which he called shallow gamification. In this study, all participants participated in a 30-minute task where they discussed a variety of questions related to solving a “business problem” in groups of 6 – in this case, understanding why student satisfaction statistics were low in the researcher’s department’s recent student evaluation surveys. Deception was involved. After the task, participants were told that the experiment was over but that they needed to stick around for an additional 20 minutes to wait for another group to finish, during which time they were able to continue engaging in the task. Within this procedure, they were randomly assigned to one of three conditions:
- In the control condition, 23 participants just did the task as described.
- In the shallow gamification condition, 22 participants completed the control task but were additionally provided a game board, but no additional game mechanics were used – players simply progressed along the board as they provided answers.
- In the deep gamification condition, 25 participants completed the control task but were additionally provided a game board, and some game mechanics were introduced; specifically, the rating of their discussion was used to determine how many spaces they progressed.
I don’t personally agree with this characterization; I would say that the shallow gamification condition simply incorporates fewer game elements than the deep gamification condition. I would argue that because the shallow/deep dichotomy is quite artificial – just how much gamification is needed to move from one to the other?
Regardless, did more gamification produce better results than less? Unfortunately, presumably due to the small sample size, the researchers did not use a hierarchical analytic approach despite the grouping. This is problematic because there may have been an effect of the composition of gameplay groups – and if you count the number of groups, there were only 3 to 4 per condition, which is a tiny sample. Instead, the researcher ignored group membership and instead focused upon individual-level effects. That may or may not have mattered; there’s no way to know given what was reported.
At the individual level though, some interesting results appeared. Specifically:
- Gamification conditions were rated as more interesting than the control conditions, although the gamification conditions were not significantly different from each other; this may be a power issue, however, given the small sample size (there was about a .2 SD difference between these conditions, with higher scores for the “deep” conditions). There were no differences on any other dimensions of intrinsic motivation.
- The control condition addressed more items in the task than either of the gamification conditions.
- There were no other differences in behavior between conditions, i.e. time spent on task.
The researcher concluded from this that “framing has a significant effect on enjoyment.”
My major concern with this conclusion is that framing is not really what changed here. To make conclusions about framing, I would have rather the researchers only changed one thing about the study: did they call the activity a game? Instead, they presented a game board, which is itself a type of gamification. The major difference between gamification conditions to me is not that one is shallow and one is deep, but instead that one is gamified with one element (the board) and the other with two (the board and a set of rules). Would we have seen the same effect with simple framing, i.e., “You’re about to play a game.”? Would we have seen the same effect with a different gamification element? What if, for example, only rules had been used, and players had been asked to record their scores on paper? There is no way to know from this study alone.
Regardless, this study provides a compelling example that relatively simple changes inspired by gaming – which I would argue is the heart of “gamification” – can produce measurable effects on motivation. Interestingly, the number of discussion items addressed decreased as a result of such gamification. The researcher suggested that this was because the game framing reduced the feeling of this being a “serious” task. As the researcher put it:
I surmise that that [sic] adding a playful frame to the task actually took away some of the grit and output orientation of more goal-oriented work.
If this type of gamification reduces performance in some contexts, this is certainly an important starting point for future research. But I am hesitant to attribute this to “shallowness” or “depth” alone.Footnotes:
Amazon Mechanical Turk (MTurk) has quickly become a highly visible source of participants for human subjects research. Psychologists, in particular, have begun to use MTurk as a major source of quick, cheap data. Studies with hundreds or thousands of participants can be identified in mere days, or sometimes, even a few hours. When it takes a full semester to get a couple hundred undergraduate research participants, the attractiveness of MTurk is obvious. But is it safe to use MTurk this way? Won’t our research be biased toward MTurk workers?
New work by Landers and Behrend explore the suitability of MTurk and other convenience sampling techniques for research purposes within the field of industrial/organizational psychology (I/O). I/O is a particularly interesting area to explore this problem because I/O research focuses upon employee behavior, and it has been a longtime concern in that field to answer questions of sampling, like: Under what conditions does sampling undergraduates lead to valid conclusions about employees?
Traditionally, there are some very extreme opinions here. Because I/O research is concerned with organizations, some researchers say the only valid research is research conducted within real organizations. Unfortunately, this preference is based largely in tradition and a largely superficial understanding of sampling.
Sampling in the social sciences works by choosing a population you’re interested in and then choosing people at random from that population. For example, you might say, “I’m interested in Walmart employees” (that’s your population), so you send letters to every Walmart employee asking them to respond to a survey. This is called probability sampling.
The key issue is that probability sampling is effectively impossible in the real world for most psychological questions (including those in I/O). Modern social science research is generally concerned with global relationships between variables. For example, I might want to know, “In general, are people that are highly satisfied with their jobs also high performers?” To sample appropriately, I would need to randomly select employees from every organization on Earth.
Having access to a convenient organization does not solve this problem. Employees in a particular company are convenience samples just as college students and MTurk Workers are. The difference that researchers should pay attention to is not the simple fact that these are convenience samples, but instead, what does that convenience do to your sample?
For example, if we use a convenient organization, we’re also pulling with it all the hiring procedures that the company has ever used. We’re grabbing organizational culture. We’re grabbing attrition practices. We’re grabbing all sorts of other sample characteristics that are part of this organization. As long as none of our research questions have anything to do with all these extra characteristics we’ve grabbed, there’s no problem. The use of convenience sampling in such a case will introduce unsystematic error – in other words, it doesn’t bias our results and instead just adds noise.
The problems occur only when what we grab is related to our research questions. For example, what if we want to know the relationship between personality and job performance? If our target organization hired on the basis of personality, any statistics we calculate based upon data from this organization will potentially be biased. Fortunately there are ways to address this statistically (for the statistically included: corrections for range restriction), but you must consider all of this before you conduct your study.
MTurk brings the same concerns. People use MTurk for many reasons. Maybe they need a little extra money. Maybe they’re just bored. As long as the reasons people are on MTurk and the differences between MTurkers and your target population aren’t related to your research question, there’s no problem. MTurk away! But you need to explicitly set aside some time to reason through it. If you’re interested in investigating how people respond to cash payments, MTurk probably isn’t a good choice (at least as long as MTurk workers aren’t your population!).
As the article states:
- Don’t assume organizational sampling is a gold standard sampling strategy. This applies to whatever your field’s “favorite” sampling strategy happens to be.
- Don’t automatically condemn college students, online panels, or crowdsourced samples. Each of these potentially brings value.
- Don’t assume that “difficult to collect” is synonymous with “good data.” It’s so natural to assume “difficult = good” but that’s a lazy mental shortcut.
- Do consider the specific merits and drawbacks of each convenience sampling approach. Unless you’ve managed to get a national or worldwide probability sample, whatever you’re using is probably a convenience sample. Think through it carefully: why are people in this sample? Does it have anything to do with what I’m doing in my study?
- Do incorporate recommended data integrity practices regardless of sampling strategy. Poor survey-takers and lazy respondents exist in all samples. Deal with these people before your study even begins by incorporating appropriate detection questions and plan those statistical analyses a priori.
- Landers, R.N., & Behrend, T.S. (2015). An inconvenient truth: Arbitrary distinctions between organizational, Mechanical Turk, and other convenience samples. Industrial and Organizational Psychology, 8 (2) [↩]