Gamification, which refers to the use of game elements in non-game contexts, is commonly used as a way to influence the motivation of people in a variety of contexts, including consumer behavior, employee behavior, and student behavior. Much prior research on gamification has been imprecise in which particular game elements are adopted; for example, a study might implement 3 or 4 game elements simultaneously and compare performance in a “gamified” group to a control group. This generally results in difficult-to-interpret and impractical results; how do you know what aspect of gamification actually changed behavior? And beyond this challenge, the simple framing of an activity as a “game” can potentially alter behavior even further. How can we disentangle the effects of game elements from the effect of game framing?
In an upcoming article in Games and Culture, Lieberoth1 sought to explore the effect of such framing empirically, which he called shallow gamification. In this study, all participants participated in a 30-minute task where they discussed a variety of questions related to solving a “business problem” in groups of 6 – in this case, understanding why student satisfaction statistics were low in the researcher’s department’s recent student evaluation surveys. Deception was involved. After the task, participants were told that the experiment was over but that they needed to stick around for an additional 20 minutes to wait for another group to finish, during which time they were able to continue engaging in the task. Within this procedure, they were randomly assigned to one of three conditions:
- In the control condition, 23 participants just did the task as described.
- In the shallow gamification condition, 22 participants completed the control task but were additionally provided a game board, but no additional game mechanics were used – players simply progressed along the board as they provided answers.
- In the deep gamification condition, 25 participants completed the control task but were additionally provided a game board, and some game mechanics were introduced; specifically, the rating of their discussion was used to determine how many spaces they progressed.
I don’t personally agree with this characterization; I would say that the shallow gamification condition simply incorporates fewer game elements than the deep gamification condition. I would argue that because the shallow/deep dichotomy is quite artificial – just how much gamification is needed to move from one to the other?
Regardless, did more gamification produce better results than less? Unfortunately, presumably due to the small sample size, the researchers did not use a hierarchical analytic approach despite the grouping. This is problematic because there may have been an effect of the composition of gameplay groups – and if you count the number of groups, there were only 3 to 4 per condition, which is a tiny sample. Instead, the researcher ignored group membership and instead focused upon individual-level effects. That may or may not have mattered; there’s no way to know given what was reported.
At the individual level though, some interesting results appeared. Specifically:
- Gamification conditions were rated as more interesting than the control conditions, although the gamification conditions were not significantly different from each other; this may be a power issue, however, given the small sample size (there was about a .2 SD difference between these conditions, with higher scores for the “deep” conditions). There were no differences on any other dimensions of intrinsic motivation.
- The control condition addressed more items in the task than either of the gamification conditions.
- There were no other differences in behavior between conditions, i.e. time spent on task.
The researcher concluded from this that “framing has a significant effect on enjoyment.”
My major concern with this conclusion is that framing is not really what changed here. To make conclusions about framing, I would have rather the researchers only changed one thing about the study: did they call the activity a game? Instead, they presented a game board, which is itself a type of gamification. The major difference between gamification conditions to me is not that one is shallow and one is deep, but instead that one is gamified with one element (the board) and the other with two (the board and a set of rules). Would we have seen the same effect with simple framing, i.e., “You’re about to play a game.”? Would we have seen the same effect with a different gamification element? What if, for example, only rules had been used, and players had been asked to record their scores on paper? There is no way to know from this study alone.
Regardless, this study provides a compelling example that relatively simple changes inspired by gaming – which I would argue is the heart of “gamification” – can produce measurable effects on motivation. Interestingly, the number of discussion items addressed decreased as a result of such gamification. The researcher suggested that this was because the game framing reduced the feeling of this being a “serious” task. As the researcher put it:
I surmise that that [sic] adding a playful frame to the task actually took away some of the grit and output orientation of more goal-oriented work.
If this type of gamification reduces performance in some contexts, this is certainly an important starting point for future research. But I am hesitant to attribute this to “shallowness” or “depth” alone.
- Lieberoth, A. (2014). Shallow gamification: Testing psychological effects of framing an activity as a game Games and Culture DOI: 10.1177/1555412014559978 [↩]
Amazon Mechanical Turk (MTurk) has quickly become a highly visible source of participants for human subjects research. Psychologists, in particular, have begun to use MTurk as a major source of quick, cheap data. Studies with hundreds or thousands of participants can be identified in mere days, or sometimes, even a few hours. When it takes a full semester to get a couple hundred undergraduate research participants, the attractiveness of MTurk is obvious. But is it safe to use MTurk this way? Won’t our research be biased toward MTurk workers?
New work by Landers and Behrend1 explore the suitability of MTurk and other convenience sampling techniques for research purposes within the field of industrial/organizational psychology (I/O). I/O is a particularly interesting area to explore this problem because I/O research focuses upon employee behavior, and it has been a longtime concern in that field to answer questions of sampling, like: Under what conditions does sampling undergraduates lead to valid conclusions about employees?
Traditionally, there are some very extreme opinions here. Because I/O research is concerned with organizations, some researchers say the only valid research is research conducted within real organizations. Unfortunately, this preference is based largely in tradition and a largely superficial understanding of sampling.
Sampling in the social sciences works by choosing a population you’re interested in and then choosing people at random from that population. For example, you might say, “I’m interested in Walmart employees” (that’s your population), so you send letters to a randomly selected subset of all Walmart employees asking them to respond to a survey. This is called probability sampling.
The key issue is that probability sampling is effectively impossible in the real world for most psychological questions (including those in I/O). Modern social science research is generally concerned with global relationships between variables. For example, I might want to know, “In general, are people that are highly satisfied with their jobs also high performers?” To sample appropriately, I would need to randomly select employees from every organization on Earth.
Having access to a convenient organization does not solve this problem. Employees in a particular company are convenience samples just as college students and MTurk Workers are. The difference that researchers should pay attention to is not the simple fact that these are convenience samples, but instead, what does that convenience do to your sample?
For example, if we use a convenient organization, we’re also pulling with it all the hiring procedures that the company has ever used. We’re grabbing organizational culture. We’re grabbing attrition practices. We’re grabbing all sorts of other sample characteristics that are part of this organization. As long as none of our research questions have anything to do with all these extra characteristics we’ve grabbed, there’s no problem. The use of convenience sampling in such a case will introduce unsystematic error – in other words, it doesn’t bias our results and instead just adds noise.
The problems occur only when what we grab is related to our research questions. For example, what if we want to know the relationship between personality and job performance? If our target organization hired on the basis of personality, any statistics we calculate based upon data from this organization will potentially be biased. Fortunately there are ways to address this statistically (for the statistically included: corrections for range restriction), but you must consider all of this before you conduct your study.
MTurk brings the same concerns. People use MTurk for many reasons. Maybe they need a little extra money. Maybe they’re just bored. As long as the reasons people are on MTurk and the differences between MTurkers and your target population aren’t related to your research question, there’s no problem. MTurk away! But you need to explicitly set aside some time to reason through it. If you’re interested in investigating how people respond to cash payments, MTurk probably isn’t a good choice (at least as long as MTurk workers aren’t your population!).
As the article states:
- Don’t assume organizational sampling is a gold standard sampling strategy. This applies to whatever your field’s “favorite” sampling strategy happens to be.
- Don’t automatically condemn college students, online panels, or crowdsourced samples. Each of these potentially brings value.
- Don’t assume that “difficult to collect” is synonymous with “good data.” It’s so natural to assume “difficult = good” but that’s a lazy mental shortcut.
- Do consider the specific merits and drawbacks of each convenience sampling approach. Unless you’ve managed to get a national or worldwide probability sample, whatever you’re using is probably a convenience sample. Think through it carefully: why are people in this sample? Does it have anything to do with what I’m doing in my study?
- Do incorporate recommended data integrity practices regardless of sampling strategy. Poor survey-takers and lazy respondents exist in all samples. Deal with these people before your study even begins by incorporating appropriate detection questions and plan those statistical analyses a priori.
- Landers, R.N., & Behrend, T.S. (2015). An inconvenient truth: Arbitrary distinctions between organizational, Mechanical Turk, and other convenience samples. Industrial and Organizational Psychology, 8 (2) [↩]
The most promising and yet most disappointing aspects of the Internet are the written comments left by the general public. On one hand, comment sections are a great democratization of personal opinion. With public commenting, anyone can make their opinion known until the world on whatever topic interests them. On the other hand, comment sections give voice to absolutely any nutjob with Internet access. As it turns out, and this is evident to anyone who has ever scrolled down on any video anywhere on YouTube, comment sections often devolve into base attacks, non sequiturs, and general insanity.
So how to deal with that problem? In an upcoming paper, in the Journal of Computer-Mediated Communication, Stroud, Scacco, Muddiman and Curry1 explore this issue on news websites by randomly assigning 70 political posts made by a local TV station to one of three conditions:
- Condition 1: An unidentified staff member from the TV station participated in the discussion.
- Condition 2: A political reporter participated in the discussion.
- Condition 3: The discussion was permitted to run unmonitored.
2703 comments were made on these posts. And as it turns out, participation in your comment section can change the tone. Findings included:
- Reporter participation improved the deliberative tone of discussion, decreased uncivil comments (17% reduction in probability), and increased the degree to which commenters provided supporting evidence for their points (15% increase in probability).
- Reporter participation also appeared to improve the probability commenters left comments relevant to the post and asked genuine questions, but these effects were much smaller (and not statistically significant, given the sample size).
- Presence of the staff member had no effect in comparison to unmonitored discussion. Troublingly, uncivil comments in fact went up and genuine questions went down when staff members were present.
- The specific type of prompt right before the discussion began had a smaller effect on discussion quality. Specifically, open-ended questions (e.g., “What do you think about x?”) versus closed questions produced slightly more genuine questions and evidence, but only changes in probabilities were only about 10%.
The take-home here is that an “official”, knowledgeable, and active participant in the comment section did improve the quality of discussion. Considering the link between discussion quality and time spent on websites, this has important implications for the use of discussion forums in contexts other than that of TV stations.
And although I don’t think we’re going to fix YouTube any time soon, this is definitely a step in the right direction.
- Stroud, N., Scacco, J., Muddiman, A., & Curry, A. (2014). Changing Deliberative Norms on News Organizations’ Facebook Sites Journal of Computer-Mediated Communication DOI: 10.1111/jcc4.12104 [↩]