A recent study by Giumetti et al examines cyber incivility, which is defined as low intensity “rude and discourteous” behavior that takes place through an internet or intranet-based communications system (e.g. e-mail, chat, or Facebook). They found that those reporting having experienced cyber incivility were more likely to skip work, burn out, and report their intention to look for a new job.
Incivility differs from traditional interpersonal organizational deviance in that it is less severe and more casual. While a person engaging in interpersonal deviance might steal their coworker’s stapler in order to annoy them, incivility might be expressed simply by that coworker making seemingly off-hand negative comments about their coworker’s workspace. Deviance is much easier to detect because it is generally obvious and with clear purpose; it is immediately clear that someone was acting inappropriately. Incivility, on the other hand, might be purposeful or it might not – it is difficult to know for sure, even for the victim.
As an example of cyber incivility, consider this scenario: a supervisor assigns work over the weekend to a subordinate. On Monday morning, the supervisor shoots off a quick e-mail that only says, “I hope you had a good weekend.” The subordinate is unclear on how to interpret this, as it might 1) be a half-hearted apology for assigning work over the weekend, 2) indicate that the supervisor didn’t remember or didn’t care about the extra work, or 3) represent the supervisor sarcastically rubbing in that s/he ruined the subordinates weekend. Thus, this is incivililty; the supervisor should not have sent the e-mail at all, or if it was not ill-intentioned, should have been more clear about the apology.
To examine their hypotheses, the authors collected surveys from two samples: 1) 407 university staff members and 2) 207 business school alumni. They modified a standard incivility questionnaire (the Workplace Incivility Scale) by adding the word “online” to the end of each statement. In addition to the effects of incivility on absenteeism, burnout, and intention to quit, the researchers also found these relationships to be moderated by neuroticism, i.e. the relationship between experienced incivility by a supervisor and burnout/intention to quit was stronger for people higher in neuroticism.
As a survey study, we can’t use this to conclude that supervisors acting rudely to their employees online causes these negative outcomes (it is equally plausible that there is a third variable predicting both), but it does explicitly link negative employee outcomes with the experience of a boss acting rudely online.Footnotes:
A few weeks ago, a new cheating scandal erupted at the University of Florida as 97 students in a 250-person course (39%) in Computer Science & Engineering were caught cheating on an online exam. Students were given the option to come clean before the presentation of evidence, in which case they faced reduced penalties. One student complained because she cheated in a way that students in the past had cheated, but she was caught while they were not.
The method of cheating detection, according to the course instructor, was perfect. Hidden markers were included from old exams so that any copy/pasting would result in those hidden markers being included in the student’s exam. These markers could not appear any way other than by copy/pasting. Thus, there may have been more cheaters than were detected, but there were definitely at least this many.
What caught my eye in this story was a comment by a student found by the reporter:
Julie Rothe, an 18-year-old finance and information systems freshman, said she plans to accept responsibility. But she will challenge the penalty, she said, because students cheated in years past.
“I’m really angry at the fact that students got away with this in earlier semesters,” she said. “We are taking the hit, and I believe that is unfair.”
According to a commenter, the online format was only adopted last semester, so this student’s premise is probably false anyway. But it still offers an intriguing anecdote in which to explore how people perceive “fairness” in decision-making.
In I/O Psychology, we talk about fairness in terms of organizational justice theory. This theory poses three types of justice:
- Distributive justice: rewards/punishments are distributed fairly
- Procedural justice: the process by which rewards/punishments are distributed is fair
- Interactional justice: adequate information about reward/punishment distribution is provided respectfully
As instructors, we think primarily about distributive justice. In this case, the students were caught cheating and they should be punished accordingly – end of story. To an academic, cheating is a break of the most sacred trust between students and faculty: that student only represent their own work as their own. Violation of this trust should result in substantial penalties, up to and including expulsion, because the transgression is so extreme.
But to the student interviewed, this is not the primary concern. Instead, she is concerned with how the decision was made in the past. Although her facts may not be correct, we can summarize her thinking with, “In the past, students weren’t penalized for doing what I did. Therefore, this is unfair.” This is a judgment about procedural justice. Although she accepts responsibility for being caught, she believes that because cheater detection in the past was not like this, she should not be penalized severely.
Which is more valid? Ultimately, the punishment itself is what should be judged as fair or unfair, so the student’s position is not tenable. But we can still appreciate the “logic” of her position. It highlights that in organizational settings, it is perceptions of fairness that ultimately affect employee behavior and attitudes more so than it is the actual fairness of decisions.
New research by Crede, Harms, Niehorster and Gaye-Valentine in the Journal of Personality and Social Psychology investigates the impact of using abbreviated personality measures. Short answer: don’t do it.
In their study, the researchers surveyed 437 employed people (collected via StudyResponse) and 395 undergraduates. Personality was assessed with common 1-item, 2-item, 4-item, 8-item, 6-item, and 8-item measures, along with a variety of outcomes, including job performance, GPA, stress, and health behaviors.
Almost universally, longer measures resulted in higher correlations with outcomes. For example, Conscientiousness predicted Job Task Performance .6 with the 8-item measure (Saucier, in this case), with correlations as low as .2 with 1-item measures.
This is most critical for personality researchers who often rely on incremental variance found over the Big Five to provide evidence of a new, unique personality trait. For example, they use a single-item measure and their new measure in a survey study, find new variance in job performance or life satisfaction using their new measure over the Big Five, and declare that they have found a new, useful personality trait. This study suggests that evidence for many such traits may be fallacious; instead, the incremental variance found is better explained by the lack of a reasonable multi-item personality trait for comparison.
The authors also find that the use of abbreviated personality measures increases both Type I and Type II statistical conclusion errors. Single-item measures are especially risky. Although the shortened length of the scales is attractive from a practical perspective, this is completely empirically unjustified – the loss of validity is simply not worth it.
All of this together suggests a very simple decision rule: don’t use abbreviated personality measures.Footnotes:
In a recent study appearing in the International Journal of Training and Development, researchers Saks and Burke discovered that the frequencies of behavioral and results-based training evaluation were related to actual transfer of training material. Or in other words, organizations that evaluated behavior changes and monetary benefits resulting from training tended to have better results from that training. In their words:
Overall, the results of this study suggest a training evaluation-transfer of training paradox: organizations are most likely to evaluate trainee reactions and learning, and yet only behavior and results criteria are significantly related to higher rates of training transfer.
The authors investigated this by surveying 150 members of a T&D association in Canada. They asked respondents to identify the extent to which employees actually made good use of organizational training initiatives and respond about the degree to which their organization evaluated their training programs. Kirkpatrick’s 4-level training evaluation model (reactions, learning, behavior, and results) was used as a framework for these questions. On a scale of 1 (never) to 5 (always), consistent with prior studies, the researchers found that evaluation of trainee reactions was by far the most common (M=4.08), followed by learning (M=2.85), behavior (M=2.59) and results (M=2.22).
Frequency of training evaluation predicted immediate transfer and far transfer (6 months and 1 year later). Correlations generally decreased with later evaluations, as would be expected. When looking at specific types of evaluative criteria, only behavior and results were statistically significantly related to transfer variables, with moderate effects (r = .28 to .43). Relationships with reactions and learning were much lower (r = .02 to .18).
This indicates that organizations that evaluate behavior and results tend to have better transfer, at least as rated by those responsible for implementing those training programs. As a survey study, these results do not speak to causation. While the act of evaluating may improve transfer, organizations that evaluate may simply pay more attention to their training programs in general than organizations that do not evaluate. The researcher controlled for organizational age and the size of the firm, but this would not address this issue. More in depth consideration of the source of variance in transfer is needed. This study also relied solely on T&D professionals’ impressions of their organizations, which may not reflect actual transfer. Research with real organizational transfer data is needed to address this limitation.Footnotes:
A recent article by Landers, Sackett and Tuzsinki investigated the degree to which 32,311 managerial applicants at a nationwide retailer completed a personality test for promotion to or selection into the position. Up to 6% of the sample (nearly 2000 applicants) distorted their responses on the personality test by responding with only the extreme ends of the scale, a phenomenon the authors labeled “blatant extreme responding” (BER). The percentage of applicants responding with BER dropped about 2% after an interactive warning was implemented.
The logic of the applicants’ response strategy is appreciated. On a multiple-choice personality survey, one approach to maximize your score might be to respond only “strongly agree” or “strongly disagree” – in effect, the extreme responses. By identifying which answer was “best,” an applicant could get the maximum possible score on such a test. It’s important to note though that not all tests are scored like this; there are many where BER would result in a very low score.
In this study, applicants were both internal (lower-level managers applying for promotions) and external (applicants from outside the organization). Among the internal applicants, a rumor spread among managers that BER would help them beat the test. This resulted in an increase in BER over the first 12 months that the test was available.
At the 12-month mark, the organization implemented an interactive warning – a process only possible in online testing. If BER was detected on the first page of the personality test, a warning popped up. This was successful in reducing the rate of BER at test completion (see the difference between the dashed and solid lines in the figure above).
So, the takehome? The authors describe a real-world setting in which applicants lied on a personality test. This lying was detectable, and warnings reduced the effect. If you’re implementing online personality testing, maybe you should use warnings too!Footnotes:
A recent study by Fang, Wen and Pavur investigated the extent to which the reputation of survey sponsors (e.g. corporations) and technology providers (e.g. SurveyMonkey) impact response rates. They discovered an interaction between the two and concluded, “A sponsoring corporation with a weak reputation who contracts with a survey provider having a strong reputation results in increased participation willingness from potential respondents if the identity of the sponsoring corporation is disguised in a survey.”
They discovered this with a series of 3 fairly small experiments (N=100, 100, 200) with Chinese participants. In each experiment, “strong reputation” providers were picked from well-known Chinese firms while the “weak reputation” providers were fictitious. So “strong” and “weak” might be better described as “having a strong reputation” versus “not having any reputation”.
In each experiment, participants were exposed to both strong and weak providers, so all results here are within-subject. The surveys were counterbalanced so as to neutralize order effects. Participants looked at several surveys and then rated their willingness to take each. So what did they do in each?
- Experiment 1 examined the effect of sponsoring program corporation reputation and found a small positive effect of corporate reputation (d = .17).
- Experiment 2 examined the effect of technology provider reputation and found a moderate positive effect of provider reputation (d = .30).
- Experiment 3 examined both simultaneously and found both main effects and an interaction between the two. When the sponsoring corporation reputation was weak, technology provider did not matter. When the sponsoring corporation was strong, a strong technology provider reputation was even more beneficial.
So as far as I can tell, the broad conclusion stated above was not actually tested anywhere. Instead, what we can safely conclude is that technology provider reputation is most important, while sponsor reputation also plays a role. A good reputation of both is further beneficial in comparison. As the study operationalized low reputation as “no reputation,” the effect may be even larger in comparison to institutions with poor reputations. And finally, it is unclear the extent to which a population of Chinese respondents is similar to those of any other nationality. In the United States, for example, corporate sponsorship might be met with more skepticism.Footnotes:
In recent article by Blackhurst, Congemi, Meyer and Schau in The Industrial-Organizational Psychologist, e-mail addresses from a group of 14,718 people who had applied for entry-level jobs in manufacturing were examined for their appropriateness. The researchers found that roughly 25% of e-mail addresses were inappropriate or antisocial, and that the level of inappropriateness predicted several qualities of interest to hiring managers: conscientiousness, professionalism, and work-related experience. Interestingly, cognitive ability was not related.
The types of e-mail addresses found appear in the table below, which were extracted by a team of 25 graduate students with high inter-rater reliability (a sub-sample of 1000 was used for this purpose).
The graduate students next categorized all 15,000ish e-mail addresses (600 addresses assigned to each of the 25 students). At the same time, the graduate student coded the e-mails as either “appropriate when applying to a job,” “questionable,” or “inappropriate when applying to a job.” Afterward, one of the researchers reviewed all 15,000, brought any questionable judgments to the attention of a 3-person panel for discussion. The researchers then compared mean scores on cognitive ability, conscientiousness, professionalism, and work-related experience across those with appropriate, questionable, and inappropriate e-mail addresses. Statistically significant differences were found on all dimensions.
Unfortunately, statistical significance is easy to attain in this sample. Even tiny effects will be statistically significant. The article did not report any standard deviations to give us a sense of effect sizes, so I had to do a little detective work. Here’s a table comparing outcomes for specific subtypes of inappropriate e-mail addresses:
This is the only table containing means, and fortunately, there are also degrees of freedom – that means we can reverse-engineer the t-test formula to estimate the standard deviation of this scale. It’s not perfect, but it’s the best available. Because these are independent-samples t-tests, t equals the mean difference (in this case, code group minus control group) divided by the pooled standard error (roughly s/SQRT(N)), and we can get the value of N by adding 2 to df (some of the numbers are a little odd in here – for example, DF should be equal to N*2-2, but it’s not – so this is my best guess). If we assume the SD of each group is equal, we can use the following formula to solve for s: s = sqrt(N)*(mean difference)/T. That produces SDs for each group here between 54.6 and 58.9, so if we assume these SDs hold up for the other scales, the differences on the predictors between appropriate and inappropriate e-mail addresses range from d=0.00 to d=0.11. So these are not by any stretch of the imagination big effects. But they are effects, about in line with what we’d expect from the intercorrelations between psychological predictors and performance generally.
The study is not without limitations; all of the measures provided were by a consulting firm, so we do not have any way to independently verify their content. The e-mail address ratings were also made by graduate students, and it is not clear how well their judgments would generalize to actual hiring managers. Actual hiring decisions made later were not available, nor was job performance data, so validation evidence is missing. All we really have is a new correlate of predictors.
Interestingly, the study also identified that roughly 5% of e-mail addresses contained information that looked like a date; considering legislation forbidding discrimination on the basis of age, the legality of hiring managers having access to this information is unclear. Although e-mail address appropriateness predicts characteristics of interest (and thus should potentially be included in a packet of information used for hiring), it may contain information itself inappropriate for a manager to see (and thus should not be included). Further research is needed to explore this further.Footnotes:
- Blackhurst, E., Congemi, P., Meyer, J. & Sachau, D. (2011). Should you hire BlazinWeedClown@Mail.Com? The Industrial/Organizational Psychologist, 49(2), 27-38. [↩]
A recent study by Information Solutions Group, sponsored by PopCap Games, led Gamasutra to claim:
A new study from PopCap Games finds that those who cheat while playing social games are nearly 3.5 times more likely to be dishonest in the real world than non-cheaters, with offenses ranging from cheating on taxes to illegally parking in handicapped spaces.
Although it doesn’t say so explicitly, that’s pretty obviously phased to lead you to believe that social cheating leads to real-life cheating. Since this was most likely a survey study, it seemed quite unlikely that they could make casual conclusions like that. So that led me to investigate the original study, which you can find for yourself here.
From that, it is clear that this was in fact a survey: a web-based presentation of 38 question to a sample that ultimately consisted of 801 US respondents and 400 UK respondents (total N=1201). The study specifically excluded anyone who played less than 15 minutes of social games per week. There’s no discussion of how many respondents completed the survey but were excluded, so it’s not very clear how well this survey would generalize to gamers in general (or any larger group).
The survey report starts by emphasizing the growing importance of social games by referencing another study that estimates 118.5 million social gamers in 2011 the US and UK, about a 17% increase from January 2010. There are a lot of social gamers; no surprise there. In PopCap’s study, they further identified that about 81 million play at least once per day, with 49 million playing more than once per day.
The report continued by exploring the profiles of current social gamers: mostly women (55%) with a mean age of 39 years old (down a little bit from last time). They play these games because they thinks it’s fun, competitive, stress relieving, and a mental workout. I’m curious exactly which social games are a mental workout (FarmVille?), but it was left unreported.
8% of respondents reported using hacks, bots, or cheats in a social game, with 11% saying they had considered it but had not actually tried it. That actually seems a bit high to me, and I wonder how their sample was located; they do not say. If their sample is loaded more toward (and bear with me here) “hardcore social gamers,” the rest of their results are a little less trustworthy. Without details on sampling, there’s no way to know.
Imagine my surprise when I reached the end of the report with absolutely no mention of the finding Gamasutra reports above. You are welcome to search for yourself (and if you find it, please let me know!), but after scanning through page by page, I searched the text for “cheat”, and for the specific percentages reported by Gamasutra. Nothing. So we are left to simply trust Gamasutra’s reporting with no verifiable source. That’s not that uncommon, but it is a bit suspicious when they point to a PDF report to provide support for their statement.
Without that support, there’s not much available to analyze, but we can at least say that the reporting above is a bit misleading. Here are several possible explanations for the reported cheating correlation, assuming it is accurate in the first place:
- People that cheat in social games are rewarded for doing so, and that leads them to cheat in real life.
- People that cheat in real life are rewarded for doing so, and that leads them to cheat in social games.
- People that self-report cheating in one category are more likely to self-report cheating in another category.
- There is an underlying psychological characteristic (e.g. integrity) that leads to cheating behaviors across situations.
As you can probably guess, the last two are more probable than the first two. Although it’s tempting to attribute causality here (much like in the debate on violence in video games causing violence), there is no evidence to suggest this – correlation is not causation. It is more likely that cheaters are cheaters, regardless of context. We’ve just found a new way to identify them.
In a recent study in the Journal of Experimental Psychology: Applied, Roediger, Agarwal, McDaniel and McDermott provide additional evidence for test-enhanced learning as a way to improve memory. It echoes an earlier study of Roediger’s in which he found in a controlled laboratory experiment that students randomly assigned to take a test had greater long-term retention than students randomly assigned to study the material. In this new research, Roediger and colleagues replicate this finding in 3 quasi-experimental field studies.
- Experiment 1: Students were quizzed on the material. They then completed later items on an exam with items parallel to the quiz items. Both chapter exam and semester exam scores of those completing quizzes were higher. Students from multiple course sections participated, and different sections received different pre-test questions; the effect held only for those questions presented int the pre-test.
- Experiment 2: Students were quizzes on the material. They then completed later items on an exam with both parallel and identical items to the quiz items. Again, exam scores were higher. The design in this experiment was similar, except for the addition of a control condition. Recall on the control was similar to that of the non-pre-tested items, lending additional support to the effect.
- Experiment 3: Students were given a multiple choice quiz in class and encouraged to continue quizzing themselves at home using a web-based tool. Students using the quizzing tool had higher exam scores on items from the quiz.
The third of these is the iffy-ist – the increased test scores could reflect higher-quality students rather than higher-quality studying, and it’s not clear to what extent the first testing effort or the home testing elicited the effect. But the general approach did seem to work for at least some of them.
Test-enhanced learning is potentially valuable in several ways. First, it is a potential application of gamification, which I covered last week. By motivating students to complete optional quizzes using badges and other motivational game-derived elements, students may learn more (and enjoy it!). Second, it is a potential pedagogical tool in both education and employee training. For example, a mid-training 5-minute practice test may increase retention more than simply asking people to review their notes for 5 minutes.
But here’s the big question for me as an educator – does this mean that adding regular quizzes to a course will increase scores on the final exam, even if the quiz questions don’t appear on the final? And perhaps more importantly, does that mean that students actually learned more, or is it because you focused their attention on the topics you knew you’d be testing on? Future research is clearly warranted.Footnotes:
You might have noticed that I missed last week. Well, that’s because it’s the holiday season, which for academics means intense sessions of writing to make up for all the not-writing during the Fall semester! I’ll be returning to my regularly weekly coverage of technology, education, and psychological scholarly articles in January.
In the meantime, I wanted to assure everyone that I was in fact coming back, as well as wish everyone a happy, safe, and productive holiday season!
One of the questions faced by survey designers is presentation order. Does it matter if I put the demographics first? Should I put the cognitive items up front because they require more attention? If I put 500 personality items in a row, will anyone actually complete this thing? Some recent research in the Journal of Business and Psychology reveals that placing demographic items at the beginning of a survey increases the response rate to those items in comparison to demographic items placed at the end. And more importantly, it did not affect scores on the three noncognitive measures that came afterward, in this case: leadership, conflict resolution, and culture and goals measures.
To investigate this, Teclaw, Price and Osatuke conducted a large survey (roughly N = 75000) on behalf of the Veterans Health Administration. Respondents were randomly assigned to one of three surveys. Of those randomly assigned to the third survey, participants received one of seven scales, three of which were those listed above, resulting in a sample size for this study of N = 4508. Respondents completing each of these three surveys were in turn randomly assigned to either complete demographic items at the beginning of end of their survey. The authors compared response rates, considering both skipped items and “Don’t Know” responses to be a lack of response, and included all respondents that opened the survey regardless of how many questions were actually completed.
Response rates were indeed different. On the first of the three focal surveys, the response rate to demographics placed at the beginning of the survey was around 97%, while the response rate to demographics placed at the end was around 87%. While this isn’t a huge difference, if demographics are involved in your primary research questions (and they often are), then this may be a good idea.
What’s especially interesting about this is that conventional wisdom is to place demographic items at the end. The argument that I have most often heard is that priming your survey respondents with their demographic characteristics (e.g. race) will lead them to respond differently than they otherwise would have. This is especially salient in the context of race-based stereotype threat, the tendency for minority group performance on cognitive measures to decrease as a result of anxiety associated with confirming negative stereotypes about intelligence. So what should we do?
There are two important facts about this study that limit its applicability. First, all measures investigated were noncognitive, i.e. survey items. Stereotype threat typically applies in contexts where there is a “right answer,” for example, knowledge tests or intelligence tests. So the placement on demographic items on such measures may still be important. Second, the study did not control for cognitive fatigue – survey length was confounded with experimental condition. Is it because the survey items were at the beginning vs. the end, or was it simply because respondents had already responded to many, many items and were bored/tired/at a loss for time/etc? Would the effect still hold with a 20-item survey? A 50-item survey? We don’t really know.
If you’re giving a noncognitve voluntary survey, you are probably interested in demographics specifically and want to ensure they are responded to more so than any other items. For now, it appears to be safe to put demographic items up front if that is your goal. Whether your survey is 20 items or 200 items, it is a low cost to move the demographic items on your survey. But if your survey has cognitively-loaded items, I’d still recommend against it.Footnotes: