One of the most difficult aspects of studying cheating on selection/promotion/training tests and otherwise dishonest behavior in either work or school is figuring out what percentage of employees/students actually engage in such behavior.
Statistical power is often low in research on organizations because sample sizes are restricted to whatever number of employees the researcher has access to. Now imagine if the only a small percentage of those already small samples were the people you were really interested in. Would it be worth it to conduct research on cheating with a sample of 400? How many people engaging in cheating activities would you really get? This is where base rates come in.
In a recent article from Network World, Professor Ed Lazowska, the Computer Science and Engineering department chair at the University of Washington says that roughly 1% – 2% of submitted computer science assignments involve cheating. Computer science is a particularly interesting case because of the nature of the training material; computer science students submit programs and algorithms via computer that can be easily examined for cheating through automated methods. This makes the tracking of cheating, even with such a low base rate, much easier – in this case, by automatically examining the assignment of roughly 2750 students per year.
The 1% – 2% figure is portrayed in the article as a high estimate, but is actually lower than the estimate that two colleagues and I found just a few years ago (Retesting after initial failure, coaching rumors, and warnings against faking in online personality measures for selection). There, we found evidence that roughly 3% of people completing online personality tests for selection purposes engaged in a particular purposeful attempt to increase their scores (and thus their probability of getting hired). But it’s impossible at this point to determine why those two estimates are different, to what extent these base rates change by environment, how test-taker motivation applies, and so on.
So how many cases do we need to investigate what psychological attributes might be relevant to cheating? Well, the depressing thing is that this research gives us a fairly small range of low base rates for cheating behavior. If you want 100 cheaters to study, it looks like you need an overall sample of between 3000 and 10000. Good luck!
SIOP 2010 Coverage
General: Schedule Planning | Lament Over Wireless Coverage
Live-Blogs: Day1 | Day 2 | Day 3
Daily Summaries: Day 1 | Day 2 | Day 3
Day 1 was all over the place, Day 2 had an excellent training track, and Day 3 was about research methods and statistics.
At 8:30AM came Statistical and Methodological Myths and Urban Legends: Part V. It was an interesting set of presentations from several well-known I/O psychologists, including Spector, Brannick, Landis, Cortina, and Aguinis. One talk was on the overuse of control variables without a rational approach to doing so. This is a common approach in survey research, because researchers think that because they are using correlational data, they must throw some control variables in. This is oversimplified; people tend to assume that statistical corrections will magically “fix” the causality problems with a correlational research design, and this simply is not true. Statistical corrections cannot compensate for a lack of modeling.
We also had a talk about the merits of statistical significance testing. If you’re unaware of the debate, it centers around a very simple problem: statistical significance testing has a very specific purpose and gives you very specific information about your data, but most researchers over-interpret it. For example, a person finding a statistically significant difference using a t-test comparing Condition A and Condition B might want to conclude “Condition A produces different results from Condition B.” That is not correct. The correct conclusion is “if we were to assume Condition A and Condition B weren’t different in the population, it is improbable that the difference we did see would be as big as it is.” But that doesn’t exactly roll off the tongue, so people tend to oversimplify.
There are two major camps on this topic: 1) we don’t have anything better, so use it anyway (which was the viewpoint of the speaker in this session), and 2) conclusions made from such tests are evil and should never be used under any circumstances. Much like the conservatism/liberalism debate, the moderates (it has flaws, but does produce interesting findings when interpreted correctly) are not really the vocal ones.
There was also a talk on meta-analysis, which was pretty good. Two major conclusions came of of that. First, meta-analysis is not infallible, but neither are randomized controlled trials. Each has its strengths and weaknesses, and interpreting them with those considerations in mind is central to correctly understanding research findings in any field. Second, meta-analysis is often being interpreted too liberally; it does not provide “the answer” in any field. It is simply another piece to add to the puzzle.
After this, I attended Archiving Data: Pitfalls and Possibilities, which was a very interesting panel discussion on what should happen to data after publication in journals. Should we require open access to all data used to produce journal articles? This is already a common model in the hard sciences – Science, for example, requires the authors of articles they publish to produce the data on demand, and many times even include it in an online article supplement. Does this make sense in psychology? We didn’t come to any firm conclusions, but a taskforce or I/O subcommittee may be in our future.
I next attended Verification of Unproctored Online Testing: Considerations and Research which was a predominantly practitioner-oriented look at how to deal with cheating in unproctored Internet-based testing (UIT). There aren’t many good ways to deal with this problem, and they primarily discussed all the ways people have tried so far through in-person verification testing. But many questions remain. Do you use the in-person scores or the online scores? What do you do with people you think cheated (since you can never be 100% sure)? How do you include cheating-prevention measures without implying to your applicants that you don’t trust them?
After a break, I attended the closing plenary, with a talk given by Dave Ulrich on the future of our profession. Engaging, but not much too surprising. One thing that stood out was a suggestion that we move from a model of knowledge warehouses where information is stored quietly by academics in journals where we can retrieve it at will to knowledge networks where knowledge is actively shared and moved to where it is needed. This really resonated with me; one of the reasons I push for social media in I/O psychology is because I don’t think we can be truly relevant or even useful to the world of work without getting our hands dirty on the front lines.
The session ended with the handing of the gavel to our new SIOP president, ODU alum Ed Salas.
That’s it for day-to-day reactions. After letting the conference percolate for a few days, I’ll post summary thoughts.
SIOP 2010 Coverage
General: Schedule Planning | Lament Over Wireless Coverage
Live-Blogs: Day1 | Day 2 | Day 3
Daily Summaries: Day 1 | Day 2 | Day 3
7:01:21 AM: NeoAcademic: SIOP 2010: End of Day 2 (http://cli.gs/eduXL) #SIOP #SIOP10
7:55:15 AM: NeoAcademic: SIOP 2010: Day 3 Live Blog (http://cli.gs/WnzD7) #SIOP #SIOP10
8:35:05 AM: At Statistical and Methodological Myths and Urban Legends 5 with some all-stars: Spector, Brannick, Landis, Cortina, Aguinis… #SIOP
8:37:34 AM: Controlling for variables in regression often introduces contamination, not corrects for it #SIOP
8:40:39 AM: Statistical corrections cannot be used to compensate for a lack of modeling #SIOP
8:45:18 AM: Ah, one of mantras when teaching: test model fit with and without controls (model controls on step 2) and compare models #SIOP
8:50:11 AM: Cortina taking a diplomatic approach to disagreeing with null hyp sig testing critics… Hmmm #SIOP
8:58:00 AM: Cortina seems to be claiming NHST is fine bc we compare to 0 in other tests like EFA (eigen > 1). But we know eigen > 1 is bad too. #SIOP
9:02:57 AM: Cortina: social science hypotheses are crude, so our statistical processes should be too… What?! #SIOP
9:13:23 AM: Statistical testing is like being an archaeologist of data… I like it! #SIOP
9:21:21 AM: Myths around meta-analysis – some excellent points so far, mostly around too much faith that meta solves all problems #SIOP
9:23:40 AM: Small failsafe N = publication bias, but large failsafe N could indicate anything #SIOP
9:28:22 AM: Randomized controlled trials are not infallible, and are not necessarily “better” than metas (but the reverse is true too) #SIOP
10:32:13 AM: At Archiving Data: Pitfalls and Possibilities
10:39:19 AM: Thanks, but Truth might be a little generous! RT @greghall12 I love your #SIOP “truths,” I wish I could’ve made it out there.
10:41:11 AM: I almost lol’d midsession! RT @BreannePH Men of SIOP- if your shirt is unbuttoned so far that I can see chest hair, you’re doin’ it wrong.
10:43:27 AM: Would open access to data help or hurt the advancement of IO? It’s already common in the hard sciences… #SIOP
11:04:39 AM: Human subject rights as a barrier to making data available for archiving by journals #SIOP
11:45:00 AM: Are we headed down an all-meta path, devoid of primary studies? #SIOP
12:17:19 PM: Archiving was either #1 or #2 most interesting thing I saw today. Now to posters! #SIOP
2:12:16 PM: At Verification of Unproctored Online Testing: Considerations and Research #SIOP
2:15:23 PM: UIT should be verified with followup in-person testing #SIOP
2:18:54 PM: Interesting approach – Using verification test only for verification – use online test results as actual test scores #SIOP
2:32:28 PM: Computerized adaptive testing in both proctored and unproctored settings #SIOP
2:44:09 PM: Speaker needed to develop a way with CAT to discourage the impression that verfication testing was used to detect cheating #SIOP
2:47:59 PM: Just means they need a wifi sponsor! RT @WorkPsy @SQNguyen @iopsychology They thought about wifi, but couldn’t afford it.
2:49:37 PM: Cognitive ability CAT UIT for selection saved company $100000 over 18 months #SIOP
2:51:22 PM: Because items are customized to the applicant, CAT is seen by speakers’ lawyers as a reduced litigation risk #SIOP
2:52:27 PM: Done with sessions! Short break until closing plenary. #SIOP
4:41:10 PM: At the closing plenary with Dave Ulrich #SIOP
4:49:55 PM: 25+ years of experience give us important reliable IO findings, students give us “Who’s going to hire me?” #SIOP
4:55:24 PM: Small group discussion with 900 attendees? Oh my. #SIOP
4:59:56 PM: Theory on the downturn in #SIOP? My impression was the opposite, but perhaps that highlights the practitioner-academic gap…
5:03:36 PM: Paralleling Republican vs. Democrat with internal struggles for IO? Not sure it’s quite THAT bad. #SIOP
5:25:59 PM: We should move from knowledge warehouses to knowledge networks – that, I’ll agree with #SIOP
5:36:25 PM: Ed Salas takes over and so #SIOP10 ends. Excellent #SIOP everyone! So many great people. Special thanks to all new followers!