In an upcoming issue of Social Science Computer Review, Villar, Callegaro, and Yang conducted a meta-analysis on impact of the use of progress bars on survey completion. In doing so, they identified 32 randomized experiments from 10 sources where a control group (no progress bar) was compared to an experimental group (progress bar). Among the experiments, they identified three types of progress bars:
- Constant. The progress bar increased in a linear, predictable fashion (e.g. with each page of the survey, or based upon how many questions were remaining).
- Fast-to-slow. The progress bar increased a lot at the beginning of the survey, but slowly at the end, in comparison to the “constant” rate.
- Slow-to-fast. The progress bar increased slowly at the beginning of the survey, but quickly at the end, in comparison to the “constant” rate.
From their meta-analysis, the authors concluded that constant progress bars did not substantially affect completion rates, fast-to-slow reduced drop offs, and slow-to-fast increased drop offs.
However, a few aspects of this study don’t make sense to me, leading me to question this interpretation. For each study, the researchers report calculating the ratio of people leaving the survey early to people starting the survey as the “drop-off rate” (they note this as a formula, so it is very clear which should be divided by which). Yet the mean drop-off rates reported in the tables are always above 14. To be consistent with their own statements, this would mean that 14 people dropped for every 1 person that took the survey – which obviously doesn’t make any sense. My next thought was that perhaps the authors converted their percentages to whole numbers – e.g., 14 is really 14% – or flipped the ratio – e.g. 14 is really 1:14 instead of 14:1.
However, the minimum and maximum associated with the drop off rates in the Constant case are 0.60 and 78.26, respectively, which rules out the “typo” explanation. This min and max imply that for some studies, more people dropped than began the survey, but the opposite for other studies. So something is miscoded somewhere, and it appears to have been done inconsistently. It is unclear if this miscoding was done just in the tabular reporting or if it carried through to analyses.
A forest plot is provided to give a visual indication of the pattern of results; however, the type of miscoding described above would reverse some of the observed effects, so this does not seem reliable. Traditional moderator analysis (as we would normally see produced from meta-analysis) was not presented in a tabular format for some reason. Instead, various subgroup analyses were embedded in the text – expected versus actual duration, incentives, and survey duration. However, with the coding problem described earlier, these are impossible to interpret as well.
Overall, I was saddened by this meta-analysis, because it seems like the researchers have an interesting dataset to work with yet coding errors cause me to question all of their conclusions. Hopefully an addendum/correction will be released addressing these issues.Footnotes:
In an upcoming article in Business Communication Quarterly, Washington, Okoro and Cardon investigated how appropriate people found various mobile-phone-related behaviors during formal business meetings. Highlights from the respondents included:
- 51% of 20-somethings believe it appropriate to read texts during formal business meetings, whereas only 16% of workers 40+ believe the same thing
- 43% of 20-somethings believe it appropriate to write texts during formal business meetings, whereas only 6% of workers 40+ believe the same thing
- 34% of 20-somethings believe it appropriate to answer phone calls during formal business meetings, whereas only 6% of workers 40+ believe the same thing
- People with higher incomes are more judgmental about mobile phone use than people with lower incomes
- At least 54% of all respondents believe it is inappropriate to use mobile phones at all during formal meetings
- 86% believe it is inappropriate to answer phone calls during formal meetings
- 84% believe it is inappropriate to write texts or emails during formal meetings
- 75% believe it is inappropriate to read texts or emails during formal meetings
- At least 22% believe it is inappropriate to use mobile phones during any meetings
- 66% believe it is inappropriate to write texts or emails during any meetings
To collect these tidbits, they conducted two studies. In the first, they conducted an exploratory study asking 204 employees at an eastern US beverage distributor about what types of inappropriate cell phone usage they observed. From this, they identified 8 mobile phone actions deemed potentially objectionable: making or answering calls, writing and sending texts or emails, checking texts or emails, browsing the Internet, checking the time, checking received calls, bringing a phone, and interrupting a meeting to leave it and answer a call.
In the second study, the researchers administered a survey developed around those 8 mobile phone actions on a 4-point scale ranging from usually appropriate to never appropriate. It was stated that this was given to a “random sample…of full-time working professionals” but the precise source is not revealed. Rated appropriateness of behaviors varied by dimension, from 54.6% at the low end for leaving a meeting to answer a call, up to 87% for answering a call in the middle of the meeting. Which leaves me wondering about the 13% who apparently take phone calls in the middle of meetings!
Writing and reading texts and emails was deemed inappropriate by 84% and 74% of respondents respectively; however, there were striking differences on this dimension by age, as depicted below:
Although only 16% of people over age 40 viewed checking texts during formal meetings as acceptable, more than half (51%) of people over 20 saw it as acceptable. It is unclear, at this point, if this pattern is the result of the early exposure to texting by the younger workers or the increased experience with interpersonal interaction at work of the older population. Regardless, it will probably be a point of contention between younger and older workers for quite some time.
So if you’re a younger worker, consider leaving your phone alone in meetings to avoid annoying your coworkers. And if you’re an older worker annoyed at what you believe to be rude behavior, just remember, it’s not you – it’s them!Footnotes:
Recently, Old Dominion University embarked on an initiative to improve the teaching of disciplinary writing across courses university-wide. This is part of ODU’s Quality Enhancement Plan, an effort to improve undergraduate instruction in general. It’s an extensive program, involving extra instructional training and internal grant competitions, among other initiatives.
Writing quality is one of the best indicators of deep understanding of subject matter, and the feedback on that writing is among the most valuable content an instructor can provide. Unfortunately, large class sizes have resulted in a shift of responsibility for grading writing from the faculty teaching courses to the graduate teaching assistants supporting them in those courses. Or more plainly, when you have 150 students in a single class, there’s simply no way to reasonably provide detailed writing feedback by yourself several times in a semester on top of the other duties required of instructors without working substantial overtime.
With that in the background, I was pleasantly surprised to discover a new paper by Doe, Gingerich and Richards appearing in the most recent issue of Teaching of Psychology on the training of graduate student teaching assistants in evaluating writing. In their study, they compared 12 GTA graders with a 5-week course on “theory and best practices” for grading with 8 professional graders over time at two time points, about 3 months apart. For this study, the professional graders were considered to be a “gold standard” for grading quality.
Overall, the researchers found that GTAs were more lenient than professional graders at both time points. Both groups provided higher grades at Time 2 than at Time 1. This is most likely due to student learning over the course of the semester, it might be due to variance in assignment difficulty – the researchers did not describe any attempts to account for this. Professional graders assessed papers blind to time point, ruling out a purely temporal effect.
The researchers also found a significant interaction between GTA grader identity and time, indicating different changes over time between graders, so GTAs were not homogenous, suggesting important individual difference moderators (perhaps GTAs become harsher or more lenient at different rates?). Student comment quality was also assessed by the researchers, finding increases in comment quality over time among student raters, but with differences across dimensions of comment quality (e.g. discussion of strengths and weaknesses, rhetorical concerns, positive feedback, etc). The relative magnitudes of increases were not examined, although these could be computed from the provided table.
One major issue with this study is that the reliabilities of the comment quality outcomes are quite low. Correlations between raters ranged from .69 to .92, but both raters only assessed 10% of the papers. When calculating correlations as an estimate of inter-rater reliability, the use of these correlations as estimates of inter-rater reliabiltiy assumes that everyone rates every paper and the mean is used. Since most papers were assessed by only one rater, these reliabilities are overestimates and should have been corrected down using the Spearman-Brown prophecy formula. Having said that, the effect of low reliability is that observed relationships are attenuated – that is, they are smaller in the dataset than they should be. So had this been done correctly, the increased accurate would have made the researchers’ results stronger. The effect of time (and of grader) may be much larger than indicated here.
Overall, the researcher conclude that the assumption of GTA quality when grading writing assignments is misplaced. Even with training and practice, GTA performance did not reach that of professional graders. On the bright side, training did help. The researcher conclude that continuous training is necessary to ensure high quality grading – a one-time training – or I suspect, more commonly, no training – is insufficient.
One thing that was not assessed by the researchers was a comparison of GTA comment quality and professional comment quality versus professor comment quality. But perhaps that would hit a bit too close to home!Footnotes: