Calculating LongString in Excel to Detect Careless Responders
Careless responding is one of the most fundamental challenges of survey research. We need our respondents to respond honestly and with effort, but when they don’t, we need to be able to detect and remove them from our datasets. A few years ago, Meade and Craig1 published an article in Psychological Methods exploring a significant number of techniques for doing exactly this, ultimately recommending a combination of three detection techniques for rigorous data cleaning, which, let’s face it, is a necessary step when analyzing any internet survey. These techniques are even-odd consistency, maximum longstring, and Mahalanobis D:
- The even-odd consistency index involves calculating the subscale means for each measure on your survey, split by even and odd items. For example, the mean of items 1, 3, 5, and 7 would become one subscale whereas the mean of items 2, 4, 6, and 8 would become the other. Next, you take all of the even subscales and pair them with all of the odd subscales across all of the measures in your survey, calculate a correlation, and then apply the Spearman-Brown prophecy formula to adjust the value up to a scale of -1 to 1.
- Maximum LongString is the largest value for LongString across all scales on your survey, where LongString is the number of identical response in a row. Meade and Craig recommended LongString would be most useful when the items were randomly ordered.
- Mahalnobis D is calculated from the regression of scale means onto all the scores that inform them. In a sense, you are trying to see if responses to individual items correspond with the scale mean they created consistently across individuals. Some conceptualizations of this index regress participant number onto scores, which conceptually accomplishes basically the same thing.
In all three cases, the next step is to create a histogram of the values and see if you see any outliers.
Calculating Careless Responding Indices
Of these three, Mahalanobis D is the most easily calculated, because saving Mahalanobis D values is a core feature in regression toolkits. It is done easily in SPSS, SAS, R, etc.
The second, the even-odd consistency index, is a bit harder but still fundamentally not too tricky; you just need to really understand how your statistical software works. Each step, individually, is simple: calculate scale means, calculate a correlation, apply a formula.
The third, Max LongString, is the most intuitively understandable but also, often unexpectedly, the most difficult to calculate. I imagine that the non-technically-inclined generally count by hand – “this person has a maximum of 5 identical answers in a row, the next person has 3…”
An SPSS macro already exists to do this, although it’s not terribly intuitive. You need to manually change pieces of the code in order to customize the function to your own data.
Given that, I decided to port the SPSS macro into Excel and make it a little easier to use.
An Excel Macro to Calculate LongString
Function LongString(cells As Range) Dim cell As Range Dim run As Integer Dim firstrow As Boolean Dim maxrun As Integer Dim lastvalue As String firstrow = True run = 1 maxrun = 1 For Each cell In cells If firstrow = True Then firstrow = False lastvalue = cell.Value Else If cell.Value = lastvalue Then run = run + 1 maxrun = Application.Max(run, maxrun) Else run = 1 End If lastvalue = cell.Value End If Next cell LongString = maxrun End Function
To Use This Code Yourself
- With Excel open, press Alt+F11 to open the VBA Editor.
- Copy/paste the code block above into the VBA Editor. In newer versions of Excel, you may not see a code window open by default. If that happens, right-click on VBAProject for the workbook you want to add the macro to, click Insert, and then Module. This should open a blank text window called “Module1” where you can paste the code block.
- Close the VBA Editor (return to Excel).
- In an empty cell, simply type =LONGSTRING() and put the cell range of your scale’s survey items inside. For example, if your first scale was between B2 and G2, you’d use =LONGSTRING(B2:G2)
- Repeat this for each scale you’ve used. For example, if you measured five personality dimensions, you’d have five longstrings calculated.
- Finally, in a new cell use the =MAX() function to determine the largest of that set. For example, if you put your five LongStrings in H2 to L2, you’d use =MAX(H2:L2)
That’s it! Importantly, the cells needs to be in Excel in the order they were administered. If you used randomly ordered items, this adds an additional layer of complexity, because you’ll need to recreate the original order for each participant first before you can apply LongString. That takes a bit of Excel magic, but if you need to do this, I recommend you read up on =INDIRECT() and =OFFSET(), which will help you get that original order back, assuming you saved that item order somewhere.
Final Steps
Once you have Max LongString calculated for each participant, create a histogram of those values to see if any strange outliers appear. If you see clear outliers (i.e., a cluster of very high Max LongString values off by itself, away from the main distribution), congratulations, because it’s obvious which cases you should drop. If you don’t see clear outliers, then it’s probably safer to ignore LongString for this analysis.
Another potential issue with Max LongString is that if you have scales with varying numbers of items, shorter scales can be lost when you calculate the Max. To avoid that problem, try converting each of your longstrings into a proportion. For example, for a 4-item scale, try =LONGSTRING(H4:H7)/4 and then calculate MAX() on those results.
- Meade AW, & Craig SB (2012). Identifying careless responses in survey data. Psychological Methods, 17 (3), 437-55 PMID: 22506584 [↩]
Previous Post: | The Difference Between Industrial and Organizational Psychology |
Next Post: | Learn Web Scraping/Data Science at SIOP, APA, and IPAC Workshops |
Just used the LongString Macro. Worked perfectly. Thanks for putting this together — it’s super helpful.
Agreed, fantastic resource for my research
Thanks. I try the codes following your steps. However, it did not run in my Excel. The excel told me that “#name?”. I do not know where is the error. Can you help me?
A #name error means that you have referred to a cell that does not exist. So you need to fix your cell references. Most likely, you specified your cell range incorrectly. You might get an error if you have blanks in your data – in that case, you should replace skipped items with a missing cell code (e.g., -99).
Do you also have the syntax to calculate the maximum long string in SPSS? That would be very helpful!
Unfortunately, I don’t, because it is much, much more complicated than Excel’s version. So back when I still used SPSS, I would still do longstring calculations in Excel and copy the resulting column back into SPSS.
Thank you for your reply.
Another question: I have a dataset where an EFA resulted in four scales which I divided in even and odd subscales and calculated the mean to obtain an even-odd correlation. However, I do not know how to calculate a within-subject correlation in SPSS.
Could you help me with that?
I think you mean within-subject correlations, i.e., one correlation per person. Unfortunately, I don’t know any way to do that in SPSS without a significant amount of syntax. It is much easier in R.
Thank you again for your reply.
Now I tried your Excel syntax for maximum Long String calculations. However, I also get the Name error even though I specified empty cells as -99 and referred to the cell range I want to. Any idea what I am doing wrong?
The other major way to get a #NAME error is if Excel can’t find your macro. This could be caused by several things, but most likely one of the following: 1) You aren’t using the same name in Excel that you gave it in the VBA Macro Editor, 2) You didn’t save the macro in the same workbook that you are currently editing, which is common if you have more than one workbook open, or 3) Macros are disabled in your workbook.
I also got the #NAME? Error, because Macros had been disabled in my workbook. I changed the document type to an Macro enabled format (.xlsm) but when I try to use the LongString function now, Excel tells me “Compile error: Ambiguous name detected: LongString” and opens the VBA Editor again.
Can you tell me what I have done wrong or what I can do now to fix this?
Ambiguous name usually means you have defined something in two places, e.g., if you have two files open that have both defined LongString. So my suspicion is that you still had the original and macro-enabled workbooks open.
For clarity, because I was running into some up the same issues mentioned in earlier comments. These instructions assume you know how to run a Macro (at least a little bit, I did not) haha. See this short YouTube video, as well as address the potential road blocks that Richard mentioned earlier.
https://www.youtube.com/watch?v=JIU2N2Z0eDQ
Thanks!
Matt
Thanks a ton for your help, Mr. Richard!! *_*
And Mr. Matt, also!! *_*
I’m trying to apply this macro but keep on getting the #NAME error. the function even shows up when I start typing it and I am sure I am not misspelling the cell selection (I also tried to select it manually but that didn’t work).
I am not proficient with macros so maybe I did something wrong creating it, but then it seems weird that the function shows up when I type it.
A #NAME error suggests you are referencing a cell that Excel cannot identify/find. For example, if you enter =AAA1 in cell A1, it will copy the value from AAA1. If you enter =AAAA1 in cell A1, it will throw a #NAME error, because cell AAAA1 does not exist in Excel due to Excel’s limit of 16384 columns.
This error will also be thrown any time you reference something that looks like a cell name but Excel can’t match to an actual cell, for whatever reason. For example, if you enter =HAPPY, unless you have a named range called “HAPPY”, it will throw a #NAME error.
I don’t have any way to directly debug code I can’t see, but the error suggests Excel is trying to find a cell that doesn’t exist at some point. That could be from 1) the name of the formula, 2) the name of a parameter you passed to the formula or 3) something mistyped inside the macro. For example, =MYFORMULA(AHA) would throw a #NAME error if either MYFORMULA did not exist as a macro (cause 1) or if AHA did not exist as a name cell or cell range (cause 2). It would also throw an error if both of those were correct but the code internally caused a reference to something that didn’t exist (cause 3).
A common way cause 1 manifests is if you record the macro in the wrong place so that Excel can’t find it where it expects it to be. Different versions of Excel handle this in slightly different ways. In Excel 2020/365, after opening the VBA editor, you will need to right-click on VBAProject for the workbook you are working on, then click Insert, and Module. Copy/paste the macro code there.
Clever man!!!
I found it very interesting, easy to try out and on-spot with the findings.
Bravo!