Over the last week or so, I wrote a web-based tool to automatically generate datasets and worked-out solutions. It creates and displays a dataset, a completed solution, and the results of most intermediary computational steps. It is freely available online for instructors or students: http://rlanders.net/datasets.php

As an example, if you select “Paired Samples t” with “n = 10” and an outcome type of “Survey”, it will output:

- Instructions on what to do with the dataset generated
- A dataset with Time 1 and Time 2 “Survey” variables (1=min, 5=max, 3ish=mean, 1ish=sd)
- A “completed” dataset containing calculated difference scores and squared difference scores
- Sum of d, Mean of d, Sum of d^2, (Sum of d)^2, sd of d, se of d, critical t statistic, degrees of freedom, observed t, CI lower and upper bounds
- A box with all important “final” output: the research question, null and alternative hypotheses, alpha, critical value, journal-type reporting of CI and t with precise p-value, unstandardized effect size (difference score), standardized effect size (Cohen’s d) and a NARRATIVE CONCLUSION including interpretation of the effect size. As an example: “
**Conclusion:**Reject the null and accept the alternative. The difference is statistically significant. The Survey variable decreases over time. If we assume this sample to represent the population, we would expect 95% of sample means to fall between -1.68 and 0.08. On average, the Survey variable was 0.80 lower at Time 2. The difference over time was medium.”

This is customized to each test. The program will also randomly choose between directional and non-directional tests when directional tests are plausible.

I have tried to tweak the generation algorithms so that you get statistically significant results about 50% of the time. However, you will get more significant results with larger sample sizes and fewer significant results with smaller sample sizes (as you might expect). But if you leave it set to n = 10, it should be about even.

The tool includes fully worked out problems for: central tendency and variability statistics, z-score calculations, confidence intervals, z-tests, one-sample t-tests, paired-samples t-tests, independent-samples t-tests, one-way ANOVA, chi-squared goodness of fit and test of independence, and correlation/regression.

If you’re wondering why I wrote this, it is because I wrote an undergraduate 1-semester introduction to statistics for business students which will be published in 2013. However, you don’t need the book to use the dataset generator (just ignore the references to “Chapters”). It is customized to the statistical method I teach in that text, however, so if you like it, I’d ask you to consider my textbook when it is published in 2013.

If you choose to use this to generate datasets for your classes or if you provide it to students so that they can test themselves at home (strongly recommended), I only ask that you test it out a couple of times first – copy/paste datasets into SPSS and ensure that the output matches what the generator produces. I have tested it myself in Firefox, IE, and Chrome, and it looks like it works great, but bugs can crop up unexpectedly. The entire program is written in JavaScript, so you’ll need a fairly modern web browser.

Also, if you decide to adopt this or provide it to your students, please leave a note in the comments saying so. Feedback is appreciated!

I am teaching Statistical Methods for the first time this semester and your dataset generator has been a great help to me in devising lab materials for my students. Thank you very much!

I’m so glad you found it useful! I’d also recommend letting students use it as a method of self-testing their computational skills.