Comparison of online stat. sig. calculators


Videos and training materials about testing often recommend you "use an online statistics calculator" to find the right sample size or to find out if the results from your test are statistically significant (rejects the null hypothesis), but they never say which one or how to find and compare them. I, as a Business Analytics Manager, have also made the same mistake with my team of Business Analysts. It so happens to be a lot of online calculators to help people with testing. And each one has peculiarities about what kind of test they help you perform or what premises they use under the hood.

It's hard to pick the best one, given they are so prolific and differ in purpose, inputs, and outputs. None of the ones I found are wrong, even if they give out different results, it's just because they consider different premises. Nevertheless, it is noticeable that all of them were built with effort and care from the developers, who, no doubt, dedicated their time and energy to building something good and useful for others to enjoy. So, thank you, good Samaritan developers.

To help testers, including my own Business Analyst team, navigate the abundant volume of online statistics calculators and build great tests, I put together this table below comparing the most popular (according to Google search) online calculators that have to do with testing, sample sizing and statistical significance verification. I hope it helps you. Enjoy!

Calculator Supported types of test Inputs
(user can set)
Premised
(user cannot set)
Output
One sample
proportion
One sample
mean
Two samples
proportion
Two samples
mean
samplesizecalculator.github.io logo Yes Yes Yes Yes or ( and ), CL, CI, , tails according to the selected type of test power but results for power=80% are shown in the details below the answer, pop ∞
infrrr Yes Yes Yes Yes or ( and ), , CL, , tails power, pop ∞ hypothesis evaluation (says if it is statistically significant or not) with p-value
 Statistics Kingdom  Yes Yes Yes Yes or ( and ), CL, CI, , tails, choose between Z-test (assumes normal distribution) or t-test (assumes student's t distribution) pop ∞ hypothesis evaluation (says if it is statistically significant or not) with p-value and power
abtestresult.com logo No No Yes Yes and or ( , , and ), , CL, tails power, pop ∞ hypothesis evaluation (says if it is statistically significant or not) with p-value and cool charts
No No Yes Yes or ( and ), CL, power, MDE, tails groups have the same size, pop ∞
sample-size.net logo Yes No No No , CL, power, two-tails CI
Yes No No No , CL, pop, MDE power, two-tails
No Yes No No , , CL, power, two-tails CI
No No Yes No , CL, power, , , left or right tail one-tail, pop ∞
No No Yes No , , CL, power, two-tails, pop ∞
No No No Yes , CL, power, , two-tails, pop ∞ CI
No No No Yes , CL, power, two-tails, pop ∞
365datascience.com logo No Yes No No , , CL, power, two-tails CI
No Yes No No , , , CL, , tails power hypothesis evaluation (says if it is statistically significant or not) with p-value
No No No Yes , , , , CL, , tails, choose between dependent and independent samples, choose between equal or unequal variances power hypothesis evaluation (says if it is statistically significant or not) with p-value
bookingcom.github.io/powercalculator logo No No Yes Yes or ( and ), CL, power, MDE (to output ), (to output MDE), tails groups have the same size, pop ∞ or MDE
calculator.net logo Yes No No No , CL, pop, MDE (to output ), (to output MDE) power, two-tails or MDE
No Yes No No , , CL, power, two-tails, pop ∞ CI
abtestguide.com logo No No Yes No , , , , CL, tails pop ∞ hypothesis evaluation (says if it is statistically significant or not) with p-value and power
No No Yes No , CL, power MDE, tails pop ∞
surveymonkey.com logo questionpro.com logo qualtrics.com logo Yes No No No CL, pop, MDE =50%, power, two-tails
raosoft.com logo select-statistics.co.uk logo Yes No No No , CL, pop, MDE power, two-tails
Legend:
proportion, or rate that is being measured in the test.
average of a continuous variable that is being measured in the test.
standard deviation of the continuous variable.
• p-value probability of obtaining the observed results when the null hypothesis is true.
• CL Confidence Level (usually 95%), complementary to the minimum acceptable p-value α.
• CI Confidence Interval, the accepted precision around the measured metric.
• MDE Minimum Detectable Effect is another way to refer to the accepted precision around the measured metric.
sample size (or sample sizes).
ratio between the sizes of the two samples.
• tails represent if the test has one (non-inferiority or superiority) or two (difference either larger or smaller) tails.
• power is the statistical power of the test, related to the risk of rejecting the null hypothesis when it's true.
power means the calculator does not consider any special treatment for guaranteeing a certain power level.
• "pop" size of the total population being targeted by the test (not only the sampled observations)
• "pop ∞" means the calculator considers the population to be large enough not to matter to the test.
population mean.