**Sample size** is critical to ensuring the validity of your study and should be determined in the very early stages of study design The effect size of your study is critical; this unique measurement will tell you the strength or importance of a particular relationship.

**Power** is the measurement of the probability of committing a Type II error, which is the probability of not finding a relationship that exists in your analysis. The a priori power is unique to every study.

The alpha or significance level of your study is the probability of committing a Type I error. More simply, it is the probability of finding a relationship that does not exist. Generally, committing a Type I error is considered more severe than committing a Type II error.

The significance level measurement is unique to your study. The significance level for a study involving airbag deployment failures would not be the same as the significance level for a study involving the satisfaction of five-year-old children with a particular brand of red crayon.

via Sample Size and Power Analysis | Statistics Solutions.

**Getting the Sample Size Right: A Brief Introduction to Power Analysis **Link

Filed under: First Principles, Statistical Concepts Tagged: Effect size, Probability, Statistical power, Statistical significance, Type I and type II errors ]]>

For more detail follow this link: Non-probability Sampling.

Filed under: First Principles, Sampling ]]>

**A probability sampling** method is any method of sampling that utilizes some form of random selection. In order to have a random selection method, you must set up some process or procedure that assures that the different units in your population have equal probabilities of being chosen. Humans have long practiced various forms of random selection, such as picking a name out of a hat, or choosing the short straw. These days, we tend to use computers as the mechanism for generating random numbers as the basis for random selection.

For more detail follow this link: Probability Sampling.

Filed under: First Principles, Sampling, Statistical Concepts Tagged: Sampling ]]>

100% of all disasters are failures of design, not analysis.

— Ron Marks, Toronto, August 16, 1994

To propose that poor design can be corrected by subtle analysis

techniques is contrary to good scientific thinking.

— Stuart Pocock (Controlled Clinical Trials, p 58) regarding the use of retrospective adjustment for trials with historical controls.

Issues of design always trump issues of analysis.

— GE Dallal, 1999, explaining to a client why it would be wasted effort to focus on the analysis of data from a study whose design was fatally flawed.

Bias dominates variability.

— John C. Bailler, III, Indianapolis, August 14, 2000

**Statistics** is not just a collection of computational techniques. It is a way of thinking about the world. Anyone can take a set of numbers and apply formulas to them. There are many computer programs that will do the calculations for you. But there is no point to analyzing data from a study that was not properly designed to answer the research question under investigation. In fact, there’s a real point in refusing to analyze such data lest faulty results be responsible for implementing a program or policy contrary to what’s really needed. Continue to read this valuable article at this link. From “Some Aspects of Study Design” by Gerard E. Dallal, Ph.D. via Study Design.

However, statistics still often get a bad press:

## Lies, Damned Lies, and Medical Science

Much of what medical researchers conclude in their studies is misleading, exaggerated, or flat-out wrong. So why are doctors—to a striking extent—still drawing upon misinformation in their everyday practice? Dr. John Ioannidis has spent his career challenging his peers by exposing their bad science.

Read more of this here: link

Additional material on research design .

- What Research Can You Believe? (psychcentral.com)
- Statistics and Statisticians in Clinical Trials – Beginning with the End in Mind (ask-cato.com)
- Is There Really a Systematic Problem in Medical Publishing? Or Just a Reporter With a Narrative? (scholarlykitchen.sspnet.org)

Filed under: Design, First Principles Tagged: Clinical study design, Clinical trial ]]>

For example, in the early part of the twentieth century, it was noticed that, when viewed over time, the number of crimes increased with membership in the Church of England. This had nothing to do with criminals finding religion. Rather, both crimes and Church membership increased as the population increased.

via Causality: Cause & Effect.

There are more examples of this important principle at the address above.

Filed under: First Principles ]]>

One thing most people (even statisticians!) would like to do is describe how likely a theory or hypothesis might be in light of a particular set of data. This is not possible in the commonly used classical/frequentist approach to statistics, which is the approach taken in these notes. Instead, statistics talks about the probability of observing particular sets of data, assuming a theory holds.

We are NOT allowed to say, “Because of these data, there is only a small probability that this theory is true.” Instead, we say things like, “If this theory is true, the probability of seeing data like these is small.”

The first statement is relatively clear. If we could say that based on a particular set of data a theory has a 10% chance of being true, then the theory has a 10% chance of being true. The second statement is murky. If the result is that there is only a 10% chance of seeing data like these if a theory is true, is that small enough to make us doubt the theory? How likely are the data under some other theory? Perhaps there’s no theory under which data like these are more likely! This means we need methods for translating this latter type of statement into a declaration that a theory is true or false. As a result…

Statistical methods are convoluted! In order to show an effect exists, […]

via Is statistics hard?.

There are many more important points made on this site that need to be read by all those new (and not so new) to statistics.

Filed under: First Principles Tagged: Statistical hypothesis testing ]]>

Filed under: Statistical Concepts ]]>

For predictive purposes it is often desirable to understand the shape of the underlying distribution of the population. To determine this underlying distribution, it is common to fit the observed distribution to a theoretical distribution by comparing the frequencies observed in the data to the expected frequencies of the theoretical distribution (i.e., a Chi-square goodness of fit test). In addition to this type a test, some software packages also allow you to compute Maximum Likelihood tests and Method of Matching Moments (see Fitting Distributions by Moments in the Process Analysis topic) tests.

**Which Distribution to use**. As described above, certain types of variables follow specific distributions. Variables whose values are determined by an infinite number of independent random events will be distributed following the normal distribution, whereas variables whose values are the result of an extremely rare event would follow the Poisson distribution. The major distributions that have been proposed for modeling survival or failure times are the exponential (and linear exponential) distribution, the Weibull distribution of extreme events, and the Gompertz distribution. The section on types of distributions contains a number of distributions generally giving a brief example of what type of data would most commonly follow a specific distribution as well as the probability density function (pdf) for each distribution.

via Distribution Fitting.

**!**What distribution does my data have? (johndcook.com)

Filed under: Statistical Concepts ]]>

Not all, but most of them are either based on the normal distribution directly or on distributions that are related to and can be derived from normal, such as t, F, or Chi-square.

Typically, these tests require that the variables analyzed are themselves normally distributed in the population, that is, they meet the so-called “normality assumption.” Many observed variables actually are normally distributed, which is another reason why the normal distribution represents a “general feature” of empirical reality.

The problem may occur when we try to use a normal distribution-based test to analyze data from variables that are themselves not normally distributed. In such cases, we have two general choices. First, we can use some alternative “nonparametric” test ; but this is often inconvenient because such tests are typically less powerful and less flexible in terms of types of conclusions that they can provide.

Alternatively, in many cases we can still use the normal distribution-based test if we only make sure that the size of our samples is large enough. The latter option is based on an extremely important principle that is largely responsible for the popularity of tests that are based on the normal function. Namely, as the sample size increases, the shape of the sampling distribution (i.e., distribution of a statistic from the sample; this term was first used by Fisher, 1928a) approaches normal shape, even if the distribution of the variable in question is not normal.

via Elementary Concepts in Statistics.

- The central limit theorem (annezelenka.com)

Filed under: Statistical Concepts Tagged: Chi-square test, Non-parametric statistics, Sample size ]]>

The concept of the probability distribution and the random variables which they describe underlies the mathematical discipline of probability theory, and the science of statistics. There is spread or variability in almost any value that can be measured in a population (e.g. height of people, durability of a metal, sales growth, traffic flow, etc.). For these and many other reasons, simple numbers are often inadequate for describing a quantity, while probability distributions are often more appropriate.

There are various probability distributions that show up in various different applications. Two of the most important ones are the normal distribution and the categorical distribution. The normal distribution, also known as the Gaussian distribution, has a familiar “bell curve” shape and approximates many different naturally occurring distributions over real numbers. The categorical distribution describes the result of an experiment with a fixed, finite number of outcomes. For example, the toss of a fair coin is a categorical distribution, where the possible outcomes are heads and tails, each with probability 1/2.

via Probability distribution – Wikipedia, the free encyclopedia.

For most people, knowing about distributions is only important insofar as it helps you to decide whether to use parametric or non-parametric tests on your data.

- What distribution does my data have? (johndcook.com)
- The normal distribution (annezelenka.com)

Filed under: Statistical Concepts Tagged: Distributions, Normal distribution ]]>