Archive for the ‘First Principles’ Category

Sample Size and Power Analysis

November 5, 2010 Comments off

Each of these four components of your study (sample size, statistical power, effect size, and significance level) are a function of the other three, meaning that altering one causes changes in the others.

Sample size is critical to ensuring the validity of your study and should be determined in the very early stages of study design The effect size of your study is critical; this unique measurement will tell you the strength or importance of a particular relationship.

Power is the measurement of the probability of committing a Type II error, which is the probability of not finding a relationship that exists in your analysis. The a priori power is unique to every study.

The alpha or significance level of your study is the probability of committing a Type I error. More simply, it is the probability of finding a relationship that does not exist. Generally, committing a Type I error is considered more severe than committing a Type II error.

The significance level measurement is unique to your study. The significance level for a study involving airbag deployment failures would not be the same as the significance level for a study involving the satisfaction of five-year-old children with a particular brand of red crayon.

via Sample Size and Power Analysis | Statistics Solutions.

Getting the Sample Size Right: A Brief Introduction to Power Analysis Link


Non-probability Sampling

November 1, 2010 Comments off

The difference between non-probability and probability sampling is that non-probability sampling does not involve random selection and probability sampling does. Does that mean that non-probability samples aren’t representative of the population? Not necessarily. But it does mean that non-probability samples cannot depend upon the rationale of probability theory. At least with a probabilistic sample, we know the odds or probability that we have represented the population well. We are able to estimate confidence intervals for the statistic. With non-probability samples, we may or may not represent the population well, and it will often be hard for us to know how well we’ve done so. In general, researchers prefer probabilistic or random sampling methods over non-probabilistic ones, and consider them to be more accurate and rigorous. However, in applied social research there may be circumstances where it is not feasible, practical or theoretically sensible to do random sampling. Here, we consider a wide range of non-probabilistic alternatives.

For more detail follow this link: Non-probability Sampling.

Categories: First Principles, Sampling

Probability Sampling

November 1, 2010 Comments off
The chosen "random" sample

Image by Marco De Cesaris via Flickr

A probability sampling method is any method of sampling that utilizes some form of random selection. In order to have a random selection method, you must set up some process or procedure that assures that the different units in your population have equal probabilities of being chosen. Humans have long practiced various forms of random selection, such as picking a name out of a hat, or choosing the short straw. These days, we tend to use computers as the mechanism for generating random numbers as the basis for random selection.

For more detail follow this link: Probability Sampling.

Study Design

October 30, 2010 Comments off


100% of all disasters are failures of design, not analysis.

— Ron Marks, Toronto, August 16, 1994

To propose that poor design can be corrected by subtle analysis

techniques is contrary to good scientific thinking.

— Stuart Pocock (Controlled Clinical Trials, p 58) regarding the use of retrospective adjustment for trials with historical controls.

Issues of design always trump issues of analysis.

— GE Dallal, 1999, explaining to a client why it would be wasted effort to focus on the analysis of data from a study whose design was fatally flawed.

Bias dominates variability.

— John C. Bailler, III, Indianapolis, August 14, 2000

Statistics is not just a collection of computational techniques. It is a way of thinking about the world. Anyone can take a set of numbers and apply formulas to them. There are many computer programs that will do the calculations for you. But there is no point to analyzing data from a study that was not properly designed to answer the research question under investigation. In fact, there’s a real point in refusing to analyze such data lest faulty results be responsible for implementing a program or policy contrary to what’s really needed. Continue to read this valuable article at this link. From “Some Aspects of Study Design” by Gerard E. Dallal, Ph.D. via Study Design.

However, statistics still often get a bad press:

Lies, Damned Lies, and Medical Science

Much of what medical researchers conclude in their studies is misleading, exaggerated, or flat-out wrong. So why are doctors—to a striking extent—still drawing upon misinformation in their everyday practice? Dr. John Ioannidis has spent his career challenging his peers by exposing their bad science.

Read more of this here: link

Additional material on research design .

Cause & Effect

October 30, 2010 Comments off

“Cause and Effect”! You almost never hear these words in an introductory statistics course. The subject is commonly ignored. Even on this site, all it gets is this one web page. If cause and effect is addressed at all, it is usually by giving the (proper) warning “Association does not imply causation!” along with a few illustrations.

For example, in the early part of the twentieth century, it was noticed that, when viewed over time, the number of crimes increased with membership in the Church of England. This had nothing to do with criminals finding religion. Rather, both crimes and Church membership increased as the population increased.

via Causality: Cause & Effect.

There are more examples of this important principle at the address above.

Categories: First Principles

Is statistics hard?

October 30, 2010 Comments off

Statistics is backwards! You might think that given a particular set of data, you are able to say how likely it is that a particular theory is true. Unfortunately, you would be wrong!

One thing most people (even statisticians!) would like to do is describe how likely a theory or hypothesis might be in light of a particular set of data. This is not possible in the commonly used classical/frequentist approach to statistics, which is the approach taken in these notes. Instead, statistics talks about the probability of observing particular sets of data, assuming a theory holds.

We are NOT allowed to say, “Because of these data, there is only a small probability that this theory is true.” Instead, we say things like, “If this theory is true, the probability of seeing data like these is small.”

The first statement is relatively clear. If we could say that based on a particular set of data a theory has a 10% chance of being true, then the theory has a 10% chance of being true. The second statement is murky. If the result is that there is only a 10% chance of seeing data like these if a theory is true, is that small enough to make us doubt the theory? How likely are the data under some other theory? Perhaps there’s no theory under which data like these are more likely! This means we need methods for translating this latter type of statement into a declaration that a theory is true or false. As a result…

Statistical methods are convoluted! In order to show an effect exists, […]

via Is statistics hard?.

There are many more important points made on this site that need to be read by all those new (and not so new) to statistics.