## Sample Size and Power Analysis

Each of these four components of your study (sample size, statistical power, effect size, and significance level) are a function of the other three, meaning that altering one causes changes in the others.

**Sample size** is critical to ensuring the validity of your study and should be determined in the very early stages of study design The effect size of your study is critical; this unique measurement will tell you the strength or importance of a particular relationship.

**Power** is the measurement of the probability of committing a Type II error, which is the probability of not finding a relationship that exists in your analysis. The a priori power is unique to every study.

The alpha or significance level of your study is the probability of committing a Type I error. More simply, it is the probability of finding a relationship that does not exist. Generally, committing a Type I error is considered more severe than committing a Type II error.

The significance level measurement is unique to your study. The significance level for a study involving airbag deployment failures would not be the same as the significance level for a study involving the satisfaction of five-year-old children with a particular brand of red crayon.

via Sample Size and Power Analysis | Statistics Solutions.

**Getting the Sample Size Right: A Brief Introduction to Power Analysis **Link

## Probability Sampling

**A probability sampling** method is any method of sampling that utilizes some form of random selection. In order to have a random selection method, you must set up some process or procedure that assures that the different units in your population have equal probabilities of being chosen. Humans have long practiced various forms of random selection, such as picking a name out of a hat, or choosing the short straw. These days, we tend to use computers as the mechanism for generating random numbers as the basis for random selection.

For more detail follow this link: Probability Sampling.

## Main parameters of a distribution

The main parameters of a distribution are generally spoken of as (i) central tendency, (ii) variability, (iii) skew, (iv) kurtosis, and (v) modality. Central tendency and variability will need to be examined separately in considerable detail, since these two parameters need to be clearly understood and precisely measured…**for more on this follow the link below…**

## What shape is my data

**Fitting my observed data to a distribution**

For predictive purposes it is often desirable to understand the shape of the underlying distribution of the population. To determine this underlying distribution, it is common to fit the observed distribution to a theoretical distribution by comparing the frequencies observed in the data to the expected frequencies of the theoretical distribution (i.e., a Chi-square goodness of fit test). In addition to this type a test, some software packages also allow you to compute Maximum Likelihood tests and Method of Matching Moments (see Fitting Distributions by Moments in the Process Analysis topic) tests.

**Which Distribution to use**. As described above, certain types of variables follow specific distributions. Variables whose values are determined by an infinite number of independent random events will be distributed following the normal distribution, whereas variables whose values are the result of an extremely rare event would follow the Poisson distribution. The major distributions that have been proposed for modeling survival or failure times are the exponential (and linear exponential) distribution, the Weibull distribution of extreme events, and the Gompertz distribution. The section on types of distributions contains a number of distributions generally giving a brief example of what type of data would most commonly follow a specific distribution as well as the probability density function (pdf) for each distribution.

via Distribution Fitting.

###### Related Articles

**!**What distribution does my data have? (johndcook.com)

## Statistical Tests & Distributions

Are all test statistics normally distributed?

Not all, but most of them are either based on the normal distribution directly or on distributions that are related to and can be derived from normal, such as t, F, or Chi-square.

Typically, these tests require that the variables analyzed are themselves normally distributed in the population, that is, they meet the so-called “normality assumption.” Many observed variables actually are normally distributed, which is another reason why the normal distribution represents a “general feature” of empirical reality.

The problem may occur when we try to use a normal distribution-based test to analyze data from variables that are themselves not normally distributed. In such cases, we have two general choices. First, we can use some alternative “nonparametric” test ; but this is often inconvenient because such tests are typically less powerful and less flexible in terms of types of conclusions that they can provide.

Alternatively, in many cases we can still use the normal distribution-based test if we only make sure that the size of our samples is large enough. The latter option is based on an extremely important principle that is largely responsible for the popularity of tests that are based on the normal function. Namely, as the sample size increases, the shape of the sampling distribution (i.e., distribution of a statistic from the sample; this term was first used by Fisher, 1928a) approaches normal shape, even if the distribution of the variable in question is not normal.

via Elementary Concepts in Statistics.

###### Related Articles

- The central limit theorem (annezelenka.com)

## Distributions

The concept of the probability distribution and the random variables which they describe underlies the mathematical discipline of probability theory, and the science of statistics. There is spread or variability in almost any value that can be measured in a population (e.g. height of people, durability of a metal, sales growth, traffic flow, etc.). For these and many other reasons, simple numbers are often inadequate for describing a quantity, while probability distributions are often more appropriate.

There are various probability distributions that show up in various different applications. Two of the most important ones are the normal distribution and the categorical distribution. The normal distribution, also known as the Gaussian distribution, has a familiar “bell curve” shape and approximates many different naturally occurring distributions over real numbers. The categorical distribution describes the result of an experiment with a fixed, finite number of outcomes. For example, the toss of a fair coin is a categorical distribution, where the possible outcomes are heads and tails, each with probability 1/2.

via Probability distribution – Wikipedia, the free encyclopedia.

For most people, knowing about distributions is only important insofar as it helps you to decide whether to use parametric or non-parametric tests on your data.

###### Related Articles

- What distribution does my data have? (johndcook.com)
- The normal distribution (annezelenka.com)

## More on P-Values

To any confused readers out there: The p-value is the probability of seeing something as extreme as the data or more so, if the null hypothesis were true. In social science (and I think in psychology as well), the null hypothesis is almost certainly false, false, false, and you don’t need a p-value to tell you this. The p-value tells you the extent to which a certain aspect of your data are consistent with the null hypothesis. A lack of rejection doesn’t tell you that the null hypothesis is likely true; rather, it tells you that you don’t have enough data to reject the null hypothesis. For more on this, see for example this paper with David Weakliem which was written for a nontechnical audience (link to pdf).

via Statistical Modeling, Causal Inference, and Social Science.