![]() |
VOOZH | about |
Probability refers to the likelihood of an event occurring. For example, when an event like throwing a ball or picking a card from a deck occurs, there is a certain probability associated with that event, which quantifies the chance of it happening. this "Last Minute Notes" article provides a quick and concise revision of the key concepts in Probability and Statistics.
Table of Content
Permutation:
Arrangement of items where order matters.
Formula:
Example:
1) Arranging 2 out of 3 letters (A, B, C): P(3, 2) = 6 (AB, BA, AC, CA, BC, CB).
2) The number of ways to arrange 3 books out of 5:
Combination:
Selection of items where order does not matter.
Formula:
Example:
1) The number of ways to select 2 items from 5:
2) Selecting 2 out of 3 letters (A, B, C): C(3,2) = 3 (AB, AC, BC).
Differences Between Permutations and Combinations:
Permutation | Combination |
|---|---|
Order is important. | Order is not important. |
Formula: P(n,r). | Formula:C(n,r). |
| Example: AB ≠ BA. | Example: AB = BA. |
read more about Permutations and Combinations.
The set of all possible outcomes of a random experiment.
For example, tossing two coins has S = {HH, HT, TH, TT}.
Events:
A subset of the sample space. For example, getting two heads is the event A = {HH}.
Compound Event:
A compound event is an event that consists of two or more outcomes.
Mutually Exclusive Events:
Events that cannot happen simultaneously
Mathematically:
Key Rules:
Examples:
1) Coin toss: Getting Heads (A) and Tails (B) are mutually exclusive events.
2) Rolling a die: Getting Odd (A) and Even (B) number are mutually exclusive events.
Independent Events:
Events where the occurrence of one does not affect the other.
Mathematically:
Key Rules:
Examples:
1) Two coin tosses: Heads on the first toss (A) and Tails on the second (B).
2) Rolling two dice: Rolling a 6 (A) on one die and a 4 (B) on the other.
Important Rules:
Joint Probability:
Joint probability represents the likelihood of two or more events occurring simultaneously. It is denoted as where A and B are events.
If A and B are independent, the formula simplifies to:
Marginal Probability:
Marginal probability is the probability of a single event regardless of the outcomes of other events. It is obtained by summing or integrating joint probabilities over all possible values of the other event.
Conditional Probability:
Conditional probability calculates the probability of event A given that event B has already occurred. It is denoted as P(A∣B) and is computed using:
read more about Joint, Marginal and Conditional Probability.
Bayes’ Theorem provides a way to calculate the conditional probability of an event A, given that another event B has already occurred. It uses prior knowledge about related events to update the probability of A.
Formula:
Here:
Descriptive statistics involves summarizing and organizing data to make it easier to understand. It includes measures like mean, median, mode, standard deviation, and variance
Mean (Average):
The mean is the central value of a dataset, calculated by summing all data points and dividing by the number of points.
Example: For {4,8,6,5,3,7} :
Median:
The median is the middle value of a sorted dataset. If the dataset has an odd number of elements, it’s the middle value; if even, it’s the average of the two middle values.
Example: For {3,4,5,6,7,8} the median is:
Median={5 + 6}/{2} = 5.5
Mode:
The mode is the value(s) that appear most frequently in the dataset.
Example: For {3,7,7,19,24}, the mode is: 7
Variance:
Variance is a measure of how much the values in a dataset deviate from the mean (average). It quantifies the spread or dispersion of the data.
Formula: For a dataset , the variance is calculated as:
Example: Consider the dataset: 3,7,8,10,12.
Standard Deviation (SD):
The standard deviation measures the spread of the data from the mean. It is the square root of the variance.
Example: For {4,8,6,5,3,7}, with μ= 5.5:
Covariance:
Measures how two variables vary together.
Range:
Types:
Formula:
Correlation:
Standardized measure of the strength and direction of the linear relationship.
Range: −1 to +1.
Formula:
Random variables :
A random variable is a function that maps outcomes of a random experiment to real numbers. It helps quantify uncertainty and calculate probabilities.
Example: If two unbiased coins are tossed, let X (X is a random variable or function) represent the number of heads.
The sample space is S = {HH, HT, TH, TT}, and X can take the values {0, 1, 2}.
Cumulative Distribution Function (CDF):
The Cumulative Distribution Function (CDF), F(x), represents the probability that a random variable X takes a value less than or equal to x. It provides an accumulated probability up to a certain point x.
Conditional Expectation:
Conditional expectation is the expected value (mean) of a random variable Y, given that another random variable X has a specific value or distribution. It provides the average value of Y, considering the information provided by X.
Conditional Variance:
Conditional variance measures the spread or variability of a random variable Y, given that another random variable X takes a specific value.
Conditional Probability Density Function:
The Conditional PDF describes the probability distribution of a random variable X, given that another random variable Y is known to take a specific value.
Mathematically:
Discrete Probability Distribution :
Applies to discrete random variables, which can only take specific, countable values (e.g., integers). The probabilities of these outcomes are represented by the Probability Mass Function (PMF).
Continuous Probability Distribution:
Applies to continuous random variables, which can take any value within a range or interval. Probabilities are described using the Probability Density Function (PDF).
Uniform Distribution:
The Uniform Distribution, also called the Rectangular Distribution, is a type of Continuous Probability Distribution. It represents a scenario where a continuous random variable X is uniformly distributed over a finite interval [a, b]. This means that every value within [a, b] is equally likely, and the probability density function f(x) is constant over this range.
The probability density function (PDF) of a uniform distribution is defined as:
This constant density ensures that the total probability over the interval [a, b] is 1.
Mean: μ = (a+b)/2
Variance :
Binomial Distribution:
A probability distribution that models the number of successes in n independent Bernoulli trials.
Key Parameters:
Probability Mass Function:
Mean = np
Variance. = np(1-p)
Bernoulli Trials:
Examples: Tossing a coin (Head = success, Tail = failure).
Theorem:
Probability of r successes in n trials is:
Exponential Distribution:
The Exponential Distribution models the time between events in a process where events occur continuously and independently at a constant average rate.
For a positive real number the Probability Density Function (PDF) of an exponentially distributed random variable X is:
Mean:
Variance:
Poisson Distribution:
The Poisson distribution is a discrete probability distribution used to model the number of occurrences of an event in a fixed interval of time, space, or volume, where:
Probability Mass Function (PMF):
where λ is the mean (expected number of events)
Mean:
Variance:
Normal Distribution:
The Normal Distribution is a continuous probability distribution that models many natural and real-world phenomena. It is characterized by its symmetric, bell-shaped curve, where:
Probability Density Function (PDF) is:
Mean ():
Variance :
Standard Normal Distribution:
The Standard Normal Distribution, also called the Z-distribution, is a special case of the normal distribution where:
It is used to compare and analyze data by standardizing values using the z-score:
Probability Density Function (PDF)
t-Distribution:
The t-distribution (Student's t-distribution) is used in statistics to infer population means when:
Key Formula:
The t-score:
Where:
Chi-Squared Distribution:
The Chi-Squared distribution represents the sum of the squares of k independent standard normal random variables:
Probability Density Function (PDF):
Mean: =k
Variance: = {Variance} = 2k
Inferential statistics makes predictions or inferences about a population based on sample data.
Sampling Distribution: A sampling distribution is the probability distribution of a statistic (such as the sample mean) obtained through repeated sampling from a population. It shows how the statistic varies across different samples,
Central limit theorem:
The Central Limit Theorem (CLT) states that for a sufficiently large sample size (n >30), the distribution of the sample mean approaches a normal distribution, regardless of the shape of the population distribution. The population must have a finite variance.
formula: For a random variable X with:
The sample mean follows:
The Z-score for the sample mean is given by:
Confidence Interval:
A Confidence Interval (CI) is a range of values within which the true population parameter (e.g., mean) lies with a certain confidence level (e.g., 95%).
Key Formula:
Z-Test:
A statistical test used to determine if a sample mean differs significantly from a population mean, applicable when:
Formula:
T-Test :
A t-test is a statistical method to compare the means of two groups and determine if the difference is statistically significant. It is used when:
Key Types:
One-Sample T-Test:
Independent T-Test: Compares means of two independent groups.
Paired T-Test: Compares means from the same group at two different times.
Chi-Square Test:
A chi-square (χ²) test assesses whether there is a significant relationship between two categorical variables.It compares observed data against expected frequencies to identify if the results are likely to occur by chance.
Example: When tossing a coin, the test can show if heads or tails appear disproportionately often, suggesting that the result isn't just random.
Formula:
where: