Definition, PMF, PDF, CDF, expectation, variance, and standard transformations.
Prereqs:Combinatorics
Overview
This topic builds the entire probabilistic foundation. We start from the
formal probability space (Ω,F,P), define random variables
and their key characteristics, then cover the essential tools — moments,
inequalities, and transforms — that appear directly in quant interview
problems.
1. Probability Space
1.1 Probability Space (Ω,F,P)
Definition— Sample Space Ω
The sample spaceΩ (also written Γ) is the set of all possible outcomes of a random experiment. Each element ω∈Ω is called an outcome.
Definition— Event and σ-algebra F
An event is a subset A⊆Ω. The collection F of all observable events is a σ-algebra (tribu) on , satisfying:
Definition— Probability Measure P
A probability measure is a function P:F→[0,1] satisfying the Kolmogorov axioms:
P(Ω)=
Remarque
In practice, you never need to manipulate F explicitly in
interview problems. What matters is the probability measure P and its
properties below.
1.2 Basic Properties of P
Properties
P(∅)=0 - P(Aˉ)=1−P(A)
(monotonicity)
Theorem— Boole's Inequality (Union Bound)
For any sequence of events A1,A2,… (not necessarily disjoint):
💡Tip
Boole's inequality is the go-to tool to upper bound the probability that at least one of many bad events occurs. If each P(Ai) is small, the union is also small.
1.3 Continuity of P
Theorem— Sequential Continuity
For any monotone sequence of events:
If A1⊆A2⊆⋯ (increasing), then:
Remarque
This theorem justifies passing limits through probability signs for monotone
sequences — a technique used often in convergence proofs.
2. Random Variables
2.1 Definition of a Random Variable
Definition
A random variableX is a measurable function X:Ω→R, i.e. for all x∈R:
Remarque
In practice, a random variable is simply a quantity whose value depends on
the outcome of a random experiment. The measurability condition ensures that
P(X≤x) is always well-defined.
2.2 Cumulative Distribution Function (CDF)
Definition
The cumulative distribution function (CDF, or FDR) of a random variable X is:
FX(x)=P(X
Properties
FX is non-decreasing
limx→−∞F and
Remarque
The CDF uniquely characterizes the distribution of X. Two random variables
with the same CDF have the same distribution.
2.3 Independence
Definition
Two events A,B∈F are independent if:
P(A∩B)=P
Properties
X,Y independent ⇒E[XY]=E[X]E[Y]
independent
Remarque
Pairwise independence does not imply mutual independence. A classic counterexample: toss two fair coins, let A={first is H}, B={second is H}, C={both same} — pairwise independent but not mutually independent.
3. Moments
3.1 Expected Value
Definition
The expected value (espérance) of a random variable X is:
Discrete: E[X]=x∑xP(X=
Formula— Transfert Formula
For any measurable function g:R→R:
E
Formula— Tail Formula
For a non-negative random variable X≥0:
E[X]=∫
Properties
E[aX+b]=aE[X]+b (linearity)
(linearity, always)
💡Tip
The tail formulaE[X]=∫0∞P(X>t)dt is extremely powerful for non-negative integer-valued RVs: . Use it for geometric distributions, waiting times, and coupon collector-type problems.
3.2 Variance
Definition
The variance of X measures the spread around its mean μ=E[X]:
Properties
Var(aX+b)=a2Var(X)
Var(X), and a.s.
Formula— Covariance
Cov(X,Y)=E[XY]−E[X]E[Y]
Formula— Correlation
ρ(X,Y)=σ
💡Tip
In interviews, always reach for Var(X)=E[X2]−(E[X])2 to
compute variance — it avoids expanding directly. Compute
via the transfert formula.
3.3 Higher Moments
Definition
The moment of order k of X is:
mk=
Properties
μ1=0, μ2=Var(X)
Skewness (asymétrie): — measures asymmetry of the distribution
Remarque
For a normal distribution: γ1=0 (symmetric) and γ2=0 (excess kurtosis = 0). A distribution with is (heavy tails) — relevant in finance where asset returns exhibit excess kurtosis.
4. Inequalities
4.1 Markov's Inequality
Theorem— Markov's Inequality
For any random variable X≥0 and a>0:
P
💡Tip
Markov only requires X≥0 and knowledge of E[X] — it is the weakest but most general bound. In interviews, use it when you only know the mean and need an upper bound on a tail probability.
4.2 Bienaymé-Chebyshev Inequality
Theorem— Bienaymé-Chebyshev
For any random variable X with mean μ and variance σ2, and for any k>0:
💡Tip
Chebyshev is the standard tool to prove the Law of Large Numbers and to bound probabilities when only mean and variance are known. It is distribution-free — valid for any X with finite variance.
4.3 Cantelli's Inequality (One-sided Chebyshev)
Theorem— Cantelli's Inequality
For any random variable X with mean μ and variance σ2, and for any λ>0:
Remarque
Cantelli is strictly stronger than Chebyshev on one side: it gives an upper
bound on the one-sided tail P(X≥μ+λ) without the factor of
2. Useful when the direction of deviation matters.
4.4 Jensen's Inequality
Theorem— Jensen's Inequality
Let φ:R→R be a convex function and X a random variable with E[∣X∣]<∞. Then:
Properties
φ(x)=x2 convex ⇒(E[X])2 (i.e. )
💡Tip
Jensen is the key inequality when dealing with convex/concave transformations of expectations — log-returns, utility functions, option pricing bounds. Whenever you see E[f(X)] vs f(E[X]), ask yourself if f is convex or concave.
4.5 Cauchy-Schwarz Inequality
Theorem— Cauchy-Schwarz
For any two random variables X,Y with finite second moments:
(E[XY])
Properties
Equality holds if and only if Y=cX a.s. for some constant c -
Implies ∣ρ(X,Y)∣≤1:
4.6 Hölder's Inequality
Theorem— Hölder's Inequality
For any random variables X,Y and conjugate exponents p,q≥1 with p:
Remarque
Cauchy-Schwarz is the special case p=q=2. Hölder generalizes to any
conjugate pair. The case p=1,q=∞ gives .
5. Generating Functions & Transforms
5.1 Probability Generating Function (PGF)
Definition
For a non-negative integer-valued random variable X, the probability generating function is:
GX
Properties
GX(1)=1
P(X=k)
💡Tip
PGFs are most useful for sums of independent discrete RVs and branching processes. The factorization property GX+Y=GX⋅G is the key tool.
5.2 Moment Generating Function (MGF)
Definition
The moment generating function of X is:
MX
Properties
MX(0)=1
E[Xk] (moments from derivatives)
Remarque
The MGF does not always exist (e.g. heavy-tailed distributions like Cauchy). In that case, use the characteristic function instead.
💡Tip
In interviews, the MGF is used to: (1) identify a distribution by matching
its MGF to a known one, (2) compute moments quickly via differentiation, (3)
prove that a sum of independents follows a known law.
5.3 Characteristic Function
Definition
The characteristic function of X is:
φX
Properties
φX(0)=1 and ∣φX(t)∣ for all
Remarque
The characteristic function is the Fourier transform of the density. It always exists, making it more general than the MGF. The inversion formula recovers the density: fX(x)=2π.
5.4 Laplace Transform
Definition
For a non-negative random variable X≥0, the Laplace transform is:
L
Properties
LX(0)=1
E[X
Remarque
The Laplace transform is essentially the MGF evaluated at −s: LX(s)=MX(−s). It is preferred for non-negative RVs (exponential, gamma) and in queuing theory / reliability contexts.
6. Quick Reference
Inequalities Summary
Inequality
Condition
Bound
Markov
X≥0, a>0
P(X≥a)≤
Transforms Summary
Transform
Definition
Exists always?
Recovers moments?
PGF
E[zX]
For integer X≥0
Yes, via derivatives at 0
MGF
E[etX]
No (heavy tails)
All Key Formulas
Concept
Formula
CDF
FX(x)=P(X≤x)
PDF from CDF
f
Ω
Ω∈F
A∈F⇒Aˉ=Ω∖A∈F
A1,A2,…∈F⇒i=1
1
P(A)≥0 for all A∈F
For any sequence of pairwise disjoint events (Ai):
P(i=1⋃∞Ai)=i=1∑∞P(Ai)
The triple (Ω,F,P) is called a probability space.
A⊆B⇒P(A)≤P(B)
P(A∪B)=P(A)+P(B)−P(A∩B)
0≤P(A)≤1
P(i=1⋃nAi)≤i=1∑nP(Ai)
P(n=1⋃∞An)=n→∞limP(An)
If A1⊇A2⊇⋯ (decreasing), then:
P(n=1⋂∞An)=n→∞limP(An)
{ω∈Ω:X(ω)≤x}∈F
The distribution (loi) of X is the probability measure PX on R defined by PX(B)=P(X∈B).
X is discrete if it takes values in a countable set
X is continuous if its distribution admits a density f with respect to the Lebesgue measure
≤
x
)
,
x
∈
R
X
(
x
)
=
0
limx→+∞FX(x)=1
FX is right-continuous: limy→x+FX(y)=FX(x)
P(a<X≤b)=FX(b)−FX(a)
P(X>x)=1−FX(x)
For continuous X: FX(x)=∫−∞xfX(t)dt and fX(x)=FX′(x)
(
A
)
P
(
B
)
Two random variables X,Y are independent if for all x,y∈R:
P(X≤x,Y≤y)=P(X≤x)P(Y≤y)
i.e. FX,Y(x,y)=FX(x)FY(y).
A family (Xi)i∈I is mutually independent if for every finite subset J⊆I: