VOOZH about

URL: https://www.analyticsvidhya.com/blog/2021/07/how-much-mathematics-do-you-need-to-know-for-machine-learning/

⇱ Mathematics For Machine Learning | Maths to understand ML Algorithms


India's Most Futuristic AI Conference Is Back – Bigger, Sharper, Bolder

  • d
  • :
  • h
  • :
  • m
  • :
  • s

Reading list

How much Mathematics do you need to know for Machine Learning?

KAVITA Last Updated : 31 Mar, 2023
8 min read

This article was published as a part of the Data Science Blogathon

 Image Source

No matter how long-running a love-hate relationship you have with maths, understanding its core concepts is essential for designing Machine Learning Models and making strategic decisions. Mathematics for Machine Learning is a prerequisite for building a career in Data Science and AI, so embracing its concepts and implementing them in your future work is crucial.

Machine learning is all about mathematics, which successively helps in creating an ML algorithm that will learn from data provided to form an accurate prediction. The prediction might be as simple as classifying cats or dogs from a given set of images or what quite products to recommend to a customer supported past purchases. Having a proper understanding of the mathematics behind the ML algorithms will help you choose all the proper algorithms for your project in data science and machine learning.

As long as you’ll understand why maths is employed, you’ll find it more interesting. With this, you’ll understand why we pick one machine learning algorithm over the opposite and the way it affects the performance of the machine learning model.

  • Vectors and Vector Spaces
  • Linear Transformations and matrices

Vectors and Vector Spaces

The ability to visualize data is one of the most useful skills to possess as a data science professional, and a solid foundation in linear algebra enables one to do that. Some concepts and algorithms are quite easy to understand if one can visualize them as vectors and matrices, rather than looking at the data as lists and arrays of numbers.

Linear Algebra is the workhorse of Data Science and ML. While training a machine learning model using a library (such as in R or Python), much of what happens behind the scenes is a bunch of matrix operations. The most popular deep learning library today, Tensorflow, is essentially an optimized (i.e. fast and reliable) matrix manipulation library. So is scikit-learn, the Python library for machine learning.

A vector is an object having both magnitudes as well as direction. Vectors are usually represented in two ways – as ordered lists, such as x = [x1X2 . . . xn] or using the β€˜hat’ notation, such as x = x1Λ†i + x2Λ†j + x3Λ†k where Λ†i, Λ†j, Λ†k represent the three perpendicular directions (or axes).

The number of elements in a vector is the dimensionality of the vector. For e.g. x = [ x1 , x] is two dimensional (2-D) vector , x = [ x1 , x2 , x] is a 3-D vector and so on.

The magnitude of a vector is the distance of its tip from the origin. For an n-dimensional vector x = [x1,x2 , . . . xn ] , the magnitude is given by,

πŸ‘ vector Mathematics For Machine Learning

A unit vector is one whose distance from the origin is exactly 1 unit. E.g the vectors πŸ‘ i hat comma j hat comma the fraction with numerator i hat and denominator the square root of 2 plus the fraction with numerator j hat and denominator the square root of 2
are unit vectors.

1. It is the element-wise sum/difference of two vectors.
Mathematically,

πŸ‘ Mathematics For Machine Learning the 4 by 1 column matrix Row 1: x sub 1 Row 2: x sub 2 Row 3: vertical ellipsis Row 4: x sub n plus or minus the 4 by 1 column matrix Row 1: y sub 1 Row 2: y sub 2 Row 3: vertical ellipsis Row 4: y sub n equals the 4 by 1 column matrix Row 1: x sub 1 plus y comma Row 2: x sub 2 plus y sub 2 Row 3: vertical ellipsis Row 4: x sub n plus y sub n

2. It is the element-wise multiplication/division of the scalar value.
Mathematically,

πŸ‘ scalar multiplication division Mathematics For Machine Learning a. times the 4 by 1 column matrix Row 1: x sub 1 Row 2: x sub 2 Row 3: vertical ellipsis Row 4: x sub n equals the 4 by 1 column matrix Row 1: a. x sub 1 Row 2: a. x sub 2 Row 3: vertical ellipsis Row 4: a. x sub n

3. It is the element-wise product of the two vectors. It is also known as the dot product of two vectors. The dot product of two vectors returns a scalar quantity. Mathematically,

πŸ‘ Mathematics For Machine Learning vector multi the 4 by 1 column matrix Row 1: x sub 1 Row 2: x sub 2 Row 3: vertical ellipsis Row 4: x sub n period the 4 by 1 column matrix Row 1: y sub 1 Row 2: y sub 2 Row 3: vertical ellipsis Row 4: y sub n equals x sub 1 y sub 1 plus x sub 2 y sub 2 plus period period period positive x sub n y sub n

Geometrically,

πŸ‘ cos theta - d o t of open paren x right arrow comma y right arrow divided into equals double vertical double vertical bar x times double vertical bar double vertical bar y times cosine theta

where πŸ‘ theta
is the angle between two vectors?

The dot product of two perpendicular vectors (also called orthogonal vectors) is 0. The dot product can be used to compute the angle between two vectors using the formula,

πŸ‘ Mathematics For Machine Learning cosine theta equals the fraction with numerator x right arrow times y right arrow and denominator double vertical bar double vertical bar x times double vertical bar double vertical bar y

This simple property of the dot product is extensively used in data science applications.
  1. Basis Vector: A basis vector of a vector space V is defined as a subset (v 1,  v2,. . . vn ) of vectors in vector space V, that are linearly independent and span vector space V. Consequently, if (v1, v2, . . . vn) is a list of vectors in vector space V, then these vectors form a vector basis if and only if every v in vector space V can be uniquely written as, πŸ‘ Mathematics For Machine Learning V equals a. sub 1 v sub 1 plus a. sub 2 v sub 2 plus times times times plus a. sub n v sub n
  2. : The span of two or more vectors is the set of all possible vectors that one can get by changing the scalars and adding them.
  3.   The linear combination of two vectors is the sum of the scaled vectors.
  4. : A set of vectors is called linearly dependent if any one or more of the vectors can be expressed as a linear combination of the other vectors.
  5. : If none of the vectors in a set can be expressed as a linear combination of the other vectors, the vectors are called linearly independent.

LINEAR TRANSFORMATIONS AND MATRICES:

Matrices are a time-tested and powerful data structure used to perform numerical computations. Briefly, a matrix is a collection of values stored as rows and columns, i.e.

πŸ‘ Linear transformation - A equals the 4 by 1 column matrix Row 1: x sub 11 comma x sub 12 times times times x sub 1 n Row 2: x sub 12 comma x sub 22 times times times x sub 2 n Row 3: colon colon colon Row 4: x sub m 1 comma x sub m 2 raised to the power times times times x sub m n

  1. Rows are horizontal. The matrix A has m rows. Each row itself is a vector, so they are also called row vectors.
  2. Columns are vertical. The matrix A has n columns. Each column itself is a vector, so they are also called column vectors.
  3. : Entities are individual values in a matrix. For a given matrix A, value of row i and column j is represented as A ij 
  4. The number of rows and columns. For m rows and n columns, the dimensions are (m Γ— n).
  5. These are matrices where the number of rows is equal to the number of columns, i.e m = n.
  6. These are square matrices where all the off-diagonal elements are zero,i.e,πŸ‘ Mathematics For Machine Learning the 4 by 1 column matrix Row 1: x sub 11 comma 0 times times times 0 Row 2: 0 sub x sub 22 period period period 0 Row 3: vertical ellipsis vertical ellipsis vertical ellipsis Row 4: 0 0 times times times x sub m n
  7. These are diagonal matrices where all the diagonal elements are 1, i.e,πŸ‘ Mathematics For Machine Learning the 4 by 1 column matrix Row 1: 1 0 times times times 0 Row 2: 0 sub 1 period period period 0 Row 3: vertical ellipsis vertical ellipsis vertical ellipsis Row 4: 0 0 times times times 1
  1. Matrix Addition/Subtraction: It is the element-wise sum/difference of two matrices. Mathematically,πŸ‘ the 4 by 4 matrix Row 1: Column 1, x sub 11 Column 2, x sub 12 Column 3, times times times Column 4, x sub 1 n Row 2: Column 1, x sub 21 Column 2, x sub 22 Column 3, times times times Column 4, x sub 2 n Row 3: Column 1, vertical ellipsis Column 2, vertical ellipsis Column 3, times times times Column 4, vertical ellipsis Row 4: Column 1, x sub m 1 Column 2, x sub m 2 Column 3, times times times Column 4, x sub m n plus the 4 by 4 matrix Row 1: Column 1, y sub 11 Column 2, y sub 12 Column 3, times times times Column 4, y sub 1 n Row 2: Column 1, y sub 21 Column 2, y sub 22 Column 3, times times times Column 4, y sub 2 n Row 3: Column 1, vertical ellipsis Column 2, vertical ellipsis Column 3, times times times Column 4, vertical ellipsis Row 4: Column 1, y sub m 1 Column 2, y sub m 2 Column 3, times times times Column 4, y sub m n equals the 4 by 4 matrix Row 1: Column 1, x sub 11 plus y sub 11 Column 2, x sub 12 plus y sub 12 Column 3, times times times Column 4, x sub 1 n plus y sub 1 n Row 2: Column 1, x sub 21 plus y sub 21 Column 2, x sub 22 plus y sub 22 Column 3, times times times Column 4, x sub 2 n plus y sub 2 n Row 3: Column 1, vertical ellipsis Column 2, vertical ellipsis Column 3, times times times Column 4, vertical ellipsis Row 4: Column 1, x sub m 1 plus y sub m 1 Column 2, x sub m 2 plus y sub m 2 Column 3, times times times Column 4, x sub m n plus y sub m n
  2. It is the element-wise multiplication/division of the scalar value. Mathematically,πŸ‘ Mathematics For Machine Learning a. times the 4 by 4 matrix Row 1: Column 1, x sub 11 Column 2, x sub 12 Column 3, times times times Column 4, x sub 1 n Row 2: Column 1, x sub 21 Column 2, x sub 22 Column 3, times times times Column 4, x sub 2 n Row 3: Column 1, vertical ellipsis Column 2, vertical ellipsis Column 3, times times times Column 4, vertical ellipsis Row 4: Column 1, x sub m 1 Column 2, x sub m 2 Column 3, times times times Column 4, x sub m n equals the 4 by 4 matrix Row 1: Column 1, a. x sub 11 Column 2, a. x sub 12 Column 3, times times times Column 4, a. x sub 1 n Row 2: Column 1, a. x sub 21 Column 2, a. x sub 22 Column 3, times times times Column 4, a. x sub 2 n Row 3: Column 1, vertical ellipsis Column 2, vertical ellipsis Column 3, times times times Column 4, vertical ellipsis Row 4: Column 1, a. x sub m 1 Column 2, a. x sub m 2 Column 3, times times times Column 4, a. x sub m n
  3. It is the element-wise product of the two matrices i.e the (i, j) element of the output matrix is the dot product of the ith row of the first matrix and the jth column of the second matrix. Mathematically,      πŸ‘ 4 lines Line 1: the 4 by 4 matrix Row 1: Column 1, x sub 11 Column 2, x sub 12 Column 3, times times times Column 4, x sub 1 n Row 2: Column 1, x sub 21 Column 2, x sub 22 Column 3, times times times Column 4, x sub 2 n Row 3: Column 1, vertical ellipsis Column 2, vertical ellipsis Column 3, times times times Column 4, vertical ellipsis Row 4: Column 1, x sub m 1 Column 2, x sub m 2 Column 3, times times times Column 4, x sub m n times the 4 by 4 matrix Row 1: Column 1, y sub 11 Column 2, y sub 12 Column 3, times times times Column 4, y sub 1 o Row 2: Column 1, y sub 21 Column 2, y sub 22 Column 3, times times times Column 4, y sub 2 o Row 3: Column 1, vertical ellipsis Column 2, vertical ellipsis Column 3, times times times Column 4, vertical ellipsis Row 4: Column 1, y sub o 1 Column 2, y sub o 2 Column 3, times times times Column 4, y sub o p Line 2: open paren m times n close paren open paren o times p close paren Line 3: equals the 4 by 3 matrix Row 1: Column 1, x sub 11 y sub 11 plus x sub 12 y sub 21 plus times times times plus x sub 1 n y sub o 1 Column 2, times times times Column 3, x sub 11 y sub 1 p plus x sub 12 y sub 2 p plus times times times plus x sub 1 n y sub o p Row 2: Column 1, x sub 21 y sub 11 plus x sub 22 y sub 21 plus times times times plus x sub 2 n y sub o 1 Column 2, times times times Column 3, x sub 21 y sub 1 p plus x sub 22 y sub 2 p plus times times times plus x sub 2 n y sub o p Row 3: Column 1, vertical ellipsis Column 2, times times times Column 3, vertical ellipsis Row 4: Column 1, x sub m 1 y sub 11 plus x sub m 2 y sub 21 plus times times times plus x sub m n y sub o 1 Column 2, times times times Column 3, x sub m 1 y sub 1 p plus x sub m 2 y sub 2 p plus times times times plus x sub m n y sub o p Line 4: open paren m times p close paren
                  Not all matrices can be multiplied with each other. For the matrix multiplication AB to be valid, the number of columns in A should be equal to the number of rows in B. i.e for two matrices A and B with dimensions (m Γ— n) and (o Γ— p), AB exists if and only if m = p and BA exists if and only if o = n. Matrix multiplication is not commutative i.e
    AB-πŸ‘ is not equal to
     BA.
  4. The inverse of a matrix A is a matrix such that AA -1 = I ( Identity Matrix).
  5. The transpose of a matrix produces a matrix in which the rows and columns are interchanged. Mathematically, πŸ‘ A to the T power equals the 4 by 4 matrix Row 1: Column 1, x sub 11 Column 2, x sub 21 Column 3, times times times Column 4, x sub m 1 Row 2: Column 1, x sub 12 Column 2, x sub 22 Column 3, times times times Column 4, x sub m 2 Row 3: Column 1, vertical ellipsis Column 2, vertical ellipsis Column 3, times times times Column 4, vertical ellipsis Row 4: Column 1, x sub 1 n Column 2, x sub 2 n Column 3, times times times Column 4, x sub n m where comma A equals the 4 by 4 matrix Row 1: Column 1, x sub 11 Column 2, x sub 12 Column 3, times times times Column 4, x sub 1 n Row 2: Column 1, x sub 21 Column 2, x sub 22 Column 3, times times times Column 4, x sub 2 n Row 3: Column 1, vertical ellipsis Column 2, vertical ellipsis Column 3, times times times Column 4, vertical ellipsis Row 4: Column 1, x sub m 1 Column 2, x sub m 2 Column 3, times times times Column 4, x sub m n

Image Source

     Any transformation can be geometrically visualized as the distortion of the n-dimensional space (it can be squishing, stretching, rotating, etc.). The distortion of space can be visualized as a distortion of the grid lines that make up the coordinate system. Space can be distorted in several different ways. A linear transformation, however, is a special distortion with two distinct properties,
  1.  Straight lines remain straight and parallel to each other
  2.  The origin remains fixed
Consider a linear transformation where the original basis vectors-πŸ‘ i hat a. n d j hat
move to the new points,πŸ‘ i hat equals open bracket 1 comma negative 2 close bracket a. n d j hat equals open bracket 3 comma 0 close bracket
 (where i  and j are unit vectors along the x-direction and y-direction in the co-ordinate system respectively) This means that i moves to (1,βˆ’ 2) from (1,0) and j moves to (3, 0) from (0, 1) in the linear transformation. This transformation simply stretches the space in the y-direction by three units while stretching the space in the x-direction by two units and rotating it by sixty degrees in the clockwise direction. One can combine the two vectors where i and j land and write them as a single matrix, i.e,πŸ‘ L equals the 2 by 1 column matrix Row 1: 1 3 Row 2: negative 2 0
As can be seen, each of these vectors forms one column of the matrix (and hence are often called column vectors). This matrix fully represents the linear transformation. Now, if one wants to find where any given vector v would land after this transformation, one simply needs to multiply the vector v with the matrix L, i.e vnew = L.v. It is convenient to think of this matrix as a function that describes the transformation, i.e it takes the original vector v as the input and returns the new vector vnew. The following figures represent the linear transformation.

Image source 

Formally, a transformation is linear if it satisfies the following two properties,
  1. Additivity or Distributivity, i.e L(v + w) = L(v) + L(w) .
  2. Associativity of Homogeneity, i.e L(cv) = cL(v) where c is a scalar.

I hope you enjoyed the article !!! Though there are plenty of valuable resources available on the internet which explain concepts like matrix decompositions, vector calculus, linear algebra, geometry, matrices, the mathematics behind the principal component analysis, and support vector machines, and many more. The following links may help you to understand the mathematical concepts :

  1. Khan Academy’s courses – Comprehensive free course for complex mathematical concepts.

  2. 3Blue1Brown – Here you will understand each most of the mathematical concept in depth

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

A Mathematics student turned Data Scientist. I am an aspiring data scientist who aims at learning all the necessary concepts in Data Science in detail. I am passionate about Data Science knowing data manipulation, data visualization, data analysis, EDA, Machine Learning, etc which will help to find valuable insights from the data.

Login to continue reading and enjoy expert-curated content.

Free Courses

Exploratory Data Analysis with Python & GenAI

Learn EDA with Python: Transform data into insights using PandasAI & more.

Data Science Course

Build a powerful 2026-ready data science resume using AI tools.

No Code Predictive Analytics with Orange

No-code AI course for business pros with real-world ML use cases.

Adaptive Email Agents with DSPy

Build adaptive email agents with DSPy using context and smart learning.

Introduction to AI & ML

AI & ML are transforming industries. Learn their impacts in this course.

Responses From Readers

This is great considering the changing technology and the need for Artificial Intelligence capabilities this is sure a good start where one can utilize his/her mathematical capabilities

Flagship Programs

GenAI Pinnacle Program| GenAI Pinnacle Plus Program| AI/ML BlackBelt Program| Agentic AI Pioneer Program

Free Courses

Generative AI| DeepSeek| OpenAI Agent SDK| LLM Applications using Prompt Engineering| DeepSeek from Scratch| Stability.AI| SSM & MAMBA| RAG Systems using LlamaIndex| Building LLMs for Code| Python| Microsoft Excel| Machine Learning| Deep Learning| Mastering Multimodal RAG| Introduction to Transformer Model| Bagging & Boosting| Loan Prediction| Time Series Forecasting| Tableau| Business Analytics| Vibe Coding in Windsurf| Model Deployment using FastAPI| Building Data Analyst AI Agent| Getting started with OpenAI o3-mini| Introduction to Transformers and Attention Mechanisms

Popular Categories

AI Agents| Generative AI| Prompt Engineering| Generative AI Application| News| Technical Guides| AI Tools| Interview Preparation| Research Papers| Success Stories| Quiz| Use Cases| Listicles

Generative AI Tools and Techniques

GANs| VAEs| Transformers| StyleGAN| Pix2Pix| Autoencoders| GPT| BERT| Word2Vec| LSTM| Attention Mechanisms| Diffusion Models| LLMs| SLMs| Encoder Decoder Models| Prompt Engineering| LangChain| LlamaIndex| RAG| Fine-tuning| LangChain AI Agent| Multimodal Models| RNNs| DCGAN| ProGAN| Text-to-Image Models| DDPM| Document Question Answering| Imagen| T5 (Text-to-Text Transfer Transformer)| Seq2seq Models| WaveNet| Attention Is All You Need (Transformer Architecture) | WindSurf| Cursor

Popular GenAI Models

Llama 4| Llama 3.1| GPT 4.5| GPT 4.1| GPT 4o| o3-mini| Sora| DeepSeek R1| DeepSeek V3| Janus Pro| Veo 2| Gemini 2.5 Pro| Gemini 2.0| Gemma 3| Claude Sonnet 3.7| Claude 3.5 Sonnet| Phi 4| Phi 3.5| Mistral Small 3.1| Mistral NeMo| Mistral-7b| Bedrock| Vertex AI| Qwen QwQ 32B| Qwen 2| Qwen 2.5 VL| Qwen Chat| Grok 3

AI Development Frameworks

n8n| LangChain| Agent SDK| A2A by Google| SmolAgents| LangGraph| CrewAI| Agno| LangFlow| AutoGen| LlamaIndex| Swarm| AutoGPT

Data Science Tools and Techniques

Python| R| SQL| Jupyter Notebooks| TensorFlow| Scikit-learn| PyTorch| Tableau| Apache Spark| Matplotlib| Seaborn| Pandas| Hadoop| Docker| Git| Keras| Apache Kafka| AWS| NLP| Random Forest| Computer Vision| Data Visualization| Data Exploration| Big Data| Common Machine Learning Algorithms| Machine Learning| Google Data Science Agent
πŸ‘ Av Logo White

Continue your learning for FREE

Forgot your password?
πŸ‘ Av Logo White

Enter OTP sent to

Edit

Wrong OTP.

Enter the OTP

Resend OTP

Resend OTP in 45s

πŸ‘ Popup Banner
πŸ‘ AI Popup Banner