![]() |
VOOZH | about |
A sparse matrix is a matrix in which most elements are zeros. Sparse matrices are widely used in machine learning, natural language processing (NLP), and large-scale data processing, where storing all zero values is inefficient.
Example of a sparse matrix:
0 0 3 0 4
0 0 5 7 0
0 0 0 0 0
0 2 6 0 0
Storing such a matrix as a normal 2D array wastes memory, as most elements are zeros. Instead, we store only non-zero elements along with their row and column indices (triplets format).
Benefits of using sparse matrices:
The scipy.sparse module provides several formats for storing sparse matrices, each optimized for different operations:
Format | Best For | Description |
|---|---|---|
csr_matrix | Fast row slicing, math operations | Compressed Sparse Row good for arithmetic and row access. |
csc_matrix | Fast column slicing | Compressed Sparse Column efficient for column-based ops. |
coo_matrix | Easy matrix building | Coordinate format using (row, col, value) triples. |
lil_matrix | Incremental row-wise construction | List of Lists, modify rows easily before converting. |
dia_matrix | Diagonal-dominant matrices | Stores only diagonals, saves space. |
dok_matrix | Fast item assignment | Dictionary-like, ideal for random updates. |
CSR format stores non-zero values row-wise, enabling fast row slicing and efficient matrix operations.
Output
[[0 0 3 0 4]
[0 0 5 7 0]
[0 0 0 0 0]
[0 2 6 0 0]]
Explanation: csr_matrix stores only non-zero values with their coordinates and reconstructs full matrix using toarray().
CSC format stores data column-wise, making column-based operations faster.
Output
[[0 0 3 0 4]
[0 0 5 7 0]
[0 0 0 0 0]
[0 2 6 0 0]]
Explanation: Stores non-zero values in column-compressed format, efficient for column operations.
COO format represents the matrix using (row, col, value) triplets. Useful when constructing matrices dynamically before converting to CSR/CSC.
Output
[[0 0 3 0 4]
[0 0 5 7 0]
[0 0 0 0 0]
[0 2 6 0 0]]
Explanation: Stores elements as (row, col, value) tuples.
LIL (List of Lists) format allows efficient row-wise construction. You can easily insert or modify values before converting the matrix to CSR or CSC for faster computation.
Output
[[0. 0. 3. 0. 4.]
[0. 0. 5. 7. 0.]
[0. 0. 0. 0. 0.]
[0. 2. 6. 0. 0.]]
Explanation: Creates a List of Lists (LIL) matrix and assigns values directly by row and column.
DOK (Dictionary of Keys) format is ideal for random assignments. You can assign elements at any position efficiently, making it perfect for incremental matrix construction.
Output
[[0. 0. 3. 0. 4.]
[0. 0. 5. 7. 0.]
[0. 0. 0. 0. 0.]
[0. 2. 6. 0. 0.]]
Explanation: Internally stored as dictionary {(row, col): value}.
DIA (Diagonal) format stores only the diagonals of the matrix. It is very memory-efficient for diagonal-dominant matrices, where most non-zero elements lie along certain diagonals.
Output
[[3 0 0 0 0]
[0 5 0 0 0]
[0 0 6 0 0]
[0 0 0 7 0]]
Explanation: Creates a Diagonal (DIA) matrix storing only specified diagonals.