![]() |
VOOZH | about |
In Apache Cassandra, data can be accessed using attributes that are part of the partition key. For example, if Emp_id is a column in an Employee table and it serves as the partition key, you can filter or search data using this key. In this case, the WHERE clause can be used to define conditions over the attribute to retrieve data efficiently.
However, if you want to filter or search using a column that is not a partition key, the query will not execute and will return an error.
You need to define an index to enable efficient data lookup using attributes other than the partition key. Indexes can be used for various purposes, such as indexing collections, static columns, and other columns except for counter columns.
Suppose you have a cricket match table with millions of entries for players across hundreds of matches. If you want to look up players' ranks by the number of matches played, and many players share the same match_year, this match_year column is a good candidate for indexing.
Syntax to create an Index
CREATE INDEX [ IF NOT EXISTS ] index_name
ON [keyspace_name.]table_name
([ ( KEYS | FULL ) ] column_name)
(ENTRIES column_name);
Consider a table in keyspace1 called Task:
CREATE TABLE keyspace1.Task
(
Task_id text,
Task_name text,
Task_time timestamp,
T_location text,
PRIMARY KEY (Task_id, Task_name)
);
Cassandra is a distributed and decentralized database, and data is organized by partition key. Typically, WHERE clause queries must include a partition key.
Valid Queries
SELECT *
FROM Task
WHERE Task_id = 'T210';
This query would work fine.
SELECT *
FROM Task
WHERE Task_id = 'T210' AND Task_name = 'set alarm';
Note: In above table Task_id and Task_name columns are the part of primary key.
Invalid Query
SELECT *
FROM Task
WHERE Task_time = '2019-09-30 15:02:56';
This query will not work because Task_time is not part of the partition key.
Error:
Bad Request: no indexed columns present in by columns clause with Equal operator.
To resolve this error, you can create an index on the clustering column. Define a table with a composite partition key, and then create an index on the clustering column.
CREATE TABLE keyspace1.Task (
Task_id text,
Task_name text,
Task_time timestamp,
T_location text,
PRIMARY KEY ((Task_id, Task_name), Task_time)
);
CREATE INDEX ON keyspace1.Task(Task_time);
Now, this query will work:
SELECT *
FROM Task
WHERE Task_time = '2019-09-30 15:02:56';
Note:
Creating secondary indexes does not necessarily increase the speed of queries in Cassandra. One of the main advantages of secondary indexes is that they allow WHERE clauses referencing columns beyond the primary and clustering keys to execute successfully.
However, for better query performance, it's often preferable to create a table specifically designed for the query pattern.
Another Example:
Consider a table Student_record in keyspace1:
CREATE TABLE Student_record
(
Stu_state text,
Stu_zip text,
Stu_address text,
PRIMARY KEY (Stu_state, Stu_zip)
);
In this table, Stu_state and Stu_zip might be the same for multiple records. To uniquely define each record, you can add Stu_id as a primary key.
ALTER TABLE Student_record ADD Stu_id int PRIMARY KEY;Output:
👁 ImageTable :
To describe the table structure, use
DESCRIBE TABLE Student_record;To query data using Stu_id
SELECT *
FROM Student_record
WHERE Stu_id = '107';
Output:
👁 ImageA composite partition key involves more than one column. Here's an example:
CREATE TABLE Registration (
Name text,
Date timestamp,
Email text,
Query text,
PRIMARY KEY((Name, Date), Email)
) WITH CLUSTERING ORDER BY (Date DESC);
To query the table
SELECT *
FROM Registration LIMIT 2;
Note:
When creating an index in Cassandra, each row has a row key. It's common to have denormalized data in Cassandra. However, it is a misconception that secondary indexes make queries faster. The use of parentheses is essential when specifying a composite partition key.