VOOZH about

The Indian Express

⇱ Google AI model helps unmask cancer cells to the immune system: Lead scientist explains breakthrough | Explained News - The Indian Express


Google DeepMind recently announced that its AI model C2S-Scale had generated a “novel hypothesis” about how cancer cells behave, which was later confirmed through lab experiments. The research was conducted in collaboration with Yale University. The lab believes this marks a milestone for AI in science and opens up a promising new direction for developing cancer treatments.

Shekoofeh Azizi, Staff Research Scientist and Research Lead at Google DeepMind, speaks with Kaunain Sheriff M about the significance of this breakthrough.

C2S-Scale is a family of large language models (LLMs) built upon Google’s Gemma-2 architecture. Think of it as a specialised AI model that we’ve taught to understand the language of biology in the form of gene expression inside of cells. We do this by taking the complex gene activity inside a single cell — measured by a technique called single-cell RNA sequencing (scRNA-seq) — and translating it into a simple “cell sentence,” which is a list of the most active genes in order of their activity.

The model “reads” these sentences across millions of cells and learns the patterns of gene expression that define what a cell is and what it’s doing. The paradigm shift is that this approach bridges the gap between raw genomic data and human language, and allows LLMs to perform complex tasks on cells in natural language.

Our immune system is constantly looking for unhealthy or diseased cells, but cancer cells are often good at hiding. We asked our model to find drugs that could make cancer cells more “visible” to the immune system by acting as a conditional amplifier: increasing antigen presentation in cancer cells when in the presence of low levels of interferon (a key immune signaling protein).

Our model predicted that a drug called silmitasertib would significantly boost antigen presentation in the immune-context-positive setting. This prediction serves as a promising hypothesis that now requires rigorous validation through research and clinical trials.

The key is in its training. Before we asked it to do a complex task like drug screening, we put C2S-Scale through a rigorous pre-training phase. We trained it on a massive dataset of over 50 million cells from public repositories like the Human Cell Atlas, covering a wide range of human and mouse tissues, diseases, and conditions.

During this pre-training, we gave it a series of fundamental tasks, like predicting a cell’s type based on its “cell sentence,” identifying its tissue of origin, or even generating a realistic new cell from scratch. By mastering these foundational tasks, the model learns the fundamental patterns of gene expression. This biological intuition is what allows it to make sense of new, complex information and perform sophisticated reasoning in later stages.

Scale is critical because biology is unimaginably complex. A large model, like our 27 billion-parameter C2S-Scale, has a greater capacity to learn and remember the countless subtle relationships between genes, cells, and tissues. There’s a well-known phenomenon in AI called “scaling laws,” where larger models don’t just get incrementally better, they often develop entirely new, emergent capabilities that smaller models lack. For a problem as vast as understanding life at the cellular level, that massive scale is essential for the model to have enough capacity to uncover genuinely new biological insights.

The model predicted that a drug called silmitasertib could make certain cancer cells more visible to the immune system, but only under very specific conditions.

To validate the AI’s prediction, we took it to the lab. We used human neuroendocrine cancer cell lines that the model had never seen before, and set up a controlled experiment with two scenarios: cells treated with silmitasertib alone, and cells treated with a low dose of the immune signal (interferon) along with silmitasertib.

The results confirmed the AI’s prediction. The drug by itself had no effect on the cells’ visibility markers. But when we combined it with low levels of interferon signaling, we saw a marked and significant increase in the molecules that make cancer cells visible to the immune system. It was a clear demonstration of the synergy the model had predicted, moving an AI-generated hypothesis from the computer to a real biological outcome.

It’s important to note the limitations of this validation: these experiments were conducted in vitro, not in a living organism. Furthermore, this was observed in a specific neuroendocrine cancer cell line. While these results are highly promising, significant further research and clinical trials would be required to understand if this effect translates into a safe and effective therapy for patients.

Traditional drug discovery involves physically screening thousands of compounds in a lab, which is incredibly slow, expensive, and often misses the mark. C2S-Scale allows us to perform these massive screening experiments in silico — inside the computer — at a scale and speed that would be impossible in the real world. This shows AI can be a powerful accelerator for science.

This doesn’t replace scientists, but it empowers them. It allows us to rapidly identify and prioritise the most promising and often non-obvious drug candidates. By narrowing the search space, AI can help researchers focus their lab experiments where they’re most likely to succeed, dramatically shortening the timeline from an initial idea to a potential new therapy.

This gets to the heart of our multimodal approach. During its training, C2S-Scale wasn’t just fed raw cell sentences. It saw them alongside the human-generated context they came from — things like scientific annotations, tissue and disease labels, and even summaries from the research papers where the data was published.

By being trained on this rich mixture of biological data and natural language simultaneously, the model learns to connect the dots. It understands that a certain pattern of genes is not just a list, but corresponds to a “T-cell in a kidney from a patient with this disease,” as described in a scientific abstract. This ability to bridge the world of cellular data with the world of human knowledge is what allows it to generate novel hypotheses.