Build Your Own Transaction Foundation Model for Financial Intelligence
Every swipe, transfer, and payment on a modern financial network encodes a pattern of human behavior. Transaction data is one of the richest signals an...
Reducing CUDA Binary Size to Distribute cuML on PyPI
Starting with the 25.10 release, pip-installable cuML wheels can now be downloaded directly from PyPI. No more complex installation steps or managing Conda...
How to GPU-Accelerate Model Training with CUDA-X Data Science
In previous posts on AI in manufacturing and operations, we covered the unique data challenges in the supply chain and how smart feature engineering can...
How to Accelerate Community Detection in Python Using GPU-Powered Leiden
Community detection algorithms play an important role in understanding data by identifying hidden groups of related entities in networks. Social network...
The Kaggle Grandmasters Playbook: 7 Battle-Tested Modeling Techniques for Tabular Data
Over hundreds of Kaggle competitions, we've refined a playbook that consistently lands us near the top of the leaderboard—no matter if we’re working with...
NVIDIA RAPIDS 25.08 Adds New Profiler for cuML, Updates to the Polars GPU Engine, Additional Algorithm Support, and More
The 25.08 release of RAPIDS continues to push the boundaries toward making accelerated data science more accessible and scalable with the addition of several...
How to Spot (and Fix) 5 Common Performance Bottlenecks in pandas Workflows
Slow data loads, memory-intensive joins, and long-running operations—these are problems every Python practitioner has faced. They waste valuable time and make...
NVIDIA Hardware Innovations and Open Source Contributions Are Shaping AI
Open source AI models such as Cosmos, DeepSeek, Gemma, GPT-OSS, Llama, Nemotron, Phi, Qwen, and many more are the foundation of AI innovation. These models are...
Efficient Transforms in cuDF Using JIT Compilation
RAPIDS cuDF offers a broad set of ETL algorithms for processing data with GPUs. For pandas users, cuDF accelerated algorithms are available with the zero code...
Train with Terabyte-Scale Datasets on a Single NVIDIA Grace Hopper Superchip Using XGBoost 3.0
Gradient-boosted decision trees (GBDTs) power everything from real-time fraud filters to petabyte-scale demand forecasts. XGBoost open source library has long...
7 Drop-In Replacements to Instantly Speed Up Your Python Data Science Workflows
You've been there. You wrote the perfect Python script, tested it on a sample CSV, and everything worked flawlessly. But when you unleashed it on the full 10...
Serverless Distributed Data Processing with Apache Spark and NVIDIA AI on Azure
The process of converting vast libraries of text into numerical representations known as embeddings is essential for generative AI. Various technologies—from...
3 pandas Workflows That Slowed to a Crawl on Large Datasets—Until We Turned on GPUs
If you work with pandas, you’ve probably hit the wall. It’s that moment when your trusty workflow, so elegant on smaller datasets, grinds to a halt on a large...
Feature Engineering at Scale: Optimizing ML Models in Semiconductor Manufacturing with NVIDIA CUDA‑X Data Science
In our previous post, we introduced the setup of predictive modeling in chip manufacturing and operations, highlighting common challenges such as imbalanced...
RAPIDS Adds GPU Polars Streaming, a Unified GNN API, and Zero-Code ML Speedups
RAPIDS, a suite of NVIDIA CUDA-X libraries for Python data science, released version 25.06, introducing exciting new features. These include a Polars GPU...
How to Work with Data Exceeding VRAM in the Polars GPU Engine
In high-stakes fields such as quant finance, algorithmic trading, and fraud detection, data practitioners frequently need to process hundreds of gigabytes (GB)...