VOOZH about

URL: https://gpuopen.com/orochi/

⇱ Orochi - AMD GPUOpen


Support both HIP and CUDA® with ease

👁 secondary-image

The Orochi library loads HIP and CUDA® APIs dynamically, allowing you to switch between them at runtime. Orochi is named after a legendary Japanese dragon with eight heads and eight tails on a single body. In keeping with its namesake, Orochi enables a single library to use multiple backends at runtime.

Download the latest version - v2.0

This release adds the following features:

  • Support many more CUDA/HIP functions compared to Orochi 1. Should be almost exhaustive.
  • We will keep one branch per version of CUDA/HIP, (example of branch name: release/hip5.7_cuda12.2),
    so developers can switch on branches depending on their environment.
    If you need a combination that doesn’t exist, open an ‘Issue’ on the GitHub of the project.
  • Change compared to Orochi 1: you need to install the CUDA SDK corresponding to the branch you are using.
    for example, if you use branch release/hip5.7_cuda12.2, install CUDA SDK 12.2.
    However CUDA will still be dynamically loaded at runtime, only includes of the SDK are used at compile time.
  • New demo for textures.
  • New demo for Direct3D® 12 interop.
  • Some refactoring/improvement of OrochiUtils .
  • Orochi.h can be included in the kernel files to have the oro* names.
  • The binding and naming between HIP/CUDA have been improved and developed in a way it should be easier to maintain for future versions.
  • Most of the Orochi/OrochiUtils API has not been changed so updating the project from Orochi 1.0 to 2.0 should be straightforward.
  • We included an experimental high performance radix sort which we are going to publish the detail in the future.

Features

  • No need to compile two separate implementations for HIP and CUDA.
  • Compile and maintain a single binary that can run on both AMD and NVIDIA® GPUs.
  • Dynamically load the corresponding HIP/CUDA shared libraries depending on your platform.
  • Combines the functionality offered by both HIPEW and CUEW into a single library.
  • No need to link to CUDA (for the driver APIs) nor HIP (for both driver and runtime APIs) at build-time.

👁 Orochi

Requirements

To run an application compiled with Orochi, you need to install a driver of your choice with the corresponding .dll/.so files based on the GPU(s) available. Orochi will automatically link with the corresponding shared library at runtime.

Version history

Related software

HIP Ray Tracing
HIP RT is a ray tracing library for HIP, making it easy to write ray tracing applications in HIP.
Radeon™ ProRender Suite
AMD Radeon™ ProRender is our fast, easy, and incredible physically-based rendering engine built on industry standards that enables accelerated rendering on virtually any GPU, any CPU, and any OS in over a dozen leading digital content creation and CAD applications.
AMD Radeon™ ProRender SDK
AMD Radeon™ ProRender SDK is a powerful physically-based path traced rendering engine that enables creative professionals to produce stunningly photorealistic images.
Radeon™ Rays
The lightweight accelerated ray intersection library for DirectX®12 and Vulkan®.

Related news and technical articles

WMMA guide for AMD RDNA 4 architecture GPUs - part 3
Learn how to implement fast in-register matrix transpose on AMD RDNA™ 4 architecture GPUs with a WMMA-based identity trick, delivering a lightweight, memory-free alternative proven in Llama.cpp.
WMMA guide for AMD RDNA 4 architecture GPUs - part 2
Achieve peak AMD RDNA™ 4 architecture memory bandwidth for low-precision GEMM by fusing WMMA to double the K dimension, enabling 128-bit loads for FP8/INT8, and matching hipBLAS results bit-for-bit.
WMMA guide for AMD RDNA 4 architecture GPUs - part 1
Practical guide to fusing GEMMs on AMD RDNA™ 4 architecture, covering WMMA layout, a transpose-by-swapping A/B technique, HIP sample code, and hipBLAS-verified results used in Llama.cpp.
AMD DGF: An Open Geometry Compression Standard
AMD is partnering with Samsung on a multivendor Vulkan extension for Dense Geometry Format (DGF) to help enable dramatically smaller geometry, reduced memory/latency for ray-traced real‑time 3D, and easier engine integration.
Introducing AMD DGF SuperCompression
AMD DGF SuperCompression (DGFS) cuts DGF geometry file sizes while preserving exact block reconstruction and enabling fast decode to either DGF blocks or conventional meshlets for cross-device deployment.
AMD Radeon Anti-Lag 2 SDK
Learn how to integrate the AMD Radeon Anti-Lag 2 SDK into your game. Unlike the driver-based AL 1, AL 2's point of insertion is at the optimal point inside the game's logic, just before the user controls are sampled.

Related videos

Introducing AMD Render Pipeline Shaders SDK
The brand-new AMD Render Pipeline Shaders (RPS) SDK is a comprehensive and extensible Render Graph framework for graphics applications and engines using explicit APIs (such as DirectX® 12 and Vulkan®). In this video we will take a look at the overview of the design and implementation of the SDK. We will demonstrate how the RPS SDK can help game engines building render graphs easily, managing resource barriers and transient memory efficiently, and more!