This is an archive of the discontinued LLVM Phabricator instance.

copying the known-working openmp driver from rocm into the trunk source tree
deleting lots of stuff that didn't look necessary
deleting some stuff that is broadly necessary, but the specifics are up for debate

The idea is to move language-independent but amdgcn-specific code into ROCMToolChain. Some has already gone in, others (like computeMSVCVersion) will likely move too.

Regarding the rest of the end to end stack:

host plugin works, same code in trunk / rocm / aomp
device plugin will work once it's building as openmp, modulo printf and malloc
compiler backend will work for spmd kernels today, will work for generic kernels after D94648 or equivalent lands

Regarding tests (which need the unimplemented bit filled in with the next patch):

Runtime tests (for spmd kernels) are working locally with a jury rigged devicertl
Codegen tests are proving awkward to update. Dropping line number would help, but there's still a difference in addrspace cast distribution. I'm hoping the scripts involved in generating the nvptx cases can be adapted.

Harbormaster completed remote builds in B85692: Diff 317514.Jan 19 2021, 4:31 AM

pdhaliwal updated this revision to Diff 317553.Jan 19 2021, 6:56 AM

Comment Actions

Fix clang-tidy error

jdoerfert added a comment.Jan 19 2021, 6:59 AM

Comment Actions

Won't this just prevent us from building clang due to the missing cmake changes? We need somewhat testable chunks.

ABataev added a comment.Jan 19 2021, 6:59 AM

Comment Actions

In D94961#2506460, @JonChesterfield wrote:

This patch was written, roughly, by:

copying the known-working openmp driver from rocm into the trunk source tree

deleting lots of stuff that didn't look necessary

deleting some stuff that is broadly necessary, but the specifics are up for debate

The idea is to move language-independent but amdgcn-specific code into ROCMToolChain. Some has already gone in, others (like computeMSVCVersion) will likely move too.

Regarding the rest of the end to end stack:

host plugin works, same code in trunk / rocm / aomp

device plugin will work once it's building as openmp, modulo printf and malloc

compiler backend will work for spmd kernels today, will work for generic kernels after D94648 or equivalent lands

Regarding tests (which need the unimplemented bit filled in with the next patch):

Runtime tests (for spmd kernels) are working locally with a jury rigged devicertl

Codegen tests are proving awkward to update. Dropping line number would help, but there's still a difference in addrspace cast distribution. I'm hoping the scripts involved in generating the nvptx cases can be adapted.

Could you add the tests for the tools invocation to check that the newly added classes/functions correctly translate options/flags?

Harbormaster completed remote builds in B85714: Diff 317553.Jan 19 2021, 7:43 AM

pdhaliwal updated this revision to Diff 317810.Jan 20 2021, 1:49 AM

Comment Actions

Won't this just prevent us from building clang due to the missing cmake changes?

It compiles and builds fine, however, I wasn't actually aware such sanity checking being present. It turns out
the unknown files inside llvm/ will lead cmake to report error but such reporting will not happen inside clang. Maybe such checks
were not enabled inside clang. Anyways thanks for pointing out. I will keep that in mind in future.

The idea for this patch was basically to introduce AMDGPUToolChain classes without much of the functionality in order
to keep its size in check. And the second patch would have integrated the toolchain with driver along with testing.
But during the intermediate time of the two patches, bare files would have existed (never built and tested).

I have updated this patch to now include somewhat functional driver along with tests.

Herald added a subscriber: mgorny. · View Herald TranscriptJan 20 2021, 1:49 AM

pdhaliwal retitled this revision from [OpenMP] Add OpenMP offloading toolchain skeleton for AMDGPU to [OpenMP] Add OpenMP offloading toolchain for AMDGPU.Jan 20 2021, 2:16 AM

pdhaliwal edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B85856: Diff 317810.Jan 20 2021, 2:22 AM

pdhaliwal updated this revision to Diff 317831.Jan 20 2021, 4:09 AM

Comment Actions

Fixed failing debian tests

Harbormaster completed remote builds in B85860: Diff 317831.Jan 20 2021, 4:41 AM

pdhaliwal updated this revision to Diff 318120.Edited · Jan 20 2021, 11:51 PM

Comment Actions

Moved common methods of HIP and OpenMP to base AMDGPUToolChain
Removed unnecessary asserts

pdhaliwal added a reviewer: yaxunl.Jan 20 2021, 11:53 PM

Harbormaster completed remote builds in B86047: Diff 318120.Jan 21 2021, 1:17 AM

JonChesterfield added a comment.Jan 25 2021, 5:39 PM

Comment Actions

The deviceRTL is changing from cuda/hip to openmp at present. It would be good to be able to compile that as openmp for amdgpu, which needs a patch roughly like this and probably some follow up.

It's plausible that the contents of this file are only really of interest to AMD people, in which case we should probably land it in order to iterate in tree instead of downstream. It now has the cmake hook and a proof of concept test case.

@ABataev @grokos @jdoerfert your guidance would be very welcome, but equally if you'd prefer leave us to sink or swim that's completely reasonable too.

Thanks!

pdhaliwal added a reviewer: saiislam.Jan 27 2021, 6:42 AM

pdhaliwal added a comment.Feb 1 2021, 2:08 AM

Comment Actions

Ping!

saiislam added a comment.Feb 1 2021, 3:43 AM

Comment Actions

LGTM. But, please wait from someone outside AMD to accept it.

jdoerfert added a comment.Feb 1 2021, 8:01 AM

Comment Actions

Overall, looks reasonable. I'm not a fan of the way things are hooked up but that is also true for the NVPTX toolchain and for now unavoidable I guess. Some nits below.

clang/lib/Driver/Driver.cpp
2995	Why the copy?
clang/lib/Driver/ToolChains/AMDGPU.h
72	What is the coding style here? Pick upper case or lower case, form the above I assume upper case.
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
66	To verify, O0 is not mapped to O2, correct?

JonChesterfield added a comment.Feb 1 2021, 8:12 AM

Comment Actions

In D94961#2533848, @jdoerfert wrote:

Overall, looks reasonable. I'm not a fan of the way things are hooked up but that is also true for the NVPTX toolchain and for now unavoidable I guess.

This is loosely based on the nvptx and hip toolchain files, but looking at others in clang the style seems similar.

clang/lib/Driver/Driver.cpp
2995	As in ? Slightly prettier. Not sure ad hoc arg parsing like this is ever good though
clang/lib/Driver/ToolChains/AMDGPU.h
72	Yeah, should be as it's a function. Should probably propose that as a minimal patch to the existing code. There may be other casing mistakes. ConstructJob looks suspect, but all the files call it that instead of constructJob, so maybe I'm missing a part of the style guide.
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
66	I think is an error, though it does look like this will rewrite it to O2 instead. Which seems bad.

JonChesterfield added inline comments.Feb 1 2021, 8:21 AM

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
66	Suggest we drop the opt invocation. llvm-link is required for the case where multiple files are passed to clang as a single invocation, but I don't see why we need an opt invocation here. Passing the llvm-link'ed code to llc as-is, without this implicit lto style opt invocation, is probably a better default.

JonChesterfield added a comment.Feb 1 2021, 9:46 AM

Comment Actions

I believe this is 'safe', in the sense that it can't break anything other than AMDGPU's openmp implementation, which doesn't work yet anyway. The change to non-amd code is to OpenMPActionBuilder, and while parsing isn't pretty, it won't fire on nvptx. I'm not particularly confident it is correct, but it's hard to do comprehensive testing before there's an end to end spike. I'd like to iterate in tree, with an expectation that parts of this will need to change - @jdoerfert does the OpenMP patch look adequate for now?

clang/lib/Driver/Driver.cpp
752	This follows the same logic as nvptx so is likely to be correct

ronlieb mentioned this in D95752: [OpenMP][DeviceRTL] Extract shuffle idiom and port it to declare variant.Feb 1 2021, 5:01 PM

jdoerfert added a comment.Feb 1 2021, 6:00 PM

Comment Actions

Generally OK, the more I look at it the more things pop up. Other than my comments, I'm fine with this going in.

clang/lib/Driver/Driver.cpp
764	what if we say in line 745, rename the Cuda things to "device" and the toolchain stuff is in a conditional? I see the triple is hardcoded here for some reason and I don't know why not to use TT. Neverhteless, it seems less copying is better, we will eventually have more architectures and 3 * 13 lines means 3 times the bugs and 3 times the required changes in the future.
3000	This does not look right. IsAMDGCN is not a question that should be answered by a flag. In general, why skip these passes here. Isn't what you really want to do, pass to the device compilation job?
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
67	Just to make it clear, we should default to O0.
clang/test/Driver/amdgpu-openmp-toolchain.c
3	Would this at all work without the march flag?

pdhaliwal updated this revision to Diff 320722.Feb 2 2021, 2:39 AM

Comment Actions

Addressed review comments.

Combined the toolchain creation logic for nvptx and amdgcn
Replaced -Xopenmp-target with -emit-llvm-bc inside AMDGPUOpenMP.cpp
Removed opt from pipeline

pdhaliwal marked 3 inline comments as done.Feb 2 2021, 2:47 AM

Comment Actions

After addressing the review comments, I have internally verified changes on few simple test programs. They seem to be working fine.

clang/lib/Driver/Driver.cpp
3000	I have removed the logic from here instead now I have added -emit-llvm-bc using addClangTargetOptions in AMDGPUOpenMP.
clang/test/Driver/amdgpu-openmp-toolchain.c
3	This would not work without -march flag as amdgcn is not forward/backward compatible and there is currently no way to detect system gpu arch.

JonChesterfield added inline comments.Feb 2 2021, 3:03 AM

clang/test/Driver/amdgpu-openmp-toolchain.c
3	The command line invocation openmp needs could be friendlier. Auto-detecting march is convenient in general but probably bad for clang tests as we want them to run on systems that don't have an amdgpu installed.

JonChesterfield added a comment.Feb 2 2021, 3:13 AM

Comment Actions

Looks much simpler, thanks!

clang/lib/Driver/Driver.cpp
755	Minor suggestion, if (TT.isNVPTX() { ... } else { assert(TT.isAMDGCN()); ... } ? Semantically equivalent, but looks like there is a missing else clause.
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
44	Maybe rename this since opt is no longer involved. addLLCOptArg or similar? There's probably the same logic in another toolchain
65	Default("0") and update the comment 'we map anything else'

Harbormaster completed remote builds in B87497: Diff 320722.Feb 2 2021, 3:53 AM

pdhaliwal updated this revision to Diff 320740.Feb 2 2021, 3:58 AM

Comment Actions

Use 0 for default -O option
Rename addOptLevelArgs to addLLCOptArg

clang/lib/Driver/Driver.cpp
755	I think, having assert in last else is bit cleaner. What do you think?

JonChesterfield added inline comments.Feb 2 2021, 4:02 AM

clang/lib/Driver/Driver.cpp
755	I also prefer the if () else {assert()} construct.

Harbormaster completed remote builds in B87505: Diff 320740.Feb 2 2021, 4:45 AM

jdoerfert accepted this revision.Feb 2 2021, 10:50 AM

Comment Actions

fine with me, assuming this doesn't disturb anything outside the AMD pipeline.

This revision is now accepted and ready to land.Feb 2 2021, 10:50 AM

Closed by commit rGfcf03e728007: [OpenMP] Add OpenMP offloading toolchain for AMDGPU (authored by pdhaliwal). · Explain WhyFeb 2 2021, 9:43 PM

This revision was automatically updated to reflect the committed changes.

pdhaliwal added a commit: rGfcf03e728007: [OpenMP] Add OpenMP offloading toolchain for AMDGPU.

Revision Contents

Path

Size

clang/

lib/

Driver/

CMakeLists.txt

1 line

Driver.cpp

24 lines

ToolChains/

7 lines

106 lines

262 lines

9 lines

test/

Driver/

amdgpu-openmp-toolchain.c

36 lines

Diff 320999

clang/lib/Driver/CMakeLists.txt

Show All 30 Lines	add_clang_library(clangDriver
ToolChains/Arch/RISCV.cpp	ToolChains/Arch/RISCV.cpp
ToolChains/Arch/Sparc.cpp	ToolChains/Arch/Sparc.cpp
ToolChains/Arch/SystemZ.cpp	ToolChains/Arch/SystemZ.cpp
ToolChains/Arch/VE.cpp	ToolChains/Arch/VE.cpp
ToolChains/Arch/X86.cpp	ToolChains/Arch/X86.cpp
ToolChains/AIX.cpp	ToolChains/AIX.cpp
ToolChains/Ananas.cpp	ToolChains/Ananas.cpp
ToolChains/AMDGPU.cpp	ToolChains/AMDGPU.cpp
ToolChains/AMDGPUOpenMP.cpp
ToolChains/AVR.cpp	ToolChains/AVR.cpp
ToolChains/BareMetal.cpp	ToolChains/BareMetal.cpp
ToolChains/Clang.cpp	ToolChains/Clang.cpp
ToolChains/CloudABI.cpp	ToolChains/CloudABI.cpp
ToolChains/CommonArgs.cpp	ToolChains/CommonArgs.cpp
ToolChains/Contiki.cpp	ToolChains/Contiki.cpp
ToolChains/CrossWindows.cpp	ToolChains/CrossWindows.cpp
ToolChains/Cuda.cpp	ToolChains/Cuda.cpp
Show All 40 Lines

clang/lib/Driver/Driver.cpp

//===--- Driver.cpp - Clang GCC Compatible Driver -------------------------===//	//===--- Driver.cpp - Clang GCC Compatible Driver -------------------------===//
//	//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.	// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//	//
//===----------------------------------------------------------------------===//	//===----------------------------------------------------------------------===//
#include "clang/Driver/Driver.h"	#include "clang/Driver/Driver.h"
#include "InputInfo.h"	#include "InputInfo.h"
#include "ToolChains/AIX.h"	#include "ToolChains/AIX.h"
#include "ToolChains/AMDGPU.h"	#include "ToolChains/AMDGPU.h"
#include "ToolChains/AMDGPUOpenMP.h"
#include "ToolChains/AVR.h"	#include "ToolChains/AVR.h"
#include "ToolChains/Ananas.h"	#include "ToolChains/Ananas.h"
#include "ToolChains/BareMetal.h"	#include "ToolChains/BareMetal.h"
#include "ToolChains/Clang.h"	#include "ToolChains/Clang.h"
#include "ToolChains/CloudABI.h"	#include "ToolChains/CloudABI.h"
#include "ToolChains/Contiki.h"	#include "ToolChains/Contiki.h"
#include "ToolChains/CrossWindows.h"	#include "ToolChains/CrossWindows.h"
#include "ToolChains/Cuda.h"	#include "ToolChains/Cuda.h"
▲ Show 20 Lines • Show All 713 Lines • ▼ Show 20 Lines	if (OpenMPTargets->getNumValues()) {
// following iterations.	// following iterations.
FoundNormalizedTriples[NormalizedName] = Val;	FoundNormalizedTriples[NormalizedName] = Val;
// If the specified target is invalid, emit a diagnostic.	// If the specified target is invalid, emit a diagnostic.
if (TT.getArch() == llvm::Triple::UnknownArch)	if (TT.getArch() == llvm::Triple::UnknownArch)
Diag(clang::diag::err_drv_invalid_omp_target) << Val;	Diag(clang::diag::err_drv_invalid_omp_target) << Val;
else {	else {
const ToolChain *TC;	const ToolChain *TC;
// CUDA toolchains have to be selected differently. They pair host	// Device toolchains have to be selected differently. They pair host
// and device in their implementation.	// and device in their implementation.
if (TT.isNVPTX()) {	if (TT.isNVPTX()\|\|TT.isAMDGCN()) {
const ToolChain *HostTC =	const ToolChain *HostTC =
C.getSingleOffloadToolChain<Action::OFK_Host>();	C.getSingleOffloadToolChain<Action::OFK_Host>();
assert(HostTC && "Host toolchain should be always defined.");	assert(HostTC && "Host toolchain should be always defined.");
auto &CudaTC =	auto &DeviceTC =
ToolChains[TT.str() + "/" + HostTC->getTriple().normalize()];	ToolChains[TT.str() + "/" + HostTC->getTriple().normalize()];
if (!CudaTC)	if (!DeviceTC){
CudaTC=std::make_unique<toolchains::CudaToolChain>(	if(TT.isNVPTX())
JonChesterfieldUnsubmitted Not Done This follows the same logic as nvptx so is likely to be correct JonChesterfield: This follows the same logic as nvptx so is likely to be correct
DeviceTC = std::make_unique<toolchains::CudaToolChain>(
this, TT, HostTC, C.getInputArgs(), Action::OFK_OpenMP);	this, TT, HostTC, C.getInputArgs(), Action::OFK_OpenMP);
TC=CudaTC.get();	elseif(TT.isAMDGCN())
JonChesterfieldUnsubmitted Not Done Minor suggestion, if (TT.isNVPTX() { ... } else { assert(TT.isAMDGCN()); ... } ? Semantically equivalent, but looks like there is a missing else clause. JonChesterfield: Minor suggestion, ``` if (TT.isNVPTX() { ... } else { assert(TT.isAMDGCN()); ... }``` ?
pdhaliwalAuthorUnsubmitted Done I think, having assert in last else is bit cleaner. What do you think? pdhaliwal: I think, having assert in last else is bit cleaner. What do you think?
JonChesterfieldUnsubmitted Not Done I also prefer the if () else {assert()} construct. JonChesterfield: I also prefer the if () else {assert()} construct.
DeviceTC =
std::make_unique<toolchains::AMDGPUOpenMPToolChain>(
this, TT, HostTC, C.getInputArgs());
else
assert(DeviceTC && "Device toolchain not defined.");
}
TC = DeviceTC.get();
} else	} else
jdoerfertUnsubmitted Done what if we say in line 745, rename the Cuda things to "device" and the toolchain stuff is in a conditional? I see the triple is hardcoded here for some reason and I don't know why not to use TT. Neverhteless, it seems less copying is better, we will eventually have more architectures and 3 * 13 lines means 3 times the bugs and 3 times the required changes in the future. jdoerfert: what if we say `TT.isNVPTX() \|\| TT.isAMDGCN()` in line 745, rename the Cuda things to "device"…
TC = &getToolChain(C.getInputArgs(), TT);	TC = &getToolChain(C.getInputArgs(), TT);
C.addOffloadDeviceToolChain(TC, Action::OFK_OpenMP);	C.addOffloadDeviceToolChain(TC, Action::OFK_OpenMP);
}	}
}	}
} else	} else
Diag(clang::diag::err_drv_expecting_fopenmp_with_fopenmp_targets);	Diag(clang::diag::err_drv_expecting_fopenmp_with_fopenmp_targets);
} else	} else
Diag(clang::diag::warn_drv_empty_joined_argument)	Diag(clang::diag::warn_drv_empty_joined_argument)
▲ Show 20 Lines • Show All 2,214 Lines • ▼ Show 20 Lines	getDeviceDependences(OffloadAction::DeviceDependences &DA,
"Number of OpenMP actions and toolchains do not match.");	"Number of OpenMP actions and toolchains do not match.");
// The host only depends on device action in the linking phase, when all	// The host only depends on device action in the linking phase, when all
// the device images have to be embedded in the host image.	// the device images have to be embedded in the host image.
if (CurPhase == phases::Link) {	if (CurPhase == phases::Link) {
assert(ToolChains.size() == DeviceLinkerInputs.size() &&	assert(ToolChains.size() == DeviceLinkerInputs.size() &&
"Toolchains and linker inputs sizes do not match.");	"Toolchains and linker inputs sizes do not match.");
auto LI = DeviceLinkerInputs.begin();	auto LI = DeviceLinkerInputs.begin();
for (auto *A : OpenMPDeviceActions) {	for (auto *A : OpenMPDeviceActions) {
jdoerfertUnsubmitted Not Done Why the copy? jdoerfert: Why the `substr` copy?
JonChesterfieldUnsubmitted Not Done As in ? Slightly prettier. Not sure ad hoc arg parsing like this is ever good though JonChesterfield: As in `VStr.startswith("-march=gfx")`? Slightly prettier. Not sure ad hoc arg parsing like this…
LI->push_back(A);	LI->push_back(A);
++LI;	++LI;
}	}
// We passed the device action as a host dependence, so we don't need to	// We passed the device action as a host dependence, so we don't need to
jdoerfertUnsubmitted Not Done This does not look right. IsAMDGCN is not a question that should be answered by a flag. In general, why skip these passes here. Isn't what you really want to do, pass to the device compilation job? jdoerfert: This does not look right. IsAMDGCN is not a question that should be answered by a `-Xopenmp…
pdhaliwalAuthorUnsubmitted Done I have removed the logic from here instead now I have added -emit-llvm-bc using addClangTargetOptions in AMDGPUOpenMP. pdhaliwal: I have removed the logic from here instead now I have added -emit-llvm-bc using…
// do anything else with them.	// do anything else with them.
OpenMPDeviceActions.clear();	OpenMPDeviceActions.clear();
return ABRT_Success;	return ABRT_Success;
}	}
// By default, we produce an action for each device arch.	// By default, we produce an action for each device arch.
for (Action *&A : OpenMPDeviceActions)	for (Action *&A : OpenMPDeviceActions)
A = C.getDriver().ConstructPhaseAction(C, Args, CurPhase, A);	A = C.getDriver().ConstructPhaseAction(C, Args, CurPhase, A);
▲ Show 20 Lines • Show All 2,391 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/AMDGPU.h

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
public:	public:
AMDGPUToolChain(const Driver &D, const llvm::Triple &Triple,	AMDGPUToolChain(const Driver &D, const llvm::Triple &Triple,
const llvm::opt::ArgList &Args);	const llvm::opt::ArgList &Args);
unsigned GetDefaultDwarfVersion() const override { return 4; }	unsigned GetDefaultDwarfVersion() const override { return 4; }
bool IsIntegratedAssemblerDefault() const override { return true; }	bool IsIntegratedAssemblerDefault() const override { return true; }
bool IsMathErrnoDefault() const override { return false; }	bool IsMathErrnoDefault() const override { return false; }
bool useIntegratedAs() const override { return true; }
bool isCrossCompiling() const override { return true; }
bool isPICDefault() const override { return false; }
bool isPIEDefault() const override { return false; }
bool isPICDefaultForced() const override { return false; }
bool SupportsProfiling() const override { return false; }
jdoerfertUnsubmitted Not Done What is the coding style here? Pick upper case or lower case, form the above I assume upper case. jdoerfert: What is the coding style here? Pick upper case or lower case, form the above I assume upper…
JonChesterfieldUnsubmitted Not Done Yeah, should be as it's a function. Should probably propose that as a minimal patch to the existing code. There may be other casing mistakes. ConstructJob looks suspect, but all the files call it that instead of constructJob, so maybe I'm missing a part of the style guide. JonChesterfield: Yeah, should be `supportsProfiling()` as it's a function. Should probably propose that as a…
llvm::opt::DerivedArgList *	llvm::opt::DerivedArgList *
TranslateArgs(const llvm::opt::DerivedArgList &Args, StringRef BoundArch,	TranslateArgs(const llvm::opt::DerivedArgList &Args, StringRef BoundArch,
Action::OffloadKind DeviceOffloadKind) const override;	Action::OffloadKind DeviceOffloadKind) const override;
void addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,	void addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args,	llvm::opt::ArgStringList &CC1Args,
Action::OffloadKind DeviceOffloadKind) const override;	Action::OffloadKind DeviceOffloadKind) const override;
▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/AMDGPUOpenMP.h

This file was added.

//===- AMDGPUOpenMP.h - AMDGPUOpenMP ToolChain Implementation -- C++ ----===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
#ifndef LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_AMDGPUOPENMP_H
#define LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_AMDGPUOPENMP_H
#include "AMDGPU.h"
#include "clang/Driver/Tool.h"
#include "clang/Driver/ToolChain.h"
namespace clang {
namespace driver {
namespace tools {
namespace AMDGCN {
// Runs llvm-link/opt/llc/lld, which links multiple LLVM bitcode, together with
// device library, then compiles it to ISA in a shared object.
class LLVM_LIBRARY_VISIBILITY OpenMPLinker : public Tool {
public:
OpenMPLinker(const ToolChain &TC)
: Tool("AMDGCN::OpenMPLinker", "amdgcn-link", TC) {}
bool hasIntegratedCPP() const override { return false; }
void ConstructJob(Compilation &C, const JobAction &JA,
const InputInfo &Output, const InputInfoList &Inputs,
const llvm::opt::ArgList &TCArgs,
const char *LinkingOutput) const override;
private:
/// \return llvm-link output file name.
const char *constructLLVMLinkCommand(Compilation &C, const JobAction &JA,
const InputInfoList &Inputs,
const llvm::opt::ArgList &Args,
llvm::StringRef SubArchName,
llvm::StringRef OutputFilePrefix) const;
/// \return llc output file name.
const char *constructLlcCommand(Compilation &C, const JobAction &JA,
const InputInfoList &Inputs,
const llvm::opt::ArgList &Args,
llvm::StringRef SubArchName,
llvm::StringRef OutputFilePrefix,
const char *InputFileName,
bool OutputIsAsm = false) const;
void constructLldCommand(Compilation &C, const JobAction &JA,
const InputInfoList &Inputs, const InputInfo &Output,
const llvm::opt::ArgList &Args,
const char *InputFileName) const;
};
} // end namespace AMDGCN
} // end namespace tools
namespace toolchains {
class LLVM_LIBRARY_VISIBILITY AMDGPUOpenMPToolChain final
: public ROCMToolChain {
public:
AMDGPUOpenMPToolChain(const Driver &D, const llvm::Triple &Triple,
const ToolChain &HostTC,
const llvm::opt::ArgList &Args);
const llvm::Triple *getAuxTriple() const override {
return &HostTC.getTriple();
}
llvm::opt::DerivedArgList *
TranslateArgs(const llvm::opt::DerivedArgList &Args, StringRef BoundArch,
Action::OffloadKind DeviceOffloadKind) const override;
void
addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args,
Action::OffloadKind DeviceOffloadKind) const override;
void addClangWarningOptions(llvm::opt::ArgStringList &CC1Args) const override;
CXXStdlibType GetCXXStdlibType(const llvm::opt::ArgList &Args) const override;
void
AddClangSystemIncludeArgs(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args) const override;
void AddIAMCUIncludeArgs(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args) const override;
SanitizerMask getSupportedSanitizers() const override;
VersionTuple
computeMSVCVersion(const Driver *D,
const llvm::opt::ArgList &Args) const override;
const ToolChain &HostTC;
protected:
Tool *buildLinker() const override;
};
} // end namespace toolchains
} // end namespace driver
} // end namespace clang
#endif // LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_AMDGPUOPENMP_H

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp

This file was added.

//===- AMDGPUOpenMP.cpp - AMDGPUOpenMP ToolChain Implementation -- C++ --===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
#include "AMDGPUOpenMP.h"
#include "AMDGPU.h"
#include "CommonArgs.h"
#include "InputInfo.h"
#include "clang/Driver/Compilation.h"
#include "clang/Driver/Driver.h"
#include "clang/Driver/DriverDiagnostic.h"
#include "clang/Driver/Options.h"
#include "llvm/Support/FileSystem.h"
#include "llvm/Support/Path.h"
using namespace clang::driver;
using namespace clang::driver::toolchains;
using namespace clang::driver::tools;
using namespace clang;
using namespace llvm::opt;
namespace {
static const char *getOutputFileName(Compilation &C, StringRef Base,
const char *Postfix,
const char *Extension) {
const char *OutputFileName;
if (C.getDriver().isSaveTempsEnabled()) {
OutputFileName =
C.getArgs().MakeArgString(Base.str() + Postfix + "." + Extension);
} else {
std::string TmpName =
C.getDriver().GetTemporaryPath(Base.str() + Postfix, Extension);
OutputFileName = C.addTempFile(C.getArgs().MakeArgString(TmpName));
}
return OutputFileName;
}
static void addLLCOptArg(const llvm::opt::ArgList &Args,
llvm::opt::ArgStringList &CmdArgs) {
JonChesterfieldUnsubmitted Not Done Maybe rename this since opt is no longer involved. addLLCOptArg or similar? There's probably the same logic in another toolchain JonChesterfield: Maybe rename this since opt is no longer involved. addLLCOptArg or similar? There's probably…
if (Arg *A = Args.getLastArg(options::OPT_O_Group)) {
StringRef OOpt = "0";
if (A->getOption().matches(options::OPT_O4) \|\|
A->getOption().matches(options::OPT_Ofast))
OOpt = "3";
else if (A->getOption().matches(options::OPT_O0))
OOpt = "0";
else if (A->getOption().matches(options::OPT_O)) {
// Clang and opt support -Os/-Oz; llc only supports -O0, -O1, -O2 and -O3
// so we map -Os/-Oz to -O2.
// Only clang supports -Og, and maps it to -O1.
// We map anything else to -O2.
OOpt = llvm::StringSwitch<const char *>(A->getValue())
.Case("1", "1")
.Case("2", "2")
.Case("3", "3")
.Case("s", "2")
.Case("z", "2")
.Case("g", "1")
.Default("0");
}
JonChesterfieldUnsubmitted Not Done Default("0") and update the comment 'we map anything else' JonChesterfield: Default("0") and update the comment 'we map anything else'
CmdArgs.push_back(Args.MakeArgString("-O" + OOpt));
jdoerfertUnsubmitted Not Done To verify, O0 is not mapped to O2, correct? jdoerfert: To verify, O0 is not mapped to O2, correct?
JonChesterfieldUnsubmitted Not Done I think is an error, though it does look like this will rewrite it to O2 instead. Which seems bad. JonChesterfield: I think `opt -O0` is an error, though it does look like this will rewrite it to O2 instead.
JonChesterfieldUnsubmitted Done Suggest we drop the opt invocation. llvm-link is required for the case where multiple files are passed to clang as a single invocation, but I don't see why we need an opt invocation here. Passing the llvm-link'ed code to llc as-is, without this implicit lto style opt invocation, is probably a better default. JonChesterfield: Suggest we drop the opt invocation. llvm-link is required for the case where multiple files are…
}
jdoerfertUnsubmitted Done Just to make it clear, we should default to O0. jdoerfert: Just to make it clear, we should default to O0.
}
} // namespace
const char *AMDGCN::OpenMPLinker::constructLLVMLinkCommand(
Compilation &C, const JobAction &JA, const InputInfoList &Inputs,
const ArgList &Args, StringRef SubArchName,
StringRef OutputFilePrefix) const {
ArgStringList CmdArgs;
for (const auto &II : Inputs)
if (II.isFilename())
CmdArgs.push_back(II.getFilename());
// Add an intermediate output file.
CmdArgs.push_back("-o");
const char *OutputFileName =
getOutputFileName(C, OutputFilePrefix, "-linked", "bc");
CmdArgs.push_back(OutputFileName);
const char *Exec =
Args.MakeArgString(getToolChain().GetProgramPath("llvm-link"));
C.addCommand(std::make_unique<Command>(
JA, *this, ResponseFileSupport::AtFileCurCP(), Exec, CmdArgs, Inputs,
InputInfo(&JA, Args.MakeArgString(OutputFileName))));
return OutputFileName;
}
const char *AMDGCN::OpenMPLinker::constructLlcCommand(
Compilation &C, const JobAction &JA, const InputInfoList &Inputs,
const llvm::opt::ArgList &Args, llvm::StringRef SubArchName,
llvm::StringRef OutputFilePrefix, const char *InputFileName,
bool OutputIsAsm) const {
// Construct llc command.
ArgStringList LlcArgs;
// The input to llc is the output from opt.
LlcArgs.push_back(InputFileName);
// Pass optimization arg to llc.
addLLCOptArg(Args, LlcArgs);
LlcArgs.push_back("-mtriple=amdgcn-amd-amdhsa");
LlcArgs.push_back(Args.MakeArgString("-mcpu=" + SubArchName));
LlcArgs.push_back(
Args.MakeArgString(Twine("-filetype=") + (OutputIsAsm ? "asm" : "obj")));
for (const Arg *A : Args.filtered(options::OPT_mllvm)) {
LlcArgs.push_back(A->getValue(0));
}
// Add output filename
LlcArgs.push_back("-o");
const char *LlcOutputFile =
getOutputFileName(C, OutputFilePrefix, "", OutputIsAsm ? "s" : "o");
LlcArgs.push_back(LlcOutputFile);
const char *Llc = Args.MakeArgString(getToolChain().GetProgramPath("llc"));
C.addCommand(std::make_unique<Command>(
JA, *this, ResponseFileSupport::AtFileCurCP(), Llc, LlcArgs, Inputs,
InputInfo(&JA, Args.MakeArgString(LlcOutputFile))));
return LlcOutputFile;
}
void AMDGCN::OpenMPLinker::constructLldCommand(
Compilation &C, const JobAction &JA, const InputInfoList &Inputs,
const InputInfo &Output, const llvm::opt::ArgList &Args,
const char *InputFileName) const {
// Construct lld command.
// The output from ld.lld is an HSA code object file.
ArgStringList LldArgs{"-flavor", "gnu", "--no-undefined",
"-shared", "-o", Output.getFilename(),
InputFileName};
const char *Lld = Args.MakeArgString(getToolChain().GetProgramPath("lld"));
C.addCommand(std::make_unique<Command>(
JA, *this, ResponseFileSupport::AtFileCurCP(), Lld, LldArgs, Inputs,
InputInfo(&JA, Args.MakeArgString(Output.getFilename()))));
}
// For amdgcn the inputs of the linker job are device bitcode and output is
// object file. It calls llvm-link, opt, llc, then lld steps.
void AMDGCN::OpenMPLinker::ConstructJob(Compilation &C, const JobAction &JA,
const InputInfo &Output,
const InputInfoList &Inputs,
const ArgList &Args,
const char *LinkingOutput) const {
assert(getToolChain().getTriple().isAMDGCN() && "Unsupported target");
StringRef GPUArch = Args.getLastArgValue(options::OPT_march_EQ);
assert(GPUArch.startswith("gfx") && "Unsupported sub arch");
// Prefix for temporary file name.
std::string Prefix;
for (const auto &II : Inputs)
if (II.isFilename())
Prefix =
llvm::sys::path::stem(II.getFilename()).str() + "-" + GPUArch.str();
assert(Prefix.length() && "no linker inputs are files ");
// Each command outputs different files.
const char *LLVMLinkCommand =
constructLLVMLinkCommand(C, JA, Inputs, Args, GPUArch, Prefix);
const char *LlcCommand = constructLlcCommand(C, JA, Inputs, Args, GPUArch,
Prefix, LLVMLinkCommand);
constructLldCommand(C, JA, Inputs, Output, Args, LlcCommand);
}
AMDGPUOpenMPToolChain::AMDGPUOpenMPToolChain(const Driver &D,
const llvm::Triple &Triple,
const ToolChain &HostTC,
const ArgList &Args)
: ROCMToolChain(D, Triple, Args), HostTC(HostTC) {
// Lookup binaries into the driver directory, this is used to
// discover the clang-offload-bundler executable.
getProgramPaths().push_back(getDriver().Dir);
}
void AMDGPUOpenMPToolChain::addClangTargetOptions(
const llvm::opt::ArgList &DriverArgs, llvm::opt::ArgStringList &CC1Args,
Action::OffloadKind DeviceOffloadingKind) const {
HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind);
StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_march_EQ);
assert(!GpuArch.empty() && "Must have an explicit GPU arch.");
assert(DeviceOffloadingKind == Action::OFK_OpenMP &&
"Only OpenMP offloading kinds are supported.");
CC1Args.push_back("-target-cpu");
CC1Args.push_back(DriverArgs.MakeArgStringRef(GpuArch));
CC1Args.push_back("-fcuda-is-device");
CC1Args.push_back("-emit-llvm-bc");
}
llvm::opt::DerivedArgList *AMDGPUOpenMPToolChain::TranslateArgs(
const llvm::opt::DerivedArgList &Args, StringRef BoundArch,
Action::OffloadKind DeviceOffloadKind) const {
DerivedArgList *DAL =
HostTC.TranslateArgs(Args, BoundArch, DeviceOffloadKind);
if (!DAL)
DAL = new DerivedArgList(Args.getBaseArgs());
const OptTable &Opts = getDriver().getOpts();
if (DeviceOffloadKind != Action::OFK_OpenMP) {
for (Arg *A : Args) {
DAL->append(A);
}
}
if (!BoundArch.empty()) {
DAL->eraseArg(options::OPT_march_EQ);
DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ),
BoundArch);
}
return DAL;
}
Tool *AMDGPUOpenMPToolChain::buildLinker() const {
assert(getTriple().isAMDGCN());
return new tools::AMDGCN::OpenMPLinker(*this);
}
void AMDGPUOpenMPToolChain::addClangWarningOptions(
ArgStringList &CC1Args) const {
HostTC.addClangWarningOptions(CC1Args);
}
ToolChain::CXXStdlibType
AMDGPUOpenMPToolChain::GetCXXStdlibType(const ArgList &Args) const {
return HostTC.GetCXXStdlibType(Args);
}
void AMDGPUOpenMPToolChain::AddClangSystemIncludeArgs(
const ArgList &DriverArgs, ArgStringList &CC1Args) const {
HostTC.AddClangSystemIncludeArgs(DriverArgs, CC1Args);
}
void AMDGPUOpenMPToolChain::AddIAMCUIncludeArgs(const ArgList &Args,
ArgStringList &CC1Args) const {
HostTC.AddIAMCUIncludeArgs(Args, CC1Args);
}
SanitizerMask AMDGPUOpenMPToolChain::getSupportedSanitizers() const {
// The AMDGPUOpenMPToolChain only supports sanitizers in the sense that it
// allows sanitizer arguments on the command line if they are supported by the
// host toolchain. The AMDGPUOpenMPToolChain will actually ignore any command
// line arguments for any of these "supported" sanitizers. That means that no
// sanitization of device code is actually supported at this time.
//
// This behavior is necessary because the host and device toolchains
// invocations often share the command line, so the device toolchain must
// tolerate flags meant only for the host toolchain.
return HostTC.getSupportedSanitizers();
}
VersionTuple
AMDGPUOpenMPToolChain::computeMSVCVersion(const Driver *D,
const ArgList &Args) const {
return HostTC.computeMSVCVersion(D, Args);
}

clang/lib/Driver/ToolChains/HIP.h

Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	public:
}	}
llvm::opt::DerivedArgList *	llvm::opt::DerivedArgList *
TranslateArgs(const llvm::opt::DerivedArgList &Args, StringRef BoundArch,	TranslateArgs(const llvm::opt::DerivedArgList &Args, StringRef BoundArch,
Action::OffloadKind DeviceOffloadKind) const override;	Action::OffloadKind DeviceOffloadKind) const override;
void addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,	void addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args,	llvm::opt::ArgStringList &CC1Args,
Action::OffloadKind DeviceOffloadKind) const override;	Action::OffloadKind DeviceOffloadKind) const override;
bool useIntegratedAs() const override { return true; }
bool isCrossCompiling() const override { return true; }
bool isPICDefault() const override { return false; }
bool isPIEDefault() const override { return false; }
bool isPICDefaultForced() const override { return false; }
bool SupportsProfiling() const override { return false; }
bool IsMathErrnoDefault() const override { return false; }
void addClangWarningOptions(llvm::opt::ArgStringList &CC1Args) const override;	void addClangWarningOptions(llvm::opt::ArgStringList &CC1Args) const override;
CXXStdlibType GetCXXStdlibType(const llvm::opt::ArgList &Args) const override;	CXXStdlibType GetCXXStdlibType(const llvm::opt::ArgList &Args) const override;
void	void
AddClangSystemIncludeArgs(const llvm::opt::ArgList &DriverArgs,	AddClangSystemIncludeArgs(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args) const override;	llvm::opt::ArgStringList &CC1Args) const override;
void AddClangCXXStdlibIncludeArgs(	void AddClangCXXStdlibIncludeArgs(
const llvm::opt::ArgList &Args,	const llvm::opt::ArgList &Args,
llvm::opt::ArgStringList &CC1Args) const override;	llvm::opt::ArgStringList &CC1Args) const override;
Show All 24 Lines

clang/test/Driver/amdgpu-openmp-toolchain.c

This file was added.

// REQUIRES: amdgpu-registered-target
// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 %s 2>&1 \
// RUN: \| FileCheck %s
jdoerfertUnsubmitted Not Done Would this at all work without the march flag? jdoerfert: Would this at all work without the march flag?
pdhaliwalAuthorUnsubmitted Done This would not work without -march flag as amdgcn is not forward/backward compatible and there is currently no way to detect system gpu arch. pdhaliwal: This would not work without -march flag as amdgcn is not forward/backward compatible and there…
JonChesterfieldUnsubmitted Not Done The command line invocation openmp needs could be friendlier. Auto-detecting march is convenient in general but probably bad for clang tests as we want them to run on systems that don't have an amdgpu installed. JonChesterfield: The command line invocation openmp needs `-fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp…
// verify the tools invocations
// CHECK: clang{{.}}"-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.}}"-x" "c"{{.*}}
// CHECK: clang{{.}}"-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.}}"-x" "ir"{{.*}}
// CHECK: clang{{.}}"-cc1"{{.}}"-triple" "amdgcn-amd-amdhsa"{{.}}"-target-cpu" "gfx906" "-fcuda-is-device" "-emit-llvm-bc"{{.}}
// CHECK: llvm-link{{.}}"-o" "{{.}}amdgpu-openmp-toolchain-{{.}}-gfx906-linked-{{.}}.bc"
// CHECK: llc{{.}}amdgpu-openmp-toolchain-{{.}}-gfx906-linked-{{.}}.bc" "-mtriple=amdgcn-amd-amdhsa" "-mcpu=gfx906" "-filetype=obj" "-o"{{.}}amdgpu-openmp-toolchain-{{.}}-gfx906-{{.}}.o"
// CHECK: lld{{.}}"-flavor" "gnu" "--no-undefined" "-shared" "-o"{{.}}amdgpu-openmp-toolchain-{{.}}.out" "{{.}}amdgpu-openmp-toolchain-{{.}}-gfx906-{{.}}.o"
// CHECK: clang-offload-wrapper{{.}}"-target" "x86_64-unknown-linux-gnu" "-o" "{{.}}a-{{.}}.bc" {{.}}amdgpu-openmp-toolchain-{{.*}}.out"
// CHECK: clang{{.}}"-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.}}"-o" "{{.}}a-{{.}}.o" "-x" "ir" "{{.}}a-{{.}}.bc"
// CHECK: ld{{.}}"-o" "a.out"{{.}}"{{.}}amdgpu-openmp-toolchain-{{.}}.o" "{{.}}a-{{.}}.o" "-lomp" "-lomptarget"
// RUN: %clang -ccc-print-phases --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 %s 2>&1 \
// RUN: \| FileCheck --check-prefix=CHECK-PHASES %s
// phases
// CHECK-PHASES: 0: input, "{{.*}}amdgpu-openmp-toolchain.c", c, (host-openmp)
// CHECK-PHASES: 1: preprocessor, {0}, cpp-output, (host-openmp)
// CHECK-PHASES: 2: compiler, {1}, ir, (host-openmp)
// CHECK-PHASES: 3: backend, {2}, assembler, (host-openmp)
// CHECK-PHASES: 4: assembler, {3}, object, (host-openmp)
// CHECK-PHASES: 5: input, "{{.*}}amdgpu-openmp-toolchain.c", c, (device-openmp)
// CHECK-PHASES: 6: preprocessor, {5}, cpp-output, (device-openmp)
// CHECK-PHASES: 7: compiler, {6}, ir, (device-openmp)
// CHECK-PHASES: 8: offload, "host-openmp (x86_64-unknown-linux-gnu)" {2}, "device-openmp (amdgcn-amd-amdhsa)" {7}, ir
// CHECK-PHASES: 9: backend, {8}, assembler, (device-openmp)
// CHECK-PHASES: 10: assembler, {9}, object, (device-openmp)
// CHECK-PHASES: 11: linker, {10}, image, (device-openmp)
// CHECK-PHASES: 12: offload, "device-openmp (amdgcn-amd-amdhsa)" {11}, image
// CHECK-PHASES: 13: clang-offload-wrapper, {12}, ir, (host-openmp)
// CHECK-PHASES: 14: backend, {13}, assembler, (host-openmp)
// CHECK-PHASES: 15: assembler, {14}, object, (host-openmp)
// CHECK-PHASES: 16: linker, {4, 15}, image, (host-openmp)

URL: https://reviews.llvm.org/D94961

⇱ ⚙ D94961 [OpenMP] Add OpenMP offloading toolchain for AMDGPU

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Add OpenMP offloading toolchain for AMDGPU
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 320999

clang/lib/Driver/CMakeLists.txt

clang/lib/Driver/Driver.cpp

clang/lib/Driver/ToolChains/AMDGPU.h

clang/lib/Driver/ToolChains/AMDGPUOpenMP.h

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp

clang/lib/Driver/ToolChains/HIP.h

clang/test/Driver/amdgpu-openmp-toolchain.c

URL: https://reviews.llvm.org/D94961

⇱ ⚙ D94961 [OpenMP] Add OpenMP offloading toolchain for AMDGPU

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Add OpenMP offloading toolchain for AMDGPUClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 320999

clang/lib/Driver/CMakeLists.txt

clang/lib/Driver/Driver.cpp

clang/lib/Driver/ToolChains/AMDGPU.h

clang/lib/Driver/ToolChains/AMDGPUOpenMP.h

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp

clang/lib/Driver/ToolChains/HIP.h

clang/test/Driver/amdgpu-openmp-toolchain.c

[OpenMP] Add OpenMP offloading toolchain for AMDGPU
ClosedPublic