VOOZH about

URL: https://reviews.llvm.org/D94961

⇱ ⚙ D94961 [OpenMP] Add OpenMP offloading toolchain for AMDGPU


This is an archive of the discontinued LLVM Phabricator instance.

  • -
    clang/
  • -
    lib/Driver/
  • -
    Driver/
  • -
    CMakeLists.txt
  • 3/9
    Driver.cpp
  • -
    ToolChains/
  • 2
    AMDGPU.h
  • -
    AMDGPUOpenMP.h
  • 2/6
    AMDGPUOpenMP.cpp
  • -
    HIP.h
  • -
    test/Driver/
  • -
    Driver/
  • 1/3
    amdgpu-openmp-toolchain.c

[OpenMP] Add OpenMP offloading toolchain for AMDGPU
ClosedPublic

Authored by pdhaliwal on Jan 19 2021, 3:45 AM.

Details

Summary

This patch adds AMDGPUOpenMPToolChain for supporting OpenMP
offloading to AMD GPU's.

Originally authored by Greg Rodgers

Diff Detail

Event Timeline

pdhaliwal created this revision.Jan 19 2021, 3:45 AM
pdhaliwal requested review of this revision.Jan 19 2021, 3:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 19 2021, 3:45 AM
Comment Actions

This patch was written, roughly, by:

  • copying the known-working openmp driver from rocm into the trunk source tree
  • deleting lots of stuff that didn't look necessary
  • deleting some stuff that is broadly necessary, but the specifics are up for debate

The idea is to move language-independent but amdgcn-specific code into ROCMToolChain. Some has already gone in, others (like computeMSVCVersion) will likely move too.

Regarding the rest of the end to end stack:

  • host plugin works, same code in trunk / rocm / aomp
  • device plugin will work once it's building as openmp, modulo printf and malloc
  • compiler backend will work for spmd kernels today, will work for generic kernels after D94648 or equivalent lands

Regarding tests (which need the unimplemented bit filled in with the next patch):

  • Runtime tests (for spmd kernels) are working locally with a jury rigged devicertl
  • Codegen tests are proving awkward to update. Dropping line number would help, but there's still a difference in addrspace cast distribution. I'm hoping the scripts involved in generating the nvptx cases can be adapted.
pdhaliwal updated this revision to Diff 317553.Jan 19 2021, 6:56 AM
Comment Actions

Fix clang-tidy error

Comment Actions

Won't this just prevent us from building clang due to the missing cmake changes? We need somewhat testable chunks.

Comment Actions

This patch was written, roughly, by:

  • copying the known-working openmp driver from rocm into the trunk source tree
  • deleting lots of stuff that didn't look necessary
  • deleting some stuff that is broadly necessary, but the specifics are up for debate

The idea is to move language-independent but amdgcn-specific code into ROCMToolChain. Some has already gone in, others (like computeMSVCVersion) will likely move too.

Regarding the rest of the end to end stack:

  • host plugin works, same code in trunk / rocm / aomp
  • device plugin will work once it's building as openmp, modulo printf and malloc
  • compiler backend will work for spmd kernels today, will work for generic kernels after D94648 or equivalent lands

Regarding tests (which need the unimplemented bit filled in with the next patch):

  • Runtime tests (for spmd kernels) are working locally with a jury rigged devicertl
  • Codegen tests are proving awkward to update. Dropping line number would help, but there's still a difference in addrspace cast distribution. I'm hoping the scripts involved in generating the nvptx cases can be adapted.

Could you add the tests for the tools invocation to check that the newly added classes/functions correctly translate options/flags?

pdhaliwal updated this revision to Diff 317810.Jan 20 2021, 1:49 AM
Comment Actions

Won't this just prevent us from building clang due to the missing cmake changes?

It compiles and builds fine, however, I wasn't actually aware such sanity checking being present. It turns out
the unknown files inside llvm/ will lead cmake to report error but such reporting will not happen inside clang. Maybe such checks
were not enabled inside clang. Anyways thanks for pointing out. I will keep that in mind in future.

The idea for this patch was basically to introduce AMDGPUToolChain classes without much of the functionality in order
to keep its size in check. And the second patch would have integrated the toolchain with driver along with testing.
But during the intermediate time of the two patches, bare files would have existed (never built and tested).

I have updated this patch to now include somewhat functional driver along with tests.

pdhaliwal retitled this revision from [OpenMP] Add OpenMP offloading toolchain skeleton for AMDGPU to [OpenMP] Add OpenMP offloading toolchain for AMDGPU.Jan 20 2021, 2:16 AM
pdhaliwal edited the summary of this revision. (Show Details)
pdhaliwal updated this revision to Diff 317831.Jan 20 2021, 4:09 AM
Comment Actions

Fixed failing debian tests

pdhaliwal updated this revision to Diff 318120.Edited · Jan 20 2021, 11:51 PM
Comment Actions
  • Moved common methods of HIP and OpenMP to base AMDGPUToolChain
  • Removed unnecessary asserts
Comment Actions

The deviceRTL is changing from cuda/hip to openmp at present. It would be good to be able to compile that as openmp for amdgpu, which needs a patch roughly like this and probably some follow up.

It's plausible that the contents of this file are only really of interest to AMD people, in which case we should probably land it in order to iterate in tree instead of downstream. It now has the cmake hook and a proof of concept test case.

@ABataev @grokos @jdoerfert your guidance would be very welcome, but equally if you'd prefer leave us to sink or swim that's completely reasonable too.

Thanks!

Comment Actions

LGTM. But, please wait from someone outside AMD to accept it.

Comment Actions

Overall, looks reasonable. I'm not a fan of the way things are hooked up but that is also true for the NVPTX toolchain and for now unavoidable I guess. Some nits below.

clang/lib/Driver/Driver.cpp
2995

Why the copy?

clang/lib/Driver/ToolChains/AMDGPU.h
72

What is the coding style here? Pick upper case or lower case, form the above I assume upper case.

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
66

To verify, O0 is not mapped to O2, correct?

Comment Actions

Overall, looks reasonable. I'm not a fan of the way things are hooked up but that is also true for the NVPTX toolchain and for now unavoidable I guess.

This is loosely based on the nvptx and hip toolchain files, but looking at others in clang the style seems similar.

clang/lib/Driver/Driver.cpp
2995

As in ? Slightly prettier. Not sure ad hoc arg parsing like this is ever good though

clang/lib/Driver/ToolChains/AMDGPU.h
72

Yeah, should be as it's a function. Should probably propose that as a minimal patch to the existing code. There may be other casing mistakes. ConstructJob looks suspect, but all the files call it that instead of constructJob, so maybe I'm missing a part of the style guide.

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
66

I think is an error, though it does look like this will rewrite it to O2 instead. Which seems bad.

JonChesterfield added inline comments.Feb 1 2021, 8:21 AM
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
66

Suggest we drop the opt invocation. llvm-link is required for the case where multiple files are passed to clang as a single invocation, but I don't see why we need an opt invocation here. Passing the llvm-link'ed code to llc as-is, without this implicit lto style opt invocation, is probably a better default.

Comment Actions

I believe this is 'safe', in the sense that it can't break anything other than AMDGPU's openmp implementation, which doesn't work yet anyway. The change to non-amd code is to OpenMPActionBuilder, and while parsing isn't pretty, it won't fire on nvptx. I'm not particularly confident it is correct, but it's hard to do comprehensive testing before there's an end to end spike. I'd like to iterate in tree, with an expectation that parts of this will need to change - @jdoerfert does the OpenMP patch look adequate for now?

clang/lib/Driver/Driver.cpp
752

This follows the same logic as nvptx so is likely to be correct

Comment Actions

Generally OK, the more I look at it the more things pop up. Other than my comments, I'm fine with this going in.

clang/lib/Driver/Driver.cpp
764

what if we say in line 745, rename the Cuda things to "device" and the toolchain stuff is in a conditional? I see the triple is hardcoded here for some reason and I don't know why not to use TT. Neverhteless, it seems less copying is better, we will eventually have more architectures and 3 * 13 lines means 3 times the bugs and 3 times the required changes in the future.

3000

This does not look right. IsAMDGCN is not a question that should be answered by a flag.

In general, why skip these passes here. Isn't what you really want to do, pass to the device compilation job?

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
67

Just to make it clear, we should default to O0.

clang/test/Driver/amdgpu-openmp-toolchain.c
3

Would this at all work without the march flag?

pdhaliwal updated this revision to Diff 320722.Feb 2 2021, 2:39 AM
Comment Actions

Addressed review comments.

  • Combined the toolchain creation logic for nvptx and amdgcn
  • Replaced -Xopenmp-target with -emit-llvm-bc inside AMDGPUOpenMP.cpp
  • Removed opt from pipeline
pdhaliwal marked 3 inline comments as done.Feb 2 2021, 2:47 AM
Comment Actions

After addressing the review comments, I have internally verified changes on few simple test programs. They seem to be working fine.

clang/lib/Driver/Driver.cpp
3000

I have removed the logic from here instead now I have added -emit-llvm-bc using addClangTargetOptions in AMDGPUOpenMP.

clang/test/Driver/amdgpu-openmp-toolchain.c
3

This would not work without -march flag as amdgcn is not forward/backward compatible and there is currently no way to detect system gpu arch.

JonChesterfield added inline comments.Feb 2 2021, 3:03 AM
clang/test/Driver/amdgpu-openmp-toolchain.c
3

The command line invocation openmp needs could be friendlier.

Auto-detecting march is convenient in general but probably bad for clang tests as we want them to run on systems that don't have an amdgpu installed.

Comment Actions

Looks much simpler, thanks!

clang/lib/Driver/Driver.cpp
755

Minor suggestion,

if (TT.isNVPTX() {
 ...
} else {
 assert(TT.isAMDGCN());
 ...
}

? Semantically equivalent, but looks like there is a missing else clause.

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp
44

Maybe rename this since opt is no longer involved. addLLCOptArg or similar? There's probably the same logic in another toolchain

65

Default("0") and update the comment 'we map anything else'

pdhaliwal updated this revision to Diff 320740.Feb 2 2021, 3:58 AM
Comment Actions
  • Use 0 for default -O option
  • Rename addOptLevelArgs to addLLCOptArg
clang/lib/Driver/Driver.cpp
755

I think, having assert in last else is bit cleaner. What do you think?

JonChesterfield added inline comments.Feb 2 2021, 4:02 AM
clang/lib/Driver/Driver.cpp
755

I also prefer the if () else {assert()} construct.

jdoerfert accepted this revision.Feb 2 2021, 10:50 AM
Comment Actions

fine with me, assuming this doesn't disturb anything outside the AMD pipeline.

This revision is now accepted and ready to land.Feb 2 2021, 10:50 AM
This revision was automatically updated to reflect the committed changes.

Revision Contents

PathSize
clang/
lib/
Driver/
1 line
24 lines
ToolChains/
7 lines
106 lines
262 lines
9 lines
test/
Driver/
36 lines

Diff 320999

clang/lib/Driver/CMakeLists.txt

Show All 30 Linesadd_clang_library(clangDriver
ToolChains/Arch/RISCV.cpp ToolChains/Arch/RISCV.cpp
ToolChains/Arch/Sparc.cpp ToolChains/Arch/Sparc.cpp
ToolChains/Arch/SystemZ.cpp ToolChains/Arch/SystemZ.cpp
ToolChains/Arch/VE.cpp ToolChains/Arch/VE.cpp
ToolChains/Arch/X86.cpp ToolChains/Arch/X86.cpp
ToolChains/AIX.cpp ToolChains/AIX.cpp
ToolChains/Ananas.cpp ToolChains/Ananas.cpp
ToolChains/AMDGPU.cpp ToolChains/AMDGPU.cpp
ToolChains/AMDGPUOpenMP.cpp
ToolChains/AVR.cpp ToolChains/AVR.cpp
ToolChains/BareMetal.cpp ToolChains/BareMetal.cpp
ToolChains/Clang.cpp ToolChains/Clang.cpp
ToolChains/CloudABI.cpp ToolChains/CloudABI.cpp
ToolChains/CommonArgs.cpp ToolChains/CommonArgs.cpp
ToolChains/Contiki.cpp ToolChains/Contiki.cpp
ToolChains/CrossWindows.cpp ToolChains/CrossWindows.cpp
ToolChains/Cuda.cpp ToolChains/Cuda.cpp
Show All 40 Lines

clang/lib/Driver/Driver.cpp

//===--- Driver.cpp - Clang GCC Compatible Driver -------------------------===// //===--- Driver.cpp - Clang GCC Compatible Driver -------------------------===//
// //
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information. // See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
// //
//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//
#include "clang/Driver/Driver.h" #include "clang/Driver/Driver.h"
#include "InputInfo.h" #include "InputInfo.h"
#include "ToolChains/AIX.h" #include "ToolChains/AIX.h"
#include "ToolChains/AMDGPU.h" #include "ToolChains/AMDGPU.h"
#include "ToolChains/AMDGPUOpenMP.h"
#include "ToolChains/AVR.h" #include "ToolChains/AVR.h"
#include "ToolChains/Ananas.h" #include "ToolChains/Ananas.h"
#include "ToolChains/BareMetal.h" #include "ToolChains/BareMetal.h"
#include "ToolChains/Clang.h" #include "ToolChains/Clang.h"
#include "ToolChains/CloudABI.h" #include "ToolChains/CloudABI.h"
#include "ToolChains/Contiki.h" #include "ToolChains/Contiki.h"
#include "ToolChains/CrossWindows.h" #include "ToolChains/CrossWindows.h"
#include "ToolChains/Cuda.h" #include "ToolChains/Cuda.h"
▲ Show 20 LinesShow All 713 Lines▼ Show 20 Lines if (OpenMPTargets->getNumValues()) {
// following iterations. // following iterations.
FoundNormalizedTriples[NormalizedName] = Val; FoundNormalizedTriples[NormalizedName] = Val;
// If the specified target is invalid, emit a diagnostic. // If the specified target is invalid, emit a diagnostic.
if (TT.getArch() == llvm::Triple::UnknownArch) if (TT.getArch() == llvm::Triple::UnknownArch)
Diag(clang::diag::err_drv_invalid_omp_target) << Val; Diag(clang::diag::err_drv_invalid_omp_target) << Val;
else { else {
const ToolChain *TC; const ToolChain *TC;
// CUDA toolchains have to be selected differently. They pair host // Device toolchains have to be selected differently. They pair host
// and device in their implementation. // and device in their implementation.
if (TT.isNVPTX()) { if (TT.isNVPTX()||TT.isAMDGCN()) {
const ToolChain *HostTC = const ToolChain *HostTC =
C.getSingleOffloadToolChain<Action::OFK_Host>(); C.getSingleOffloadToolChain<Action::OFK_Host>();
assert(HostTC && "Host toolchain should be always defined."); assert(HostTC && "Host toolchain should be always defined.");
auto &CudaTC = auto &DeviceTC =
ToolChains[TT.str() + "/" + HostTC->getTriple().normalize()]; ToolChains[TT.str() + "/" + HostTC->getTriple().normalize()];
if (!CudaTC) if (!DeviceTC){
CudaTC=std::make_unique<toolchains::CudaToolChain>( if(TT.isNVPTX())
JonChesterfieldUnsubmitted
Not Done

This follows the same logic as nvptx so is likely to be correct

JonChesterfield: This follows the same logic as nvptx so is likely to be correct
DeviceTC = std::make_unique<toolchains::CudaToolChain>(
*this, TT, *HostTC, C.getInputArgs(), Action::OFK_OpenMP); *this, TT, *HostTC, C.getInputArgs(), Action::OFK_OpenMP);
TC=CudaTC.get(); elseif(TT.isAMDGCN())
JonChesterfieldUnsubmitted
Not Done

Minor suggestion,

if (TT.isNVPTX() {
 ...
} else {
 assert(TT.isAMDGCN());
 ...
}

? Semantically equivalent, but looks like there is a missing else clause.

JonChesterfield: Minor suggestion, ``` if (TT.isNVPTX() { ... } else { assert(TT.isAMDGCN()); ... }``` ?
pdhaliwalAuthorUnsubmitted
Done

I think, having assert in last else is bit cleaner. What do you think?

pdhaliwal: I think, having assert in last else is bit cleaner. What do you think?
JonChesterfieldUnsubmitted
Not Done

I also prefer the if () else {assert()} construct.

JonChesterfield: I also prefer the if () else {assert()} construct.
DeviceTC =
std::make_unique<toolchains::AMDGPUOpenMPToolChain>(
*this, TT, *HostTC, C.getInputArgs());
else
assert(DeviceTC && "Device toolchain not defined.");
}
TC = DeviceTC.get();
} else } else
jdoerfertUnsubmitted
Done

what if we say in line 745, rename the Cuda things to "device" and the toolchain stuff is in a conditional? I see the triple is hardcoded here for some reason and I don't know why not to use TT. Neverhteless, it seems less copying is better, we will eventually have more architectures and 3 * 13 lines means 3 times the bugs and 3 times the required changes in the future.

jdoerfert: what if we say `TT.isNVPTX() || TT.isAMDGCN()` in line 745, rename the Cuda things to "device"…
TC = &getToolChain(C.getInputArgs(), TT); TC = &getToolChain(C.getInputArgs(), TT);
C.addOffloadDeviceToolChain(TC, Action::OFK_OpenMP); C.addOffloadDeviceToolChain(TC, Action::OFK_OpenMP);
} }
} }
} else } else
Diag(clang::diag::err_drv_expecting_fopenmp_with_fopenmp_targets); Diag(clang::diag::err_drv_expecting_fopenmp_with_fopenmp_targets);
} else } else
Diag(clang::diag::warn_drv_empty_joined_argument) Diag(clang::diag::warn_drv_empty_joined_argument)
▲ Show 20 LinesShow All 2,214 Lines▼ Show 20 Lines getDeviceDependences(OffloadAction::DeviceDependences &DA,
"Number of OpenMP actions and toolchains do not match."); "Number of OpenMP actions and toolchains do not match.");
// The host only depends on device action in the linking phase, when all // The host only depends on device action in the linking phase, when all
// the device images have to be embedded in the host image. // the device images have to be embedded in the host image.
if (CurPhase == phases::Link) { if (CurPhase == phases::Link) {
assert(ToolChains.size() == DeviceLinkerInputs.size() && assert(ToolChains.size() == DeviceLinkerInputs.size() &&
"Toolchains and linker inputs sizes do not match."); "Toolchains and linker inputs sizes do not match.");
auto LI = DeviceLinkerInputs.begin(); auto LI = DeviceLinkerInputs.begin();
for (auto *A : OpenMPDeviceActions) { for (auto *A : OpenMPDeviceActions) {
jdoerfertUnsubmitted
Not Done

Why the copy?

jdoerfert: Why the `substr` copy?
JonChesterfieldUnsubmitted
Not Done

As in ? Slightly prettier. Not sure ad hoc arg parsing like this is ever good though

JonChesterfield: As in `VStr.startswith("-march=gfx")`? Slightly prettier. Not sure ad hoc arg parsing like this…
LI->push_back(A); LI->push_back(A);
++LI; ++LI;
} }
// We passed the device action as a host dependence, so we don't need to // We passed the device action as a host dependence, so we don't need to
jdoerfertUnsubmitted
Not Done

This does not look right. IsAMDGCN is not a question that should be answered by a flag.

In general, why skip these passes here. Isn't what you really want to do, pass to the device compilation job?

jdoerfert: This does not look right. IsAMDGCN is not a question that should be answered by a `-Xopenmp…
pdhaliwalAuthorUnsubmitted
Done

I have removed the logic from here instead now I have added -emit-llvm-bc using addClangTargetOptions in AMDGPUOpenMP.

pdhaliwal: I have removed the logic from here instead now I have added -emit-llvm-bc using…
// do anything else with them. // do anything else with them.
OpenMPDeviceActions.clear(); OpenMPDeviceActions.clear();
return ABRT_Success; return ABRT_Success;
} }
// By default, we produce an action for each device arch. // By default, we produce an action for each device arch.
for (Action *&A : OpenMPDeviceActions) for (Action *&A : OpenMPDeviceActions)
A = C.getDriver().ConstructPhaseAction(C, Args, CurPhase, A); A = C.getDriver().ConstructPhaseAction(C, Args, CurPhase, A);
▲ Show 20 LinesShow All 2,391 LinesShow Last 20 Lines

clang/lib/Driver/ToolChains/AMDGPU.h

Show First 20 LinesShow All 58 Lines▼ Show 20 Lines
public: public:
AMDGPUToolChain(const Driver &D, const llvm::Triple &Triple, AMDGPUToolChain(const Driver &D, const llvm::Triple &Triple,
const llvm::opt::ArgList &Args); const llvm::opt::ArgList &Args);
unsigned GetDefaultDwarfVersion() const override { return 4; } unsigned GetDefaultDwarfVersion() const override { return 4; }
bool IsIntegratedAssemblerDefault() const override { return true; } bool IsIntegratedAssemblerDefault() const override { return true; }
bool IsMathErrnoDefault() const override { return false; } bool IsMathErrnoDefault() const override { return false; }
bool useIntegratedAs() const override { return true; }
bool isCrossCompiling() const override { return true; }
bool isPICDefault() const override { return false; }
bool isPIEDefault() const override { return false; }
bool isPICDefaultForced() const override { return false; }
bool SupportsProfiling() const override { return false; }
jdoerfertUnsubmitted
Not Done

What is the coding style here? Pick upper case or lower case, form the above I assume upper case.

jdoerfert: What is the coding style here? Pick upper case or lower case, form the above I assume upper…
JonChesterfieldUnsubmitted
Not Done

Yeah, should be as it's a function. Should probably propose that as a minimal patch to the existing code. There may be other casing mistakes. ConstructJob looks suspect, but all the files call it that instead of constructJob, so maybe I'm missing a part of the style guide.

JonChesterfield: Yeah, should be `supportsProfiling()` as it's a function. Should probably propose that as a…
llvm::opt::DerivedArgList * llvm::opt::DerivedArgList *
TranslateArgs(const llvm::opt::DerivedArgList &Args, StringRef BoundArch, TranslateArgs(const llvm::opt::DerivedArgList &Args, StringRef BoundArch,
Action::OffloadKind DeviceOffloadKind) const override; Action::OffloadKind DeviceOffloadKind) const override;
void addClangTargetOptions(const llvm::opt::ArgList &DriverArgs, void addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args, llvm::opt::ArgStringList &CC1Args,
Action::OffloadKind DeviceOffloadKind) const override; Action::OffloadKind DeviceOffloadKind) const override;
▲ Show 20 LinesShow All 44 LinesShow Last 20 Lines

clang/lib/Driver/ToolChains/AMDGPUOpenMP.h

  • This file was added.
//===- AMDGPUOpenMP.h - AMDGPUOpenMP ToolChain Implementation -*- C++ -*---===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
#ifndef LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_AMDGPUOPENMP_H
#define LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_AMDGPUOPENMP_H
#include "AMDGPU.h"
#include "clang/Driver/Tool.h"
#include "clang/Driver/ToolChain.h"
namespace clang {
namespace driver {
namespace tools {
namespace AMDGCN {
// Runs llvm-link/opt/llc/lld, which links multiple LLVM bitcode, together with
// device library, then compiles it to ISA in a shared object.
class LLVM_LIBRARY_VISIBILITY OpenMPLinker : public Tool {
public:
OpenMPLinker(const ToolChain &TC)
: Tool("AMDGCN::OpenMPLinker", "amdgcn-link", TC) {}
bool hasIntegratedCPP() const override { return false; }
void ConstructJob(Compilation &C, const JobAction &JA,
const InputInfo &Output, const InputInfoList &Inputs,
const llvm::opt::ArgList &TCArgs,
const char *LinkingOutput) const override;
private:
/// \return llvm-link output file name.
const char *constructLLVMLinkCommand(Compilation &C, const JobAction &JA,
const InputInfoList &Inputs,
const llvm::opt::ArgList &Args,
llvm::StringRef SubArchName,
llvm::StringRef OutputFilePrefix) const;
/// \return llc output file name.
const char *constructLlcCommand(Compilation &C, const JobAction &JA,
const InputInfoList &Inputs,
const llvm::opt::ArgList &Args,
llvm::StringRef SubArchName,
llvm::StringRef OutputFilePrefix,
const char *InputFileName,
bool OutputIsAsm = false) const;
void constructLldCommand(Compilation &C, const JobAction &JA,
const InputInfoList &Inputs, const InputInfo &Output,
const llvm::opt::ArgList &Args,
const char *InputFileName) const;
};
} // end namespace AMDGCN
} // end namespace tools
namespace toolchains {
class LLVM_LIBRARY_VISIBILITY AMDGPUOpenMPToolChain final
: public ROCMToolChain {
public:
AMDGPUOpenMPToolChain(const Driver &D, const llvm::Triple &Triple,
const ToolChain &HostTC,
const llvm::opt::ArgList &Args);
const llvm::Triple *getAuxTriple() const override {
return &HostTC.getTriple();
}
llvm::opt::DerivedArgList *
TranslateArgs(const llvm::opt::DerivedArgList &Args, StringRef BoundArch,
Action::OffloadKind DeviceOffloadKind) const override;
void
addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args,
Action::OffloadKind DeviceOffloadKind) const override;
void addClangWarningOptions(llvm::opt::ArgStringList &CC1Args) const override;
CXXStdlibType GetCXXStdlibType(const llvm::opt::ArgList &Args) const override;
void
AddClangSystemIncludeArgs(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args) const override;
void AddIAMCUIncludeArgs(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args) const override;
SanitizerMask getSupportedSanitizers() const override;
VersionTuple
computeMSVCVersion(const Driver *D,
const llvm::opt::ArgList &Args) const override;
const ToolChain &HostTC;
protected:
Tool *buildLinker() const override;
};
} // end namespace toolchains
} // end namespace driver
} // end namespace clang
#endif // LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_AMDGPUOPENMP_H

clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp

  • This file was added.
//===- AMDGPUOpenMP.cpp - AMDGPUOpenMP ToolChain Implementation -*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
#include "AMDGPUOpenMP.h"
#include "AMDGPU.h"
#include "CommonArgs.h"
#include "InputInfo.h"
#include "clang/Driver/Compilation.h"
#include "clang/Driver/Driver.h"
#include "clang/Driver/DriverDiagnostic.h"
#include "clang/Driver/Options.h"
#include "llvm/Support/FileSystem.h"
#include "llvm/Support/Path.h"
using namespace clang::driver;
using namespace clang::driver::toolchains;
using namespace clang::driver::tools;
using namespace clang;
using namespace llvm::opt;
namespace {
static const char *getOutputFileName(Compilation &C, StringRef Base,
const char *Postfix,
const char *Extension) {
const char *OutputFileName;
if (C.getDriver().isSaveTempsEnabled()) {
OutputFileName =
C.getArgs().MakeArgString(Base.str() + Postfix + "." + Extension);
} else {
std::string TmpName =
C.getDriver().GetTemporaryPath(Base.str() + Postfix, Extension);
OutputFileName = C.addTempFile(C.getArgs().MakeArgString(TmpName));
}
return OutputFileName;
}
static void addLLCOptArg(const llvm::opt::ArgList &Args,
llvm::opt::ArgStringList &CmdArgs) {
JonChesterfieldUnsubmitted
Not Done

Maybe rename this since opt is no longer involved. addLLCOptArg or similar? There's probably the same logic in another toolchain

JonChesterfield: Maybe rename this since opt is no longer involved. addLLCOptArg or similar? There's probably…
if (Arg *A = Args.getLastArg(options::OPT_O_Group)) {
StringRef OOpt = "0";
if (A->getOption().matches(options::OPT_O4) ||
A->getOption().matches(options::OPT_Ofast))
OOpt = "3";
else if (A->getOption().matches(options::OPT_O0))
OOpt = "0";
else if (A->getOption().matches(options::OPT_O)) {
// Clang and opt support -Os/-Oz; llc only supports -O0, -O1, -O2 and -O3
// so we map -Os/-Oz to -O2.
// Only clang supports -Og, and maps it to -O1.
// We map anything else to -O2.
OOpt = llvm::StringSwitch<const char *>(A->getValue())
.Case("1", "1")
.Case("2", "2")
.Case("3", "3")
.Case("s", "2")
.Case("z", "2")
.Case("g", "1")
.Default("0");
}
JonChesterfieldUnsubmitted
Not Done

Default("0") and update the comment 'we map anything else'

JonChesterfield: Default("0") and update the comment 'we map anything else'
CmdArgs.push_back(Args.MakeArgString("-O" + OOpt));
jdoerfertUnsubmitted
Not Done

To verify, O0 is not mapped to O2, correct?

jdoerfert: To verify, O0 is not mapped to O2, correct?
JonChesterfieldUnsubmitted
Not Done

I think is an error, though it does look like this will rewrite it to O2 instead. Which seems bad.

JonChesterfield: I think `opt -O0` is an error, though it does look like this will rewrite it to O2 instead.
JonChesterfieldUnsubmitted
Done

Suggest we drop the opt invocation. llvm-link is required for the case where multiple files are passed to clang as a single invocation, but I don't see why we need an opt invocation here. Passing the llvm-link'ed code to llc as-is, without this implicit lto style opt invocation, is probably a better default.

JonChesterfield: Suggest we drop the opt invocation. llvm-link is required for the case where multiple files are…
}
jdoerfertUnsubmitted
Done

Just to make it clear, we should default to O0.

jdoerfert: Just to make it clear, we should default to O0.
}
} // namespace
const char *AMDGCN::OpenMPLinker::constructLLVMLinkCommand(
Compilation &C, const JobAction &JA, const InputInfoList &Inputs,
const ArgList &Args, StringRef SubArchName,
StringRef OutputFilePrefix) const {
ArgStringList CmdArgs;
for (const auto &II : Inputs)
if (II.isFilename())
CmdArgs.push_back(II.getFilename());
// Add an intermediate output file.
CmdArgs.push_back("-o");
const char *OutputFileName =
getOutputFileName(C, OutputFilePrefix, "-linked", "bc");
CmdArgs.push_back(OutputFileName);
const char *Exec =
Args.MakeArgString(getToolChain().GetProgramPath("llvm-link"));
C.addCommand(std::make_unique<Command>(
JA, *this, ResponseFileSupport::AtFileCurCP(), Exec, CmdArgs, Inputs,
InputInfo(&JA, Args.MakeArgString(OutputFileName))));
return OutputFileName;
}
const char *AMDGCN::OpenMPLinker::constructLlcCommand(
Compilation &C, const JobAction &JA, const InputInfoList &Inputs,
const llvm::opt::ArgList &Args, llvm::StringRef SubArchName,
llvm::StringRef OutputFilePrefix, const char *InputFileName,
bool OutputIsAsm) const {
// Construct llc command.
ArgStringList LlcArgs;
// The input to llc is the output from opt.
LlcArgs.push_back(InputFileName);
// Pass optimization arg to llc.
addLLCOptArg(Args, LlcArgs);
LlcArgs.push_back("-mtriple=amdgcn-amd-amdhsa");
LlcArgs.push_back(Args.MakeArgString("-mcpu=" + SubArchName));
LlcArgs.push_back(
Args.MakeArgString(Twine("-filetype=") + (OutputIsAsm ? "asm" : "obj")));
for (const Arg *A : Args.filtered(options::OPT_mllvm)) {
LlcArgs.push_back(A->getValue(0));
}
// Add output filename
LlcArgs.push_back("-o");
const char *LlcOutputFile =
getOutputFileName(C, OutputFilePrefix, "", OutputIsAsm ? "s" : "o");
LlcArgs.push_back(LlcOutputFile);
const char *Llc = Args.MakeArgString(getToolChain().GetProgramPath("llc"));
C.addCommand(std::make_unique<Command>(
JA, *this, ResponseFileSupport::AtFileCurCP(), Llc, LlcArgs, Inputs,
InputInfo(&JA, Args.MakeArgString(LlcOutputFile))));
return LlcOutputFile;
}
void AMDGCN::OpenMPLinker::constructLldCommand(
Compilation &C, const JobAction &JA, const InputInfoList &Inputs,
const InputInfo &Output, const llvm::opt::ArgList &Args,
const char *InputFileName) const {
// Construct lld command.
// The output from ld.lld is an HSA code object file.
ArgStringList LldArgs{"-flavor", "gnu", "--no-undefined",
"-shared", "-o", Output.getFilename(),
InputFileName};
const char *Lld = Args.MakeArgString(getToolChain().GetProgramPath("lld"));
C.addCommand(std::make_unique<Command>(
JA, *this, ResponseFileSupport::AtFileCurCP(), Lld, LldArgs, Inputs,
InputInfo(&JA, Args.MakeArgString(Output.getFilename()))));
}
// For amdgcn the inputs of the linker job are device bitcode and output is
// object file. It calls llvm-link, opt, llc, then lld steps.
void AMDGCN::OpenMPLinker::ConstructJob(Compilation &C, const JobAction &JA,
const InputInfo &Output,
const InputInfoList &Inputs,
const ArgList &Args,
const char *LinkingOutput) const {
assert(getToolChain().getTriple().isAMDGCN() && "Unsupported target");
StringRef GPUArch = Args.getLastArgValue(options::OPT_march_EQ);
assert(GPUArch.startswith("gfx") && "Unsupported sub arch");
// Prefix for temporary file name.
std::string Prefix;
for (const auto &II : Inputs)
if (II.isFilename())
Prefix =
llvm::sys::path::stem(II.getFilename()).str() + "-" + GPUArch.str();
assert(Prefix.length() && "no linker inputs are files ");
// Each command outputs different files.
const char *LLVMLinkCommand =
constructLLVMLinkCommand(C, JA, Inputs, Args, GPUArch, Prefix);
const char *LlcCommand = constructLlcCommand(C, JA, Inputs, Args, GPUArch,
Prefix, LLVMLinkCommand);
constructLldCommand(C, JA, Inputs, Output, Args, LlcCommand);
}
AMDGPUOpenMPToolChain::AMDGPUOpenMPToolChain(const Driver &D,
const llvm::Triple &Triple,
const ToolChain &HostTC,
const ArgList &Args)
: ROCMToolChain(D, Triple, Args), HostTC(HostTC) {
// Lookup binaries into the driver directory, this is used to
// discover the clang-offload-bundler executable.
getProgramPaths().push_back(getDriver().Dir);
}
void AMDGPUOpenMPToolChain::addClangTargetOptions(
const llvm::opt::ArgList &DriverArgs, llvm::opt::ArgStringList &CC1Args,
Action::OffloadKind DeviceOffloadingKind) const {
HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind);
StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_march_EQ);
assert(!GpuArch.empty() && "Must have an explicit GPU arch.");
assert(DeviceOffloadingKind == Action::OFK_OpenMP &&
"Only OpenMP offloading kinds are supported.");
CC1Args.push_back("-target-cpu");
CC1Args.push_back(DriverArgs.MakeArgStringRef(GpuArch));
CC1Args.push_back("-fcuda-is-device");
CC1Args.push_back("-emit-llvm-bc");
}
llvm::opt::DerivedArgList *AMDGPUOpenMPToolChain::TranslateArgs(
const llvm::opt::DerivedArgList &Args, StringRef BoundArch,
Action::OffloadKind DeviceOffloadKind) const {
DerivedArgList *DAL =
HostTC.TranslateArgs(Args, BoundArch, DeviceOffloadKind);
if (!DAL)
DAL = new DerivedArgList(Args.getBaseArgs());
const OptTable &Opts = getDriver().getOpts();
if (DeviceOffloadKind != Action::OFK_OpenMP) {
for (Arg *A : Args) {
DAL->append(A);
}
}
if (!BoundArch.empty()) {
DAL->eraseArg(options::OPT_march_EQ);
DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ),
BoundArch);
}
return DAL;
}
Tool *AMDGPUOpenMPToolChain::buildLinker() const {
assert(getTriple().isAMDGCN());
return new tools::AMDGCN::OpenMPLinker(*this);
}
void AMDGPUOpenMPToolChain::addClangWarningOptions(
ArgStringList &CC1Args) const {
HostTC.addClangWarningOptions(CC1Args);
}
ToolChain::CXXStdlibType
AMDGPUOpenMPToolChain::GetCXXStdlibType(const ArgList &Args) const {
return HostTC.GetCXXStdlibType(Args);
}
void AMDGPUOpenMPToolChain::AddClangSystemIncludeArgs(
const ArgList &DriverArgs, ArgStringList &CC1Args) const {
HostTC.AddClangSystemIncludeArgs(DriverArgs, CC1Args);
}
void AMDGPUOpenMPToolChain::AddIAMCUIncludeArgs(const ArgList &Args,
ArgStringList &CC1Args) const {
HostTC.AddIAMCUIncludeArgs(Args, CC1Args);
}
SanitizerMask AMDGPUOpenMPToolChain::getSupportedSanitizers() const {
// The AMDGPUOpenMPToolChain only supports sanitizers in the sense that it
// allows sanitizer arguments on the command line if they are supported by the
// host toolchain. The AMDGPUOpenMPToolChain will actually ignore any command
// line arguments for any of these "supported" sanitizers. That means that no
// sanitization of device code is actually supported at this time.
//
// This behavior is necessary because the host and device toolchains
// invocations often share the command line, so the device toolchain must
// tolerate flags meant only for the host toolchain.
return HostTC.getSupportedSanitizers();
}
VersionTuple
AMDGPUOpenMPToolChain::computeMSVCVersion(const Driver *D,
const ArgList &Args) const {
return HostTC.computeMSVCVersion(D, Args);
}

clang/lib/Driver/ToolChains/HIP.h

Show First 20 LinesShow All 65 Lines▼ Show 20 Linespublic:
} }
llvm::opt::DerivedArgList * llvm::opt::DerivedArgList *
TranslateArgs(const llvm::opt::DerivedArgList &Args, StringRef BoundArch, TranslateArgs(const llvm::opt::DerivedArgList &Args, StringRef BoundArch,
Action::OffloadKind DeviceOffloadKind) const override; Action::OffloadKind DeviceOffloadKind) const override;
void addClangTargetOptions(const llvm::opt::ArgList &DriverArgs, void addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args, llvm::opt::ArgStringList &CC1Args,
Action::OffloadKind DeviceOffloadKind) const override; Action::OffloadKind DeviceOffloadKind) const override;
bool useIntegratedAs() const override { return true; }
bool isCrossCompiling() const override { return true; }
bool isPICDefault() const override { return false; }
bool isPIEDefault() const override { return false; }
bool isPICDefaultForced() const override { return false; }
bool SupportsProfiling() const override { return false; }
bool IsMathErrnoDefault() const override { return false; }
void addClangWarningOptions(llvm::opt::ArgStringList &CC1Args) const override; void addClangWarningOptions(llvm::opt::ArgStringList &CC1Args) const override;
CXXStdlibType GetCXXStdlibType(const llvm::opt::ArgList &Args) const override; CXXStdlibType GetCXXStdlibType(const llvm::opt::ArgList &Args) const override;
void void
AddClangSystemIncludeArgs(const llvm::opt::ArgList &DriverArgs, AddClangSystemIncludeArgs(const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args) const override; llvm::opt::ArgStringList &CC1Args) const override;
void AddClangCXXStdlibIncludeArgs( void AddClangCXXStdlibIncludeArgs(
const llvm::opt::ArgList &Args, const llvm::opt::ArgList &Args,
llvm::opt::ArgStringList &CC1Args) const override; llvm::opt::ArgStringList &CC1Args) const override;
Show All 24 Lines

clang/test/Driver/amdgpu-openmp-toolchain.c

  • This file was added.
// REQUIRES: amdgpu-registered-target
// RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 %s 2>&1 \
// RUN: | FileCheck %s
jdoerfertUnsubmitted
Not Done

Would this at all work without the march flag?

jdoerfert: Would this at all work without the march flag?
pdhaliwalAuthorUnsubmitted
Done

This would not work without -march flag as amdgcn is not forward/backward compatible and there is currently no way to detect system gpu arch.

pdhaliwal: This would not work without -march flag as amdgcn is not forward/backward compatible and there…
JonChesterfieldUnsubmitted
Not Done

The command line invocation openmp needs could be friendlier.

Auto-detecting march is convenient in general but probably bad for clang tests as we want them to run on systems that don't have an amdgpu installed.

JonChesterfield: The command line invocation openmp needs `-fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp…
// verify the tools invocations
// CHECK: clang{{.*}}"-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.*}}"-x" "c"{{.*}}
// CHECK: clang{{.*}}"-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.*}}"-x" "ir"{{.*}}
// CHECK: clang{{.*}}"-cc1"{{.*}}"-triple" "amdgcn-amd-amdhsa"{{.*}}"-target-cpu" "gfx906" "-fcuda-is-device" "-emit-llvm-bc"{{.*}}
// CHECK: llvm-link{{.*}}"-o" "{{.*}}amdgpu-openmp-toolchain-{{.*}}-gfx906-linked-{{.*}}.bc"
// CHECK: llc{{.*}}amdgpu-openmp-toolchain-{{.*}}-gfx906-linked-{{.*}}.bc" "-mtriple=amdgcn-amd-amdhsa" "-mcpu=gfx906" "-filetype=obj" "-o"{{.*}}amdgpu-openmp-toolchain-{{.*}}-gfx906-{{.*}}.o"
// CHECK: lld{{.*}}"-flavor" "gnu" "--no-undefined" "-shared" "-o"{{.*}}amdgpu-openmp-toolchain-{{.*}}.out" "{{.*}}amdgpu-openmp-toolchain-{{.*}}-gfx906-{{.*}}.o"
// CHECK: clang-offload-wrapper{{.*}}"-target" "x86_64-unknown-linux-gnu" "-o" "{{.*}}a-{{.*}}.bc" {{.*}}amdgpu-openmp-toolchain-{{.*}}.out"
// CHECK: clang{{.*}}"-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.*}}"-o" "{{.*}}a-{{.*}}.o" "-x" "ir" "{{.*}}a-{{.*}}.bc"
// CHECK: ld{{.*}}"-o" "a.out"{{.*}}"{{.*}}amdgpu-openmp-toolchain-{{.*}}.o" "{{.*}}a-{{.*}}.o" "-lomp" "-lomptarget"
// RUN: %clang -ccc-print-phases --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 %s 2>&1 \
// RUN: | FileCheck --check-prefix=CHECK-PHASES %s
// phases
// CHECK-PHASES: 0: input, "{{.*}}amdgpu-openmp-toolchain.c", c, (host-openmp)
// CHECK-PHASES: 1: preprocessor, {0}, cpp-output, (host-openmp)
// CHECK-PHASES: 2: compiler, {1}, ir, (host-openmp)
// CHECK-PHASES: 3: backend, {2}, assembler, (host-openmp)
// CHECK-PHASES: 4: assembler, {3}, object, (host-openmp)
// CHECK-PHASES: 5: input, "{{.*}}amdgpu-openmp-toolchain.c", c, (device-openmp)
// CHECK-PHASES: 6: preprocessor, {5}, cpp-output, (device-openmp)
// CHECK-PHASES: 7: compiler, {6}, ir, (device-openmp)
// CHECK-PHASES: 8: offload, "host-openmp (x86_64-unknown-linux-gnu)" {2}, "device-openmp (amdgcn-amd-amdhsa)" {7}, ir
// CHECK-PHASES: 9: backend, {8}, assembler, (device-openmp)
// CHECK-PHASES: 10: assembler, {9}, object, (device-openmp)
// CHECK-PHASES: 11: linker, {10}, image, (device-openmp)
// CHECK-PHASES: 12: offload, "device-openmp (amdgcn-amd-amdhsa)" {11}, image
// CHECK-PHASES: 13: clang-offload-wrapper, {12}, ir, (host-openmp)
// CHECK-PHASES: 14: backend, {13}, assembler, (host-openmp)
// CHECK-PHASES: 15: assembler, {14}, object, (host-openmp)
// CHECK-PHASES: 16: linker, {4, 15}, image, (host-openmp)