1/3 amdgpu-openmp-toolchain.c
| [OpenMP] Add OpenMP offloading toolchain for AMDGPU
Authored by pdhaliwal on Jan 19 2021, 3:45 AM. Summary This patch adds AMDGPUOpenMPToolChain for supporting OpenMP
offloading to AMD GPU's.
Originally authored by Greg Rodgers Event TimelineComment ActionsThis patch was written, roughly, by:
- copying the known-working openmp driver from rocm into the trunk source tree
- deleting lots of stuff that didn't look necessary
- deleting some stuff that is broadly necessary, but the specifics are up for debate
The idea is to move language-independent but amdgcn-specific code into ROCMToolChain. Some has already gone in, others (like computeMSVCVersion) will likely move too.
Regarding the rest of the end to end stack:
- host plugin works, same code in trunk / rocm / aomp
- device plugin will work once it's building as openmp, modulo printf and malloc
- compiler backend will work for spmd kernels today, will work for generic kernels after D94648 or equivalent lands
Regarding tests (which need the unimplemented bit filled in with the next patch):
- Runtime tests (for spmd kernels) are working locally with a jury rigged devicertl
- Codegen tests are proving awkward to update. Dropping line number would help, but there's still a difference in addrspace cast distribution. I'm hoping the scripts involved in generating the nvptx cases can be adapted.
Comment ActionsWon't this just prevent us from building clang due to the missing cmake changes? We need somewhat testable chunks. Comment Actions
This patch was written, roughly, by:
- copying the known-working openmp driver from rocm into the trunk source tree
- deleting lots of stuff that didn't look necessary
- deleting some stuff that is broadly necessary, but the specifics are up for debate
The idea is to move language-independent but amdgcn-specific code into ROCMToolChain. Some has already gone in, others (like computeMSVCVersion) will likely move too.
Regarding the rest of the end to end stack:
- host plugin works, same code in trunk / rocm / aomp
- device plugin will work once it's building as openmp, modulo printf and malloc
- compiler backend will work for spmd kernels today, will work for generic kernels after D94648 or equivalent lands
Regarding tests (which need the unimplemented bit filled in with the next patch):
- Runtime tests (for spmd kernels) are working locally with a jury rigged devicertl
- Codegen tests are proving awkward to update. Dropping line number would help, but there's still a difference in addrspace cast distribution. I'm hoping the scripts involved in generating the nvptx cases can be adapted.
Could you add the tests for the tools invocation to check that the newly added classes/functions correctly translate options/flags? Comment ActionsWon't this just prevent us from building clang due to the missing cmake changes?
It compiles and builds fine, however, I wasn't actually aware such sanity checking being present. It turns out
the unknown files inside llvm/ will lead cmake to report error but such reporting will not happen inside clang. Maybe such checks
were not enabled inside clang. Anyways thanks for pointing out. I will keep that in mind in future.
The idea for this patch was basically to introduce AMDGPUToolChain classes without much of the functionality in order
to keep its size in check. And the second patch would have integrated the toolchain with driver along with testing.
But during the intermediate time of the two patches, bare files would have existed (never built and tested).
I have updated this patch to now include somewhat functional driver along with tests. pdhaliwal retitled this revision from [OpenMP] Add OpenMP offloading toolchain skeleton for AMDGPU to [OpenMP] Add OpenMP offloading toolchain for AMDGPU. Jan 20 2021, 2:16 AM Comment Actions
- Moved common methods of HIP and OpenMP to base AMDGPUToolChain
- Removed unnecessary asserts
Comment ActionsThe deviceRTL is changing from cuda/hip to openmp at present. It would be good to be able to compile that as openmp for amdgpu, which needs a patch roughly like this and probably some follow up.
It's plausible that the contents of this file are only really of interest to AMD people, in which case we should probably land it in order to iterate in tree instead of downstream. It now has the cmake hook and a proof of concept test case.
@ABataev @grokos @jdoerfert your guidance would be very welcome, but equally if you'd prefer leave us to sink or swim that's completely reasonable too.
Thanks! Comment ActionsLGTM. But, please wait from someone outside AMD to accept it. Comment ActionsOverall, looks reasonable. I'm not a fan of the way things are hooked up but that is also true for the NVPTX toolchain and for now unavoidable I guess. Some nits below. | clang/lib/Driver/Driver.cpp |
| 2995 | |
| clang/lib/Driver/ToolChains/AMDGPU.h |
| 72 | What is the coding style here? Pick upper case or lower case, form the above I assume upper case. |
| clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp |
| 66 | To verify, O0 is not mapped to O2, correct? |
Comment Actions
Overall, looks reasonable. I'm not a fan of the way things are hooked up but that is also true for the NVPTX toolchain and for now unavoidable I guess.
This is loosely based on the nvptx and hip toolchain files, but looking at others in clang the style seems similar. | clang/lib/Driver/Driver.cpp |
| 2995 | As in ? Slightly prettier. Not sure ad hoc arg parsing like this is ever good though |
| clang/lib/Driver/ToolChains/AMDGPU.h |
| 72 | Yeah, should be as it's a function. Should probably propose that as a minimal patch to the existing code. There may be other casing mistakes. ConstructJob looks suspect, but all the files call it that instead of constructJob, so maybe I'm missing a part of the style guide. |
| clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp |
| 66 | I think is an error, though it does look like this will rewrite it to O2 instead. Which seems bad. |
| clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp |
| 66 | Suggest we drop the opt invocation. llvm-link is required for the case where multiple files are passed to clang as a single invocation, but I don't see why we need an opt invocation here. Passing the llvm-link'ed code to llc as-is, without this implicit lto style opt invocation, is probably a better default. |
Comment ActionsI believe this is 'safe', in the sense that it can't break anything other than AMDGPU's openmp implementation, which doesn't work yet anyway. The change to non-amd code is to OpenMPActionBuilder, and while parsing isn't pretty, it won't fire on nvptx. I'm not particularly confident it is correct, but it's hard to do comprehensive testing before there's an end to end spike. I'd like to iterate in tree, with an expectation that parts of this will need to change - @jdoerfert does the OpenMP patch look adequate for now? | clang/lib/Driver/Driver.cpp |
| 752 | This follows the same logic as nvptx so is likely to be correct |
Comment ActionsGenerally OK, the more I look at it the more things pop up. Other than my comments, I'm fine with this going in. | clang/lib/Driver/Driver.cpp |
| 764 | what if we say in line 745, rename the Cuda things to "device" and the toolchain stuff is in a conditional? I see the triple is hardcoded here for some reason and I don't know why not to use TT. Neverhteless, it seems less copying is better, we will eventually have more architectures and 3 * 13 lines means 3 times the bugs and 3 times the required changes in the future. |
| 3000 | This does not look right. IsAMDGCN is not a question that should be answered by a flag.
In general, why skip these passes here. Isn't what you really want to do, pass to the device compilation job? |
| clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp |
| 67 | Just to make it clear, we should default to O0. |
| clang/test/Driver/amdgpu-openmp-toolchain.c |
| 3 | Would this at all work without the march flag? |
Comment ActionsAddressed review comments.
- Combined the toolchain creation logic for nvptx and amdgcn
- Replaced -Xopenmp-target with -emit-llvm-bc inside AMDGPUOpenMP.cpp
- Removed opt from pipeline
Comment ActionsAfter addressing the review comments, I have internally verified changes on few simple test programs. They seem to be working fine. | clang/lib/Driver/Driver.cpp |
| 3000 | I have removed the logic from here instead now I have added -emit-llvm-bc using addClangTargetOptions in AMDGPUOpenMP. |
| clang/test/Driver/amdgpu-openmp-toolchain.c |
| 3 | This would not work without -march flag as amdgcn is not forward/backward compatible and there is currently no way to detect system gpu arch. |
| clang/test/Driver/amdgpu-openmp-toolchain.c |
| 3 | The command line invocation openmp needs could be friendlier.
Auto-detecting march is convenient in general but probably bad for clang tests as we want them to run on systems that don't have an amdgpu installed. |
Comment ActionsLooks much simpler, thanks! | clang/lib/Driver/Driver.cpp |
| 755 | Minor suggestion,
if (TT.isNVPTX() {
...
} else {
assert(TT.isAMDGCN());
...
}
? Semantically equivalent, but looks like there is a missing else clause. |
| clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp |
| 44 | Maybe rename this since opt is no longer involved. addLLCOptArg or similar? There's probably the same logic in another toolchain |
| 65 | Default("0") and update the comment 'we map anything else' |
Comment Actions
- Use 0 for default -O option
- Rename addOptLevelArgs to addLLCOptArg
| clang/lib/Driver/Driver.cpp |
| 755 | I think, having assert in last else is bit cleaner. What do you think? |
| clang/lib/Driver/Driver.cpp |
| 755 | I also prefer the if () else {assert()} construct. |
Comment Actionsfine with me, assuming this doesn't disturb anything outside the AMD pipeline. This revision was automatically updated to reflect the committed changes. | Path | Size |
|---|
| | | 1 line | | 24 lines | | | 7 lines | | 106 lines | | 262 lines | | 9 lines | | | 36 lines |
clang/lib/Driver/CMakeLists.txt| Show All 30 Lines | add_clang_library(clangDriver
| | ToolChains/Arch/RISCV.cpp
| ToolChains/Arch/RISCV.cpp
| | ToolChains/Arch/Sparc.cpp
| ToolChains/Arch/Sparc.cpp
| | ToolChains/Arch/SystemZ.cpp
| ToolChains/Arch/SystemZ.cpp
| | ToolChains/Arch/VE.cpp
| ToolChains/Arch/VE.cpp
| | ToolChains/Arch/X86.cpp
| ToolChains/Arch/X86.cpp
| | ToolChains/AIX.cpp
| ToolChains/AIX.cpp
| | ToolChains/Ananas.cpp
| ToolChains/Ananas.cpp
| | ToolChains/AMDGPU.cpp
| ToolChains/AMDGPU.cpp
| | ToolChains/AMDGPUOpenMP.cpp
| | ToolChains/AVR.cpp
| ToolChains/AVR.cpp
| | ToolChains/BareMetal.cpp
| ToolChains/BareMetal.cpp
| | ToolChains/Clang.cpp
| ToolChains/Clang.cpp
| | ToolChains/CloudABI.cpp
| ToolChains/CloudABI.cpp
| | ToolChains/CommonArgs.cpp
| ToolChains/CommonArgs.cpp
| | ToolChains/Contiki.cpp
| ToolChains/Contiki.cpp
| | ToolChains/CrossWindows.cpp
| ToolChains/CrossWindows.cpp
| | ToolChains/Cuda.cpp
| ToolChains/Cuda.cpp
| | Show All 40 Lines |
clang/lib/Driver/Driver.cpp| //===--- Driver.cpp - Clang GCC Compatible Driver -------------------------===//
| //===--- Driver.cpp - Clang GCC Compatible Driver -------------------------===//
| | //
| //
| | // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
| // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
| | // See https://llvm.org/LICENSE.txt for license information.
| // See https://llvm.org/LICENSE.txt for license information.
| | // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
| // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
| | //
| //
| | //===----------------------------------------------------------------------===//
| //===----------------------------------------------------------------------===//
| | #include "clang/Driver/Driver.h"
| #include "clang/Driver/Driver.h"
| | #include "InputInfo.h"
| #include "InputInfo.h"
| | #include "ToolChains/AIX.h"
| #include "ToolChains/AIX.h"
| | #include "ToolChains/AMDGPU.h"
| #include "ToolChains/AMDGPU.h"
| | #include "ToolChains/AMDGPUOpenMP.h"
| | #include "ToolChains/AVR.h"
| #include "ToolChains/AVR.h"
| | #include "ToolChains/Ananas.h"
| #include "ToolChains/Ananas.h"
| | #include "ToolChains/BareMetal.h"
| #include "ToolChains/BareMetal.h"
| | #include "ToolChains/Clang.h"
| #include "ToolChains/Clang.h"
| | #include "ToolChains/CloudABI.h"
| #include "ToolChains/CloudABI.h"
| | #include "ToolChains/Contiki.h"
| #include "ToolChains/Contiki.h"
| | #include "ToolChains/CrossWindows.h"
| #include "ToolChains/CrossWindows.h"
| | #include "ToolChains/Cuda.h"
| #include "ToolChains/Cuda.h"
| | ▲ Show 20 Lines • Show All 713 Lines • ▼ Show 20 Lines | if (OpenMPTargets->getNumValues()) {
| | // following iterations.
| // following iterations.
| | FoundNormalizedTriples[NormalizedName] = Val;
| FoundNormalizedTriples[NormalizedName] = Val;
| | // If the specified target is invalid, emit a diagnostic.
| // If the specified target is invalid, emit a diagnostic.
| | if (TT.getArch() == llvm::Triple::UnknownArch)
| if (TT.getArch() == llvm::Triple::UnknownArch)
| | Diag(clang::diag::err_drv_invalid_omp_target) << Val;
| Diag(clang::diag::err_drv_invalid_omp_target) << Val;
| | else {
| else {
| | const ToolChain *TC;
| const ToolChain *TC;
| | // CUDA toolchains have to be selected differently. They pair host
| // Device toolchains have to be selected differently. They pair host
| | // and device in their implementation.
| // and device in their implementation.
| | if (TT.isNVPTX()) {
| if (TT.isNVPTX()||TT.isAMDGCN()) {
| | const ToolChain *HostTC =
| const ToolChain *HostTC =
| | C.getSingleOffloadToolChain<Action::OFK_Host>();
| C.getSingleOffloadToolChain<Action::OFK_Host>();
| | assert(HostTC && "Host toolchain should be always defined.");
| assert(HostTC && "Host toolchain should be always defined.");
| | auto &CudaTC =
| auto &DeviceTC =
| | ToolChains[TT.str() + "/" + HostTC->getTriple().normalize()];
| ToolChains[TT.str() + "/" + HostTC->getTriple().normalize()];
| | if (!CudaTC)
| if (!DeviceTC){
| | CudaTC=std::make_unique<toolchains::CudaToolChain>(
| if(TT.isNVPTX())
| JonChesterfieldUnsubmitted This follows the same logic as nvptx so is likely to be correct JonChesterfield: This follows the same logic as nvptx so is likely to be correct | | DeviceTC = std::make_unique<toolchains::CudaToolChain>(
| | *this, TT, *HostTC, C.getInputArgs(), Action::OFK_OpenMP);
| *this, TT, *HostTC, C.getInputArgs(), Action::OFK_OpenMP);
| | TC=CudaTC.get();
| elseif(TT.isAMDGCN())
| JonChesterfieldUnsubmitted Minor suggestion,
if (TT.isNVPTX() {
...
} else {
assert(TT.isAMDGCN());
...
}
? Semantically equivalent, but looks like there is a missing else clause. JonChesterfield: Minor suggestion,
```
if (TT.isNVPTX() {
...
} else {
assert(TT.isAMDGCN());
...
}```
? | pdhaliwalAuthorUnsubmitted I think, having assert in last else is bit cleaner. What do you think? pdhaliwal: I think, having assert in last else is bit cleaner. What do you think? | JonChesterfieldUnsubmitted I also prefer the if () else {assert()} construct. JonChesterfield: I also prefer the if () else {assert()} construct. | | DeviceTC =
| | std::make_unique<toolchains::AMDGPUOpenMPToolChain>(
| | *this, TT, *HostTC, C.getInputArgs());
| | else
| | assert(DeviceTC && "Device toolchain not defined.");
| | }
| | TC = DeviceTC.get();
| | } else
| } else
| what if we say in line 745, rename the Cuda things to "device" and the toolchain stuff is in a conditional? I see the triple is hardcoded here for some reason and I don't know why not to use TT. Neverhteless, it seems less copying is better, we will eventually have more architectures and 3 * 13 lines means 3 times the bugs and 3 times the required changes in the future. jdoerfert: what if we say `TT.isNVPTX() || TT.isAMDGCN()` in line 745, rename the Cuda things to "device"… | | TC = &getToolChain(C.getInputArgs(), TT);
| TC = &getToolChain(C.getInputArgs(), TT);
| | C.addOffloadDeviceToolChain(TC, Action::OFK_OpenMP);
| C.addOffloadDeviceToolChain(TC, Action::OFK_OpenMP);
| | }
| }
| | }
| }
| | } else
| } else
| | Diag(clang::diag::err_drv_expecting_fopenmp_with_fopenmp_targets);
| Diag(clang::diag::err_drv_expecting_fopenmp_with_fopenmp_targets);
| | } else
| } else
| | Diag(clang::diag::warn_drv_empty_joined_argument)
| Diag(clang::diag::warn_drv_empty_joined_argument)
| | ▲ Show 20 Lines • Show All 2,214 Lines • ▼ Show 20 Lines | getDeviceDependences(OffloadAction::DeviceDependences &DA,
| | "Number of OpenMP actions and toolchains do not match.");
| "Number of OpenMP actions and toolchains do not match.");
| | // The host only depends on device action in the linking phase, when all
| // The host only depends on device action in the linking phase, when all
| | // the device images have to be embedded in the host image.
| // the device images have to be embedded in the host image.
| | if (CurPhase == phases::Link) {
| if (CurPhase == phases::Link) {
| | assert(ToolChains.size() == DeviceLinkerInputs.size() &&
| assert(ToolChains.size() == DeviceLinkerInputs.size() &&
| | "Toolchains and linker inputs sizes do not match.");
| "Toolchains and linker inputs sizes do not match.");
| | auto LI = DeviceLinkerInputs.begin();
| auto LI = DeviceLinkerInputs.begin();
| | for (auto *A : OpenMPDeviceActions) {
| for (auto *A : OpenMPDeviceActions) {
| jdoerfert: Why the `substr` copy? | JonChesterfieldUnsubmitted As in ? Slightly prettier. Not sure ad hoc arg parsing like this is ever good though JonChesterfield: As in `VStr.startswith("-march=gfx")`? Slightly prettier. Not sure ad hoc arg parsing like this… | | LI->push_back(A);
| LI->push_back(A);
| | ++LI;
| ++LI;
| | }
| }
| | // We passed the device action as a host dependence, so we don't need to
| // We passed the device action as a host dependence, so we don't need to
| This does not look right. IsAMDGCN is not a question that should be answered by a flag.
In general, why skip these passes here. Isn't what you really want to do, pass to the device compilation job? jdoerfert: This does not look right. IsAMDGCN is not a question that should be answered by a `-Xopenmp… | pdhaliwalAuthorUnsubmitted I have removed the logic from here instead now I have added -emit-llvm-bc using addClangTargetOptions in AMDGPUOpenMP. pdhaliwal: I have removed the logic from here instead now I have added -emit-llvm-bc using… | | // do anything else with them.
| // do anything else with them.
| | OpenMPDeviceActions.clear();
| OpenMPDeviceActions.clear();
| | return ABRT_Success;
| return ABRT_Success;
| | }
| }
| | // By default, we produce an action for each device arch.
| // By default, we produce an action for each device arch.
| | for (Action *&A : OpenMPDeviceActions)
| for (Action *&A : OpenMPDeviceActions)
| | A = C.getDriver().ConstructPhaseAction(C, Args, CurPhase, A);
| A = C.getDriver().ConstructPhaseAction(C, Args, CurPhase, A);
| | ▲ Show 20 Lines • Show All 2,391 Lines • Show Last 20 Lines |
clang/lib/Driver/ToolChains/AMDGPU.h| Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines | | public:
| public:
| | AMDGPUToolChain(const Driver &D, const llvm::Triple &Triple,
| AMDGPUToolChain(const Driver &D, const llvm::Triple &Triple,
| | const llvm::opt::ArgList &Args);
| const llvm::opt::ArgList &Args);
| | unsigned GetDefaultDwarfVersion() const override { return 4; }
| unsigned GetDefaultDwarfVersion() const override { return 4; }
| | bool IsIntegratedAssemblerDefault() const override { return true; }
| bool IsIntegratedAssemblerDefault() const override { return true; }
| | bool IsMathErrnoDefault() const override { return false; }
| bool IsMathErrnoDefault() const override { return false; }
| | bool useIntegratedAs() const override { return true; }
| | bool isCrossCompiling() const override { return true; }
| | bool isPICDefault() const override { return false; }
| | bool isPIEDefault() const override { return false; }
| | bool isPICDefaultForced() const override { return false; }
| | bool SupportsProfiling() const override { return false; }
| What is the coding style here? Pick upper case or lower case, form the above I assume upper case. jdoerfert: What is the coding style here? Pick upper case or lower case, form the above I assume upper… | JonChesterfieldUnsubmitted Yeah, should be as it's a function. Should probably propose that as a minimal patch to the existing code. There may be other casing mistakes. ConstructJob looks suspect, but all the files call it that instead of constructJob, so maybe I'm missing a part of the style guide. JonChesterfield: Yeah, should be `supportsProfiling()` as it's a function. Should probably propose that as a… | | llvm::opt::DerivedArgList *
| llvm::opt::DerivedArgList *
| | TranslateArgs(const llvm::opt::DerivedArgList &Args, StringRef BoundArch,
| TranslateArgs(const llvm::opt::DerivedArgList &Args, StringRef BoundArch,
| | Action::OffloadKind DeviceOffloadKind) const override;
| Action::OffloadKind DeviceOffloadKind) const override;
| | void addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,
| void addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,
| | llvm::opt::ArgStringList &CC1Args,
| llvm::opt::ArgStringList &CC1Args,
| | Action::OffloadKind DeviceOffloadKind) const override;
| Action::OffloadKind DeviceOffloadKind) const override;
| | ▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines |
clang/lib/Driver/ToolChains/AMDGPUOpenMP.h| //===- AMDGPUOpenMP.h - AMDGPUOpenMP ToolChain Implementation -*- C++ -*---===//
| | //
| | // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
| | // See https://llvm.org/LICENSE.txt for license information.
| | // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
| | //
| | //===----------------------------------------------------------------------===//
| | #ifndef LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_AMDGPUOPENMP_H
| | #define LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_AMDGPUOPENMP_H
| | #include "AMDGPU.h"
| | #include "clang/Driver/Tool.h"
| | #include "clang/Driver/ToolChain.h"
| | namespace clang {
| | namespace driver {
| | namespace tools {
| | namespace AMDGCN {
| | // Runs llvm-link/opt/llc/lld, which links multiple LLVM bitcode, together with
| | // device library, then compiles it to ISA in a shared object.
| | class LLVM_LIBRARY_VISIBILITY OpenMPLinker : public Tool {
| | public:
| | OpenMPLinker(const ToolChain &TC)
| | : Tool("AMDGCN::OpenMPLinker", "amdgcn-link", TC) {}
| | bool hasIntegratedCPP() const override { return false; }
| | void ConstructJob(Compilation &C, const JobAction &JA,
| | const InputInfo &Output, const InputInfoList &Inputs,
| | const llvm::opt::ArgList &TCArgs,
| | const char *LinkingOutput) const override;
| | private:
| | /// \return llvm-link output file name.
| | const char *constructLLVMLinkCommand(Compilation &C, const JobAction &JA,
| | const InputInfoList &Inputs,
| | const llvm::opt::ArgList &Args,
| | llvm::StringRef SubArchName,
| | llvm::StringRef OutputFilePrefix) const;
| | /// \return llc output file name.
| | const char *constructLlcCommand(Compilation &C, const JobAction &JA,
| | const InputInfoList &Inputs,
| | const llvm::opt::ArgList &Args,
| | llvm::StringRef SubArchName,
| | llvm::StringRef OutputFilePrefix,
| | const char *InputFileName,
| | bool OutputIsAsm = false) const;
| | void constructLldCommand(Compilation &C, const JobAction &JA,
| | const InputInfoList &Inputs, const InputInfo &Output,
| | const llvm::opt::ArgList &Args,
| | const char *InputFileName) const;
| | };
| | } // end namespace AMDGCN
| | } // end namespace tools
| | namespace toolchains {
| | class LLVM_LIBRARY_VISIBILITY AMDGPUOpenMPToolChain final
| | : public ROCMToolChain {
| | public:
| | AMDGPUOpenMPToolChain(const Driver &D, const llvm::Triple &Triple,
| | const ToolChain &HostTC,
| | const llvm::opt::ArgList &Args);
| | const llvm::Triple *getAuxTriple() const override {
| | return &HostTC.getTriple();
| | }
| | llvm::opt::DerivedArgList *
| | TranslateArgs(const llvm::opt::DerivedArgList &Args, StringRef BoundArch,
| | Action::OffloadKind DeviceOffloadKind) const override;
| | void
| | addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,
| | llvm::opt::ArgStringList &CC1Args,
| | Action::OffloadKind DeviceOffloadKind) const override;
| | void addClangWarningOptions(llvm::opt::ArgStringList &CC1Args) const override;
| | CXXStdlibType GetCXXStdlibType(const llvm::opt::ArgList &Args) const override;
| | void
| | AddClangSystemIncludeArgs(const llvm::opt::ArgList &DriverArgs,
| | llvm::opt::ArgStringList &CC1Args) const override;
| | void AddIAMCUIncludeArgs(const llvm::opt::ArgList &DriverArgs,
| | llvm::opt::ArgStringList &CC1Args) const override;
| | SanitizerMask getSupportedSanitizers() const override;
| | VersionTuple
| | computeMSVCVersion(const Driver *D,
| | const llvm::opt::ArgList &Args) const override;
| | const ToolChain &HostTC;
| | protected:
| | Tool *buildLinker() const override;
| | };
| | } // end namespace toolchains
| | } // end namespace driver
| | } // end namespace clang
| | #endif // LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_AMDGPUOPENMP_H
|
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp| //===- AMDGPUOpenMP.cpp - AMDGPUOpenMP ToolChain Implementation -*- C++ -*-===//
| | //
| | // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
| | // See https://llvm.org/LICENSE.txt for license information.
| | // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
| | //
| | //===----------------------------------------------------------------------===//
| | #include "AMDGPUOpenMP.h"
| | #include "AMDGPU.h"
| | #include "CommonArgs.h"
| | #include "InputInfo.h"
| | #include "clang/Driver/Compilation.h"
| | #include "clang/Driver/Driver.h"
| | #include "clang/Driver/DriverDiagnostic.h"
| | #include "clang/Driver/Options.h"
| | #include "llvm/Support/FileSystem.h"
| | #include "llvm/Support/Path.h"
| | using namespace clang::driver;
| | using namespace clang::driver::toolchains;
| | using namespace clang::driver::tools;
| | using namespace clang;
| | using namespace llvm::opt;
| | namespace {
| | static const char *getOutputFileName(Compilation &C, StringRef Base,
| | const char *Postfix,
| | const char *Extension) {
| | const char *OutputFileName;
| | if (C.getDriver().isSaveTempsEnabled()) {
| | OutputFileName =
| | C.getArgs().MakeArgString(Base.str() + Postfix + "." + Extension);
| | } else {
| | std::string TmpName =
| | C.getDriver().GetTemporaryPath(Base.str() + Postfix, Extension);
| | OutputFileName = C.addTempFile(C.getArgs().MakeArgString(TmpName));
| | }
| | return OutputFileName;
| | }
| | static void addLLCOptArg(const llvm::opt::ArgList &Args,
| | llvm::opt::ArgStringList &CmdArgs) {
| JonChesterfieldUnsubmitted Maybe rename this since opt is no longer involved. addLLCOptArg or similar? There's probably the same logic in another toolchain JonChesterfield: Maybe rename this since opt is no longer involved. addLLCOptArg or similar? There's probably… | | if (Arg *A = Args.getLastArg(options::OPT_O_Group)) {
| | StringRef OOpt = "0";
| | if (A->getOption().matches(options::OPT_O4) ||
| | A->getOption().matches(options::OPT_Ofast))
| | OOpt = "3";
| | else if (A->getOption().matches(options::OPT_O0))
| | OOpt = "0";
| | else if (A->getOption().matches(options::OPT_O)) {
| | // Clang and opt support -Os/-Oz; llc only supports -O0, -O1, -O2 and -O3
| | // so we map -Os/-Oz to -O2.
| | // Only clang supports -Og, and maps it to -O1.
| | // We map anything else to -O2.
| | OOpt = llvm::StringSwitch<const char *>(A->getValue())
| | .Case("1", "1")
| | .Case("2", "2")
| | .Case("3", "3")
| | .Case("s", "2")
| | .Case("z", "2")
| | .Case("g", "1")
| | .Default("0");
| | }
| JonChesterfieldUnsubmitted Default("0") and update the comment 'we map anything else' JonChesterfield: Default("0") and update the comment 'we map anything else' | | CmdArgs.push_back(Args.MakeArgString("-O" + OOpt));
| To verify, O0 is not mapped to O2, correct? jdoerfert: To verify, O0 is not mapped to O2, correct? | JonChesterfieldUnsubmitted I think is an error, though it does look like this will rewrite it to O2 instead. Which seems bad. JonChesterfield: I think `opt -O0` is an error, though it does look like this will rewrite it to O2 instead. | JonChesterfieldUnsubmitted Suggest we drop the opt invocation. llvm-link is required for the case where multiple files are passed to clang as a single invocation, but I don't see why we need an opt invocation here. Passing the llvm-link'ed code to llc as-is, without this implicit lto style opt invocation, is probably a better default. JonChesterfield: Suggest we drop the opt invocation. llvm-link is required for the case where multiple files are… | | }
| Just to make it clear, we should default to O0. jdoerfert: Just to make it clear, we should default to O0. | | }
| | } // namespace
| | const char *AMDGCN::OpenMPLinker::constructLLVMLinkCommand(
| | Compilation &C, const JobAction &JA, const InputInfoList &Inputs,
| | const ArgList &Args, StringRef SubArchName,
| | StringRef OutputFilePrefix) const {
| | ArgStringList CmdArgs;
| | for (const auto &II : Inputs)
| | if (II.isFilename())
| | CmdArgs.push_back(II.getFilename());
| | // Add an intermediate output file.
| | CmdArgs.push_back("-o");
| | const char *OutputFileName =
| | getOutputFileName(C, OutputFilePrefix, "-linked", "bc");
| | CmdArgs.push_back(OutputFileName);
| | const char *Exec =
| | Args.MakeArgString(getToolChain().GetProgramPath("llvm-link"));
| | C.addCommand(std::make_unique<Command>(
| | JA, *this, ResponseFileSupport::AtFileCurCP(), Exec, CmdArgs, Inputs,
| | InputInfo(&JA, Args.MakeArgString(OutputFileName))));
| | return OutputFileName;
| | }
| | const char *AMDGCN::OpenMPLinker::constructLlcCommand(
| | Compilation &C, const JobAction &JA, const InputInfoList &Inputs,
| | const llvm::opt::ArgList &Args, llvm::StringRef SubArchName,
| | llvm::StringRef OutputFilePrefix, const char *InputFileName,
| | bool OutputIsAsm) const {
| | // Construct llc command.
| | ArgStringList LlcArgs;
| | // The input to llc is the output from opt.
| | LlcArgs.push_back(InputFileName);
| | // Pass optimization arg to llc.
| | addLLCOptArg(Args, LlcArgs);
| | LlcArgs.push_back("-mtriple=amdgcn-amd-amdhsa");
| | LlcArgs.push_back(Args.MakeArgString("-mcpu=" + SubArchName));
| | LlcArgs.push_back(
| | Args.MakeArgString(Twine("-filetype=") + (OutputIsAsm ? "asm" : "obj")));
| | for (const Arg *A : Args.filtered(options::OPT_mllvm)) {
| | LlcArgs.push_back(A->getValue(0));
| | }
| | // Add output filename
| | LlcArgs.push_back("-o");
| | const char *LlcOutputFile =
| | getOutputFileName(C, OutputFilePrefix, "", OutputIsAsm ? "s" : "o");
| | LlcArgs.push_back(LlcOutputFile);
| | const char *Llc = Args.MakeArgString(getToolChain().GetProgramPath("llc"));
| | C.addCommand(std::make_unique<Command>(
| | JA, *this, ResponseFileSupport::AtFileCurCP(), Llc, LlcArgs, Inputs,
| | InputInfo(&JA, Args.MakeArgString(LlcOutputFile))));
| | return LlcOutputFile;
| | }
| | void AMDGCN::OpenMPLinker::constructLldCommand(
| | Compilation &C, const JobAction &JA, const InputInfoList &Inputs,
| | const InputInfo &Output, const llvm::opt::ArgList &Args,
| | const char *InputFileName) const {
| | // Construct lld command.
| | // The output from ld.lld is an HSA code object file.
| | ArgStringList LldArgs{"-flavor", "gnu", "--no-undefined",
| | "-shared", "-o", Output.getFilename(),
| | InputFileName};
| | const char *Lld = Args.MakeArgString(getToolChain().GetProgramPath("lld"));
| | C.addCommand(std::make_unique<Command>(
| | JA, *this, ResponseFileSupport::AtFileCurCP(), Lld, LldArgs, Inputs,
| | InputInfo(&JA, Args.MakeArgString(Output.getFilename()))));
| | }
| | // For amdgcn the inputs of the linker job are device bitcode and output is
| | // object file. It calls llvm-link, opt, llc, then lld steps.
| | void AMDGCN::OpenMPLinker::ConstructJob(Compilation &C, const JobAction &JA,
| | const InputInfo &Output,
| | const InputInfoList &Inputs,
| | const ArgList &Args,
| | const char *LinkingOutput) const {
| | assert(getToolChain().getTriple().isAMDGCN() && "Unsupported target");
| | StringRef GPUArch = Args.getLastArgValue(options::OPT_march_EQ);
| | assert(GPUArch.startswith("gfx") && "Unsupported sub arch");
| | // Prefix for temporary file name.
| | std::string Prefix;
| | for (const auto &II : Inputs)
| | if (II.isFilename())
| | Prefix =
| | llvm::sys::path::stem(II.getFilename()).str() + "-" + GPUArch.str();
| | assert(Prefix.length() && "no linker inputs are files ");
| | // Each command outputs different files.
| | const char *LLVMLinkCommand =
| | constructLLVMLinkCommand(C, JA, Inputs, Args, GPUArch, Prefix);
| | const char *LlcCommand = constructLlcCommand(C, JA, Inputs, Args, GPUArch,
| | Prefix, LLVMLinkCommand);
| | constructLldCommand(C, JA, Inputs, Output, Args, LlcCommand);
| | }
| | AMDGPUOpenMPToolChain::AMDGPUOpenMPToolChain(const Driver &D,
| | const llvm::Triple &Triple,
| | const ToolChain &HostTC,
| | const ArgList &Args)
| | : ROCMToolChain(D, Triple, Args), HostTC(HostTC) {
| | // Lookup binaries into the driver directory, this is used to
| | // discover the clang-offload-bundler executable.
| | getProgramPaths().push_back(getDriver().Dir);
| | }
| | void AMDGPUOpenMPToolChain::addClangTargetOptions(
| | const llvm::opt::ArgList &DriverArgs, llvm::opt::ArgStringList &CC1Args,
| | Action::OffloadKind DeviceOffloadingKind) const {
| | HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind);
| | StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_march_EQ);
| | assert(!GpuArch.empty() && "Must have an explicit GPU arch.");
| | assert(DeviceOffloadingKind == Action::OFK_OpenMP &&
| | "Only OpenMP offloading kinds are supported.");
| | CC1Args.push_back("-target-cpu");
| | CC1Args.push_back(DriverArgs.MakeArgStringRef(GpuArch));
| | CC1Args.push_back("-fcuda-is-device");
| | CC1Args.push_back("-emit-llvm-bc");
| | }
| | llvm::opt::DerivedArgList *AMDGPUOpenMPToolChain::TranslateArgs(
| | const llvm::opt::DerivedArgList &Args, StringRef BoundArch,
| | Action::OffloadKind DeviceOffloadKind) const {
| | DerivedArgList *DAL =
| | HostTC.TranslateArgs(Args, BoundArch, DeviceOffloadKind);
| | if (!DAL)
| | DAL = new DerivedArgList(Args.getBaseArgs());
| | const OptTable &Opts = getDriver().getOpts();
| | if (DeviceOffloadKind != Action::OFK_OpenMP) {
| | for (Arg *A : Args) {
| | DAL->append(A);
| | }
| | }
| | if (!BoundArch.empty()) {
| | DAL->eraseArg(options::OPT_march_EQ);
| | DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ),
| | BoundArch);
| | }
| | return DAL;
| | }
| | Tool *AMDGPUOpenMPToolChain::buildLinker() const {
| | assert(getTriple().isAMDGCN());
| | return new tools::AMDGCN::OpenMPLinker(*this);
| | }
| | void AMDGPUOpenMPToolChain::addClangWarningOptions(
| | ArgStringList &CC1Args) const {
| | HostTC.addClangWarningOptions(CC1Args);
| | }
| | ToolChain::CXXStdlibType
| | AMDGPUOpenMPToolChain::GetCXXStdlibType(const ArgList &Args) const {
| | return HostTC.GetCXXStdlibType(Args);
| | }
| | void AMDGPUOpenMPToolChain::AddClangSystemIncludeArgs(
| | const ArgList &DriverArgs, ArgStringList &CC1Args) const {
| | HostTC.AddClangSystemIncludeArgs(DriverArgs, CC1Args);
| | }
| | void AMDGPUOpenMPToolChain::AddIAMCUIncludeArgs(const ArgList &Args,
| | ArgStringList &CC1Args) const {
| | HostTC.AddIAMCUIncludeArgs(Args, CC1Args);
| | }
| | SanitizerMask AMDGPUOpenMPToolChain::getSupportedSanitizers() const {
| | // The AMDGPUOpenMPToolChain only supports sanitizers in the sense that it
| | // allows sanitizer arguments on the command line if they are supported by the
| | // host toolchain. The AMDGPUOpenMPToolChain will actually ignore any command
| | // line arguments for any of these "supported" sanitizers. That means that no
| | // sanitization of device code is actually supported at this time.
| | //
| | // This behavior is necessary because the host and device toolchains
| | // invocations often share the command line, so the device toolchain must
| | // tolerate flags meant only for the host toolchain.
| | return HostTC.getSupportedSanitizers();
| | }
| | VersionTuple
| | AMDGPUOpenMPToolChain::computeMSVCVersion(const Driver *D,
| | const ArgList &Args) const {
| | return HostTC.computeMSVCVersion(D, Args);
| | }
|
clang/lib/Driver/ToolChains/HIP.h| Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines | public:
| | }
| }
| | llvm::opt::DerivedArgList *
| llvm::opt::DerivedArgList *
| | TranslateArgs(const llvm::opt::DerivedArgList &Args, StringRef BoundArch,
| TranslateArgs(const llvm::opt::DerivedArgList &Args, StringRef BoundArch,
| | Action::OffloadKind DeviceOffloadKind) const override;
| Action::OffloadKind DeviceOffloadKind) const override;
| | void addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,
| void addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,
| | llvm::opt::ArgStringList &CC1Args,
| llvm::opt::ArgStringList &CC1Args,
| | Action::OffloadKind DeviceOffloadKind) const override;
| Action::OffloadKind DeviceOffloadKind) const override;
| | bool useIntegratedAs() const override { return true; }
| | bool isCrossCompiling() const override { return true; }
| | bool isPICDefault() const override { return false; }
| | bool isPIEDefault() const override { return false; }
| | bool isPICDefaultForced() const override { return false; }
| | bool SupportsProfiling() const override { return false; }
| | bool IsMathErrnoDefault() const override { return false; }
| | void addClangWarningOptions(llvm::opt::ArgStringList &CC1Args) const override;
| void addClangWarningOptions(llvm::opt::ArgStringList &CC1Args) const override;
| | CXXStdlibType GetCXXStdlibType(const llvm::opt::ArgList &Args) const override;
| CXXStdlibType GetCXXStdlibType(const llvm::opt::ArgList &Args) const override;
| | void
| void
| | AddClangSystemIncludeArgs(const llvm::opt::ArgList &DriverArgs,
| AddClangSystemIncludeArgs(const llvm::opt::ArgList &DriverArgs,
| | llvm::opt::ArgStringList &CC1Args) const override;
| llvm::opt::ArgStringList &CC1Args) const override;
| | void AddClangCXXStdlibIncludeArgs(
| void AddClangCXXStdlibIncludeArgs(
| | const llvm::opt::ArgList &Args,
| const llvm::opt::ArgList &Args,
| | llvm::opt::ArgStringList &CC1Args) const override;
| llvm::opt::ArgStringList &CC1Args) const override;
| | Show All 24 Lines |
clang/test/Driver/amdgpu-openmp-toolchain.c| // REQUIRES: amdgpu-registered-target
| | // RUN: %clang -### --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 %s 2>&1 \
| | // RUN: | FileCheck %s
| Would this at all work without the march flag? jdoerfert: Would this at all work without the march flag? | pdhaliwalAuthorUnsubmitted This would not work without -march flag as amdgcn is not forward/backward compatible and there is currently no way to detect system gpu arch. pdhaliwal: This would not work without -march flag as amdgcn is not forward/backward compatible and there… | JonChesterfieldUnsubmitted The command line invocation openmp needs could be friendlier.
Auto-detecting march is convenient in general but probably bad for clang tests as we want them to run on systems that don't have an amdgpu installed. JonChesterfield: The command line invocation openmp needs `-fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp… | | // verify the tools invocations
| | // CHECK: clang{{.*}}"-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.*}}"-x" "c"{{.*}}
| | // CHECK: clang{{.*}}"-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.*}}"-x" "ir"{{.*}}
| | // CHECK: clang{{.*}}"-cc1"{{.*}}"-triple" "amdgcn-amd-amdhsa"{{.*}}"-target-cpu" "gfx906" "-fcuda-is-device" "-emit-llvm-bc"{{.*}}
| | // CHECK: llvm-link{{.*}}"-o" "{{.*}}amdgpu-openmp-toolchain-{{.*}}-gfx906-linked-{{.*}}.bc"
| | // CHECK: llc{{.*}}amdgpu-openmp-toolchain-{{.*}}-gfx906-linked-{{.*}}.bc" "-mtriple=amdgcn-amd-amdhsa" "-mcpu=gfx906" "-filetype=obj" "-o"{{.*}}amdgpu-openmp-toolchain-{{.*}}-gfx906-{{.*}}.o"
| | // CHECK: lld{{.*}}"-flavor" "gnu" "--no-undefined" "-shared" "-o"{{.*}}amdgpu-openmp-toolchain-{{.*}}.out" "{{.*}}amdgpu-openmp-toolchain-{{.*}}-gfx906-{{.*}}.o"
| | // CHECK: clang-offload-wrapper{{.*}}"-target" "x86_64-unknown-linux-gnu" "-o" "{{.*}}a-{{.*}}.bc" {{.*}}amdgpu-openmp-toolchain-{{.*}}.out"
| | // CHECK: clang{{.*}}"-cc1" "-triple" "x86_64-unknown-linux-gnu"{{.*}}"-o" "{{.*}}a-{{.*}}.o" "-x" "ir" "{{.*}}a-{{.*}}.bc"
| | // CHECK: ld{{.*}}"-o" "a.out"{{.*}}"{{.*}}amdgpu-openmp-toolchain-{{.*}}.o" "{{.*}}a-{{.*}}.o" "-lomp" "-lomptarget"
| | // RUN: %clang -ccc-print-phases --target=x86_64-unknown-linux-gnu -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx906 %s 2>&1 \
| | // RUN: | FileCheck --check-prefix=CHECK-PHASES %s
| | // phases
| | // CHECK-PHASES: 0: input, "{{.*}}amdgpu-openmp-toolchain.c", c, (host-openmp)
| | // CHECK-PHASES: 1: preprocessor, {0}, cpp-output, (host-openmp)
| | // CHECK-PHASES: 2: compiler, {1}, ir, (host-openmp)
| | // CHECK-PHASES: 3: backend, {2}, assembler, (host-openmp)
| | // CHECK-PHASES: 4: assembler, {3}, object, (host-openmp)
| | // CHECK-PHASES: 5: input, "{{.*}}amdgpu-openmp-toolchain.c", c, (device-openmp)
| | // CHECK-PHASES: 6: preprocessor, {5}, cpp-output, (device-openmp)
| | // CHECK-PHASES: 7: compiler, {6}, ir, (device-openmp)
| | // CHECK-PHASES: 8: offload, "host-openmp (x86_64-unknown-linux-gnu)" {2}, "device-openmp (amdgcn-amd-amdhsa)" {7}, ir
| | // CHECK-PHASES: 9: backend, {8}, assembler, (device-openmp)
| | // CHECK-PHASES: 10: assembler, {9}, object, (device-openmp)
| | // CHECK-PHASES: 11: linker, {10}, image, (device-openmp)
| | // CHECK-PHASES: 12: offload, "device-openmp (amdgcn-amd-amdhsa)" {11}, image
| | // CHECK-PHASES: 13: clang-offload-wrapper, {12}, ir, (host-openmp)
| | // CHECK-PHASES: 14: backend, {13}, assembler, (host-openmp)
| | // CHECK-PHASES: 15: assembler, {14}, object, (host-openmp)
| | // CHECK-PHASES: 16: linker, {4, 15}, image, (host-openmp)
|
|