5. Limitations
The following are known issues with the current release.
- A security vulnerability issue required profiling tools to disable all the features
for non-root or non-admin users. As a result, CUPTI cannot profile the application when
using a Windows 419.17 or Linux 418.43 or later driver. More details about the issue
and the solutions can be found on this
web page.
Note: Starting with CUDA 10.2, CUPTI allows tracing features for non-root and non-admin users on desktop platforms. But events and metrics profiling is still restricted for non-root and non-admin users.
- The CUPTI event APIs from the header cupti_events.h and metric APIs from the header cupti_metrics.h are not supported for the devices with compute capability 7.5 and higher. These are replaced by Profiling API and Perfworks metrics API. Refer to the section Migration to the new Profiling API.
- Profiling results might be inconsistent when auto boost is enabled. Profiler tries to disable auto boost by default. But it might fail to do so in some conditions and profiling will continue and results will be inconsistent. API cuptiGetAutoBoostState() can be used to query the auto boost state of the device. This API returns error CUPTI_ERROR_NOT_SUPPORTED on devices that don't support auto boost. Note that auto boost is supported only on certain Tesla devices with compute capability 3.0 and higher.
- CUPTI doesn't populate the activity structures which are deprecated, instead the newer version of the activity structure is filled with the information.
- While collecting events in continuous mode, event reporting may be delayed i.e. event values may be returned by a later call to readEvent(s) API and the event values for the last readEvent(s) API may get lost.
- When profiling events, it is possible that the domain instance that gets profiled gives event value 0 due to absence of workload on the domain instance since CUPTI profiles one instance of the domain by default. To profile all instances of the domain, user can set event group attribute CUPTI_EVENT_GROUP_ATTR_PROFILE_ALL_DOMAIN_INSTANCES through API cuptiEventGroupSetAttribute().
- Starting CUDA Toolkit 9.0, CUPTI doesn't support CUDA Dynamic Parallelism (CDP) kernel launch tracing and source level metrics for devices with compute capability 7.0 and later.
- Events and metrics profiling is not supported on virtual GPUs (vGPU).
- Profiling results might be incorrect for CUDA applications compiled with nvcc version older than 9.0 for devices with compute capability 6.0 and 6.1. Profiling session will continue and CUPTI will notify it using error code CUPTI_ERROR_CUDA_COMPILER_NOT_COMPATIBLE. It is advised to recompile the application code with nvcc version 9.0 or later. Ignore this warning if code is already compiled with the recommended nvcc version
- Because of the low resolution of the timer on Windows, the start and end timestamps can be same for activities having short execution duration on Windows.
- Profiling (event and metric collection) is not supported for multidevice cooperative kernels, that is, kernels launched by using the API functions cudaLaunchCooperativeKernelMultiDevice or cuLaunchCooperativeKernelMultiDevice.
- The application which calls CUPTI APIs cannot be used with Nvidia tools like nvprof, Nvidia Visual Profiler, Nsight Compute, Nsight Systems, Nvidia Nsight Visual Studio Edition, cuda-gdb and cuda-memcheck.
- Profiling is not supported for CUDA kernel nodes launched by a CUDA Graph.
- CUDA runtime and driver API callbacks for kernel launch are not issued when the stream is in the capture mode.
- PCIE and NVLINK records are not captured when CUPTI is initialized lazily after the CUDA initialization.
- CUPTI fails to profile the OpenACC application when the OpenACC library linked with the application has missing definition of the OpenACC API routine/s. This is indicated by the error code CUPTI_ERROR_OPENACC_UNDEFINED_ROUTINE.
- OpenACC profiling might fail when OpenACC library is linked statically in the user application. This happens due to the missing definition of the OpenACC API routines needed for the OpenACC profiling, as compiler might ignore definitions for the functions not used in the application. This issue can be mitigated by linking the OpenACC library dynamically.
- PC Sampling is not supported on Tegra platforms.
- Events and metrics profiling is not supported on virtual GPUs (vGPU).