198 points by Flex247A 8 months ago | 27 comments
vblanco 8 months ago
Tracy requires you to add macros to your codebase to log functions/scopes, so its not an automatic sampling profiler like superluminal, verysleepy, VS profiler, or others. Each of those macros has around 50 nanoseconds of overhead, so you can liberally use them in the millions. On the UI, it has a stats window that will record average, deviation, min/max of those profiler zones, which can be used to profile functions at the level of single nanoseconds.
Its the main thing i use for all my profiling and optimization work. I combine it with superluminal (sampling profiler) to get a high level overview of the program, then i put tracy zones on the important places to get the detailed information.
eagle2com 8 months ago
vblanco 8 months ago
forrestthewoods 8 months ago
My only issue with Superluminal is I can’t get proper callstacks for interpreted languages like Python. It treats all the CPP callstacks as the same. Not sure if Tracy can handle that nicely or not…
forrestthewoods 8 months ago
Flex247A 8 months ago
Thanks for the good work.
Flex247A 8 months ago
Web demo of Tracy: https://tracy.nereid.pl/
This blows my mind. It's so fast and responsive I never expected a WebAssembly application to be!
gcr 8 months ago
mastax 8 months ago
mastax 8 months ago
Is the latest windows build broken for anyone else? It doesn’t start. In WinDbg it looks like it dereferences a null pointer. I built it myself and it works fine.
mastax 8 months ago
mastax 8 months ago
MSVC changed the mutex constructor to constexpr, breaking binary backward compatibility. They say WONTFIX, you must use the latest MSVCRT with the latest MSVC. But I have the latest MSVCRT installed? Whatever - a workaround was pushed to master yesterday.
drpossum 8 months ago
simonask 8 months ago
The resolution of the actual measurements depends on the kind of measurement:
1. If the measurement is based on high resolution timers on the CPU, the resolution depends on the hardware and the OS. On Windows, `QueryPerformanceFrequency()` returns the resolution, and I believe it is often in the order of 10s or 100s of nanoseconds.
2. If the measurement is based on GPU-side performance counters, it depends on the driver and the hardware. Graphics APIs allow you to query the "time-per-tick" value to translate from performance counters to nanoseconds. Performance counters can be down to "number of instructions executed", and since a single instruction can be on the order of 1-2 nanoseconds in some cases, translating a performance counter value to a time period requires nanosecond precision.
3. Modern GPUs also include their own high-precision timers for profiling things that are not necessarily easy to capture with performance counters (like barriers, contention, and complicated cache interactions).
drpossum 8 months ago
There was even a discussion on this not long ago on how to market to technical folks and things to not do (this is one of the things not to do)
vardump 8 months ago
https://github.com/wolfpld/tracy/blob/master/public/client/T...
Galanwe 8 months ago
vardump 8 months ago
Galanwe 8 months ago
vardump 8 months ago
Again, due to bad calibration code the measured timestamps have quite a bit jitter.
Edit: TSC might not be synchronized in multi-socket systems. (Multiple physical CPU sockets). That can generate a large error.
cwbaker400 8 months ago
Green-Man 8 months ago
https://github.com/yse/easy_profiler
Especially interesting if based on real practical experience.
boywitharupee 8 months ago
for ex. Apple wraps Metal buffers as "Debug" buffers to record allocations/deallocations.
MindSpunk 8 months ago
But in principle it’s not that different to how you just grab timestamps on the CPU. On Vulkan the API used is called “timestamp queries”
It’s quite tricky on tiled renderers like Arm/Qualcomm/Apple as they can’t provide meaningful timestamps at much tighter granularity than a whole renderpass. I believe Metal only allows you to query timestamps at the encoder level, which roughly maps to a render pass in Vulkan (at the hardware level anyway)
ossobuco 8 months ago
- 0: https://github.com/brendan-duncan/webgpu_inspector/blob/main...
e-dant 8 months ago
The overheard of observing the time is around 60 nanoseconds.
Light travels about a foot in a nanosecond.
Below ~500 nanoseconds, things get fuzzy.
throwawaymaths 8 months ago