In the past I used PMP a lot with some usage of perf. PMP was great for me because my focus was on stalls from IO and synchronization. PMP is simple but good for showing off-cpu stalls. As I catch up to modern performance tools I have begun to use flamegraphs with perf. However, I also started to share bogus flamegraphs because there is a bug and I did not validate that the flamegraphs matched the output from perf report. Fortunately, Domas pointed out the problem.
The bug is FlameGraph issue 165 and the problem is that stackcollapse-perf.pl assumes that all stack traces are weighted equally but the output from perf script does not match that assumption. In the perf script output the stack traces have weights like 1 cycles and 258 cycles (collected via a script like this). If you use those values to compute time spent in each stack trace then your results will match those provided by perf report (at least in the cases I checked). But stackcollapse-perf.pl doesn't parse that number and just assumes each stack trace has weight 1 (see here).
I started with PR 250 but it is ~20 diffs behind and I am not sure it is correct. My variant of PR 250 that works for me is here. And now I get valid flamegraphs again that are extremely useful. Note that I am not an expert in Perl, perf or the FlameGraph repo.
- I am now less confused about perf record
- A fix was pushed!
Post a Comment