PMP has been great for off-CPU profiling, as long as you remember to strip the binaries. Percona shared a way to make flame graphs from PMP output. Maybe the next improvement can be a tool to make PMP useful for on-CPU profiling.
Remove all stacks that appear to be off-CPU (blocked on a mutex or IO). This won't be exact. I wonder if it will be useful. It won't remove threads that are ready to run but not running. Whether that is an issue might depend on whether a workload runs with more threads than cores.
Assuming you already run PMP for off-CPU profiling then you have the thread stacks. Perhaps this makes them more useful.