I haven't done serious Java development since 2000. I am happy for comments that help me learn more as a lot has changed since then. I was recently confronted with Java that used too much CPU time (Linkbench client for an OSS DBMS). I started with jPMP and the results were useful. I then found VisualVM. It has a CPU sampler and profiler. The sampler uses wall clock time while the profiler uses CPU time.
How much time does the JIT need to do its magic before I attach a CPU profiler? It depends. This article is great but if some decisions depend on the number of calls to a method then you might want to wait for that method to be called enough times. Thus it depends. If the method is called once per database query then this depends on the query response time.
I added -XX:+PrintCompilation to the Linkbench client command line to observe the JIT in action. It looks like most optimization is done within 3 minutes for a workload doing ~1000 QPS based on the output and that seems to match what the article states about compilation thresholds and tiered compilation.
Be careful when comparing profile output between two runs. If method A accounts for 10% of time with server 1 and 20% of time with server 2 that doesn't always mean method A is a problem for server 2. Assume that QPS is 3X better with server 2, then normalize CPU overheads by QPS (CPU / QPS) and the method A / server 2 combination doesn't look so bad.