High System CPU usage

After upgrading a webserver running Apache to Debian Jessie (from Wheezy), I noticed that the system CPU usage was higher.  Running an strace on one of the Apache processes was giving me very little info:

strace -c -p 10112
Process 10112 attached - interrupt to quit
^CProcess 10112 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
95.71 28.549784 3172198 9 4 futex

I had previously tried to spread interrupts across cores and also limited network activity – as they are something one can do to work out why System CPU is high – but they didn’t bring down the CPU usage.

One culprit remained: Futex/Mutex locks. I changed the default Mutex [1] to file and magically the System CPU usage went down.

[1] Apache 2.4 Mutex doc

HHVM notes

  • Impressive throughput improvements (>100%) with the app that I am working on.
  • phpinfo() doesn’t output what you would expect.
  • xhprof output_dir doesn’t get read from ini files, need to set that up in the constructor of XHProfRuns_Default.
  • Set hhvm.server.thread_count to a high value (>=MaxRequestWorkers), otherwise a few slow MySQL queries could bring the server to halt, minimal doc here: HHVM server architecture (worker thread => hhvm.server.thread_count). Suggest to keep it higher while JITing is happening.
  • If using Newrelic, tough luck!
    Unofficial Newrelic HHVM extension uses XHProf internally, so cannot get any data out of your own XHProf usage.
    The extension above relies on agent SDK that has no support for MySQL slow traces.
    Very low MySQL time in transactions.
    Strange traces in transactions.
  • CGI differences (apache_getenv not available use $_SERVER, SCRIPT_NAME will not be the same as REQUEST_URI).
  • Use realpath in imageftbox, relative paths for fonts don’t work.
  • Use Apache 2.4 as it has FastCGI support.
  • hhvm.log.header = true to have datetime in hhvm log.
  • HHVM log will also contain slow sql.
  • .hhbc was getting very huge, turned out it was due to Smarty file caching being enabled (the cached files were themselves php files that HHVM was compiling).
  • .hhbc file is sqlite(3) file that one can query (that is how I worked out the above).
  • High timeout values in memcached was leading to very high System CPU usage.
  • @ wasn’t suppressing (this could be Newrelic related)
  • Friendly folks in the hhvm IRC channel (get link from HHVM homepage), need to be online during daytime in the US.

Keepalived instance not entering FAILED state

When a monitored interface goes down, the instance immediately enters FAILED state and the other instance gets into the MASTER state.

But, if you have a script block to check – say you are monitoring HAProxy – and HAProxy goes down the MASTER will not enter FAILED state, unless you do this:

Set the weight to a negative number (if MASTER priority is 101 and BACKUP priority is 100, the weight could be -2).

This way when HAProxy goes down, the Priority of the master will become 101 -2 = 99, the Backup with a priority of 100 will win the election and enter into MASTER state.

When HAProxy on the master comes back, it’s priority increases by 2 to become 101 again and if you have nopreempt disabled, this instance will enter the MASTER state.