Log Flushing Triggers Outages

bmarks
Contributor II

After a lot of trial and error testing, I have come to the conclusion that log flushing is leading to outages with my JSS. I have two Tomcat web apps behind a load balancer and they connect to a 3rd VM running MySQL. All three VM's are RHEL. Each instance of Tomcat is configured to make a maximum number of connections to the DB of 90, and once log flushing starts, both instances of Tomcat spike up to 90 and don't drop down again unless I restart Tomcat. I have let my JSS sit in this state for hours, but once the two VM's max out their connections to the DB, it's over. I haven't made much progress with our TAM, so I figured I'd ask here next. I originally noticed this because technicians in Asian timezones were opening support tickets stating that Casper Imaging would time out and not open. Thoughts?

5 REPLIES 5

bmarks
Contributor II

During the rest of the day however, if I leave MySQL Workbench open and monitor the connections, I don't see more than 40 max at any given time. It's only when log flushing starts that I see a spike.

gabester
Contributor III

I can only say that, anecdotally, I think I have seen this in the past. Only I didn't realize it, but on a similar configuration - 2 tomcat servers on a VIP connecting to a MySQL server (all running on Windows in my case) that the JSS would become unresponsive sometimes. That sometimes almost always was preceded by flushing logs... if memory serves. And we're talking back in JSS 9.2-9.4 days. I've moved away from that day to day proximity to server operations since then.

bmarks
Contributor II

I ended up simplifying my JSS by removing the load balancer and the issue instantly went away at the next log flushing interval.

The problem now is I don't own the load balancer, another team does, and the guy on that team that was helping us is no longer in that position so I have no resources available to figure out why the load balancer might have been causing this issue.

bmarks
Contributor II

Actually, I lied. About one week after the load balancer was removed ( and with no other changes being made) the outages returned. At this time, I am running with one JSS in our DMZ and one internal JSS for management only. Every day like clockwork, when we hit the time of day for log flushing, the connections between the VM in the DMZ and our database max out to 90 and all check-ins fail until Tomcat is restarted.

bentoms
Release Candidate Programs Tester

@bmarks so you have a JSS DMZ connected to the DB which is located on the Internal JSS... & the internal JSS is a limited JSS?