automatic JSS Status, Health, Policies, Clients monitoring - anyone?

aamjohns
Contributor II

Please delete this post.

13 REPLIES 13

bpavlov
Honored Contributor

Why aren't you testing the policy before deploying to the rest of your company? Setup VMs for each OS, enroll them in Casper, and run the policy you've setup and gather your results that way before expanding the policy scope.

There was also a feature request not long ago to be able to do a 'test run' within Casper so that you can see what steps the policy would take before actually enabling it. Not sure what the status of that is or what the thread was called though. May be relevant to what you're looking for.

aamjohns
Contributor II

Sorry. The question was about the forest, not the trees. I don't think that is easy conveyed so I'm just not going to ask.

@bpavlov - thank you for your response.

mm2270
Legendary Contributor III

I'm not against setting up test environments; its generally a very good idea to do that, but I don't know that a test policy scoped to test Macs would have caught something like a miscalculated Smart Group (due to human error or general Smart Group oddities) for the actual policy. Just sayin'

I think it would actually be a nice feature to set a threshold for failure rates, either how many within a time period, or just how many are reasonable to expect before being notified. You can get failure emails right now from the JSS, but you get ALL of the failures and so unless you have something set up to filter those emails and then alert you when it sees a certain number of them (for a given policy) hit your mailbox, its not always easy to use the regular email method for this. Since you may be getting errors from many different policies sent to you, you can't just say, once my sub folder hits a certain number of emails. Also, we all know many "failures" are not failures at all, but silly things like a failover sharepoint needing to be used to complete the policy, which the JSS registers as a failure.

aamjohns
Contributor II

Hi mm2270,
I guess you got in before I removed my original post.

I think you understand what I mean. I agree with filtering, thresholds, customization of what is monitored and when to notify. I also envision something like a webpage that one can go and see a summary of data that may be of interest, but not emergency level. One might see an unusual pattern in the summary and want to look into what is causing it.

My two initial thoughts are the API and the MySQL. I'm leaning to the MySQL where I could build some views, and query for criteria that I want to be notified about.

That is where I am right now. There have been some things over the last couple of years that this would have helped me get a fix in place quicker, or catch accidental mistakes and keep them from propogating or wasting cpu cycles.

In addition, there are things that can happen that we have never experienced before and something watching for oddities and such would be nice. Nice to know it is there.

I think something somewhat analogous would be a system used here that monitors event log data, network traffic, these typs of things to catch owned machines or accounts. Phishing emails. This is not our department, but same idea, looking for patterns and saying 'hey you might want to take a closer look at me because I might be bad'.

ttyl mm2270.

aamjohns
Contributor II

Oh, and people do make mistakes. How long until you figure that out just depends.

bpavlov
Honored Contributor

@mm2270 I do believe the test VMs would have caught it in this case. If he forgot to create a criteria of needing of a specific OS he would picked up that it installed on an OS that it shouldn't have (or at least the logs would have displayed some failure if the policy continued to fail because the package couldn't run).

@aamjohns I hope my response didn't cause you to delete your post. I think what you run into is a common problem (I'm sure not immune to it). I was only making a recommendation based on the situation your described. The idea of setting a failure threshold does seem interesting though. If a policy fails X times, have it automatically disable itself. However as it stands a policy can fail for rather silly reasons (fails to mount a DP on the first attempt but successfully mounts the failover DP and installs package successfully). So I think this would work great so long as JAMF resolved how a policy status is determined because right now it can be rather unhelpful.

To take your idea a step further, granular failure thresholds:
Specify whether to disable a policy based on how many failed actions occur. Action being the step you decided to take within the policy. That's to say if Maintenance-Inventory Update fails X times then disable the policy. If a script fails to run, then disable the policy. If a pkg or dmg fails to install, then disable the policy. Hopefully you get the idea.

mm2270
Legendary Contributor III

@aamjohns Agreed, we are human and make mistakes. Sometimes those mistakes are of the "geez, I'm an idiot that didn't have enough coffee before I did that" variety, and sometimes they are of the "I didn't know it actually worked that way because its not well documented" type. Meaning, I've made errors with Smart Groups that are the result of the fact that sometimes getting complex Smart Groups just right is hard and not always intuitive. One item set wrong can throw the whole thing off in some cases.

BTW, @bpavlov was referring to my Feature Request here I think. Since version 9 now let's us see the scope more easily for policies than in the past, this feature request has morphed a bit. Basically, it would be great to see what will happen when a policy is enabled before actually enabling it. Again, test servers and test systems can be great, but they can't possibly catch all issues that will occur in the production environment.

aamjohns
Contributor II

Hi bpavlov,
I deleted my post because I felt pretty sure that it was not going to be interpreted as I intended, due to my wording. And I didn't want to discuss best practices for ensuring successful policy deployment. I just used that as an example. Don't feel bad. I just think I worded my question poorly.

And to let you know, where I work I don't have much time to do testing like you describe. Casper admin is just one of the things I do and I try to do it smart, but at the same time, I don't have time to test a policy against VMs of different OS versions, nor am I provided the resources here to do that. I wish I could just sit and focus on Casper, but I have to make do the best I can in the shortest time possible.

I'm just glad we even have Casper. A couple of years ago we had no management of our Macs at all. Casper has been a wonderful addition.

On to your discussion of granularity in monitoring, I agree. I also agree if you come in one morning and open you email and see that something is failing that shouldn't be, I'd rather know at that point in time rather than later when a trouble ticket or just checking the JSS led to me finding the issue.

I also am not really saying this needs to be some massive project. Just have the ability to define some criteria, and rules, and let it go. I think it would help me. Maybe you knowing that I am a lone admin, I have no backup, and I admin many other things (seriously), I second pair of eyes on the JSS would help me know when to go and pay attention to something I might not have expected.

aamjohns
Contributor II

@mm2270 ,
Yes, catching the unexpected. Mistake or not (I know I make them). Or like you said, not realize that what you set actually targets something else. You understand what I mean. And aside from the human factor, there is the other factors like OS or app updates installing on Macs, software patches that did something you didn't expect, etc.

I have to finish up some stuff before my ride gets here but I'll post back if I get any ideas worth sharing in case you two are interested. Or if you do, let me know.

Have a good evening.

Aaron.

bpavlov
Honored Contributor

@mm2270 That's the one. Funny you should comment on this thread and have created that feature request. I think @aamjohns's idea really enhances the feature request made there. Hopefully he feels inclined to submit a feature request because I don't think there's a way to do what he currently asked for and I think everyone could definitely benefit from something like that.

But like I mentioned depending on how granular they get they would need to really fix this:
https://jamfnation.jamfsoftware.com/featureRequest.html?id=1892

In fact it reminds me of two anecdotes:
1. I recall a policy that I made where the package was installing fine but every policy was failing because the distribution point was not mounting properly. I found out through the policy logs since I was monitoring the deployment and got it sorted it out rather quickly, but what a pain that was to see.
2. A former coworker of mine used to manage SCCM and every time he would deploy software he would always stick around or take his laptop at home to monitor the progress of the deployment so that he could see if anything went wrong. I never understood it until I stepped into a similar role.

aamjohns
Contributor II

Agreed. Good examples.

mm2270
Legendary Contributor III

@bpavlov Oh, I agree that there are quite a few silly "errors" coming from policies, like the failover distro mounting. I even mentioned that in my first post above. JAMF really does need to address these issues because as it is, these lead too many of us on wild goose chases, or worse, make us ignore error messages coming from the JSS (classic Boy Who Cried Wolf scenario) Its always when you assume they are not anything needing your attention that its something you should pay attention to.

In fact, this has been one of my biggest issues with how policies in general seem to work. The jamf binary, or whatever processes run policies, seem blind to what has come before it and what comes after it. Scripts that run in a Before mode have no way of telling the policy that if they don't exit with a 0 status, 'stop right there because there's no point in going any further'. Scripts running in "After" mode may not work if the package itself failed for any reason, so no point in running the script. The jamf binary will happily continue to try to install packages, run additional scripts or Run Commands, and do anything else the policy states it should do, even if its pointless to try. I really would like to see some additional smarts make its way into how it all works. Right now its just more or less a series of disparate items strung together that hopefully all work or don't have pre-requisites attached to them.

bpavlov
Honored Contributor

@mm2270 And guess what there's a feature request for that too that was made not too long ago.

https://jamfnation.jamfsoftware.com/featureRequest.html?id=3453

I know JAMF is focusing on some really big new features for sometime this year, but I do hope they get to focus on refining/extending some of the other features that already exist.