Connection error while connecting to... - Casper Imaging

Kumarasinghe
Valued Contributor

We are experiencing a issue with Casper Imaging 8.xx. Currently we're on 8.73 but issue was there even on 8.62.
At the end of Casper Imaging we're getting this error;

"Connection error while connecting to
https://our_jss_url:8443/:42
Check your network connection and try again"

external image link

It seems connection is timing out if we have a large number of packages assigned to the Imaging Config.

Only way to get rid of this issue is to remove few packages from the imaging config BUT that is not a proper workaround.
Did anyone experience the same issue and have a solution or a proper workaround?

Thanks

20 REPLIES 20

Kumarasinghe
Valued Contributor

I know @tkimpton had this issue before.
From our testing we found that it is not a specific package or scripts causing it but the total number of packages in the Imaging config causing it.

tkimpton
Valued Contributor II

I didn't get to the bottom of it in afraid and support couldn't replicate it. For some time I had a launch daemon looking at the jamf log on the NetBoot and a script that rebooted the machine if it found the error.

In the end I only resolved it by completely stripping down every config and starting again to keep things as minimal as possible.

daz_wallace
Contributor III

I'm also hitting this exact same issue with 9.31.

In your experience was it something that always happened with an affected workflow or intermittently?

I mean I can kick off 16 Macs on the same workflow, with 4 working and 12 showing the above error.

Also did you happen to figure out the exact number that is having issues?

Thanks

Darren

were_wulff
Valued Contributor II

Hey all,

None of you would happen to be using this script or a variant of it in any of your configurations, by any chance, would you?

I ask because we've found that it seems just this one particular script, when used as part of imaging, creates the error above, and it's a false error; the script actually does run, the device name gets populated where it's supposed to be, and everything works just fine.

Additionally, the script runs without error if run via policy or locally.

We have a defect open specifically for that error when paired with that script (D-006884) in 9.3x, but the script itself has been around and available since 2012, so it wouldn't surprise me if it causes a problem in 8 as well.

When we saw this initially, we thought it was due to a large amount of scripts (as that's what the particular client environment had; they had 62 scripts and found that, at 59 scripts it worked, at 60 it did not); after a lot of process of elimination, we got it narrowed down to one particular script causing the issue and removed that from the imaging process and set it to run as a policy at the Enrollment Complete trigger instead.

@Kumarasinghe I can certainly do some testing here and see if I can get it to fail at a certain number of packages consistently, though if we do find it's an issue that's confined to the 8 series, the solution is likely going to be 'upgrade to 9' as updates to 8 are likely to only be to add ability to manage, inventory, and add new OS/iOS versions from Apple.

I apologize if I've missed it, but did you find that there seemed to be a specific number of packages at which it stopped working?

Thanks!
Amanda Wulff
JAMF Software Support

daz_wallace
Contributor III

Hi Amanda,

I'm sorry to say we're not using that script in any of our configurations.

Currently, the Theory seems to be that 125 items (using Casper Admin to check) is fine, whilst 130+ causes the error. We're are testing this now and hope to update the post with the results over the next few days.

The strange thing is it only drops out on 90% of the Macs imaged and seems to be slightly worse if we are imaging more than a few at a time.

Just in case it's any help we're using a Windows 2012 VM for the JSS and the Distribution Point. We're using the latest NetSUS for netboot, however, I've also tested this using a local Mac Distribution Point and netboot server with the same outcome.

Darren

were_wulff
Valued Contributor II

@daz_dar

And now, I go about building 131 packages...

This may take a little while! :)

Have you noticed any sort of pattern in which package appears to cause the error or does it seem to be solely based on the number of packages in your environment?

Amanda Wulff
JAMF Software Support

Kumarasinghe
Valued Contributor

@amanda.wulff][/url
No we don't have that script in our imaging config.
Most of our packages and scripts are to be run at reboot (to the boot volume at imaging time).

We have: Packages:90 Scrips: 12

If we remove a package to make the total of packages to 89, it works fine.

Thanks

daz_wallace
Contributor III

Hi @amanda.wulff

No specific packaged / script I'm afraid. After making sure that all of our configs (including the smart ones) are 125 or less items, all of our imaging (for the last 24 hours) has gone without a hitch, even at full capacity.

It's strange as prior to the main rollout we've spent the last few months imaging 2-3 Macs at a time (whilst we build, package and test the varying configurations) and we didn't come across the issue. I can only guess it's a combination of the number of items in the config we're using and the number of Macs / load the JSS is under.

Any chance you've managed to replicate it yet?

Darren

were_wulff
Valued Contributor II

@Kumarasinghe @daz_dar

I haven’t managed to replicate it yet, but I wonder, since it doesn’t seem to be a specific package/script that’s causing it if it might be a matter of needing to tweak some Tomcat & MySQL settings.

That’s something that may be best done with your TAM since they’d know your environment a bit better, or if you’re not comfortable making the changes below on your own, but these are the settings I tend to use for—to pick an arbitrary mid-size number—up to about 2000 devices.

MySQL:

We’ll need to edit the my.cnf/my.ini file for this.

In Windows, we’ll typically find my.ini in Program DataMySQLMySQL Server 5.x. It may also be in Program FilesMySQL or Program FilesMySQLMySQL Server 5.x somewhere, but it’s usually in the Program Data location.
It can be edited in notepad, but I’ve seen some Windows installations complain if MySQL is still running when we try to make changes.

On Mac & Linux it’s in /etc/my.cnf.

If we don’t have that file in /etc, we can cp it over from /usr/local/mysql/support-files/. The file, depending on your version of MySQL, will either be called my-huge.cnf or my-default.cnf, copying either over to /etc will work.
Once it’s copied, we’ll edit the file with elevated privileges (so, with a sudo).

- max_allowed_packet set to at least 512M.
- max_connections set to 601.
- Strict mode OFF if 5.6. Look for the line that starts with sql_mode and comment it out with #.

Save, exit, and restart the MySQL service.

Tomcat:

For Windows servers, it may be necessary to stop the Tomcat service first. It’s a good idea to stop it on any platform, but Windows is most likely to not let you edit at all if it’s still running.

On Mac and Linux, it’s easiest to edit with sudo nano/sudo vi /path/to/the/file, but GUI text editors work as well.

server.xml (/path/to/Tomcat/conf):

- maxThreads set to 1502.

If we’re suspecting it may be a legitimate timeout issue, we can also change connectionTimeout to something a bit higher than the default.

If we’re suspecting it may be an issue with the amount of data being sent, we can also try changing maxPostSize to something higher than the default as well.

DataBase.xml (/path/to/Tomcat/webapps/ROOT/WEB-INF/xml):

- MaxPoolSize (it may read slightly different depending on your version, but ‘max pool size’ will be in there somewhere) to 150 from 90.

- MaxConnectionAgeInMinutes (may be slightly different depending on your version, but it will reference maximum connection age) to 8 or 10 minutes from 5, if we're suspecting it may be a timeout-due-to-volume issue.

Save both files, exit, and restart the Tomcat service.

The general rule for MySQL connections vs. Tomcat threads is that we want the Tomcat thread number to be roughly 2.5 times the number of MySQL connections.

Checking the amount of available memory for Tomcat never hurts either. The bottom-of-the-barrel bare minimum we recommend in support, for environments UNDER 1000 devices, is 2GB available for Tomcat to use.

Ideally, 4GB would be the bare minimum, but I like to aim a bit high on settings as 18 or so years of working with computers in various capacities has taught me that ‘minimum requirements’ tends to mean, “Well, it’ll technically run, but you’ll probably want to tear your hair out, so…”

For environments over 1000 devices, we typically go with a minimum of 4GB allocated to Tomcat, with 8 or more being preferable.

General rule of thumb: Let Tomcat’s maximum allowed RAM be roughly half of the server’s available RAM if the server is primarily or completely dedicated to only running the JSS.
The JSSDatabaseUtil’s slider only goes to 8GB, and for most environments that’s plenty, but it can also be edited manually if necessary.

In Windows there’s a file called tomcat7w.exe (on the Java tab) where we can change it, and for other OSes, we have this KB.

Minimum memory can stay at the 256 default, as we don’t want Tomcat eating up a full 2GB of RAM even when it’s sitting mostly idle.
We don’t typically need to change the PermGen sizes either.

Tomcat will need to be restarted again if we make changes to its memory allocation.

Let me know if any of that makes a difference; my day today is pretty booked up with meetings so I may not get much spare time to pop back to JN, but I'll try to keep on top of it.

Thanks!

Amanda Wulff
JAMF Software Support

Kumarasinghe
Valued Contributor

@amanda.wulff][/url][/url
Thanks.
How would this changes to be applied if we have a clustered setup. Divide those settings across all web apps?
Is there anything special needs to be done for the MASTER WebApp?

Thanks

were_wulff
Valued Contributor II

@Kumarasinghe

In clustered situations, we’d need to make the Tomcat changes to each webapp.
We’d want to follow the same procedure we do for upgrades: Change settings on the Master first, get it restarted, then go through and make the changes to the child clusters.

If some of the non-Master web apps have less RAM than the Master, we’d want to either get some more memory for those machines/VMs, or adjust the amount we’re allocating to Tomcat accordingly.

For adjusting memory in Tomcat, we have this KB: https://jamfnation.jamfsoftware.com/article.html?id=139

Unless we’re doing MySQL replication or have a JSS to JSS Plugin setup going, there should only be one instance of MySQL and nothing would need to be done to the other clusters in that regard.

If we DO have more than one MySQL running for the JSS, we’d want to make those changes for each of those as well.

If you’ve got additional questions on that, it may be best to contact your Technical Account Manager, as they’ll likely know your environment better than I do and may be able to get on a call or a WebEx if you’d prefer that instead. They can also get additional information/screenshots/logs that we probably wouldn’t want posted here due to security & privacy concerns.

Thanks!
Amanda Wulff
JAMF Software Support

daz_wallace
Contributor III

Hi Amanda

We'll be looking to make these changes shortly, however due to deadlines I not be able to test if they resolve the original issue for our site I'm afraid!

Darren

daz_wallace
Contributor III

Just an update.

I made the changes as suggested above and we are still having the same issue with configurations over 125 items.

Darren

were_wulff
Valued Contributor II

@daz_dar

I was able to inconsistently reproduce this (sometimes it worked, sometimes it didn't) in my test environment, so I went ahead and opened up a defect.

For reference, the defect ID is D-007212.

I did notice something odd in the error I got, as well as in @Kumarasinghe 's screenshot: That /:42 at the end of the JSS URL. I've seen connection errors in imaging before, but not with that particular URL ending; however, I'm not sure what the significance of it (if any) is. It is mentioned in the defect itself, and I'm guessing someone in development will have a better idea on that than I do.

There's probably a Douglas Adams joke in there somewhere.

Do you happen to recall, on your errors, if you're seeing that same /:42 at the end of your JSS URL?

Thanks!
Amanda Wulff
JAMF Software Support

unknown_err
New Contributor III

We are also getting this error 99% of the time when imaging computers. We have 8 DMGs that install, then a .ppd, then the computer is named, then 14 packages are installed after the FirstRun script is prepared. It always happens right before it needs to reboot and doesn't seem to actually effect anything other than making the computer wait until you click "ok" before it will reboot. It has the same /:42 at the end of our JSS URL. We've been getting it on and off since version 8.x with multiple configurations.

were_wulff
Valued Contributor II

@mwhitaker

Outside of the known issue we currently have with more than 125 items in a configuration, generally when the error pops up it’s due to something in one of the configurations being the root cause of the problem.

Most times it tends to be a script, but it can also be a particular package or file we’re trying to use as well.

I’d recommend opening up a case with your Technical Account Manager so they can assist, but to get started, we typically start troubleshooting this by attempting to image with JUST a base image.

No packages, no printer drivers, no scripts, etc…just a plain, vanilla base OS.

If the error doesn’t occur, then we add one package/file/script and repeat, and keep going down the list until we hit an error.

At that point, we’ll usually try imaging with the base OS and the package/script/file that seemed to have caused the error in past testing.

If the error still occurs, we’ve found the problem, and usually rebuilding the package in question takes care of it. If it’s a script, then it’s time to take a look at the script and see if there is anything that might need to be changed there.

If the error doesn’t occur, then we start adding things back in to see if appears to be a certain number/combination that’s doing it.

We’d also want to make note as to whether it happens if we’re imaging groups of computers only (and if so, how many) or if it doesn’t seem to matter if it’s groups of computers vs. a single computer.

The type of distribution point and method of imaging we’re using can also be relevant, so that’s something you’d want to mention to your Technical Account Manager as well.

Thanks!

Amanda Wulff
JAMF Software Support

unknown_err
New Contributor III

@amanda.wulff

Thanks Amanda. I was able to test each item in the configuration and it is the .pdd script that is causing it. Even if that is the only thing selected to install in the image it gives the error message but without it, everything is fine. I'll see if our team member who built it can redo it and see if that helps.

Kumarasinghe
Valued Contributor

@amanda.wulff
Any updates on this issue?

Thanks

were_wulff
Valued Contributor II

@Kumarasinghe

This issue will not be remedied for the 8.x series, as we are no longer releasing any bug fixes for the 8 series; the solution for open 8 series defects is to move to the 9 series.

The only way we will be able to get around the issue in 8 will be to reduce the number of items in an imaging configuration to 125 items or fewer.

If we’re needing to remain on the 8 series, we will need to continue to use the workaround of configurations that have a maximum of 125 items as there will not be an additional fix or patch for the 8 series to address the issue.

In the 9 series, we do still have D-007212 open, but that only affects environments with configurations that have more than 125 items. At the moment, the workaround is the same as it is for the 8 series: Reduce the number of items in the configuration to 125 or fewer.

In all other cases, when we’re on the 9.x series and do not have configurations with more than 125 items, the culprit has turned out to be a particular script or package that needed to be either rebuilt or modified.

Thanks!

Amanda Wulff
JAMF Software Support

Kumarasinghe
Valued Contributor

@amanda.wulff

Sadly for the v8.73 the limit is 89 packages. It has to be 89 packages or less.
Just for your bug report. This limit was mentioned earlier as well.

Thanks