SCOM troubleshoot cross platform agent discovery and installation – Part 3

This is a continuation of the previous part 1 and part 2 of this blog posting.
Alright -> I have my own test environment in another part of the customers network and I also have a few Red Hat machines in there. With everything working! So lets try the same kind of command there. Guess what? Error! But a different error!
Apparently the winrm command can not handle an ampersand (&) in the password :roll:. Getting impatient so we changed the password of that account on the linux box and in the winrm command of course and tried again. Bingo. Output by the way looks a bit like this (it is longer, I just left out a lot of additional lines with info and I changed the server name):


<wsman:Results xmlns:wsman="http://schemas.dmtf.org/wbem/wsman/1/wsman/results">
<p:SCX_OperatingSystem xmlns:p="http://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_OperatingSystem" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <p:CSCreationClassName>SCX_ComputerSystem</p:CSCreationClassName>
    <p:CSName>server.domain.local</p:CSName>
    <p:Caption>Red Hat Enterprise Linux AS release 4 (Nahant Update 8)</p:Caption>
    <p:CreationClassName>SCX_OperatingSystem</p:CreationClassName>
    <p:CurrentTimeZone>120</p:CurrentTimeZone>
    <p:Description>Red Hat Enterprise Linux AS release 4 (Nahant Update 8)</p:Description>
    <p:Distributed xsi:nil="true"></p:Distributed>
    <p:ElementName xsi:nil="true"></p:ElementName>
    <p:EnabledDefault>2</p:EnabledDefault>
</p:SCX_OperatingSystem>
</wsman:Results>


Back to the other test environment again. Time to ask the man who manages the ISA firewall to check if he is seeing traffic passing through. Yes, he does see the some traffic on one of the firewalls, where it does not belong. Remember this is a network with several possible exits (gateways, firewalls). He told us the traffic was not going to the default gateway, but towards the wrong ISA server.
Alright, add the IP addresses and FQDN names of the remote Red Hat machines to the proxy configuration in Internet explorer and re-run the NetSH command again.
Now we got connection with the command and we found the traffic in tcpdump as well.
Networking networking networking. Told you so!
In the meantime we already manually removed the installed agent from the Red Hat box and removed the signed certificate. We were actually installing for two machines at this point – a RH4 and a RH5 box. On the RH5 box we have just done the changes to the scoma account to enable it to su and sudo. At this point there was a discussion whether the Red Hat box needed to have a full DNS name. Of course it does as the discovery wizard forces itself to use it and checks the certificates on that basis. But to prove it we just went ahead and tried to push to both machines.
Now the discovery wizard run did progress. Of course the agent was not installed (anymore) so this time it was right that it wanted to continue installing the agent. Installed the agent and validated. Yes, the next step did say it wanted to sign the certificate for the RH4 server and did not want to talk about the RH5 server as it was not using an FQDN in its self signed certificate.
If you want to check this you can do the following. If you copy the self signed .pem certificate file from the cross platform machine to the management server and you rename it to a .cer file you are able to open it like any other certificate and check it. In this case it was obvious that the RH5 machine did not use an FQDN name for itself, but a short name. By the way also after cross signing between the servers you can also use this trick to see what this double signing is about.
So we needed to give the RH5 box a DNS name. We went to “/etc/sysconfig/network-scripts/ifcfg-eth0” and entered a domain name there and restarted the server.
We also deleted the rpm again and the remaining files (including the wrong certificate file) and restarted the discovery wizard. This worked and we had the server in SCOM! So this confirmed that a FQDN name is needed.:!:
Conclusion:

  • Networking is very important. Routing, firewalls, perhaps proxy settings or lack thereof, as long as it works. We got to be able to connect first.
  • DNS is important. Make sure resolving works and that the cross plat machine also has an FQDN name. Either the machines can not find each other or the certificate process will break if this is not alright.
  • Use the tools if something is going wrong. Debugview, enableopsmgrmodulelogging.
  • Use putty for instance to connect in order to check logon and elevation process.
  • Pre-requisite software on cross platform machines. Make sure you have it covered.
  • Use the latest update (especially cumulative update) for the cross platform components and the latest management packs and make sure they are consistent for all management servers where the components are used.
  • Winrm command can also help you to troubleshoot, although it can create new issues, like formatting of the command after copy/paste actions and possible characters in the password that it does not like.

So, I think we have touched every part of the diagram so graciously provided by Robert Hearn in some way. Please make sure to check out Roberts troubleshooting series as well (already linked a few times in here).
Happy cross platform monitoring!
Bob Cornelissen