We had a chance the other week to understand the memory usage by SCOM cross plat agents for Red Hat linux and I wanted to share this info with you as I could not find much about it. The take away by the way is that the agent is not using much memory.
The case:
After deploying the X-plat agent to a number of Red Hat 4 and 5 servers during a proof of concept the linux admins discovered that the agent was consuming a lot of memory. And we are talking between 200 and 500 MB of ram.
Trying to find out what was going on:
We were running the “top” command to see what processes were using memory and how much they were using. This showed a few processes from the SCOM agent picking up 150 MB each. Here is part of the output of that command:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5777 root 16 0 128m 9244 6004 S 0.0 0.1 0:02.30 scxcimprovagt
5709 root 17 0 114m 8436 5388 S 0.0 0.1 0:03.57 scxcimserver
5737 scoma 17 0 34548 5688 5076 S 0.0 0.1 0:00.09 scxcimprovagt
So in column number 5 it says that these three processes take 128+114+34 MB, right?
So let’s pick up a “pmap” of the first process (this is a long list!):
# pmap 5777
5777: /opt/microsoft/scx/bin/scxcimprovagt 0 15 18 root SCXCoreProviderModule
00111000 32K r-x– /lib/libcrypt-2.3.4.so
00119000 4K r-x– /lib/libcrypt-2.3.4.so
0011a000 4K rwx– /lib/libcrypt-2.3.4.so
0011b000 156K rwx– [ anon ]
00142000 132K r-x– /lib/tls/libm-2.3.4.so
00163000 4K r-x– /lib/tls/libm-2.3.4.so
00164000 4K rwx– /lib/tls/libm-2.3.4.so
00165000 36K r-x– /lib/libnss_files-2.3.4.so
0016e000 4K r-x– /lib/libnss_files-2.3.4.so
0016f000 4K rwx– /lib/libnss_files-2.3.4.so
00174000 156K r-x– /opt/microsoft/scx/lib/libpegprm.so.1
0019b000 4K rwx– /opt/microsoft/scx/lib/libpegprm.so.1
0019c000 308K r-x– /opt/microsoft/scx/lib/libpegrepository.so.1
001e9000 4K rwx– /opt/microsoft/scx/lib/libpegrepository.so.1
001ea000 196K r-x– /lib/libssl.so.0.9.7a
0021b000 12K rwx– /lib/libssl.so.0.9.7a
0021e000 76K r-x– /usr/lib/libgssapi_krb5.so.2.2
00231000 4K rwx– /usr/lib/libgssapi_krb5.so.2.2
00232000 60K r-x– /lib/libresolv-2.3.4.so
00241000 4K r-x– /lib/libresolv-2.3.4.so
00242000 4K rwx– /lib/libresolv-2.3.4.so
00243000 8K rwx– [ anon ]
00245000 56K r-x– /lib/libaudit.so.0.0.0
00253000 8K rwx– /lib/libaudit.so.0.0.0
0026e000 156K r-x– /opt/microsoft/scx/lib/providers/libpegprovider.so.1
00295000 4K rwx– /opt/microsoft/scx/lib/providers/libpegprovider.so.1
00296000 396K r-x– /usr/lib/libkrb5.so.3.2
002f9000 8K rwx– /usr/lib/libkrb5.so.3.2
00367000 32K r-x– /lib/libpam.so.0.77
0036f000 4K rwx– /lib/libpam.so.0.77
0039c000 32K r-x– /lib/tls/librt-2.3.4.so
003a4000 4K r-x– /lib/tls/librt-2.3.4.so
003a5000 4K rwx– /lib/tls/librt-2.3.4.so
003a6000 40K rwx– [ anon ]
003d5000 88K r-x– /lib/ld-2.3.4.so
003eb000 4K r-x– /lib/ld-2.3.4.so
003ec000 4K rwx– /lib/ld-2.3.4.so
003ed000 2100K r-x– /opt/microsoft/scx/lib/libpegcommon.so.1
005fa000 28K rwx– /opt/microsoft/scx/lib/libpegcommon.so.1
00601000 8K rwx– [ anon ]
00603000 560K r-x– /opt/microsoft/scx/lib/libCMPIProviderManager.so.1
0068f000 12K rwx– /opt/microsoft/scx/lib/libCMPIProviderManager.so.1
006a7000 768K r-x– /usr/lib/libstdc++.so.6.0.3
00767000 20K rwx– /usr/lib/libstdc++.so.6.0.3
0076c000 24K rwx– [ anon ]
00779000 8K r-x– /lib/libcom_err.so.2.1
0077b000 4K rwx– /lib/libcom_err.so.2.1
007d6000 84K r-x– /opt/microsoft/scx/lib/libpegwql.so.1
007eb000 4K rwx– /opt/microsoft/scx/lib/libpegwql.so.1
0087d000 40K r-x– /opt/microsoft/scx/bin/scxcimprovagt
00887000 4K rwx– /opt/microsoft/scx/bin/scxcimprovagt
00888000 1188K r-x– /lib/tls/libc-2.3.4.so
009b1000 8K r-x– /lib/tls/libc-2.3.4.so
009b3000 8K rwx– /lib/tls/libc-2.3.4.so
009b5000 8K rwx– [ anon ]
009ef000 60K r-x– /opt/microsoft/scx/lib/libpegquerycommon.so.1
009fe000 4K rwx– /opt/microsoft/scx/lib/libpegquerycommon.so.1
009ff000 852K r-x– /lib/libcrypto.so.0.9.7a
00ad4000 72K rwx– /lib/libcrypto.so.0.9.7a
00ae6000 12K rwx– [ anon ]
00b3f000 56K r-x– /lib/tls/libpthread-2.3.4.so
00b4d000 4K r-x– /lib/tls/libpthread-2.3.4.so
00b4e000 4K rwx– /lib/tls/libpthread-2.3.4.so
00b4f000 8K rwx– [ anon ]
00bca000 172K r-x– /opt/microsoft/scx/lib/libDefaultProviderManager.so.1
00bf5000 4K rwx– /opt/microsoft/scx/lib/libDefaultProviderManager.so.1
00bf6000 116K r-x– /opt/microsoft/scx/lib/libpegpmservice.so.1
00c13000 4K rwx– /opt/microsoft/scx/lib/libpegpmservice.so.1
00d71000 60K r-x– /usr/lib/libz.so.1.2.1.2
00d80000 4K rwx– /usr/lib/libz.so.1.2.1.2
00db0000 148K r-x– /opt/microsoft/scx/lib/libpegprovidermanager.so.1
00dd5000 16K rwx– /opt/microsoft/scx/lib/libpegprovidermanager.so.1
00e0a000 36K r-x– /lib/libgcc_s-3.4.6-20060404.so.1
00e13000 4K rwx– /lib/libgcc_s-3.4.6-20060404.so.1
00e17000 128K r-x– /usr/lib/libk5crypto.so.3.0
00e37000 4K rwx– /usr/lib/libk5crypto.so.3.0
00e38000 220K r-x– /opt/microsoft/scx/lib/libpegclient.so.1
00e6f000 8K rwx– /opt/microsoft/scx/lib/libpegclient.so.1
00f1f000 8K r-x– /lib/libdl-2.3.4.so
00f21000 4K r-x– /lib/libdl-2.3.4.so
00f22000 4K rwx– /lib/libdl-2.3.4.so
00f44000 172K r-x– /opt/microsoft/scx/lib/libpegconfig.so.1
00f6f000 8K rwx– /opt/microsoft/scx/lib/libpegconfig.so.1
00f95000 20K r-x– /opt/microsoft/scx/lib/libpegqueryexpression.so.1
00f9a000 4K rwx– /opt/microsoft/scx/lib/libpegqueryexpression.so.1
00f9b000 2176K r-x– /opt/microsoft/scx/lib/providers/libSCXCoreProviderModule.so
011bb000 16K rwx– /opt/microsoft/scx/lib/providers/libSCXCoreProviderModule.so
08990000 1020K rw— [ anon ]
aebfc000 4K —– [ anon ]
aebfd000 10240K rw— [ anon ]
af5fd000 4K —– [ anon ]
af5fe000 10240K rw— [ anon ]
afffe000 4K —– [ anon ]
affff000 10240K rw— [ anon ]
b09ff000 4K —– [ anon ]
b0a00000 10860K rw— [ anon ]
b149b000 404K —– [ anon ]
b15fe000 4K —– [ anon ]
b15ff000 10240K rw— [ anon ]
b2000000 728K rw— [ anon ]
b20b6000 296K —– [ anon ]
b21ff000 4K —– [ anon ]
b2200000 10920K rw— [ anon ]
b2caa000 344K —– [ anon ]
b2d6b000 4K —– [ anon ]
b2d6c000 10240K rw— [ anon ]
b4b70000 4K —– [ anon ]
b4b71000 10240K rw— [ anon ]
b5571000 4K —– [ anon ]
b5572000 10240K rw— [ anon ]
b6973000 4K rw— [ anon ]
b6974000 4K —– [ anon ]
b6975000 10240K rw— [ anon ]
b7375000 24K r–s- /usr/lib/gconv/gconv-modules.cache
b737b000 2048K r—- /usr/lib/locale/locale-archive
b758b000 4K —– [ anon ]
b758c000 10264K rw— [ anon ]
b7fa2000 8K rw— [ anon ]
bfec0000 1280K rw— [ stack ]
ffffe000 4K r-x– [ anon ]
total 131508K
I told you the list was long.
We see a number of libraries and a few anon entries with 10 Mb size each.
Hmmm strange.
So we went into a CSS call with Microsoft and we got to discuss some things with Microsoft CSS and the X-plat team. Thanks to Bhushan and Robert we got to gather the following.
The memory usage is not what it seems o be frm these commands and actually we can see the real memory footprint on the live system. Will come back to that. This is what happens (This is part of the email conversation containing my try to explain what I think happens and the answer of the product team on this):
==================================
- The agent (several processes as you know) seems to request to linux that it wants a few hundred MB of memory. Lets say 350 MB.
- Linux seems to say OK to the request.
[Response] As long as the memory being allocated fits into the total virtual memory available (physical RAM+Swap), memory allocation will succeed. However before the virtual address is touched, the system won’t bother to allocate the page for you.
- Linux seems to only give the actual physical memory when the process actually writes to it.
[Response] Yes this is expected. The system only allocates pages/virtual memory when the process actually uses it. As the process does not require any pages, no pages are allocated.
- The agent seems to take around 20 MB of resident memory (perhaps this grows when we add more rules or when time passes, will have to see about this)
[Response] This is the memory being touched by the code.
- Swap space is not used on the machine
[Response] The paging strategy on Linux is different from Windows. So this is expected.
- When looking with top and expanding into the memory we see a number of “anon” entries with chunks of 10 MB each. We can not see what they do. Perhaps they are reserved or claimed, but the agent is not yet writing into those parts of memory.
[Response]: In “pmap” output, there are entries for the process code segment itself (mapped into virtual space), shared libs, — these are named, and more importantly there are [stack] entries and [anon] entries which is memory allocated on heap(runtime). So basically, pmap shows the total size of the process which includes all of the shared libraries. This is not a fair number (since the libraries are shared, many different processes use them). From the pmap output you provided earlier, it seems the stack being used is very reasonable and heap[Anon] was used a lot.
- So what would be a question to the dev team would be why the agent seems to request a lot of memory to start with and especially the large amount (agent of 20 mb can easy request 40 perhaps…?). And if it is expected to grow that large. Why not start smaller and perhaps request more when needed?
[Response] There are some valid reason for doing that, most common cases would be file/device mapping [see mmap(2)] and memory pool for avoiding memory fragmentation. Allocating a big memory at the beginning and consuming little by little slowly is not an issue. However, memory fragmentation is an issue, unintended big memory allocation is an issue, memory leak is an issue.
==================================
So, to resume. The agent requests this amount of memory, but is not using it. It needs to request this much in order to later avoid memory issues. If we now look at the top output again:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5777 root 16 0 128m 9244 6004 S 0.0 0.1 0:02.30 scxcimprovagt
5709 root 17 0 114m 8436 5388 S 0.0 0.1 0:03.57 scxcimserver
5737 scoma 17 0 34548 5688 5076 S 0.0 0.1 0:00.09 scxcimprovagt
We should be looking at column number 6. So it takes 9+8+5=22 MB basically. We also checked this before by stopping and starting the agent and observing what happened to the amount of available physical memory on the box and that only changed a small amount of MB and not hundreds. That is also when we started doubting that the agent actually would take so much memory.
This 22 MB usage by the way is while running only the default linux management packs that come with the product. We can expect this to become a bit bigger due to more rules running against the agent if you load more custom management packs onto it.
It is also possible that a larger amount of memory allocation happens when a large number of instances are returned for any class (for Eg: thousands of concurrent processes on the Linux Servers). Also because of some winrm queries being run such as some process monitors, but I think this only covers the first call as it needs to load what it needs to execute this into memory.
So, if your linux admins are asking about the memory usage and are knocking at your door (or holding you out the window :> ) please point them to this post and have them check what is really being used and not what is being requested. There probably is no problem.
This excercise and some things we ran into before (some of them on this blog in previous posts) helped me to understand a bit more about the cross platform agent. All this research was done together with the linux admins at this customers location and it helped to gain their confidence and understanding (and I was happy to not be hanging by my ankles from the fourth floor anymore).
If in the end you do find a high memory usage than please check if you have created any custom rules that go crazy (you can disable them to check) and if that is not the case perhaps you need to contact Microsoft through the forums or CSS.
Thanks to the guys in the linux team and to Microsoft CSS and the product team for clearing things up and for working quite well together!
Good luck with cross plat monitoring! It’s grrrrreat!
Bob Cornelissen