ConfigMgr\MEM 101: Advanced Performance Troubleshooting Part 1

In the continuation of the ConfigMgr\MEM series, our next session is going to be looking at some Advanced Performance Troubleshooting, and what you can do to optimize what you have.

In the previous Blog here we discussed some SQL Performance basics.

As mentioned in the Previous post, MEM is heavily reliant on SQL, and incorrect maintenance\configuration can have a huge impact on the stability\usability of the service.

However SQL is not the only Indicator of Performance on a system

MEM Basics

As you know, MEM uses agents that are installed on machines, to perform actions, and report back to the MEM Server

These messages that are sent back to the server, cover everything from Inventory files (what is installed on the machines, where it is located etc) as well as status messages (The App is downloading, App is installing etc).

These messages then need to be processed by the server, in order for the DB to have the up to date information, so that our Collections (Logical grouping of users or computers based on a set of criteria that is queried from the DB) and reports are accurate.

These messages that are sent back get processed in what we call Inboxes.

We are going to be looking at Inbox in this post, as they potentially have a HUGE impact on performance, while Collections will be covered in Part 2 (coming soon)

Inboxes

Inboxes folders are in ConfigMgr Site Server installation directory. Many components have an Inbox folder

Inboxes structure and file types are not documented, subject to change, however is we look at the purpose of the Inboxes, we split the File function types into Two-types

Data files

Site systems copy files with Client data to Site Server Inboxes, files then processed, and contents recorded into Site Database

Trigger files

Site Server components use service and trigger files to communicate to each other

So what does this mean to me as an Admin? Well if the Inboxes are backlogged, then the information is not being processed\loaded into the DB.

Most issues with Inboxes are caused by excessive number of files residing a given folder.

Data and trigger files in <ConfigMgr Installation>\Inboxes\subfolders
•Processed by various ConfigMgr components
•A component’s speed may be limited by SQL health, code design, number of threads, size of backlog, OS performance, or poorly configured antivirus exclusions

Backlog Causes

Too Much Data Coming in

Many components are single-threaded, or programmed to wait a period of time after processing an inbox file. For example, if there is a mass simultaneous data upload cycle by all clients, backlog is expected. In that case, if the backlog affects ConfigMgr operations, reduce data send frequency, of randomize data sending cycle.

Example: launching a software installation (non-OSD) Task Sequence with 15 steps on 100 000 clients will result in more than 2 500 000 status messages in offersum.box Inbox folder on a primary site.

SMS_OFFER_SUMMARIZER is a single-thread component by design, taking one file at a time, while other files are not even taken into task. In this case it is clear that the backlog will not be processed in reasonable time, and Administrators should probably change approach of using a Task Sequence for mass non-OSD deployments.

Slow Processing

On the other side, if site’s database server is not performing well, the insert of the data from incoming files will take longer than expected, even if a component is designed multi-threaded.

Example: backlog of .MIF files in ..\inboxes\auth\dataldr.box\. Hardware Inventory from clients is processed on the site server by SMS_INVENTORY_DATA_LOADER, this component is multi-threaded.

The component reads files and inserts/updates rows in the database. If Database indexes in a bad shape, this process will take longer. Administrators can solve the Database indexes problem or throttle down the flow (frequency) of Hardware Inventory on the clients.

Help I have an Inbox Backlog, should i just delete them then?

The short Answer is NO

While it may seem tempting to just delete the Backlog files, this will not resolve the issue, and may aggravate the issue.

Deleting ANY delta files causes a resynchronization files storm, causing another backlog

..\Inboxes\ folderDescriptionData loss foreverResync storm and backlogSafe* to deleteSelf- sustaining
\auth\dataldr.boxHardware inventoryNo, returns after resyncYesNoYes
\auth\ddm.boxDiscovery dataNo, returns at Discovery cycleNo*.DDRNo
\auth\sinv.boxSoftware inventoryNo, returns after resyncYesNoYes
\auth\statesys.boxState messagesNo, returns after resyncYesNoYes
\offersum.boxDeployment status messagesYesNoNoNo
\colleval.boxCollection triggersNoNo*.*DC_No
\policypv.boxPolicy provider trigger filesNoNoNoNo
\statmgr.boxStatus Messages from clientsYesNoNoNo

You can refer to this article for some guidance on troubleshooting State Message Backlogs.
Useful read from ConfigMgr 2007 (still can be relevant for Current Branch)

So If I cannot delete the Files then, what should I DO?

Lets start with Analyzing the Backlog Files

Looking inside backlogged files

Discovery data record (*.ddr) snippet

The AGENTINFO section reveals the discovery method that generated the file, and the source site for that record

DDR files can contain a specific OU or discovery method that causes the backlog. Address the discovery settings after learning the data influx source.

Status message (*.svf) snippet
SVF is a binary file. It is possible to extract the Deployment ID and Package ID to investigate the source of the backlog.   

MININT-1337.copr.contoso.com is the machine name of the client which send the status message

TS4209A0 is deployment ID

TS400153 02: Daily scripts run (system) is the package and program name

SVF file analysis can lead to discovering cause of status messages backlog – e.g. the ID of the mass deployment.

Analyze the Backlogged Files

So let us delve a bit deeper into the Files now, to start seeing what is wring

We have a couple of options available to us

•Use CMD or PowerShell to get data from backlogged files

•Search for specific text like “AGENTINFO” or “Software Distribution” or “deployment ID”

Output the results of this search to a text file

•Analyze the output file in Excel to determine the source of incoming files

CMD For DDR’s

findstr /I “AGENTINFO” *.ddr >C:\Temp\Output_AGENTINFO.txt – output will show how many files were generated by each Discovery method and the analysis will allow to take actions to reduce the amount of incoming data.

PowerShell for XML Based Files (inventory)

$ScriptPath = split-path -parent $MyInvocation.MyCommand.Definition

Set-Location $ScriptPath

$files = get-childitem $ScriptPath\SMX -Filter *.smx -File

Foreach($file in $files) {

    $currentstring=$null

    $statemsg = Get-Content $file.FullName

    $messages = @()

    $messages = $statemsg.Report.ReportBody.StateMessage

    ForEach ($msg in $messages) {

        $currentstring = $file.Name + “;” + $msg.Topic.ID + “;” + $msg.Topic.Type + “;” + $msg.Topic.IDType

        Out-File -filepath “_SMX.output.txt” -InputObject $currentstring -append

    }

}

Add the output file as CSV in Excel:

Get data > From File > From Text/CSV > Use Custom delimiter (“>” symbol in case of *.ddr files)


Use the Excel Pivot Table function to count records with the Discovery method. This example screenshot shows that the majority of the DDR backlog came from an AD System Discovery method.


Gradual build up of files

Gradual build-up of files may be a sign of excessive data influx AND component limited performance

Some components have limited speed by design (one thread, worker cycle every 1 sec., etc.)

Processing speed greatly depends on SQL health, disk performance, and the size of the backlog

When the file count is extremely high, use command line tools instead of File Explorer to move backlogged files to temporary folder

So we need to be aware what determines Component Speed

SQL performance

Disk performance (latency, throughput, etc.)

Number of files to process (querying the folder contents takes longer)

Number of threads, used by the component (see logs to determine thread number)

Component code design limitations (select top 1, wait 10 seconds)

Perfmon Counters for Backlog related issues

ComponentCounter Name\inboxes\ Folder
SMS Discovery Data ManagerDDRs Processed/minute\Auth\ddm.box
SMS Inventory Data LoaderMIFs Processed/minute\Auth\dataldr.box
SMS Software Inventory ProcessorSINVs Processed/minute\Auth\sinv.box
SMS Software Metering ProcessorSWM Usage Records Processed/minute\Swmproc.box\usage
SMS Status ManagerProcessed/sec : Total\Statmgr.box\statmsgs
SMS State SystemMessage Files Processed/min\Auth\statesys.box\incoming

So in the End, let us put this all together

Monitoring and early warning system for component health is crucial to find performance issues before it’s too late.

Check component logs to see if files are processed at all. Use Perfmon counters to see if the processing rate is at normal level. Check real-time scan AV exclusions on inboxes folder:

https://techcommunity.microsoft.com/t5/core-infrastructure-and-security/configuration-manager-current-branch-antivirus-exclusions/ba-p/884831

Temporary reduce the amount of incoming data, so the backlog can be processed. Check if settings (e.g. Hardware Inventory, Discovery) are in normal range and not set to hourly etc.

When directory contains a massive amount of files (>1m), components have difficulty reading the backlog, and even viewing the folder in Explorer is not possible. Use command prompt to move the backlog to another location.

Analyze copied files if not sure about the backlog cause

Copy files back in small batches to improve speed of processing when massive influx of data stops.

If you are interested in learning even more, Please reach out to your TAM, for ConfigMgr Troubleshooting Workshop information.

Authors