How to optimise the Data Warehouse - 5 Point Plan
Overview
SCOM simultaneously writes performance data, events and monitor state changes to the Operations Manager database (Ops DB) and the Data Warehouse (DW). Dashboard Server accesses data from both the Ops DB and the Data Warehouse:
- Dashboard Server authenticates through the SCOM SDK using role-based access control (RBAC) to access core runtime data such as users/roles, entities, classes, monitors, alerts and health status.
- Dashboard Server accesses the Data Warehouse directly using the published schema, to show performance data, state history (SLA tile) and event data (SQL tile).
So performance tuning the Data Warehouse is key to fast data retrieval in Dashboard Server. How do you get the most out of your Data Warehouse performance? There isn't one configuration setting, but this article covers a five point plan for reviewing the performance of your Data Warehouse.
You may also like to watch the Dashboard Server webinar 'Tuning the SCOM Data Warehouse':
1. Sizing
What sort of resources does my SQL server running the Data Warehouse server require? As much as you can get!
Before Dashboard Server the Data Warehouse was being written to, but was probably largely unused.
- No fixed sizing rules, but the Data Warehouse is now a core component and will significantly influence user experience.
2. Configuration
The two resources you must follow:
SQL 2012 and System Center 2012 R2 Database Tweaks
- System Center Operations Manager Field Experience e-book
Then monitor using Tao Yang’s Community resources:
3. Data In
SCOM is constantly writing data to the Data Warehouse. Check whether you do need all that data.
- Tune collection intervals - Do you need high frequency granular performance metrics? If not, perhaps tune down the collection intervals.
- Disable unnecessary rules - Disable any rules that are not useful to your organization. Share details with your organization about what SCOM is collecting, to see if they want that data at that frequency, or if anything is missing.
Here are several common sources of high load:
Flip-flopping monitors
Check for noisy monitors 'flip-flopping'. It's not just performance data, every health state change is written to the Ops DB and the Data Warehouse, so if there are monitors that are changing from green to red frequently, find those monitors and tune the thresholds, or disable the monitors, to save all those write actions.
Make use of Kevin Holman’s 'noisiest monitors' query: "Tuning tip: Do you have monitors constantly 'flip-flopping'?"
Rules and Events data
These queries can help you identify rules and events which might need tweaking. Run these queries on the Data Warehouse database in SQL Server Management Studio or in Dashboard Server:
High frequency performance collection rules
SELECT TOP 50 r.RuleSystemName, Count(*) AS Count FROM perf.vPerfRaw AS p JOIN vPerformanceRuleInstance AS pri ON p.PerformanceRuleInstanceRowId = pri.PerformanceRuleInstanceRowId JOIN vRule AS r ON pri.RuleRowId = r.RuleRowId GROUP BY r.RuleSystemName ORDER BY Count DESC
Event data
SELECT TOP 50 r.RuleSystemName, Count(*) AS Count FROM Event.vEvent AS e JOIN Event.vEventRule AS er ON e.EventOriginId = er.EventOriginId JOIN vRule AS r ON er.RuleRowId = r.RuleRowId GROUP BY r.RuleSystemName ORDER BY Count DESC
4. Data Retention
The Data Warehouse stores data in three forms: Raw, Hourly and Daily.
Kevin Holman's blog shows you how to run a Microsoft tool called DWDATARP.EXE which shows the percentage of the Data Warehouse taken up by different types of data:
Understanding and modifying Data Warehouse retention and grooming
This shows that hourly performance data can become huge – typically 33% of the Data Warehouse size.
- Do you need all that hourly data? It's not just about the capacity of the Data Warehouse for storing the data, but more the performance, in terms of the SQL server's memory. If hourly data is available for a year, then users may well be querying it and pulling all that data back. Consider reducing the amount of hourly data retained.
- How long do you keep the data for? Consider reducing hourly retention to three to six months.
Take a look at the percentage for event data, as this is often little used, but can take up considerable space.
5. Data Out
When using the Performance tile (see How to use the Performance tile), carefully choose your resolution:
- Raw: slow
- Hourly: fast
- Daily: fastest
Recommendations:
- Performance Bar chart (especially with top n option) should use hourly or daily where possible.
- For large groups (>100 objects) use hourly or daily where possible.
Timeouts:
- Timeout is 30 seconds
- It cannot be configured
- If queries are taking > 30 seconds, the query is not working as expected!
For more information see Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding