Snippet Name: SQL_Server_Health_Check
Tags:
Created Date: 2025-06-16 13:12:44
Last Modified Date: 2025-05-31 10:32:01
Dependencies:
DBA Notes:
Sample Output:
Title: SQL Server Health Check
Author: user
I am doing a SQL Server health check help me with teh categories of things to review. for example, I have Performance, Server Configuration, SSRS, SSIS among others.
Author: ChatGPT
For a **comprehensive SQL Server health check**, here are the key categories along with specific items to review within each:
---
### 1. **Performance**
- **Server Performance**
- CPU usage and bottlenecks.
- Memory allocation (Page Life Expectancy, Buffer Cache Hit Ratio).
- Disk I/O metrics (Disk Queue Length, Read/Write Latency).
- Query Execution Plans (Identify slow queries, missing indexes).
- **Database Performance**
- Index fragmentation and maintenance.
- Statistics up-to-date status.
- Wait stats analysis (e.g., `sys.dm_os_wait_stats`).
- TempDB contention and configuration (files, growth settings).
- **Workload Monitoring**
- Query execution time and duration.
- Long-running transactions.
- Blocking and deadlocks.
---
### 2. **Server Configuration**
- **Instance Settings**
- Max Server Memory (`sp_configure`).
- Max Degree of Parallelism (MAXDOP) and Cost Threshold for Parallelism.
- Default fill factor and autogrowth settings.
- Priority Boost (should be disabled).
- CLR Integration and XP_CMDSHELL settings.
- **Service Accounts**
- Verify SQL Server and Agent service accounts (principle of least privilege).
- **Security**
- Authentication Mode (Windows vs. Mixed).
- Unused logins and orphaned users.
- SA account status (disable if not used).
---
### 3. **Database Integrity and Availability**
- **DBCC CHECKDB**
- Verify database integrity regularly.
- **Backups**
- Review recovery model (Full, Simple, or Bulk-Logged).
- Verify full, differential, and transaction log backups are running.
- Check backup history and last backup times.
- **High Availability/Disaster Recovery (HA/DR)**
- Status of AlwaysOn Availability Groups or Log Shipping.
- Failover Cluster or Database Mirroring configurations.
- Verify database replicas are in sync.
- RPO (Recovery Point Objective) and RTO (Recovery Time Objective) alignment.
---
### 4. **SQL Server Agent**
- **Job Management**
- Review enabled/disabled jobs and schedules.
- Check for job failures (last run status).
- **Alerts and Operators**
- Review configured alerts (e.g., severity level errors).
- Ensure notifications for failures are active.
---
### 5. **SSRS (SQL Server Reporting Services)**
- **SSRS Configuration**
- Verify `RSReportServer.config` settings.
- URL reservations for Report Manager and Web Service.
- **Performance and Logs**
- Analyze SSRS Execution Log for slow reports.
- Monitor report subscriptions and data refresh schedules.
- **Security**
- Review permissions (folder, data source access).
- Ensure service accounts are configured securely.
---
### 6. **SSIS (SQL Server Integration Services)**
- **Package Validation**
- Check for package failures and error logs.
- Ensure SSISDB is healthy (if using the SSIS Catalog).
- **Configuration**
- Package execution schedules and resource usage.
- Review deployment methods and version control.
- **Performance**
- Monitor SSIS packages for bottlenecks (source/destination delays, memory issues).
---
### 7. **Security**
- **Logins and Users**
- Review logins, roles, and users with elevated privileges.
- Remove unused or orphaned logins.
- **Encryption**
- Transparent Data Encryption (TDE) enabled where required.
- Column-level encryption if sensitive data exists.
- **Auditing**
- Review SQL Server Audit logs for critical events.
- Check for unauthorized login attempts.
---
### 8. **Index and Storage Management**
- **Index Health**
- Analyze index fragmentation (use `sys.dm_db_index_physical_stats`).
- Identify unused indexes and missing indexes.
- **Storage and File Sizes**
- Check database file sizes and autogrowth settings.
- Monitor space usage in database and log files.
- **TempDB**
- Multiple data files with uniform growth.
- Review TempDB usage and contention.
---
### 9. **Monitoring and Logs**
- **SQL Server Logs**
- Check for errors, warnings, and startup issues.
- **Extended Events**
- Set up event sessions to capture slow queries, deadlocks, etc.
- **System Health Session**
- Analyze built-in `system_health` Extended Events for anomalies.
---
### 10. **Patching and Version Management**
- **SQL Server Version**
- Check for latest Cumulative Updates (CUs) and Service Packs.
- Validate SQL Server version compatibility with applications.
- **Windows Updates**
- Ensure the OS hosting SQL Server is up-to-date.
---
### 11. **Licensing and Compliance**
- **License Usage**
- Check licensing mode (e.g., Core-based, CAL-based).
- **Compliance**
- Ensure GDPR, HIPAA, or other regulatory compliance is met.
---
### 12. **Extended Services**
- **Linked Servers**
- Validate linked server connections and credentials.
- **Replication**
- Check status and health of replication (Transactional, Snapshot, or Merge).
- **Full-Text Search**
- Ensure catalogs are up-to-date and queries are efficient.
---
If you want **queries, scripts, or checklists** for any of these categories, let me know!
Author: user
can you give me a query related to Page Life Expectancy
Author: ChatGPT
Here is a **query** to check the **Page Life Expectancy (PLE)** for each buffer node in SQL Server. Page Life Expectancy is a critical performance metric indicating how long pages stay in memory before being flushed to disk. A low PLE can signal memory pressure.
### Query for Page Life Expectancy
```sql
SELECT
RTRIM(instance_name) AS [NUMA Node],
cntr_value AS [Page Life Expectancy (Seconds)]
FROM sys.dm_os_performance_counters
WHERE counter_name = 'Page life expectancy'
AND OBJECT_NAME LIKE '%Buffer Manager%';
```
---
### **How to Interpret Results**
- **Threshold**: The general rule of thumb is **PLE > 300 seconds** (5 minutes) per NUMA node.
- **Low PLE**: If PLE drops significantly below 300, it often indicates memory pressure due to heavy read operations or insufficient memory.
---
### **Additional Notes**
- If your server has multiple **NUMA nodes**, this query will return a PLE value for each node.
- Modern systems may require a higher threshold depending on memory size. For example:
- Servers with **64 GB RAM or more** might aim for **PLE > 1,200 seconds** (20 minutes).
Author: user
how about a query for Buffer Cache Hit Ratio
Author: ChatGPT
Here???s a **query to check the Buffer Cache Hit Ratio** in SQL Server. This metric represents the percentage of pages read from the buffer cache instead of disk. A high ratio (close to 100%) indicates good memory performance.
### **Query for Buffer Cache Hit Ratio**
```sql
SELECT
(1.0 - (a.cntr_value * 1.0 / b.cntr_value)) * 100 AS [Buffer Cache Hit Ratio (%)]
FROM
sys.dm_os_performance_counters a
JOIN
sys.dm_os_performance_counters b
ON a.object_name = b.object_name
WHERE
a.counter_name = 'Cache Pages'
AND b.counter_name = 'Page Reads/sec'
AND a.object_name LIKE '%Buffer Manager%';
```
---
### **How to Interpret Results**
- **Ideal Value**: Close to **100%**. A value above **95%** is considered healthy.
- **Low Ratio**: If below **90%**, it may indicate:
- Insufficient memory for the workload.
- High I/O due to inefficient queries or index fragmentation.
- A need to review indexing and query performance.
---
### **Notes**
- If your system has **NUMA nodes**, this ratio should ideally hold true for each node.
- You can cross-reference this with **Page Life Expectancy** to confirm memory pressure.
Let me know if you need scripts for monitoring other SQL Server performance metrics! ????