Active Directory housekeeping is a critical operational practice that ensures the health, security, and optimal performance of your directory services infrastructure. Regular maintenance tasks prevent security vulnerabilities, eliminate operational inefficiencies, and maintain compliance with organizational security policies and industry standards.
Why Housekeeping Matters
Neglecting Active Directory maintenance can lead to severe consequences:
- Security Risks: Stale accounts, orphaned objects, and misconfigured permissions create attack vectors that malicious actors can exploit to gain unauthorized access to your environment.
- Operational Issues: Lingering objects, replication errors, and database inconsistencies can cause authentication failures, slow performance, and unexpected service disruptions.
- Compliance Violations: Unmanaged privileged accounts, unauthorized group memberships, and poor audit trails can result in regulatory non-compliance and failed audits.
- Resource Waste: Obsolete accounts and objects consume storage, processing power, and administrative overhead unnecessarily.
Task Structure Explained
Each housekeeping task in this document follows a standardized structure to ensure clarity and consistency:
| Column | Purpose |
|---|---|
| Name | Task identifier and short description
A concise label that describes the housekeeping activity being performed |
| Description | Objective and context
Explains why the task is necessary, what problem it solves, and its impact on security or operations |
| Task | Execution procedure
Step-by-step instructions, commands, or tools required to perform the housekeeping activity |
| Impact Definition | Technical consequences
Documents the expected impact based on technical procedures, including what changes occur and potential side effects |
| Automated | Automation status
Indicates whether the task is automated, who manages it, if scripts exist (like EguibarIT PowerShell modules), or if manual intervention is required |
Task Frequency Categories
Housekeeping tasks are organized by frequency to establish a sustainable maintenance schedule:
- Daily Tasks: Critical operations that require constant monitoring to maintain security posture and detect anomalies immediately (e.g., privileged group audits, replication checks)
- Weekly Tasks: Regular maintenance to prevent accumulation of stale objects and maintain account hygiene (e.g., disabling inactive accounts, delegation reviews)
- Bi-Weekly Tasks: Specialized security scans that balance thoroughness with operational efficiency (e.g., password discovery in GPOs)
- Monthly Tasks: Comprehensive reviews and cleanup activities that address accumulated issues (e.g., AdminCount normalization, subnet validation)
- Odd Monthly Tasks: Group Policy maintenance performed during lower-activity periods to minimize disruption risk (e.g., orphaned GPO cleanup)
- Quarterly Tasks: Strategic reviews of privileged access, delegation models, and security configurations (e.g., user reviews, ACL audits)
- Semi-Annual Tasks: Deep technical maintenance and critical security updates requiring careful planning (e.g., KRBTGT password rotation, database consistency checks)
- Annual Tasks: Comprehensive governance reviews and long-term maintenance activities (e.g., trust audits, ownership reviews)
- As-Needed Tasks: Event-driven monitoring for critical security events requiring immediate investigation (e.g., privileged group membership changes)
Automation and Compliance
Many housekeeping tasks can be automated using PowerShell scripts and modules. The EguibarIT PowerShell modules provide comprehensive automation for delegation model management, privileged account hygiene, and semiprivileged user provisioning. Where automation is not possible or requires approval, tasks are marked with specific requirements:
- Requires an RFC: Change management approval required before execution
- Managed by HR/IAM: Integrated with identity lifecycle management systems
- Script provided by MS: Official Microsoft tools or scripts available
DNS-Specific
Maintenance
This housekeeping guide includes DNS maintenance tasks integrated with Active Directory operations. For comprehensive DNS-specific guidance including configuration, security hardening, troubleshooting, and detailed monitoring scripts, see:
DNS Configuration for Active Directory
The DNS article provides in-depth coverage of zone design, scavenging configuration, forwarder setup, DNSSEC implementation, and DNS Best Practices Analyzer (BPA) usage.
DHCP-Specific
Maintenance
This housekeeping guide includes DHCP maintenance tasks integrated with Active Directory operations. For comprehensive DHCP-specific guidance including failover configuration, scope design, security hardening, and detailed automation scripts, see:
DHCP Configuration & Best Practices for Active Directory
The DHCP article provides in-depth coverage of high availability strategies, scope management, option configuration, troubleshooting, and PowerShell automation for enterprise environments.
IPv6-Specific Implementation
This housekeeping guide includes IPv6 maintenance tasks for dual-stack Active Directory environments. For comprehensive IPv6 implementation guidance including address planning, static configuration, DHCPv6 setup, security hardening, and transition mechanism management, see:
IPv6 Implementation & Best Practices for Windows Server
The IPv6 article provides in-depth coverage of enterprise address planning, Active Directory integration, firewall configuration, Router Advertisement (RA) guard, and PowerShell automation for IPv6 deployment.
AD Sites &
Services
This housekeeping guide includes AD Sites & Services maintenance tasks for dual-stack Active Directory environments. For comprehensive AD Sites & Services implementation guidance including site topology, replication configuration, and security hardening, see:
AD Sites & Services Implementation & Best Practices
The AD Sites & Services article provides in-depth coverage of site design, subnet configuration, and replication management for Active Directory environments.
DFS Namespace & Replication Maintenance
This housekeeping guide includes DFS Namespace and DFS Replication (DFS-R) maintenance tasks for enterprise file services and SYSVOL replication. For comprehensive DFS implementation guidance including namespace design, replication topologies, SYSVOL monitoring, performance tuning, and disaster recovery procedures, see:
DFS Namespace & Replication Implementation Guide
The DFS article provides in-depth coverage of namespace architecture, Full Mesh vs Hub-and-Spoke topologies, staging quota sizing, conflict resolution, authoritative restore, and PowerShell automation for DFS operations.
Hyper-V Networking & Clustering Maintenance
This housekeeping guide now includes Hyper-V cluster maintenance tasks covering networking, Live Migration, RDMA/SMB Direct, failover clustering, CAU, and DR. For the full implementation guide, design decisions, troubleshooting procedures, and automation scripts, see:
Hyper-V Networking & Clustering — Enterprise Virtualization Platform
The Hyper-V article includes vSwitch/SET architecture, RDMA validation, CAU automation, security hardening, DR strategies, war stories, and production PowerShell scripts.
Group Policy Object (GPO) Maintenance
This housekeeping guide includes GPO health monitoring, baseline drift detection, AGPM lifecycle tasks, and security filtering validation at daily, weekly, monthly, quarterly, semi-annual, and annual frequencies. For comprehensive guidance on monolithic GPO architecture, Tier-aligned design patterns, AGPM workflow automation, and delegation best practices, see:
GPO Monolithic Approach — Tier-Aligned Baseline Architecture
The GPO Monolithic Approach article provides in-depth coverage of consolidating functional GPOs into managed baselines, controlled exception governance, AGPM version control, PowerShell automation for creation/delegation/backup, and cross-tier contamination prevention strategies.
Daily Tasks
| # | Name | Description | Task | Impact Definition | Automated |
|---|---|---|---|---|---|
| 1 | OU Service Accounts | To include all accounts within Service-Account OU to the corresponding group on its own Tier | Read all userID accounts and ManagedServiceAccounts and check if they belong to the WO_SVC Deny Interactive RDP login security group. In case they are not members, include them. | Service accounts are added to denial groups preventing interactive logon, reducing attack surface for lateral movement. No service disruption as service logon remains permitted. | Can be automated with EguibarIT Housekeeping module - Test-ServiceAccountGroupMembership |
| 2 | Audit Group Membership changes on privileged groups | Verify that only authorized members belong to Privileged groups (Domain Admins, Schema Admins, or any other delegated privileged group) | Use Get-ADGroupMember cmdlet to enumerate members of privileged groups. Compare against authorized baseline. Generate alerts for unauthorized additions. Review change logs in Event Viewer (Event ID 4728, 4732, 4756). | Unauthorized privileged access detected immediately, preventing potential compromise. Alerts generated for security team response. No operational impact on legitimate users. | |
| 3 | Privileged Admin Accounts Consistency Check | Privileged admin accounts (Tier 0, Tier 1, Tier 2) can only be enabled and exist if a corresponding standard user does. Checking that standard user exists and is enabled, otherwise disable the admin account. | Query all privileged admin accounts across all tiers. For each admin account, verify corresponding standard user exists and is enabled. If standard user is disabled or deleted, disable the admin account. Use Get-ADUser with naming convention filters (e.g., T0_*, T1_*, T2_*). | Orphaned admin accounts disabled, preventing credential theft of abandoned privileged accounts. Maintains 1:1 admin-to-user relationship per delegation model. | |
| 4 | Non-Privileged Groups having Privileged Members Check | Verify that privileged accounts (Tier 0, Tier 1, Tier 2) are in "controlled & Secure" groups. | Enumerate all privileged accounts across all tiers. Check membership in standard (non-privileged) groups. Flag violations where privileged tier accounts appear in regular user groups. Use EguibarIT delegation module Get-PrivilegedGroupMembership. | Privilege isolation maintained. Prevents privilege escalation through group policy inheritance. Privileged accounts remain segregated from standard user operations. | Can be automated with EguibarIT Delegation module - Test-PrivilegedAccountSegregation |
| 5 | Tier-Based Group Members Account Check | Verify that privileged groups only contain properly segregated tier-based admin accounts | Query membership of all privileged groups (T0, T1, T2 groups). Verify members follow tier naming convention (T0_* for Tier 0, T1_* for Tier 1, T2_* for Tier 2 admins). Flag standard user accounts in privileged groups. | Ensures tier model segregation. Prevents standard users from having privileged access. Maintains clear audit trail for privileged operations. | Can be automated with EguibarIT Delegation module - Test-AdminGroupMembership |
| 6 | PG Group membership check | Verification on Privileged group membership. This is related to the delegation model. | Validate all Privileged Groups (PG_*) contain only authorized members per delegation matrix. Cross-reference with approved delegation documentation. Check for nested group memberships. Use Get-ADPrincipalGroupMembership recursively. | Delegation model integrity maintained. Unauthorized delegation detected early. Role-based access control enforced per organizational security policy. | Can be automated with EguibarIT Delegation module - Test-DelegationModel |
| 7 | Check overall AD Replication inside given site | Verify AD topology and ensure all domain controllers are replicating successfully | Use repadmin /replsummary to check replication status. Run dcdiag /test:replications on each DC. Monitor replication latency. Check for replication errors in Event Viewer. Use MS AD Replication Status Tool, PowerShell (Get-ADReplicationPartnerMetadata), or Solarwinds monitoring. | Replication failures detected early preventing authentication issues and data inconsistencies. Directory convergence maintained across all DCs. No impact on production if monitoring only. | Can be monitored with PowerShell scripts - Get-ADReplicationFailure |
| 8 | Check sysvol Replication Consistency | Ensure SYSVOL folder structure and Group Policy templates replicate correctly across all domain controllers | Run dcdiag /test:sysvolcheck and /test:frssysvol on all DCs. Verify SYSVOL share accessible. Compare folder structure and file counts across DCs. Check DFS Replication health for Windows Server 2008 R2 and newer. | Group Policy inconsistencies prevented. Login scripts and policies deploy correctly. SYSVOL corruption detected before impacting users. No operational impact during check. | Can be automated with PowerShell - Test-SysvolReplication |
| 9 | Check Lingering objects if any and remove them | Detect and remove objects that exist on some DCs but were deleted on others, causing replication inconsistencies | Run repadmin /removelingeringobjects on each DC. Use dcdiag /test:replications to identify lingering object errors (Event ID 1388, 1988). Run in advisory mode first to identify objects, then remove mode to clean up. | Replication errors resolved. Database consistency restored. Object tombstone cleanup completed. Potential risk of unintended object deletion if not properly identified - RFC required for safety. | Requires an RFC |
| 10 | Automate DNS Zone Backup on Daily Basis | Ensure DNS zone data is backed up daily to enable recovery from corruption or accidental deletion | For AD-integrated zones, included in system state backup. For primary zones, export zone files using dnscmd /zoneexport. Verify backup integrity. Store backups in secure location with retention policy. | DNS zone data protected. Recovery point objective (RPO) of 24 hours maintained. Name resolution can be restored quickly after failure. No operational impact - backup runs during low activity. | DNS is AD integrated, included in the AD Backup |
| 11 | DNS Service Health Check | Verify DNS service is running on all DNS servers to ensure name resolution availability | Check DNS service status on all DNS servers. Review DNS server event logs for errors (Event IDs 4000-4015 = DNS issues). Validate critical SRV records are resolving (_ldap._tcp.dc._msdcs.domain.com). Use Get-Service cmdlet and Test-DnsServer for validation. | DNS service failures detected immediately preventing authentication outages. Critical SRV records validated ensuring domain controller location works. Name resolution availability maintained for all clients. | Can be automated with PowerShell script scheduled via Task Scheduler, sends alert email if failures detected |
| 12 | Monitor DNS Query Performance | Review DNS query response times to identify performance degradation | Review DNS query response times (should be <10ms for local queries). Check for excessive recursive queries (may indicate forwarder issues). Monitor failed query count (unexpected spikes indicate problems). Use performance counters or Get-DnsServerStatistics. | Performance issues detected early before impacting users. Slow queries and failures identified for troubleshooting. Baseline metrics maintained for trending and capacity planning. | Can be automated with performance counters via SCOM, Azure Monitor, or custom PowerShell script |
| 13 | KCC Topology Health Check | Verify Knowledge Consistency Checker (KCC) is generating proper replication topology and connection objects are valid. | Run repadmin /kcc on each DC to force KCC execution. Check for KCC errors in Directory Service event log (Event ID 1311, 1925). Verify automatic connection objects created. Check for redundant connection objects. Review site link bridge configuration if using manual topology. | Replication topology optimal for network infrastructure. KCC failures detected preventing replication issues. Connection object problems identified early. Topology automatically adjusts to DC/site changes. | Can be automated with PowerShell - Test-KCCTopology |
| 14 | Authentication Failure Pattern Analysis | Monitor for unusual authentication failure patterns that may indicate attacks (password spray, brute force) or service issues. | Query Security event log for Event ID 4625 (failed logons), 4771 (Kerberos pre-auth failed), 4776 (NTLM auth failed). Analyze patterns: multiple accounts from single source, single account from multiple sources, timing patterns. Alert on threshold exceedances. Correlate with account lockout events (4740). | Attack detection (password spray, brute force) identified in real-time. Service account misconfigurations detected. Account lockout root causes identified. Security team alerted for response. Critical for detecting credential-based attacks early. | - integrate with SIEM |
| 15 | RID Pool Monitoring | Monitor available RID pool on all DCs to ensure sufficient RIDs available for new object creation, preventing service outages. | Query RID pool status: dcdiag /test:ridmanager /v. Check RID Master availability. For each DC, query allocated vs. consumed RIDs: Get-ADDomainController | Get-RIDPoolStatus. Alert if any DC below 10% available. Request new RID pool allocation if needed. Verify RID Master can allocate new pools. | RID pool exhaustion prevented. New object creation (users, groups, computers) maintained. Early warning allows proactive RID pool allocation. RID pool depletion blocks all new object creation domain-wide. | Can be automated with PowerShell - Test-RIDPoolAvailability with alerting |
| 16 | Critical DC Event Log Monitoring | Monitor Domain Controllers for specific critical Event IDs indicating replication failures, authentication issues, or security concerns requiring immediate attention. | Query Directory Service log for critical errors:
|
Critical issues identified immediately. Replication failures caught before data divergence. Security incidents detected in real-time. Service failures trigger rapid response. Delayed detection causes extended outages and security exposure. | - integrate with SIEM/monitoring platform for 24/7 alerting |
| 17 | Privileged Account Lockout Monitoring | Monitor Tier 0 and privileged admin accounts for lockouts which indicate either attack attempts or credential misconfigurations requiring immediate investigation. | Query Security log for Event ID 4740 (account locked out) filtered for privileged accounts (T0_*, T1_*, T2_*, Domain Admins members). Investigate source of bad password attempts. Check for service accounts with cached credentials. Identify brute force or password spray attacks. Document legitimate lockouts (forgotten passwords) vs. security incidents. | Privileged account attacks detected immediately. Compromised service account credentials identified. Attack patterns recognized for threat intelligence. Security incident response triggered. Privileged lockout indicates active attack or serious misconfiguration. | - real-time alerting required for Tier 0 accounts |
| 18 | PAW Compliance Verification | Verify Tier 0 administrators are only using Privileged Access Workstations (PAW) for domain administration, not standard workstations, to prevent credential theft. | Query Security log for privileged logons (Event ID 4624 Logon Type 10 - RemoteInteractive). Cross-reference logon source machines with approved PAW inventory. Flag any Tier 0 admin logons from non-PAW systems. Review interactive console logons to DCs. Verify PAW systems meeting hardening baseline. | Clean source principle enforced. Tier 0 credential theft risk minimized. Admin workstation compromise doesn't expose domain admin credentials. Compliance with privileged access security model maintained. Non-PAW admin access creates critical attack vector. | Can be partially automated - requires PAW inventory integration and SIEM correlation |
| 19 | Check DHCP Service Status | Verify DHCP Server service is running on all DHCP servers to ensure IP address assignment availability | Check DHCP service status on all DHCP servers using Get-Service -ComputerName DHCP01 -Name DHCPServer. Review DHCP event logs for service failures or critical errors. Verify service is set to automatic startup. Test DHCP response from client perspective using ipconfig /renew. | Service outage prevents all DHCP clients from obtaining IP addresses. Critical impact within lease expiration time (T2 = 87.5% of lease duration). Network-wide connectivity failures possible. | Can be automated with PowerShell script scheduled via Task Scheduler, sends alert email if service stopped |
| 20 | Monitor DHCP Scope Utilization | Check for DHCP scopes exceeding 80% utilization (warning) or 90% (critical) to prevent exhaustion | Query scope statistics using Get-DhcpServerv4ScopeStatistics -ComputerName DHCP01 | Where {($_.PercentageInUse) -gt 80}. Review scopes approaching exhaustion. Identify scopes needing immediate expansion. Track utilization trends for capacity planning. | Scope exhaustion prevents new devices from obtaining IP addresses. Causes widespread connectivity failures and help desk escalations. Business impact from inability to connect new devices or renew leases. | Can be automated with PowerShell - alert at 80%/90% thresholds |
| 21 | Verify DHCP Failover Partnership Status | Check DHCP failover state is "Normal" (not CommunicationInterrupted or PartnerDown) to ensure high availability | Query failover status using Get-DhcpServerv4Failover -ComputerName DHCP01. Verify state shows "Normal". Check partner server reachability. Review failover replication lag. Investigate any CommunicationInterrupted or PartnerDown states immediately. | Failover degradation creates single point of failure. Loss of high availability protection. If remaining server fails, complete DHCP outage occurs affecting all clients. | Can be automated with PowerShell - alert if state != Normal |
| 22 | Review DHCP Event Log Errors | Check DHCP event logs for critical errors including scope exhaustion, rogue DHCP detection, and replication failures | Query DHCP event log for errors: Event ID 1020 (scope exhaustion), 1034 (rogue DHCP detected), 1042 (authorization failed), replication errors. Use Get-WinEvent -LogName 'Microsoft-Windows-DHCP-Server/Operational' -FilterXPath "*[System[Level=2]]" for critical events. | Undetected errors accumulate into major issues. Rogue DHCP servers cause network-wide failures by providing incorrect configuration. Replication failures lead to failover inconsistencies. | Can be automated - integrate with SIEM for correlation rules |
| 23 | Verify DHCP Audit Log Writing | Confirm DHCP audit logs are being written and forwarded to SIEM for compliance and troubleshooting | Check audit log configuration using Get-DhcpServerAuditLog -ComputerName DHCP01. Verify log path has recent files (within 24 hours). Confirm log forwarding to SIEM is functioning. Check disk space on log volume. | Loss of audit trail creates compliance violation. Inability to troubleshoot DHCP incidents or investigate security events. Regulatory audit failures (SOX, PCI-DSS, HIPAA require logging). | Can be automated - alert if logs older than 24 hours |
| 24 | Monitor IPv6 Router Advertisement Activity | Detect rogue Router Advertisements (RAs) that can redirect IPv6 traffic or misconfigure hosts through fake default gateway announcements | Monitor network for unauthorized RA messages using Get-NetNeighbor -AddressFamily IPv6 and check default route changes with Get-NetRoute -AddressFamily IPv6 -DestinationPrefix "::/0". Compare detected routers against authorized router list. Alert on any new or unexpected routers. | Rogue RAs can hijack default gateway, redirect traffic to attacker-controlled systems, or cause denial of service by misconfiguring client IPv6 stacks. Immediate detection prevents network-wide compromise. | Can be automated with PowerShell script - alert on unauthorized router detection |
| 25 | Check IPv6 Duplicate Address Detection (DAD) Failures | Identify IPv6 address conflicts that prevent servers from acquiring working IPv6 connectivity | Query Event Viewer for TCPIP Event ID 4199 (DAD failure). Check for addresses in tentative state using Get-NetIPAddress -AddressFamily IPv6 -DuplicateAddressDetectionState Tentative. Investigate duplicate address sources and resolve conflicts. | DAD failures prevent IPv6 address assignment, causing connectivity failures. Critical servers may lose dual-stack functionality, falling back to IPv4-only operation. | Can be automated - alert on Event ID 4199 or tentative addresses detected |
| 26 | Verify IPv6 Connectivity to Critical Services | Ensure key infrastructure services (Domain Controllers, DNS servers, file servers) are reachable over IPv6 | Test IPv6 connectivity using Test-NetConnection against critical servers. Verify DNS AAAA record resolution with Resolve-DnsName -Type AAAA. Check ICMPv6 reachability to domain controllers and infrastructure servers. Validate IPv6 default routes are present. | Loss of IPv6 connectivity fragments dual-stack environment. Applications preferring IPv6 experience connection failures. Intermittent failures difficult to troubleshoot without daily monitoring. | Can be automated with PowerShell - test connectivity matrix daily, alert on failures |
| 27 | Replication Failures | Check for replication failures across the forest. | Get-ADReplicationFailure -Scope Forest | Zero failures | Investigate immediately; check network connectivity, DNS, time sync |
| 28 | Replication Lag | Check replication lag for all partners. | Get-ADReplicationPartnerMetadata (check LastReplicationSuccess) | <15 minutes intrasite; <interval+15 minutes intersite | Check for backlogs; verify site link schedules not blocking replication |
| 29 | Bridgehead Health | Check health of all bridgehead domain controllers. | Get-ADDomainController -Filter {IsGlobalCatalog -eq $true} + DC diag | All bridgeheads online and replicating | If preferred bridgehead down, remove designation or designate replacement |
| 30 | Unmapped Subnets | Check for unmapped subnets in event logs. | Check event logs for "client has no site" warnings (Event ID 5805, 5807) | Zero unmapped subnets | Map subnet to appropriate site |
| 31 | Backup FSMO DCs | Verify that scheduled system-state backups completed successfully for all FSMO role holders. | Check backup job status and retention for FSMO DCs; remediate failures and re-run backups as needed. Ensure backups include system state and are stored per retention policy. | High — ensures recoverability of AD and FSMO state; missed backups increase RTO and recovery risk. | Yes (backup solution reports) |
| 32 | Service & reachability | Verify DCs are reachable and AD services (NTDS, Kerberos, DNS) are running. | Run health probe and remediate unreachable hosts. | High — DC unreachability affects authentication and FSMO availability. | Yes (monitoring) |
| 33 | Event log alerting | Surface critical errors from System/Directory Services logs to operations. | Ensure SIEM/alerting rules processed events; investigate new critical alerts. | Medium — early detection of issues that can lead to role impact. | Partial (SIEM) |
| 34 | Time sync check | Confirm PDC Emulator time sync to authoritative NTP and that other DCs are synchronized. | Run w32tm/status checks; remediate drift >5s for domain operations. | High — time skew breaks Kerberos and replication. | Yes (scripted) |
| 35 | Check SYSVOL DFS-R backlog | Monitor SYSVOL replication backlog to ensure Group Policy and login scripts replicate promptly across all domain controllers. | Run Get-DfsrBacklog -GroupName 'Domain System Volume' on all DCs. Check backlog count between replication partners. | Normal: 0-10 files. Warning: >50 files (investigate change patterns). Critical: >100 files (SYSVOL replication not keeping pace—requires immediate attention). | Can be automated with PowerShell - alert at 50/100 file thresholds |
| 36 | Verify DFSR service running on all DFS members | Ensure DFS Replication service is running on all file servers and domain controllers participating in DFS-R groups. | Query service status: Get-Service DFSR -ComputerName (Get-DfsrMember).ComputerName. Check for stopped or crashed services. | All services should show Status=Running. If stopped, start service and check Event Log (Event IDs 1202, 2212) for crash cause or database corruption. | Can be automated with PowerShell - alert on stopped service |
| 37 | Check DFS-R staging quota utilization | Monitor staging folder usage to prevent replication stalls caused by staging exhaustion (most common DFS-R failure). | Run Test-DfsReplicationHealth.ps1 to check staging quota percentage on all members. Review staging folder size vs quota. | Normal: <60% utilization. Warning: 60-80% (consider quota increase). Critical:>80% (immediate quota increase required—replication will stall at 100%). | Can be automated with health check script - alert at 60%/80% thresholds |
| 38 | Review DFS Replication Event Log | Check for DFS-R errors that indicate replication failures, database corruption, or staging issues. | Filter DFS Replication Event Log for Level=Error in last 24 hours. Prioritize Event IDs: 4012 (staging quota exhausted), 2212 (database corruption), 6002 (replication stopped). | Expected: 0 errors. Any errors require immediate investigation. Database corruption (2212) requires scheduled rebuild. Staging exhaustion (4012) requires quota increase. | Can be automated - integrate with SIEM for real-time alerting on critical Event IDs |
| 39 | Cluster Node Status Check | Verify all cluster nodes are online and responding | Get-ClusterNode | Select Name, State, StatusInformation; all nodes State=Up. |
Early detection of node failures; prevents quorum loss and VM outages. | Yes (scheduled script + alert) |
| 40 | CSV Space Monitoring | Monitor Cluster Shared Volume free space thresholds | Get-ClusterSharedVolume; alert if free <10% or growth spike >5%/day. |
Prevents disk-full VM crashes and corruption; enables capacity planning. | Yes (monitoring integration) |
| 41 | Hyper-V Service Health | Ensure VMMS service running on every node | Get-Service vmms -ComputerName (Get-ClusterNode).Name | Where Status -ne 'Running'
|
Prevents lifecycle operation failures (start/stop/migrate). | Yes (auto-recovery + alert) |
| 42 | Hyper-V / Cluster Critical Events | Scan last 24h for VM crash, resource fail, node removal | Filter logs for 18590, 1069, 1135; triage & remediate. | Early hardware/net/storage failure visibility; avoids cascading impacts. | Partial (collection automated, review manual) |
| 43 | Replication / SYSVOL Health | Verify AD & SYSVOL/DFSR replication; check key event IDs (4012, 4016, 1058, 1030) for emerging issues. | Run health script; review DC logs; confirm SYSVOL consistency. | Early detection prevents baseline drift and inconsistent policy enforcement. | Partial (Script + Log Review) |
| 44 | AGPM Pending Approvals | Identify changes awaiting approval to maintain controlled lifecycle. | List Pending in AGPM; escalate items >24h. | Prevents unreviewed changes lingering and risking silent drift. | Manual (AGPM UI) |
| 45 | Baseline Drift Quick Scan | Detect unsanctioned edits outside AGPM workflow. | Run AGPM Difference report vs production. | Minimizes silent security regression in Tier baselines. | Partial (AGPM Diff) |
| 46 | GPO Processing Errors | Review client/system GPO application errors (1058/1030) & replication related issues. | Parse SIEM / sample gpresult logs. | Ensures enforcement integrity and reduces configuration gap exposure. | Partial (SIEM Query) |
| 47 | Privileged Filtering Anomalies | Check for newly added privileged groups to baseline filtering. | Enumerate filters; diff against whitelist manifest. | Prevents privilege expansion & lateral movement. | Yes (Script) |
| 48 | Backup Job Success | Confirm last scheduled backup completed properly. | Review backup log & artifact count. | Assures recoverability & rollback readiness. | Yes (Script) |
| 49 | Monitor Unauthorized Tier 0 Logon Attempts (EAM) | Track failed authentication attempts to Tier 0 assets (DCs, ADFS, PKI, PAWs) to detect brute-force attacks or silo policy violations | Query Security Event Log for Event IDs 4625 (failed logon), 4769 (Kerberos ticket denied), 4776 (NTLM auth failure) on all DCs. Filter by Tier 0 account names (EAM_Tier0_*). Alert on ≥10 failures from single source within 1 hour. | <5 failures per DC per day baseline (excluding known service account retries). Threshold breach indicates potential credential stuffing, misconfigured silo, or compromised account attempting lateral movement. Immediate investigation prevents Tier 0 breach. | SIEM correlation rule (Sentinel/Splunk); PowerShell script via scheduled task querying Event Logs and sending Teams/email alerts |
| 50 | Review Tier 0 Authentication Policy Silo Violations (EAM) | Validate that all Tier 0 interactive logons originate from PAW devices only, enforced via Authentication Policy Silos | Filter Event ID 4624 (successful logon) Type 10 (RemoteInteractive) for EAM_Tier0_* accounts. Cross-reference source workstation name against approved PAW device list. Check Event ID 4768 (Kerberos TGT) for silo claim presence. Any mismatch = policy violation. | Zero violations expected in production. Single violation indicates either silo misconfiguration or admin bypassing PAW requirement (potential insider threat or compromised credential). Immediate lockout + root cause analysis required. | SIEM detection rule; Azure Monitor + Log Analytics query for Entra ID sign-ins; custom PowerShell script cross-referencing CMDB PAW inventory |
| 51 | Check PIM Activations for Global Admin / Privileged Role Admin (EAM) | Audit all Entra ID Privileged Identity Management role activations to ensure no standing privilege and proper approval workflows | Review Entra ID Audit Logs for PIM role activation events. Verify: (1) All activations include justification text, (2) Approvals completed where required, (3) No activation duration exceeds 8 hours, (4) No more than 2 activations per user per day (indicates potential standing privilege workaround). | 100% compliance with approval + justification requirements. Activations >8 hours or >2/day per user indicate process abuse (admin leaving role active vs. using JIT). Reduces blast radius of compromised admin session and creates forensic audit trail. | Entra ID PIM built-in reports; Power BI dashboard; Microsoft Graph API query via Logic App to send daily summary to SOC |
| 52 | Validate Conditional Access Policy Hits for Admin Portals (EAM) | Ensure all admin sign-ins to Azure Portal, Microsoft 365 Admin Center, and Entra ID comply with device compliance + MFA policies | Query Entra ID Sign-in Logs filtered by admin roles (Global Admin, Privileged Role Admin, etc.). Check Conditional Access policy results: 100% should show "Success" with device compliance pass. Flag any "Block" events (non-compliant device) or "RequireMFA" step-ups (>2% indicates weak primary authentication). | 100% admin sign-ins from compliant PAWs; <2% require manual MFA step-up (baseline = passwordless auth + device compliance). Any sign-in from non-compliant device = immediate block + alert to SOC. Prevents token theft and session hijacking from unmanaged devices. | Entra ID Conditional Access Insights and Reporting; Azure Monitor alerts; custom KQL query in Log Analytics with automated Teams notification |
Weekly Tasks
| # | Name | Description | Task | Impact Definition | Automated |
|---|---|---|---|---|---|
| 1 | Find and Disable OLD computer accounts | Find computer objects which have not changed its password in more than 30 days. Found objects must be moved to a secure container (OU) and disabled. | Query computers with Get-ADComputer -Filter {PasswordLastSet -lt $30DaysAgo -and Enabled -eq $true}. Move identified computers to quarantine OU (e.g., OU=Disabled,OU=Computers). Set Enabled=$false. Log actions for audit trail. | Inactive systems removed from production OUs. Attack surface reduced by disabling abandoned endpoints. Computer objects preserved for potential recovery. Group policies no longer apply to disabled systems. | Managed by HR and the IAM Team using identity lifecycle management system |
| 2 | Find and Disable OLD User accounts | Find user objects which have not changed password in more than 90 days. Found objects must be moved to a secure container (OU) and disabled. | Query users with Get-ADUser -Filter {PasswordLastSet -lt $90DaysAgo -and Enabled -eq $true}. Exclude service accounts and approved exceptions. Move to quarantine OU. Disable accounts. Generate report for review. | Dormant user accounts secured preventing credential compromise. Abandoned accounts cannot be used for unauthorized access. User objects retained for recovery during grace period. | Automated by IT Security |
| 3 | Delete Stale computer accounts | Delete computers which are in the secure container and its password has not been changed in more than 60 days. | Query computers in quarantine OU with PasswordLastSet older than 60 days. Verify no recent activity. Generate deletion report. Execute Remove-ADComputer after approval. Backup objects before deletion. | Directory cleaned of truly abandoned systems. Storage and licensing costs reduced. Replication traffic minimized. Permanent deletion - recovery only possible from backup. | Automated by IT Security |
| 4 | Delete Stale user accounts | Delete user accounts which are in the secure container and password has not been changed in more than 150 days (60 days disabled + 90 days grace period). | Query users in quarantine OU disabled for more than 150 days total. Coordinate with HR to confirm terminations. Generate final report. Execute Remove-ADUser. Archive user data per retention policy. | Former employee accounts permanently removed. License reclamation completed. Directory hygiene maintained. Permanent deletion - restoration requires backup recovery. | Managed by HR and the IAM Team using identity lifecycle management system |
| 5 | Check Constrained and unconstrained delegation on privileged accounts | Verify delegations made to each account. Privileged tier accounts should always have the "Sensitive Account and Cannot be Delegated" flag, except for authorized exceptions. | Query all privileged accounts: Get-ADUser -Filter {Name -like "T0_*" -or Name -like "T1_*" -or Name -like "T2_*"} -Properties TrustedForDelegation,AccountNotDelegated. Flag accounts with TrustedForDelegation=$true or AccountNotDelegated=$false. Review exceptions, set AccountNotDelegated=$true where appropriate. | Kerberos delegation attacks prevented. Privileged accounts protected from credential relay. "Sensitive and cannot be delegated" flag enforced. No operational impact except on specifically authorized delegation scenarios. | Can be automated with EguibarIT Delegation module - Test-AdminAccountDelegation |
| 6 | DC decommissioning from physical site | Properly remove domain controllers that are being retired to prevent replication errors and orphaned metadata | Graceful demotion: Run dcpromo /unattend on DC to remove AD roles. Forced removal if DC offline: Use ntdsutil metadata cleanup to remove DC from AD Sites and Services. Clean up DNS records. Remove SYSVOL references. Verify FSMO roles transferred if DC held any. | Replication topology updated. Orphaned metadata prevented. DNS cleaned. SYSVOL replication adjusted. If improperly done, can cause replication failures and lingering objects. | Requires careful planning and RFC approval due to infrastructure impact |
| 7 | Group Policy Version Mismatch Detection | Identify GPOs with version mismatch between Active Directory (GPC) and SYSVOL (GPT) indicating replication issues or corruption. | Query all GPOs: Get-GPO -All | Select DisplayName, GpoStatus, @{n='ADVersion';e={$_.Computer.DSVersion}}, @{n='SysvolVersion';e={$_.Computer.SysvolVersion}}. Flag where ADVersion ≠ SysvolVersion. Check SYSVOL replication status: Get-DfsrState. Investigate root cause: replication failures, permissions, antivirus interference. Force replication if needed: repadmin /syncall. | GPO application consistency ensured. Policy settings applied uniformly across domain. Replication issues identified before user impact. Version mismatch causes inconsistent policy enforcement between DCs. | Can be automated with PowerShell - Test-GPOVersionConsistency with alerting |
| 8 | DNS Zone Transfer Security Review | Review DNS zone transfer settings to ensure AD-integrated zones only allow transfers to authorized name servers, preventing zone data exposure. | For each AD-integrated zone: Get-DnsServerZone | Where-Object {$_.IsAutoCreated -eq $false} | Select ZoneName, SecureSecondaries. Verify SecureSecondaries set to "NoTransfer" or specific name servers list. Check for zones allowing "TransferToAnyServer". Review zone transfer ACLs. Validate only DCs in name servers list. | DNS zone data protected from unauthorized disclosure. AD-integrated zone enumeration prevented. Unauthorized zone transfers blocked. Reconnaissance data denied to attackers. Open zone transfers expose entire AD DNS namespace. | Can be automated with PowerShell - Test-DNSZoneTransferSecurity |
| 9 | DNS Zone Integrity Check | Verify critical host records are present and accurate across all DNS zones | Verify critical host records (DCs, file servers, application servers) are present. Check for duplicate or conflicting records. Validate reverse lookup zones match forward zones (A vs PTR records). Use Get-DnsServerResourceRecord and Compare-Object for validation. | DNS zone data integrity maintained. Duplicate or conflicting records identified and resolved. Forward and reverse lookup consistency verified preventing resolution issues. Name resolution accuracy ensured. | Can be automated with PowerShell script comparing DNS records against AD computer accounts |
| 10 | DNS Scavenging Review | Verify DNS scavenging is running correctly to remove stale records | Verify scavenging is running on schedule (check Event ID 2501 = scavenging completed). Review number of records scavenged (high numbers may indicate configuration issue). Ensure scavenging enabled on only ONE DNS server per zone. Validate aging/scavenging settings consistency. | Stale DNS records cleaned preventing name resolution issues. DNS database size controlled. Scavenging configuration validated preventing accidental record deletion if misconfigured on multiple servers. Zone hygiene maintained. | Can be automated by querying Event Log for scavenging events |
| 11 | DNS Forwarder Performance Check | Test external DNS resolution and forwarder responsiveness | Test external name resolution (nslookup microsoft.com). Verify forwarders are responding (not timing out). Check for forwarder failures in DNS debug logs. Test conditional forwarders if configured. Measure query response times. | External name resolution validated. Forwarder performance issues identified before impacting users. Alternative forwarders can be configured if primary forwarders failing. Internet connectivity confirmed via DNS. | Can be automated with Test-DnsServer PowerShell cmdlet against forwarders |
| 12 | Review DHCP Scope Utilization Trends | Analyze DHCP scope growth trends to predict exhaustion before it occurs and enable proactive capacity planning | Query scope statistics using Get-DhcpServerv4ScopeStatistics | Select ScopeId, PercentageInUse, AddressesInUse, AddressesFree | Export-Csv. Compare against previous week's baseline. Calculate growth rate. Identify scopes requiring expansion within 30-60 days. | Reactive scope expansions during emergencies avoided. Proactive capacity planning enables scheduled maintenance windows. Lack of trend analysis leads to frequent outages and firefighting. | ○ Partial - report generation automated, human review required for trend analysis |
| 13 | Verify DHCP Backup Success | Confirm DHCP configuration backups completed successfully and are restorable for disaster recovery | Verify backup using Backup-DhcpServer -Path "\\FileServer\Backup\DHCP\$(Get-Date -Format 'yyyyMMdd')". Check backup file integrity and size. Verify backup includes all scopes, reservations, and options. Test restore in lab environment quarterly. | Cannot restore DHCP configuration after disaster. Extended RTO during DHCP server failures or database corruption. Manual recreation of hundreds of scopes and reservations required. | Can be automated - scheduled task runs daily backup, verify weekly |
| 14 | Review DHCP Lease Database Statistics | Check for unusual lease patterns, abandoned leases, and potential scope conflicts | Query active leases using Get-DhcpServerv4Lease -ComputerName DHCP01 -ScopeId 10.10.100.0. Identify leases with unusual characteristics (very old leases, duplicate hostnames, unknown devices). Review bad address detection results. Check for IP conflicts. | IP address conflicts detected and resolved. Wasted address space identified. Potential security incidents (unknown devices) discovered. Lease database health maintained. | ○ Partial - report generation automated, analysis requires human review |
| 15 | Audit Active DHCP Reservations | Verify DHCP reservations match asset database and remove stale entries for decommissioned devices | Query all reservations using Get-DhcpServerv4Reservation -ComputerName DHCP01 | Where {-not (Test-Connection $_.IPAddress -Quiet)}. Cross-reference with CMDB/asset database. Identify reservations for decommissioned systems. Document and remove obsolete entries after approval. | Reservation sprawl prevented. IP address waste eliminated. Documentation kept current. Troubleshooting simplified by accurate reservation inventory. | ○ Partial - identify stale reservations automatically, human approval required for removal |
| 16 | Check Disk Space on DHCP Log Volumes | Verify adequate disk space for DHCP database and audit logs (10GB+ free recommended) to prevent service failures | Check disk space using Get-PSDrive | Where {$_.Name -match '[C-Z]'} | Select Name, @{N='FreeGB';E={[math]::Round($_.Free/1GB,2)}}. Review DHCP log directory size. Verify log rotation and archival policies. Alert if free space below 10GB threshold. | DHCP stops logging when disk space below threshold. Service may fail if database cannot grow. Emergency disk cleanup required causing service interruption. | Can be automated - storage monitoring platform alerts on disk space thresholds |
| 17 | Review IPv6 Address Assignment Compliance | Verify servers have correct static IPv6 addresses per address plan; workstations have proper DHCPv6/SLAAC configuration | Enumerate IPv6 addresses using Get-NetIPAddress -AddressFamily IPv6 across infrastructure. Cross-reference with IPAM and address allocation plan. Identify unauthorized addresses (not in documented ranges). Check for privacy extension addresses on servers (should be disabled). Verify ULA vs GUA usage matches policy. | Address sprawl prevented. Security zone violations detected (wrong address ranges crossing tiers). Privacy extensions creating unstable server addresses identified and corrected. | ○ Partial - automated collection, human review of exceptions required |
| 18 | Audit DHCPv6 Server Statistics | Review DHCPv6 scope utilization, lease statistics, and identify scopes approaching exhaustion | Query DHCPv6 statistics using Get-DhcpServerv6ScopeStatistics. Check scope utilization percentages. Review active leases vs scope size. Identify trends indicating need for address pool expansion. Compare against growth projections. | Scope exhaustion prevented through proactive capacity planning. Address space waste identified (oversized scopes). DHCPv6 adoption metrics tracked for transition planning. | Can be automated - generate weekly DHCPv6 utilization report |
| 19 | Replication Topology Review | Review replication topology and site links. | Get-ADReplicationSiteLink and Get-ADReplicationSite | Verify site links and costs still match network topology | |
| 20 | Site Coverage Validation | Validate AD sites match physical locations. | Compare AD sites to physical locations; check for missing sites | Ensure new locations are properly configured | |
| 21 | Replication Performance | Measure replication time for test change across all sites. | Measure replication time for test change across all sites | Establish baseline; detect degradation trends | |
| 22 | WAN Utilization | Check WAN utilization for replication traffic. | Check network monitoring for replication traffic patterns | Verify replication not saturating WAN links during business hours | |
| 23 | FSMO Event log review | Perform a focused review of Directory Services and System logs for warnings/errors not auto-resolved. | Triaged by on-call team; escalate persistent issues. | Medium — prevents escalation of latent issues. | Partial (alerts) |
| 24 | Verify FSMO holders | Confirm current FSMO assignments and reconcile with documentation. | Run Get-ADForest/Get-ADDomain and update inventory. | Medium — keeps documentation accurate for fast response. | Yes (PowerShell) |
| 25 | FSMO Backup verification | Validate recent backups by checking job logs and performing periodic integrity checks. | Verify job success and retention policies; schedule restore test if needed. | High — undetected backup failures increase recovery time and risk. | Yes (backup reports) |
| 26 | FSMO Patch status check | Review recent patch deployments and identify pending reboots affecting DCs. | Coordinate maintenance windows for pending reboots. | Medium — delayed reboots can leave systems in inconsistent states. | Partial (patch management) |
| 27 | Full DFS-R replication backlog report | Generate comprehensive backlog report for all replication groups to identify slow or stuck connections. | Run Get-DfsrBacklogReport.ps1 for all replication groups. Review per-connection backlog counts. Identify connections with persistent backlogs. | Expected: All connections <50 files backlog. If any connection consistently>100 files, investigate network latency, replication schedule restrictions, or bandwidth throttling. | Can be automated with PowerShell - export to CSV for trending analysis |
| 28 | DFS Conflict and Deleted file count review | Monitor ConflictAndDeleted folder to detect multi-master conflict patterns and prevent disk space exhaustion. | Check ConflictAndDeleted folder size on each replication member. Count files and review conflict patterns. Identify folders with frequent conflicts. | Normal: <100 files per member. If>500 files, investigate multi-user file editing patterns. Consider file locking solutions or user training to reduce conflicts. | Can be automated with PowerShell - alert at 100/500 file thresholds |
| 29 | DFS-R database health check | Verify DFS-R database integrity to prevent replication failures caused by database corruption. | Run Get-DfsrState on all replication members. Check for corruption warnings or errors. Review Event Log for Event ID 2212 (database corruption). | Expected: No corruption warnings. If corruption detected, schedule database rebuild during maintenance window (requires full resync—avoid business hours). | Can be automated with PowerShell - alert on corruption detection |
| 30 | DFS Namespace target availability test | Verify all namespace targets are accessible from multiple sites to ensure automatic failover is functioning. | Test DFS path access from multiple sites using Test-Path \\domain\namespace\folder from different locations. Verify all configured targets respond. | All targets should be accessible. If any target fails, check share permissions, network connectivity between sites, namespace server health, and target server availability. | Can be automated with PowerShell - test from each site, alert on accessibility failures |
| 31 | Hyper-V Replica Health | Verify replica VMs healthy and current | Get-VMReplication | Where {$_.Health -ne 'Normal' -or $_.State -ne 'Replicating'}
|
Ensures DR readiness; prevents RPO/RTO breaches. | Yes (health script) |
| 32 | Checkpoint Cleanup | Remove stale checkpoints >7 days old |
Get-VM | Get-VMCheckpoint | Where CreationTime -lt (Get-Date).AddDays(-7) | Remove-VMCheckpoint
|
Prevents long chain performance penalties & storage bloat. | Partial (report auto, deletion approved) |
| 33 | Live Migration Spot Test | Validate LM throughput & latency | Measure-Command { Move-VM -Name TestVM -DestinationHost Node2 }; baseline compare.
|
Detects silent RDMA fallback or network regression early. | Partial |
| 34 | Backup Restore Sample | Test restore of 1–2 VMs from backup | Isolated network restore, boot & service verification. | Confirms recoverability; uncovers backup drift. | Partial |
| 35 | Run EAM Drift Detection Audit (GitOps Baseline) | Compare current privileged group membership against GitOps source of truth to detect unauthorized manual changes (drift) | Execute EAM-DriftAudit.ps1 script against all EAM_* groups. Script compares current AD group membership (Get-ADGroupMember) with baseline JSON exported from GitOps repo. Query Event IDs 4728/4729 (global group changes) and 4732/4733 (local group changes) for audit trail of modifications in last 7 days. | Zero drift expected (all changes via approved GitOps PR). Any drift not matching PR approval in last 7 days = unauthorized change → automatic rollback + root cause investigation. Prevents persistence via hidden admin account additions and ensures deterministic privilege state. | Scheduled Task / GitHub Actions workflow running EAM-DriftAudit.ps1 nightly; output sent to SIEM for correlation; automatic PR creation for rollback if drift detected |
| 36 | Review Privileged Account Usage Patterns (EAM) | Analyze Tier 0 account logon behavior to detect anomalous patterns (off-hours usage, cross-tier violations, unexpected source IPs) | Query Event ID 4624 (successful logon) filtered by EAM_Tier0_*, EAM_Tier1_*, EAM_Tier2_* accounts. Validate: (1) All Tier 0 logons from PAW only, (2) No Tier 0 account interactive on Tier 1/2 assets, (3) Check Event ID 4672 (special privileges assigned) for unexpected SeTakeOwnershipPrivilege or SeDebugPrivilege usage outside maintenance windows. | All Tier 0 logons originate from PAW device group; zero cross-tier interactive sessions; privileged rights usage correlates with approved change requests. Single cross-tier logon = P1 incident (assume credential compromise requiring immediate password reset + session termination). | SIEM correlation rule with baseline behavioral analytics; Azure Sentinel UEBA (User Entity Behavior Analytics) for Entra ID accounts; custom PowerShell report emailed to security team |
| 37 | Audit Authentication Policy Silo Membership vs Intended Design (EAM) | Verify that silo membership (users, computers, service accounts) matches documented plane-map.yaml classification | Run Get-ADAuthenticationPolicySilo for each defined silo (Tier0_ControlPlane, Tier1_Management, etc.). Export membership lists and compare against plane-map.yaml in GitOps repo. Check Event ID 4662 (AD object access) and 5136 (directory object modified) for unauthorized silo membership changes in last 7 days. | 100% alignment between silo membership and plane-map.yaml; no orphaned accounts in multiple silos (violates containment). Unplanned silo changes indicate potential attacker attempting to expand TGT scope for lateral movement → revert + investigate. | PowerShell script comparing Get-ADAuthenticationPolicySilo output to YAML file; differences generate ServiceNow incident; Azure DevOps pipeline validates silo configuration on every commit |
| 38 | Check PAW Compliance Status (Intune/SCCM) (EAM) | Ensure all Privileged Access Workstations meet security baseline (OS patches, WDAC enabled, BitLocker, no unapproved software) | Query Intune device compliance reports or SCCM compliance dashboard for all devices tagged "PAW". Verify: (1) OS version within vendor support window, (2) WDAC policy active (AppLocker/WDAC event logs), (3) BitLocker enabled + recovery keys escrowed, (4) No compliance policy violations in last 7 days. Cross-reference with Conditional Access device state logs. | ≥98% PAWs compliant. Non-compliance <95% = block non-compliant PAWs from Control Plane via CA policy update. Single non-compliant PAW used for Tier 0 access = immediate lockout + forensic review (potential backdoor/malware installation). | Intune/SCCM built-in compliance reporting; Azure Monitor alerts for compliance drops below 98%; Power Automate flow to disable CA exemptions for non-compliant devices |
| 39 | Review gMSA Password Retrieval Events (EAM) | Monitor which service hosts are retrieving Group Managed Service Account passwords to detect unauthorized lateral movement attempts | Query Event ID 4662 (object access) filtered by msDS-ManagedPassword attribute reads. For each gMSA (especially Tier 0 gMSAs like SQL Server service accounts on DCs), list all computers that retrieved the password in last 7 days. Compare against PrincipalsAllowedToRetrieveManagedPassword AD attribute. Check Event ID 5136 for modifications to this attribute. | Only authorized service hosts (listed in PrincipalsAllowedToRetrieveManagedPassword) retrieve gMSA passwords; <5 unique requestors per gMSA per week. Unexpected host retrieving Tier 0 gMSA password = kill affected service immediately + rotate gMSA + investigate source host for compromise. | SIEM query aggregating 4662 events by gMSA object; custom PowerShell script comparing requestors to AD attribute; automated alert to SOC + ServiceNow P1 incident creation |
Bi-Weekly Tasks
| # | Name | Description | Task | Impact Definition | Automated |
|---|---|---|---|---|---|
| 1 | Find passwords in GPOs | Scan all Group Policy Objects for embedded passwords (cpassword in Groups.xml) which are encrypted with a publicly known AES key, creating critical security vulnerability. | Use Get-GPO -All to enumerate GPOs. Search SYSVOL for Groups.xml, Services.xml, Scheduledtasks.xml, DataSources.xml files containing cpassword attribute. PowerShell: Get-ChildItem -Path "\\domain\sysvol" -Recurse -Include *Groups.xml | Select-String "cpassword". Remove vulnerable preferences from GPOs. | Critical vulnerability eliminated (MS14-025). Passwords in GPOs can be decrypted by any domain user. Removal prevents credential exposure. May require reconfiguration of systems using affected GPO preferences. | Automated by IT Security |
| 2 | Certificate Expiration Check (DC & LDAPS) | Verify Domain Controller authentication certificates and LDAPS certificates are valid and not approaching expiration to prevent authentication failures. | On each DC, check certificate store: Get-ChildItem Cert:\LocalMachine\My | Where {$_.Subject -like "*DC*"}. Verify expiration dates. Check LDAPS functionality: ldp.exe or Test-LdapsCertificate. Review CA health if using Enterprise CA. Alert if certificates expire within 60 days. | Certificate expiration prevented avoiding authentication outages. LDAPS connectivity maintained for secure LDAP communications. Early warning allows planned certificate renewal. Certificate expiration causes immediate authentication failures. | Can be automated with PowerShell - Test-DCCertificateExpiration |
| 3 | Trust Health Validation | Verify all trust relationships are functioning correctly and trust passwords are synchronized to prevent cross-domain/forest authentication failures. | Test each trust: netdom trust TrustingDomain /domain:TrustedDomain /verify. Check trust password health: Get-ADTrust -Filter * | Test-ADTrustPassword. Verify Kerberos trust tickets working. Review Event Viewer for trust-related errors (Event ID 5805, 5722, 5723). | Trust failures detected early preventing cross-domain access issues. Trust password mismatches identified and resolved. Authentication across trust boundaries maintained. Trust failure causes widespread authentication issues. | Can be automated with PowerShell - Test-ADTrustHealth |
| 4 | FSMO Role Holder Availability Check | Verify all five FSMO role holders are online and responding to ensure critical AD operations can be performed when needed. | Identify FSMO role holders: netdom query fsmo. Test connectivity to each role holder. Verify services running on FSMO DCs. Check Event Viewer for role-related errors. Document role holder locations for DR planning. For PDC Emulator, verify time sync source configured. | FSMO availability confirmed for critical operations (password changes, schema modifications, RID pool allocation). Offline role holders identified before causing operational issues. DR preparedness validated. Unavailable FSMO roles block specific operations. | Can be automated with PowerShell - Test-FSMORoleAvailability |
| 5 | DNS Scavenging Configuration Review | Review DNS scavenging settings to ensure stale DNS records are cleaned up without accidentally removing valid records. | Check scavenging enabled on DNS zones: Get-DnsServerZone | Select ZoneName, ScavengingEnabled. Review scavenging settings (No-Refresh and Refresh intervals). Verify scavenging schedule on DNS servers. Check for stale record accumulation. Validate scavenging not too aggressive for DHCP environment. | Stale DNS records cleaned preventing name resolution issues. DNS database size controlled. Scavenging properly configured to avoid removing valid records. Improper scavenging can remove active records causing connectivity failures. | Manual review process - scavenging requires careful tuning per environment |
| 6 | GPO Backup Execution | Run full Tier0/1/2 baseline & exception backup routine. | Execute backup script; verify log; rotate old archives. | Maintains point-in-time recovery integrity. | Yes (Script) |
| 7 | Disabled / Unlinked Report | List dormant GPOs for cleanup or re-link decision. | Generate report; tag candidates >30d unused. | Controls sprawl & audit complexity. | Yes (Script) |
| 8 | RSoP Validation Sample | Confirm baseline enforcement on representative endpoints. | Run gpresult; compare vs baseline manifest. | Assures effective policy delivery. | Partial (Script + Compare) |
| 9 | WMI Filter Performance | Check execution performance & relevance of filters. | Extract definitions; flag obsolete predicates. | Improves logon/boot performance. | Partial (Script) |
| 10 | AGPM Archive Hygiene | Prune superseded versions beyond retention policy. | Retain last N + milestone snapshots. | Reduces clutter while preserving rollback fidelity. | Manual (AGPM UI) |
Monthly Tasks
| # | Name | Description | Task | Impact Definition | Automated |
|---|---|---|---|---|---|
| 1 | Clear AdminCount for Non-Privileged accounts. | When a user is added to a privileged group (Like Domain Admins) his account get protected by a proccess called AdminSDHolder. This process controlls the security of the user and inheritance. When the user is removed from the privileged group, this process continues to "protect" the object unless manually removed. In order to minimize a security breach, these accounts must be normalized by clearing the AdminCount attribute and enabling inheritance. Additional reference can be found at https://technet.microsoft.com/en-us/magazine/2009.09.sdadminholder.aspx | Get the list of users who have AdminCount attribute greater than 1. PS:\>Get-AdUser -Filter { AdminCount -gt 0 } -Properties * | Select-Object Name, SamAccountName, DistinguishedName, enabled Check if the user currently belongs to a privileged group (Domain Admins, Enterprise Admins, Administrators, etc.) If user does not belonges to a privileged group, set the AdminCount attribute to 0 and reset ACL to permit permission inheritance (PowerShell script). |
||
| 2 | Missing Subnets in AD | Verify the Netlogon.log on each DC, looking for probable subnets not defined on AD. Create subnets as required to optimize authentication and site-aware services. | Review %SystemRoot%\Debug\Netlogon.log on all DCs for "NO_CLIENT_SITE" warnings. Extract IP addresses of clients without site mapping. Create missing subnet objects in AD Sites and Services. Associate with appropriate sites based on network topology. | Optimal DC selection for authentication. Reduced authentication latency. DFS and other site-aware services function correctly. Group Policy application performance improved. No operational disruption. | Can be automated with PowerShell script to parse Netlogon.log and generate subnet recommendations |
| 3 | Remove Duplicates | Find and delete any duplicated object within the directory caused by replication conflicts or import errors | Search for objects with CNF (conflict) naming: Get-ADObject -Filter {Name -like "*CNF:*"}. Identify legitimate object vs. duplicate. Compare attributes, creation dates, and usage. Delete duplicate objects. Clean up any references to deleted duplicates. | Directory consistency restored. Replication conflicts resolved. Ambiguous object references eliminated. Potential for accidental deletion of wrong object if not carefully identified. | Requires RFC due to deletion risk - careful validation needed before removal |
| 4 | SidHistory matching the domain | Review any account having the current domain SID within the SidHistory attribute. Remove such record on any found account as this represents a misconfiguration. | Query objects with SidHistory: Get-ADUser -Filter {SidHistory -like "*"} -Properties SidHistory. Identify accounts where SidHistory contains current domain SID (indicates migration error or misconfiguration). Remove invalid SidHistory entries using PowerShell or ADSI. | Invalid SID references removed. Token bloating prevented (reduces Kerberos token size). Authorization processed correctly. No functional impact as SidHistory for same domain is redundant. | Can be automated with PowerShell script - Remove-InvalidSidHistory |
| 5 | Review domain "Remote Desktop Users" group | Review and amend the group. As this is a Built-in group it may grant RDP to any device within the domain, including sensitive assets as DC/RODC | Enumerate members: Get-ADGroupMember "Remote Desktop Users". Identify any members (should typically be empty at domain level). Remove unauthorized accounts. Ensure RDP access controlled via delegated OU-level groups instead. Audit group membership changes (Event ID 4732). | RDP access to Domain Controllers prevented. Security boundary enforced. Privileged access properly controlled through delegation model. Removal may break RDP access for users expecting domain-wide RDP rights. | Should be monitored as part of privileged group auditing - integrate with IT Security monitoring |
| 6 | Is Critical OU & DNS zone Accidental protection Enabled? | Review all critical OU's and DNS zones that they have enabled Accidental Protection checkbox enabled to prevent accidental deletion | For OUs: Get-ADOrganizationalUnit -Filter * -Properties ProtectedFromAccidentalDeletion | Where {$_.ProtectedFromAccidentalDeletion -eq $false}. For DNS: Review zone properties in DNS Manager. Enable protection on Domain Controllers OU, Admin OU, critical resource OUs, and all AD-integrated DNS zones. | Accidental deletion prevented for critical infrastructure. "Protect object from accidental deletion" flag enforced. Requires explicit confirmation to delete protected objects. No operational impact - protection is transparent to normal operations. | Can be automated with PowerShell - Set-ADOrganizationalUnit -ProtectedFromAccidentalDeletion $true |
| 7 | Time Synchronization Hierarchy Verification | Verify NTP time synchronization hierarchy is properly configured from external source through PDC Emulator to all DCs and member servers. | Check PDC Emulator external time source: w32tm /query /source. Verify other DCs sync to PDC: w32tm /query /status on each DC. Check time offset: w32tm /stripchart /computer:PDC. Review Event Viewer for time service errors (Event ID 36, 37, 50). Validate NTP stratum levels. | Time synchronization maintained preventing Kerberos authentication failures (5-minute tolerance). Event log correlation accurate. Time-sensitive applications function correctly. Time drift causes authentication failures and audit trail issues. | Can be monitored with PowerShell - Test-TimeSyncHierarchy |
| 8 | GPO Link Review and Cleanup | Review Group Policy links to ensure no orphaned links, correctly scoped GPOs, and proper link order for expected policy application. | Use GPMC or PowerShell: Get-ADOrganizationalUnit -Filter * | Get-GPInheritance to enumerate all GPO links. Identify orphaned links (GPO deleted but link remains). Verify link order matches intended precedence. Check for disabled links. Review link enforcement and blocking inheritance settings. | GPO application predictable and documented. Orphaned links cleaned preventing errors. Link precedence correct for security policies. Performance improved by removing unnecessary links. Incorrect link order can cause security policy failures. | Manual review with GPMC - can be assisted with PowerShell reporting |
| 9 | RID Pool Allocation Review | Review RID Master allocation patterns and individual DC RID pool consumption to ensure adequate RID pools available for object creation and identify potential exhaustion scenarios. | On RID Master DC: dcdiag /test:ridmanager /v. Check allocated RID pool blocks. Review RID pool request frequency. For each DC: Check current RID pool allocation vs. consumption. Calculate time to exhaustion based on object creation rate. Alert if domain approaching global RID pool limit (2^30 - 1 objects). Plan RID pool ceiling raise if needed (requires forest functional level 2012+). | RID pool exhaustion prevented proactively. Domain object creation capacity maintained. Long-term capacity planning informed. RID pool ceiling adjustments planned before crisis. Global RID exhaustion prevents all new object creation domain-wide. | Should be reviewed monthly - reporting can be automated with PowerShell |
| 10 | Default Domain and DC Password Policy Review | Review Default Domain Policy and Default Domain Controllers Policy password settings to ensure compliance with organizational security baseline and regulatory requirements. | Review Default Domain Policy: Get-ADDefaultDomainPasswordPolicy. Verify: Password history (24), max age (60-90 days), min age (1 day), min length (14+ chars), complexity requirements (enabled), lockout threshold (5-10), lockout duration (30 min), lockout observation window. Review Default Domain Controllers Policy for DC-specific settings. Document any Fine-Grained Password Policies (FGPP). | Password security baseline maintained. Compliance requirements met (NIST, CIS, PCI-DSS). Weak password prevention enforced. Account lockout protects against brute force. Weak password policies enable credential attacks. | Policy review is manual - changes require RFC approval and testing |
| 11 | Kerberos Encryption Type Review | Review Kerberos encryption types in use to ensure legacy weak encryption (DES, RC4) is disabled and strong encryption (AES128, AES256) is enforced for security. | Check Default Domain Policy: Computer Configuration > Policies > Windows Settings > Security Settings > Local Policies > Security Options > "Network security: Configure encryption types allowed for Kerberos". Ensure only AES128_HMAC_SHA1 and AES256_HMAC_SHA1 enabled. Verify DES and RC4 disabled. Check DC functional level supports AES (2008+). Test application compatibility before enforcing AES-only. | Strong Kerberos encryption enforced. Legacy weak encryption (DES, RC4) disabled. Pass-the-hash and downgrade attacks mitigated. Compliance with security frameworks (CIS, DISA STIG). RC4 vulnerable to cryptographic attacks. | Policy change requires RFC and extensive application testing |
| 12 | DNS Zone Cleanup | Remove stale and orphaned DNS records to maintain database hygiene | Review stale computer records (computers no longer in AD but still in DNS). Remove orphaned records from decommissioned servers. Clean up test/temporary records that are no longer needed. Compare DNS records against AD computer accounts using Get-DnsServerResourceRecord and Get-ADComputer. | DNS database size reduced. Stale records removed preventing name resolution conflicts. Query performance improved with smaller zone database. Name resolution accuracy maintained. | Can be automated by comparing DNS records against AD computer accounts and flagging orphans |
| 13 | DNS Configuration Backup and Documentation | Export DNS zone data and configuration for disaster recovery purposes | Export all DNS zones (AD-Integrated zones backed up with System State, but export for documentation). Document forwarders, conditional forwarders, and zone settings. Backup DNS server configuration (registry settings, custom scripts). Use Export-DnsServerZone cmdlet. | DNS zone data protected for recovery. Configuration documented for reference. Recovery point objective (RPO) maintained for disaster recovery scenarios. Documentation aids troubleshooting and knowledge transfer. | Can be automated with Export-DnsServerZone cmdlet via scheduled backup task |
| 14 | Review and Apply Security Patches to DHCP Servers | Install Windows Updates on DHCP servers ensuring high availability is maintained during patching process | Test patches in lab environment first. Coordinate patching with failover partner (never patch both simultaneously). Verify failover partnership healthy before patching. Patch one partner, verify clients continue receiving addresses. Wait 24-48 hours then patch second partner. Use Get-HotFix to verify installation. | Security vulnerabilities remediated. Compliance violations prevented. Potential compromise of Tier 0 infrastructure avoided. Unpatched DHCP servers create security exposure and audit findings. | ○ Partial - WSUS/SCCM automates patch deployment but requires maintenance window coordination |
| 15 | Audit Authorized DHCP Servers List | Verify Active Directory authorized server list matches production inventory and remove decommissioned servers | Query authorized servers using Get-DhcpServerInDC. Compare against approved DHCP server inventory. Identify unauthorized or decommissioned servers. Remove stale authorizations. Document all authorized servers with business justification. | Stale authorized servers create security risk. Old servers may be re-provisioned without proper hardening. Rogue DHCP prevention maintained. Accurate inventory enables security audits. | ○ No - requires human review and approval before removal |
| 16 | Review DHCP Administrator Permissions | Audit members of DHCP Administrators group, remove stale accounts, and verify Tier 0 separation compliance | Query DHCP Administrators group membership using Get-LocalGroupMember -Group "DHCP Administrators". Cross-reference with HR database for active employees. Verify Tier 0 admins use separate Tier 0 accounts. Remove terminated employee accounts. Document business justification for each member. | Excessive permissions prevented (privilege creep). Compliance violations avoided. Attack surface reduced by removing unnecessary access. Terminated employee access revoked preventing security incidents. | ○ No - security review requires human judgment and approval |
| 17 | Test DHCP Failover Behavior | Controlled failover test to verify surviving partner assumes responsibility during partner failure | Schedule maintenance window. Stop DHCP service on one partner using Stop-Service DHCPServer. Verify clients successfully renew leases from surviving partner. Monitor scope utilization during test. Verify failover state transitions correctly. Restart service using Start-Service DHCPServer. Verify replication resumes. | Untested failover creates false confidence in HA design. Real failures may expose undiscovered configuration issues. Controlled testing identifies problems before production incidents affect business. | ○ No - requires coordination, monitoring, and validation during test |
| 18 | Review DHCP Scope Naming and Documentation | Verify scope names match naming standards, descriptions are accurate, and IPAM database is current | Export scope list using Get-DhcpServerv4Scope | Select ScopeId, Name, Description | Export-Csv. Review scope naming consistency. Verify descriptions accurately reflect location and purpose. Update IPAM database with any changes. Document VLAN associations and network diagrams. | Poor documentation causes troubleshooting delays. Misconfigured scope expansions occur from inaccurate information. Confusion during incidents wastes time. Accurate documentation enables faster problem resolution. | ○ No - documentation quality assessment requires human review |
| 19 | Test DHCP Configuration Restore | Perform test restore of DHCP backup in lab environment to verify recoverability (quarterly minimum, monthly recommended) | Copy latest DHCP backup to lab environment. Execute Restore-DhcpServer -Path "\\FileServer\Backup\DHCP\Latest" in lab. Verify all scopes, reservations, and options restored correctly. Test client lease assignment. Document restore procedure and RTO/RPO actuals. | Untested backups create unknown recovery time. Backup corruption discovered during actual disaster too late. Recovery procedures validated ensuring confidence in DR capabilities. | ○ Partial - restore command automated but validation requires human review |
| 20 | Archive Old DHCP Audit Logs | Archive and compress DHCP audit logs older than retention period (90+ days) and move to long-term storage | Identify logs older than retention policy. Compress using Compress-Archive -Path "D:\DHCP\AuditLogs\Old\*" -DestinationPath "\\Archive\DHCP\". Move compressed archives to long-term storage. Verify original logs deleted after successful archive. Maintain archive index for retrieval. | Disk space exhaustion prevented. Historical logs preserved for compliance. Audit log retrieval during incident investigations remains possible. Storage costs optimized through compression and tiering. | Can be automated - scheduled task for log archival and cleanup |
| 24 | Audit IPv6 Firewall Rules and ACLs | Verify Windows Firewall and network ACLs protect IPv6 traffic equivalently to IPv4, eliminating security blind spots | Review firewall rules using Get-NetFirewallRule | Get-NetFirewallAddressFilter | Where {$_.RemoteAddress -like "*:*"}. Verify ICMPv6 types permitted match security policy (types 1,2,3,4,128,129,133-137 allowed; all others blocked). Check application rules cover both IPv4 and IPv6. Test firewall blocks unauthorized IPv6 traffic. Review network device ACLs for IPv6 coverage. | IPv6 security blind spots eliminated. Attack surface reduced to match IPv4 protections. Compliance with security policies requiring protocol-agnostic controls. Lateral movement via IPv6 prevented. | ○ Partial - automated rule collection, human analysis of coverage required |
| 25 | Review IPv6 Transition Mechanism Status | Verify 6to4, Teredo, and ISATAP remain disabled per security policy (dual-stack environments should not use tunneling) | Check transition mechanisms using Get-Net6to4Configuration, Get-NetTeredoConfiguration, Get-NetIsatapConfiguration. Verify all show State=Disabled or Type=Disabled. Review Group Policy enforcement of tunnel disablement. Scan network for unexpected tunnel traffic (protocol 41, UDP 3544 Teredo). | Transition mechanism security risks prevented (firewall bypass, untrusted relay routing, NAT traversal exploits). Network complexity reduced. Troubleshooting simplified by eliminating encapsulation. | Can be automated - alert if any transition mechanism becomes enabled |
| 26 | Validate IPv6 DNS AAAA Records | Ensure critical servers have correct AAAA records registered; remove stale entries for decommissioned systems | Query DNS for AAAA records of infrastructure servers. Verify records match current IPv6 addresses using Get-DnsServerResourceRecord -ZoneName "corp.EguibarIT.local" -RRType AAAA. Test AAAA record resolution from clients. Identify orphaned records for deleted servers. Verify reverse DNS (PTR) records exist for all AAAA records. | DNS inconsistencies preventing IPv6 connectivity resolved. Stale records removed reducing confusion during troubleshooting. Monitoring systems correctly resolve hostnames over IPv6. | ○ Partial - identify mismatches automatically, cleanup requires approval |
| 27 | DFS - Generate replication health report | Produce and review a report of replication status, failures, and lag for all sites. | Ensures management visibility and early detection of problems. | Yes | |
| 28 | DFS -Review/update site topology documentation | Verify diagrams and inventories match current AD configuration. | Prevents drift and supports troubleshooting. | No | |
| 29 | DFS -Validate disaster recovery procedures | Test manual bridgehead failover and site link redundancy. | Ensures DR readiness and operational resilience. | No | |
| 30 | DFS -Check for orphaned site links | Identify and remove site links referencing deleted sites. | Reduces replication errors and topology confusion. | No | |
| 31 | Audit preferred bridgehead designations | Review and remove unnecessary manual bridgehead assignments. | Prevents single points of failure. | No | |
| 32 | Replication & topology review | Analyze replication latency, queue sizes, and inter-site links for issues. | Run repadmin reports; adjust schedules or troubleshoot broken links. | High — replication problems affect directory integrity. | Yes (reports) |
| 33 | FSMO Monitoring rules & thresholds | Review and tune monitoring alerts to reduce noise and ensure coverage. | Adjust SIEM/monitoring thresholds and update runbooks accordingly. | Medium — reduces alert fatigue and ensures critical events are noticed. | Partial |
| 34 | FSMO Inventory & documentation | Ensure FSMO owner records, contact info, and runbook versions are current. | Update CMDB/inventory and commit runbook changes to version control. | Medium — accurate docs speed incident response. | Partial (automation for reports) |
| 35 | FSMO Security & audit review | Review privileged access to FSMO DCs and recent administrative activity. | Audit privileged groups, review access logs, and rotate credentials if needed. | High — prevents unauthorized role manipulation. | Partial (audit tooling) |
| 36 | DFS ConflictAndDeleted folder cleanup | Review and clean up conflict and deleted files that accumulate when replication encounters file conflicts. | Examine \\ServerName\ConflictAndDeleted share, identify legitimate files vs. replication artifacts, move valid data, delete obsolete conflicts. | Moderate — reclaims disk space and reduces confusion. Target: <100 files. | Manual (scripted reporting available) |
| 37 | DFS-R staging quota review | Check if staging quotas need adjustment based on historical utilization and peak file-change patterns. | Review staging utilization metrics, adjust quotas if frequently exceeding 60%, document changes. | High — prevents replication stalls due to undersized staging areas. Target: 60-80% max utilization. | Partial (monitoring automated) |
| 38 | DFS replication performance trending | Analyze monthly replication backlog trends, bandwidth usage, and identify emerging bottlenecks. | Generate monthly performance report, compare against SLAs, identify sites with degrading performance. | Moderate — enables proactive capacity planning and prevents service degradation. | Partial (reporting automated) |
| 39 | DFS-R topology verification | Verify replication topology matches documented design (Hub-and-Spoke vs. Full Mesh) and all connections are healthy. | Run Get-DfsrConnection | Select-Object GroupName, SourceComputerName, DestinationComputerName, Enabled. Compare against design doc. | Moderate — detects configuration drift and ensures redundancy expectations are met. | Partial (topology reporting automated) |
| 40 | Exception Expiry Audit | Identify temporary exception GPOs past lifespan. | Compare exception list vs expiry; retire or renew. | Limits risk accumulation & drift. | Partial (Script + Approval) |
| 41 | Consolidation Review | Assess recurring exceptions for baseline inclusion. | Analyze frequency; propose merge if pattern stable. | Strengthens consistency; reduces overhead. | Manual (Analysis) |
| 42 | Baseline Compliance Diff | Diff baselines vs current MS / CIS updates. | Pull benchmark delta; draft update proposal. | Keeps parity with external standards. | Partial (Script + Review) |
| 43 | Delegation ACL Review | Verify only approved groups retain rights. | Enumerate ACLs; diff against manifest. | Prevents privilege creep. | Yes (Script) |
| 44 | PIM Access Review for All Privileged Roles (EAM) | Conduct formal access review of all Entra ID PIM eligible assignments to ensure only authorized users retain privileged access | Run Entra ID PIM access review for all privileged roles (Global Admin, Privileged Role Admin, Security Admin, etc.). Reviewers (typically managers + security team) verify each eligible assignment is still required. Check PIM role assignment history for: (1) Zero standing assignments exist, (2) All eligible assignments reapproved by manager, (3) Dormant roles (no activation in last 90 days) flagged for removal. | 100% of eligible assignments reviewed and reapproved. Dormant roles (>90 days no activation) automatically removed to reduce attack surface. Reviewers skip >10% of reviews → escalate to CISO for mandatory completion. Ensures least-privilege principle and removes stale access (former employees, role changes). | Entra ID PIM built-in access review workflows with email reminders; Power Automate to escalate incomplete reviews; Microsoft Graph API to query dormant roles and auto-remove eligibility |
| 45 | Conditional Access Policy Effectiveness Review (EAM) | Analyze Conditional Access policy success/failure rates and identify gaps in coverage or usability issues causing excessive break-glass usage | Query Entra ID Sign-in Logs for last 30 days filtered by admin roles. Calculate: (1) Policy success rate (≥99.5% target), (2) Block events from non-compliant devices, (3) Break-glass account usage frequency. Use What If tool to simulate test scenarios (non-compliant device, risky sign-in, legacy auth) and validate policies block correctly. | ≥99.5% admin sign-ins comply with CA policies; <0.1% break-glass usage. >5 break-glass uses in 30 days indicates policy too restrictive (usability issue) or legitimate break-glass process needs refinement. Policy gaps detected via What If tool require immediate remediation to prevent bypass. | Entra ID Conditional Access Insights and Reporting workbook; custom Power BI dashboard; Azure Monitor alerts for break-glass usage spikes; Azure DevOps pipeline to test CA policies in staging tenant |
| 46 | Tier 0 Backup Restore Test (DC/ADFS/CA) (EAM) | Validate disaster recovery procedures by performing full restore of critical Tier 0 components in isolated test environment | In isolated lab network: (1) Restore Domain Controller from latest backup using Windows Server Backup or third-party tool, (2) Verify SYSVOL replication post-restore (check DFSR/FRS event logs), (3) Test LDAP queries and Kerberos authentication against restored DC, (4) Validate ADFS farm restore and certificate validity, (5) Document recovery time (target <4 hours from backup media to functional DC). | Full DC restore completes in <4 hours; SYSVOL replication healthy; authentication services functional post-restore. Restore failure or >6 hour recovery indicates backup corruption or process gap → immediate backup tooling remediation + revalidation required. Monthly testing ensures DR readiness and meets compliance audit requirements (SOC 2, ISO 27001). | Scripted restore process documented in runbook; backup software (Veeam/Commvault) test restore jobs scheduled monthly; automated validation tests (LDAP bind, Kerberos ticket request) via PowerShell |
| 47 | Review Privileged Credential Rotation Compliance (EAM) | Audit all Tier 0 service accounts and admin passwords to ensure rotation meets 60-day policy (gMSA auto-rotation verified) | Query AD for all Tier 0 accounts: (1) Get-ADServiceAccount for gMSAs → verify automatic rotation enabled (msDS-ManagedPasswordInterval attribute), (2) Get-ADUser for admin accounts → check PasswordLastSet <60 days, (3) Query Azure Key Vault for stored secrets (service principal credentials, API keys) and flag any >60 days old, (4) Review gMSA rotation logs for failures (Event ID 4662). | 100% Tier 0 service accounts use gMSA with auto-rotation; manual admin passwords ≤60 days old; Key Vault secrets ≤60 days. Any Tier 0 password >90 days = force reset immediately + investigate why rotation skipped. Non-gMSA service account found = migration plan required within 30 days to prevent pass-the-hash attacks. | PowerShell script querying PasswordLastSet and msDS-ManagedPasswordInterval; Azure Key Vault compliance report via Azure Policy; automated email alerts for passwords approaching expiration; ServiceNow ticket auto-creation for non-compliant accounts |
| 48 | Audit Tier 0 Application Inventory for Drift (EAM) | Detect unauthorized software installations on Domain Controllers and PAWs that could indicate backdoor persistence or malware | Run software inventory scan (SCCM/Intune) on all DCs and PAWs. Compare installed applications against approved baseline stored in GitOps repo. Check: (1) File Integrity Monitoring (FIM) on System32 and SYSVOL for new executables, (2) Event ID 4688 (process creation) for unusual binaries, (3) Application Control (WDAC/AppLocker) policy violations, (4) Service installations (Event ID 7045). | Installed applications 100% match approved baseline; zero unapproved executables in System32; no WDAC policy violations. Unauthorized software detected on DC = immediate quarantine + forensic imaging (potential APT implant or ransomware). Non-baseline software on PAW = user violation requiring device reimage + security awareness training. | SCCM/Intune software inventory reports; File Integrity Monitoring via Windows Defender ATP or third-party FIM tool; SIEM query aggregating 4688/7045 events; automated baseline comparison script generating variance report |
| 49 | Validate Emergency Break-Glass Account Functionality (EAM) | Test break-glass accounts (on-prem and cloud) to ensure they remain functional for emergency access during CA policy failures or PIM outages | In controlled test: (1) Authenticate to DC using on-prem break-glass account (stored in physical safe), (2) Sign in to Azure Portal with cloud break-glass account (excluded from CA policies), (3) Verify PIM override capability (manual role assignment without approval), (4) Check Event ID 4624 for successful logon, (5) Validate credentials stored in offline location (safe) match current AD/Entra ID passwords. | Break-glass accounts authenticate successfully to both on-prem and cloud; credentials in secure offline storage are current (no password expiration drift). Break-glass logon fails = immediate credential reset + access path verification (ensure CA exclusion policies intact). Monthly testing prevents "locked out of production" scenario during real incidents. | Manual test documented in runbook with screenshot evidence; credential vault sync verification via PowerShell comparing AD PasswordLastSet to stored credential date; automated reminder to security team to perform test |
Odd Monthly Tasks
| # | Name | Description | Task | Impact Definition | Automated |
|---|---|---|---|---|---|
| 1 | GPO Delete Empty | Identify and delete empty Group Policy Objects that contain no settings in either Computer or User configuration sections to reduce administrative overhead and clutter. | Query all GPOs: Get-GPO -All | ForEach-Object { $gpoReport = [xml](Get-GPOReport -Guid $_.Id -ReportType Xml); if no settings in Computer or User Configuration sections, flag as empty. Verify GPO last modified date older than 60 days. Check for any OU or site links using Get-GPInheritance. Document GPO purpose from description field. If truly empty and unlinked for 60+ days, delete GPO object and associated SYSVOL folder. Requires RFC approval before deletion. | GPMC console clutter reduced. Administrative overhead minimized. SYSVOL replication traffic decreased. No operational impact if GPO is truly empty and unlinked. Risk of deleting GPO intended for future use - verify with policy owners first. | Requires an RFC - can be identified with PowerShell script but deletion needs approval |
| 2 | GPO Fix Orphaned (GPC without GPT) | Identify and remediate orphaned Group Policy Containers (GPC) in Active Directory that lack corresponding Group Policy Templates (GPT) in SYSVOL due to replication failures or manual deletions. | Enumerate all GPOs in AD: Get-GPO -All. For each GPO, verify corresponding GPT folder exists in SYSVOL at \\domain\SYSVOL\domain\Policies\{GUID}. If GPT folder missing: Check if GPO is linked anywhere using Get-GPInheritance across all OUs. Review GPO last modified date and description. Option 1: If GPO is linked and recently used, restore GPT from backup or recreate settings. Option 2: If GPO is not linked and old, delete GPC object from AD using Remove-GPO. Document all actions for audit trail. | GPO consistency restored between AD and SYSVOL. Replication errors eliminated. GPMC console displays accurate GPO status. If GPO was linked, policy application resumes after GPT restoration. If deleted, removes misleading object from AD. Incorrect deletion of linked orphaned GPO causes policy application failures. | Requires an RFC due to potential impact - manual investigation and remediation needed |
| 3 | GPT Fix Orphaned (GPT without GPC) | Identify and clean up orphaned Group Policy Templates (GPT) in SYSVOL that lack corresponding Group Policy Container (GPC) objects in Active Directory, typically caused by improper GPO deletion. | Enumerate all policy folders in SYSVOL: Get-ChildItem \\domain\SYSVOL\domain\Policies -Directory. For each GUID folder (excluding {6AC1786C-016F-11D2-945F-00C04fB984F9} and {31B2F340-016D-11D2-945F-00C04FB984F9} which are default GPOs), verify corresponding GPC exists in AD: Get-GPO -Guid {GUID}. If GPC not found in AD: Document GPT folder path and size. Review gpt.ini file for version and display name hints. If no matching GPC found, delete orphaned GPT folder from SYSVOL on PDC Emulator (replication will propagate). Verify SYSVOL replication completes successfully. | SYSVOL storage reclaimed. Replication consistency improved. Directory hygiene maintained. No operational impact as orphaned GPT cannot apply policy without GPC object. Incorrect identification could delete valid GPT causing policy application failures until SYSVOL replication restores it. | Requires an RFC - must be performed on PDC Emulator to ensure proper SYSVOL replication |
| 4 | GPO Unlinked Cleanup | Identify and evaluate Group Policy Objects that are not linked to any organizational unit, site, or domain for potential deletion or archival to reduce administrative complexity. | Query all GPOs: Get-GPO -All. For each GPO, check for links using: Get-GPInheritance on all OUs, Get-ADDomain | Check domain links, Get-ADReplicationSite | Check site links. Flag GPOs with zero links. Review GPO description, settings report, and last modified date. If unlinked for 90+ days: Contact GPO owner (if documented) to verify if needed for future. For confirmed unused GPOs: Option 1: Backup GPO using Backup-GPO before deletion. Option 2: Move to "Archived GPOs" OU if using AGPM. Option 3: Delete using Remove-GPO. Document justification for deletion or retention decision. | GPO inventory streamlined. Administrative confusion reduced. SYSVOL storage and replication traffic minimized. No production impact as unlinked GPOs don't apply policy. Risk of deleting GPO being prepared for future deployment - verify retention requirements before deletion. | Requires an RFC - backup and approval mandatory before deletion |
| 5 | Domain Controller Logon/Startup Scripts Review | Review and validate any logon or startup scripts assigned to Domain Controllers or Read-Only Domain Controllers through Group Policy to ensure only approved scripts execute on critical infrastructure. | Identify GPOs linked to Domain Controllers OU: Get-GPInheritance -Target "OU=Domain Controllers,DC=EguibarIT,DC=local". For each linked GPO: Export GPO settings report using Get-GPOReport -ReportType Html. Review Computer Configuration > Policies > Windows Settings > Scripts > Startup section. Review User Configuration > Policies > Windows Settings > Scripts > Logon section. Document all scripts found (script path, parameters, run order). Verify each script: Review script code for malicious content. Validate business justification for script execution on DCs. Check script digital signature if available. Remove unauthorized or unnecessary scripts. Test DC functionality after script removal in isolated test environment first. | DC security posture improved by removing unnecessary script execution. Attack surface reduced on Tier 0 infrastructure. Unauthorized startup/logon scripts eliminated. Only validated, approved scripts execute on domain controllers. Script removal may break DC functionality if script was providing critical operation - thorough testing required. | Requires an RFC and thorough testing - script removal on DCs can have domain-wide impact |
Quarterly Tasks
| # | Name | Description | Task | Impact Definition | Automated |
|---|---|---|---|---|---|
| 1 | Reivew of PasswordNeverExpires accounts | The objective of this task is to control, review, and validate accounts with password never expires flag to ensure only authorized service accounts have this exception. | Query all accounts: Get-ADUser -Filter {PasswordNeverExpires -eq $true} -Properties PasswordNeverExpires, PasswordLastSet. Review business justification for each account. Verify service accounts are documented and approved. Remove flag from unauthorized accounts. Update exception documentation. | Password policy compliance enforced. Unauthorized exceptions eliminated. Service account inventory validated. Security risk reduced from accounts with static passwords. No impact to approved service accounts. | For Service Accounts, this is automated with EguibarIT Housekeeping module |
| 2 | Review AD Sites & Services network | Ensure that logical network topology reflects the physical network infrastructure. Capture and document any changes to network architecture. | Open AD Sites and Services console. Review all sites, subnets, and site links. Verify subnet assignments match current network design. Update site link costs based on actual bandwidth. Document changes to replication topology. Coordinate with network team for upcoming infrastructure changes. | Replication topology optimized for current network. Site-aware services function optimally. Authentication routes to nearest DC. Documentation current for DR planning. No disruption to production during review. | Requires network team coordination - manual review process with documentation updates |
| 3 | Review AGPM policies | Advanced Group Policy Management (AGPM) policy review to ensure proper change control and versioning of Group Policy Objects | Review AGPM archive for all controlled GPOs. Verify change approval workflow functioning. Check for uncontrolled GPOs that should be managed. Review delegation of AGPM roles. Audit GPO version history and rollback capability. Validate backup and restore procedures. | GPO change control maintained. Version history preserved. Unauthorized GPO changes prevented. Rollback capability verified. Compliance with change management policy ensured. No production impact during review. | Manual review process coordinated with change management team |
| 4 | Review Privileged Users | Comprehensive review of all Tier 0 and highly privileged accounts to ensure only authorized personnel have elevated access | Enumerate all Tier 0 admin accounts (Domain Admins, Enterprise Admins, Schema Admins, BUILTIN\Administrators). Verify each account against authorized user list. Review account usage logs. Check for dormant privileged accounts. Validate MFA enrollment. Confirm privileged access recertification completed. Remove unauthorized accounts. | Privileged access governance enforced. Unauthorized elevated access eliminated. Compliance with least privilege principle. Audit trail for privileged access maintained. Account removal may impact users with unauthorized access. | Can be partially automated with EguibarIT Delegation module - Generate-PrivilegedUserReport |
| 5 | Review Tier 1 and Tier 2 Privileged Users | Review all Tier 1 and Tier 2 administrative accounts to ensure proper segregation and least privilege access | Query all Tier 1 and Tier 2 privileged accounts. Verify assignment matches current job roles. Review group memberships for appropriate delegation. Check for privilege creep (accumulation of unnecessary rights). Validate naming convention compliance (T1_*, T2_*). Confirm associated standard user account still active. | Tier model segregation maintained. Excessive privileges removed. Role-based access current with organizational changes. Delegation model integrity preserved. May require user access modification if roles changed. | Can be partially automated with EguibarIT Delegation module - Test-TierPrivilegedAccounts |
| 6 | Review granted Privileges and Rights | Audit User Rights Assignments across all domain and OU-level GPOs to ensure no unauthorized privilege escalation paths exist | Export all GPOs: Get-GPOReport -All -ReportType XML. Parse User Rights Assignment section. Review sensitive privileges: SeDebugPrivilege, SeTakeOwnershipPrivilege, SeBackupPrivilege, SeRestorePrivilege, SeLoadDriverPrivilege. Verify assignments match delegation documentation. Remove unauthorized grants. | Privilege escalation paths eliminated. User Rights Assignment properly delegated. Separation of duties enforced. Attack surface reduced. Changes may affect custom administrative workflows if unauthorized privileges were in use. | Manual review process - can be assisted with PowerShell GPO report parsing scripts |
| 7 | ACL check for Unknown SIDs | Scan Active Directory ACLs for orphaned SIDs (security principals that no longer exist) to clean up permission structures | Use dsacls or PowerShell Get-Acl to enumerate permissions on OUs, GPOs, and critical objects. Identify ACEs containing SIDs that don't resolve to valid security principals (appear as S-1-5-21-...). Document orphaned SIDs. Remove invalid ACEs after verification. Use Remove-PermissionEntry function. | ACL hygiene maintained. Orphaned permissions cleaned. Security descriptor bloat reduced. Unknown SIDs cannot grant unexpected access. No operational impact as SIDs no longer represent valid principals. | Can be automated with PowerShell script - Find-OrphanedSIDsInACLs |
| 8 | Forest/Domain Functional Level Review | Review current functional levels and plan upgrades to take advantage of new features and improved security capabilities. | Check current levels: Get-ADForest | Select ForestMode and Get-ADDomain | Select DomainMode. Verify all DCs running supported OS versions. Document functional level features in use. Plan upgrade path if legacy DCs removed. Review new features available at higher levels (e.g., PAM, authentication policies). | Functional level roadmap maintained. New security features identified for implementation. Legacy constraints documented. Upgrade readiness assessed. Higher functional levels enable advanced security features like Privileged Access Management. | Manual planning process - requires business justification and testing |
| 9 | AD Backup Validation and Restore Testing | Verify Active Directory backups are completing successfully and test restore procedures to ensure disaster recovery capability. | Verify system state backups completing on all DCs. Check backup integrity and retention. Perform test restore in isolated environment: boot to DSRM, restore system state, verify AD functionality. Document restore procedures. Validate authoritative restore procedures documented. Test forest recovery plan. | Backup integrity confirmed. Restore procedures validated. Recovery time objectives (RTO) verified. DR preparedness ensured. Team trained on restore procedures. Untested backups may fail during actual recovery need. | Requires scheduled maintenance window - coordinate with change management |
| 10 | AD Database Size Monitoring and Growth Trend Analysis | Monitor NTDS.dit database size and analyze growth trends to plan capacity, identify bloat, and optimize performance. | On each DC: Check NTDS.dit file size and location: Get-ItemProperty "C:\Windows\NTDS\ntds.dit". Compare against baseline and previous quarters. Calculate growth rate. Use ntdsutil to check internal database statistics: "files" > "info". Identify white space percentage. Plan offline defragmentation if white space >30%. Review deleted object tombstone accumulation. | Database growth trends identified. Capacity planning informed. Storage requirements forecasted. Performance degradation predicted. Defragmentation needs identified. Uncontrolled growth impacts DC performance and backup windows. | Monitoring automated - analysis and planning require manual review |
| 11 | Directory Service Access Auditing Review (SACL) | Review System Access Control List (SACL) audit configurations on critical AD objects to ensure security-relevant access is being logged for forensics and compliance. | Review audit policy: Audit Directory Service Access enabled in Default Domain Controllers Policy. Check SACLs on critical objects: AdminSDHolder, Domain root, Schema partition, Configuration partition, KRBTGT account. Verify auditing: Success for all access, Failure for write attempts. Review Event ID 4662 (directory service access) volume. Tune SACL to balance security visibility with log volume. | Security-relevant AD access logged. Forensic investigation capability maintained. Compliance requirements met (PCI-DSS, HIPAA). Unauthorized schema/configuration changes detected. Privileged object access tracked. Without SACL auditing, malicious AD changes go undetected. | Audit policy configuration manual - log collection automated via SIEM |
| 12 | Privileged Access Management (PAM) Feature Usage Review | If using PAM features (time-limited group membership, authentication policies), review usage and effectiveness for Tier 0 privileged access control. | Check forest functional level (2016+ required for PAM). Review time-limited group memberships: Search for msDS-MembershipExpiry attribute. Verify authentication policy usage: Get-ADAuthenticationPolicy. Check authentication policy silos: Get-ADAuthenticationPolicySilo. Review PAM shadow security principal usage. Analyze privileged access patterns via audit logs. | PAM feature adoption tracked. Time-limited privileges enforced. Just-in-time admin access validated. Authentication policies scoped correctly. Privileged access least-privilege model verified. Underutilized PAM features represent missed security opportunities. | Manual review - PAM configuration complex and environment-specific |
| 13 | DHCP Capacity Planning Review | Analyze 90-day DHCP scope utilization trends to plan expansions or subnet redesigns before exhaustion occurs | Query historical scope statistics using Get-DhcpServerv4ScopeStatistics | Select ScopeId, InUse, Free, PercentInUse. Generate trend analysis in Excel or Power BI. Calculate growth rate over 90 days. Project exhaustion dates for scopes above 70%. Plan subnet expansions or supernet consolidation. | Unplanned scope exhaustion avoided. Proactive capacity planning enables scheduled changes. Firefighting and emergency subnet changes eliminated. Business growth accommodated without outages. | ○ No - analyst review required for capacity planning decisions |
| 14 | DHCP Security Posture Assessment | Verify DHCP hardening checklist compliance including Tier 0 controls, logging, and RBAC | Review DHCP hardening checklist from configuration guide. Verify audit logging enabled and forwarded to SIEM. Check DHCP Administrators group membership. Validate Tier 0 classification controls. Verify PAW usage for administration. Review authorized servers list. Compare against CIS and Microsoft baselines. | Security drift from baseline detected. Attack surface minimized through proper hardening. Compliance risk reduced. Tier 0 protections validated. Unaddressed security gaps create exploitable vulnerabilities. | ○ No - security assessment requires expert judgment |
| 15 | DHCP Disaster Recovery Drill | Simulate DHCP server failure and execute recovery procedures to validate RTO/RPO and runbook accuracy | Schedule DR drill during maintenance window. Simulate DHCP server failure. Execute recovery procedures from runbook. Restore DHCP configuration from backup using Restore-DhcpServer. Measure actual recovery time. Verify client lease assignment post-recovery. Document lessons learned and update runbook. | Unverified DR procedures create surprises during real incidents. Unknown recovery times prevent accurate RTO commitments. Runbook gaps discovered and remediated. Confidence in recovery capability validated. | ○ Partial - restore execution scriptable but validation and timing require manual review |
| 16 | DHCP Compliance Audit Preparation | Assemble DHCP audit logs, change tickets, access reviews, and configuration exports for regulatory audits | Export DHCP audit logs for audit period. Correlate configuration changes to approved change tickets. Document authorized server list with business justification. Export current configuration using Backup-DhcpServer for evidence. Prepare access control review documentation. Compile security assessment results. | Audit evidence readily available. Compliance findings prevented through proactive preparation. Change management traceability demonstrated. Access controls documented. Poor evidence trails create audit findings and remediation work. | ○ No - documentation assembly and curation requires human effort |
| 17 | DHCP Performance Baseline Review | Compare lease latency and failover replication lag against historical baselines to detect degradation | Collect DHCP performance metrics from monitoring system. Analyze lease assignment latency trends. Review failover replication lag statistics. Compare against quarterly and annual baselines. Investigate any performance degradation patterns. Document performance trends for capacity planning. | Performance regressions detected before becoming critical. Degradation trends identified for investigation. Capacity planning informed by performance data. Baseline metrics enable objective performance assessment. | Can be automated - metrics collection automated, trend analysis requires human interpretation |
| 20 | Review IPv6 Adoption and Traffic Metrics | Track IPv6 usage trends, identify applications and services not yet IPv6-capable, plan for increased IPv6 reliance | Analyze network traffic logs to calculate percentage of connections using IPv6 vs IPv4. Review application logs for IPv6 connection attempts. Survey critical applications for dual-stack readiness. Measure DNS AAAA query volume vs A record queries. Document IPv6 adoption by department/application. Identify blockers to IPv6 expansion. | IPv6 adoption strategy data-driven. Investment priorities identified (applications needing IPv6 support). Readiness for IPv6-only segments assessed. Trend analysis shows adoption velocity for planning. | ○ Partial - log analysis automated, strategic planning requires human input |
| 21 | Audit IPv6 Address Plan vs Actual Allocations | Verify address allocation hierarchy matches documented plan; identify unauthorized subnets or scope creep | Review IPAM documentation for IPv6 address plan (hierarchical allocation, ULA vs GUA usage, subnet assignments). Enumerate in-use prefixes using Get-NetRoute, DHCPv6 scopes, and router configs. Identify deviations from plan (unauthorized subnets, incorrect prefix lengths, misaligned allocation hierarchy). Document and remediate variances. | Address plan integrity maintained. Unauthorized subnet sprawl prevented. Hierarchical addressing structure preserved for summarization and scalability. Documentation reflects reality enabling troubleshooting. | ○ Partial - actual allocations collected automatically, variance analysis requires human review |
| 22 | Full topology review | Comprehensive audit of all sites, subnets, links, and bridgeheads. | Ensures long-term reliability and supports strategic changes. | No | |
| 23 | Disaster recovery simulation | Simulate hub site or bridgehead failure and validate recovery steps. | Tests organizational readiness for major incidents. | No | |
| 24 | DFS failover testing | Test DFS Namespace client failover by simulating primary target server failures. | Disable one DFS target, verify clients automatically failover to alternate target within referral timeout (typically 5 minutes). Document results. | High — validates client referral mechanism and ensures business continuity during server failures. | Manual (failover simulation not recommended for automation) |
| 25 | DFS-R capacity planning review | Project future DFS storage and bandwidth needs based on growth trends and business initiatives. | Analyze historical data growth rates, upcoming projects (mergers, branch openings), calculate projected storage/bandwidth needs for next 12 months. | High — prevents capacity exhaustion and enables proactive infrastructure investment. | Partial (data collection automated) |
| 26 | DFS replication topology optimization | Evaluate if current replication topology (Hub-and-Spoke or Full Mesh) still meets organizational needs. | Review replication latency requirements, WAN utilization, site growth. Recommend topology changes if warranted. | Moderate — ensures replication efficiency as infrastructure evolves. | Manual (requires business context and analysis) |
| 27 | DFS disaster recovery testing | Validate DFS-R database restore procedures and authoritative restore processes. | Simulate DFSR database corruption, perform database recovery using Repair-DfsReplication.ps1 or manual steps. Test authoritative restore (D4) in lab environment. | Critical — validates DR preparedness and ensures runbooks are accurate. Prevents panic during real incidents. | Manual (DR testing not suitable for automation) |
| 28 | Failover Drill | Simulate node outage & observe behavior | Gracefully stop cluster service or power off one node (planned window). | Validates resilience & witness configuration. | No |
| 29 | Capacity Forecast | Project CPU/RAM/Storage 12-month runway | Trend utilization; produce forecast report. | Prevents emergency procurement. | Partial |
| 30 | Replica Test Failover | Validate Hyper-V Replica recovery | Start-VMFailover -VMName Sample -AsTest; application smoke tests. |
Assures DR path works end-to-end. | No |
| 31 | Firmware & Driver Review | Assess vendor releases & apply updates | Stage updates; rolling node maintenance. | Improves stability & security posture. | No |
| 32 | Benchmark Alignment | Deep review vs CIS, STIG & MS security baselines. | Map deltas; document remediate/accept decisions. | Mitigates divergence from industry posture. | Manual (Workshop) |
| 33 | Backup Restore Test | Restore sample baseline GPOs in lab. | Import from archive; validate checksum. | Proves recovery viability. | Partial (Script) |
| 34 | gMSA Rotation Verification | Confirm password rotations executed. | Review AD events & service usage. | Assures hardened service identity. | Manual / SIEM |
| 35 | OU / Tier Boundary Assessment | Validate separation still supports org changes. | Review new workloads; adjust OU design. | Prevents cross-tier leakage. | Manual (Architecture) |
| 36 | Filtering Hygiene | Confirm only minimal necessary groups filtered. | Enumerate filters; remove residual pilot groups. | Reduces unintended exposure. | Yes (Script) |
| 37 | krbtgt Password Dual Rotation (Per Microsoft Guidance) (EAM) | Rotate krbtgt account password twice (10 hours apart) to invalidate all existing Kerberos tickets and mitigate Golden Ticket persistence | Follow Microsoft AD Forest Recovery guidance: (1) Reset krbtgt password on PDC Emulator (Event ID 4781), (2) Wait 10 hours for replication + ticket expiration (default 10h lifetime), (3) Reset krbtgt password second time, (4) For RODCs, rotate krbtgt_XXXXX per RODC (Event ID 4794), (5) Monitor replication health post-rotation (check Event IDs for replication failures), (6) Temporarily reduce ticket lifetime to 4h for 48h to expedite cache expiration. | Both krbtgt and krbtgt_XXXXX (RODC) passwords rotated successfully; zero AD replication failures post-rotation; no service disruptions reported. Replication breaks post-rotation = rollback immediately + schedule maintenance window for retry with extended replication monitoring. Quarterly rotation invalidates any Golden Ticket attacks with <90 day persistence window. | Manual execution following documented runbook; PowerShell script to reset passwords + verify replication (New-ADServiceAccountPassword equivalent for krbtgt); automated monitoring of Event IDs 4781/4794 + replication health via SIEM |
| 38 | Red Team / Purple Team Validation of EAM Controls | Conduct adversary simulation exercises to validate EAM controls prevent Tier 0 compromise from lower tiers and detect attack paths within SLA | Engage internal or external red team to simulate: (1) Tier 1 foothold → attempt DC compromise, (2) Phished Global Admin credentials → tenant takeover attempt, (3) Golden Ticket generation post-krbtgt theft, (4) gMSA credential replay attack, (5) Run BloodHound to enumerate attack paths. Validate: Authentication Policy Silos block lateral movement, PAW enforcement prevents non-PAW DC access, PIM JIT limits standing privilege abuse. Measure time-to-detect (target <15 minutes). | Zero successful Tier 0 compromises from Tier 1 foothold; lateral movement blocked by silos; detection occurs <15 minutes after initial compromise indicator. Red team achieves Domain Admin from Tier 1 = control gap requiring emergency remediation sprint. Detection >15 minutes = SIEM tuning + additional event log collection required. | Manual red team engagement scheduled quarterly; BloodHound analysis automated via Jenkins job; attack simulation results documented in ServiceNow + remediation backlog; detection timing measured via SIEM correlation rule timestamps |
| 39 | Review and Update plane-map.yaml Classification (EAM) | Audit asset inventory to ensure all new systems, applications, and cloud workloads are classified into correct Tier and Plane within 30 days of deployment | Query CMDB for all assets deployed in last 90 days. Cross-reference against plane-map.yaml in GitOps repo. Check: (1) All assets have Tier (0/1/2) and Plane (Control/Management/Data/User/App) classification, (2) No "unknown" or "pending" classification entries >30 days old, (3) Decommissioned assets removed from plane-map.yaml, (4) New cloud workloads (Azure VMs, App Services, Function Apps) tagged with Tier/Plane in Azure Resource Manager. | 100% assets classified within 30 days of deployment; zero orphaned entries in plane-map.yaml. >5% assets unclassified = delay new deployments until classification complete (prevents privilege escalation via uncontained workloads). Classification accuracy validated by reviewing 10% sample manually (quality control check). | PowerShell script querying CMDB + Azure Resource Graph for unclassified assets; automated PR generation to update plane-map.yaml with suggested classifications (ML-based); Azure Policy enforcement requiring Tier/Plane tags on all resources before deployment allowed |
| 40 | Audit Admin Workstation (PAW/VDI) Usage Compliance (EAM) | Verify all Tier 0 admins use dedicated PAWs exclusively for privileged tasks with no dual-use (email/web/productivity apps) violations | Analyze last 90 days of Event ID 4624 logon events for all EAM_Tier0_* accounts. Cross-reference source workstation names against PAW device inventory (tagged "PAW" in CMDB). Query endpoint detection (EDR) telemetry for: (1) Email client launches on PAWs (Outlook.exe), (2) Web browser usage (Edge/Chrome process creation Event ID 4688), (3) Productivity app usage (Excel/Word), (4) Web proxy logs showing PAW IPs. Validate 100% Tier 0 interactive logons from dedicated PAWs. | 100% Tier 0 interactive logons from dedicated PAWs; zero email/web/productivity app usage on PAWs. Admin uses personal device or dual-use PAW = immediate account suspension + mandatory security retraining + device reimage. Compliance <98% = escalate to management for policy enforcement + user education campaign. | SIEM query correlating 4624 events with CMDB PAW inventory; EDR query (Defender for Endpoint, CrowdStrike) detecting prohibited process launches; web proxy log analysis via Splunk; automated compliance scorecard emailed to CISO monthly |
| 41 | Validate Authentication Policy Silo Enforcement (No Audit Mode Remnants) (EAM) | Ensure all Authentication Policy Silos transitioned from audit to enforce mode and validate no accounts bypass silo restrictions | Run Get-ADAuthenticationPolicy and Get-ADAuthenticationPolicySilo for all defined policies. Verify: (1) Enforced property = $true for all silos (not Audit), (2) Event ID 4768 (Kerberos TGT) includes silo claim in all Tier 0 account tickets, (3) Event ID 4769 denials occur when cross-silo access attempted (expected behavior), (4) No accounts excluded from silo assignment via UserAccountControl flags or OU placement outside silo scope. | 100% silos in "Enforced" mode (not Audit); silo claim correctly applied to all TGTs; expected 4769 denials occur for cross-tier access attempts. Silo still in audit mode >90 days post-deployment = mandatory enforcement or documented risk acceptance required. Enforcement disabled without approval = immediate P1 incident (potential attacker disabling control). | PowerShell script querying silo Enforced property; SIEM rule alerting on 4768 TGTs missing expected silo claim; Azure DevOps pipeline validating silo configuration against infrastructure-as-code baseline; monthly compliance report sent to security architecture team |
| 42 | Review MITRE ATT&CK Coverage for EAM-Relevant TTPs | Map SIEM detection rules to MITRE ATT&CK techniques targeting privileged access (T1558.003 Golden Ticket, T1021 Remote Services, T1078 Valid Accounts, T1550 Token Theft) and validate ≥90% coverage | Export all active SIEM detection rules (Sentinel Analytics Rules, Splunk Saved Searches). Map each rule to MITRE ATT&CK technique using ATT&CK Navigator. Calculate coverage percentage for Tier 0-relevant techniques: T1558.003 (Kerberoasting/Golden Ticket), T1021 (RDP/WinRM lateral movement), T1078 (credential abuse), T1550 (pass-the-hash), T1555 (credential dumping). Test alerts quarterly by simulating attacks in lab (purple team testing) and measure false positive rate (target <20%). | ≥90% coverage for Tier 0 attack techniques; alerts tested and tuned quarterly with <20% false positive rate. <80% coverage = detection engineering sprint required to close gaps. False positives >20% = alert tuning + baseline refinement (reduces analyst fatigue and missed true positives). Coverage validated via purple team simulations ensuring detections trigger correctly. | MITRE ATT&CK Navigator integration with SIEM; automated coverage report via PowerShell querying Sentinel Analytics API; purple team attack simulation framework (Atomic Red Team, Caldera) with automated detection verification; quarterly detection rule review via ServiceNow workflow |
Semi-Annual Tasks
| # | Name | Description | Task | Impact Definition | Automated |
|---|---|---|---|---|---|
| 1 | AD DB Consistency Check | Verify the consistency and integrity of the AD Database. | Use NTDSUTIL to verify AD DB syntax, Consistency and Integrity | Can't be automated. Manual process has to be executed. Definition of the process on "AD NTDS Database maintenance" document stored on the EMEA Infrastructure SharePoint. | |
| 2 | Change KRBTGT password | Change password of the Trust Anchor account in order to protect against Kerberos Golden tickets https://blogs.microsoft.com/microsoftsecure/2015/02/11/krbtgt-account-password-reset-scripts-now-available-for-customers/ | Change twice the KRBTGT password | Script provided by MS | |
| 3 | SidHistory | This attribute should only be populated during the migration of objects. Once the migration is completed, this attribute must be empty to prevent token bloat and security risks. | Query all objects with SidHistory: Get-ADUser -Filter * -Properties SidHistory | Where {$_.SidHistory}. Verify migration completed and no ongoing migration activities. Clear SidHistory attribute using Set-ADUser -Clear SidHistory or ADSI. Coordinate with migration team before removal. | Token size reduced improving authentication performance. Security risk mitigated (SidHistory can be exploited for privilege escalation). Access should be validated after removal as SidHistory may have granted necessary permissions during migration transition. | Requires coordination with migration team - can be automated after migration completion verified |
| 4 | Topology change validation | Run validation after any major site, subnet, or link change. | Prevents unexpected replication or authentication issues. | No | |
| 5 | Incident response | Investigate and resolve replication or authentication failures as they arise. | Minimizes downtime and user impact. | No | |
| 6 | Review accounts with PNE flag | Some accounts are legitimate to have the PasswordNeverExpires flag. Review the exceptions for consistency and re-approve the exception to maintain password policy compliance. | Same process as quarterly review but with additional validation. Review approval documentation for each exception. Verify accounts still require exception (job role unchanged). Check if gMSA or other alternatives can replace static passwords. Update exception approval dates. | Exceptions remain validated and approved. Password policy compliance maintained long-term. Opportunities identified to eliminate exceptions through gMSA adoption. Re-approval ensures ongoing business need. No impact to validated service accounts. | For Service Accounts, this is automated with EguibarIT Housekeeping module |
| 7 | Tombstone Lifetime Verification | Verify tombstone lifetime (TSL) is properly configured and backups are within TSL window to prevent lingering object issues after restore. | Check current TSL: Get-ADObject "CN=Directory Service,CN=Windows NT,CN=Services,CN=Configuration,$((Get-ADDomain).DistinguishedName)" -Properties tombstoneLifetime. Default is 180 days (if attribute not set). Verify all backups are newer than TSL. Document TSL for DR planning. Ensure backup retention policy accounts for TSL. | Lingering object prevention ensured. Backup validity window confirmed. DR procedures aligned with TSL constraints. Restore operations within supported timeframe. Restoring backup older than TSL creates lingering objects causing replication issues. | Manual verification - critical for backup policy validation |
| 8 | AD Partition Health Check | Verify health of all Active Directory partitions (Domain, Configuration, Schema, Application partitions) and their replication status. | Enumerate all partitions: Get-ADRootDSE | Select namingContexts. For each partition, verify replication: repadmin /showrepl. Check partition metadata: dcdiag /test:crossrefvalidation. Review application partition health (DNS, DFS if applicable). Verify all DCs hold expected partition replicas. | All AD partitions healthy and replicating. Application partition issues identified (DNS, DFS). Partition metadata consistent. Forest-wide data integrity confirmed. Partition replication failures cause service outages (DNS, DFS). | Can be automated with PowerShell - Test-ADPartitionHealth |
| 9 | Offline AD Database Defragmentation and Compaction | Perform offline defragmentation of NTDS.dit to reclaim white space, reduce database size, and improve performance when white space exceeds thresholds. | Schedule DC maintenance window. Boot DC to Directory Services Restore Mode (DSRM). Use ntdsutil: "activate instance ntds" > "files" > "compact to c:\temp". Review compaction results (space reclaimed). Replace original ntds.dit with compacted version. Delete log files. Reboot to normal mode. Verify AD functionality. Only perform if white space >30% or performance issues observed. | Database size reduced (can reclaim 10-40% depending on white space). Performance improved through defragmentation. Storage space reclaimed. Backup and restore times reduced. Requires DC downtime during compaction (plan for multiple hours on large databases). | Manual process requiring downtime - schedule during maintenance window with RFC approval |
| 10 | DNS Aging and Scavenging Effectiveness Review | Review DNS aging and scavenging configuration effectiveness to ensure stale resource records are being removed as intended without impacting valid records. | Review scavenging settings: Get-DnsServerScavenging. Check no-refresh interval (7 days) and refresh interval (7 days). Verify scavenging enabled on zones: Get-DnsServerZone | Where Scavenging -eq $true. Review scavenging logs (Event ID 2501-2505). Compare zone record counts over time. Identify stale record accumulation. Verify dynamic updates timestamp updating correctly. | DNS zone hygiene maintained. Stale records removed automatically. DNS database size controlled. Zone transfer and replication efficiency maintained. Query performance optimized. Ineffective scavenging allows unlimited stale record growth. | Configuration automated - effectiveness review requires manual analysis |
| 11 | DHCP Database Defragmentation | Compact DHCP database to reclaim space and improve performance after high lease turnover periods | Schedule maintenance window. Stop DHCP service using Stop-Service DHCPServer. Navigate to DHCP database directory (typically C:\Windows\System32\DHCP). Run jetpack.exe dhcp.mdb temp.mdb to compact database. Verify successful compaction. Restart DHCP service using Start-Service DHCPServer. Test client lease acquisition. | Database fragmentation reduced improving query performance. File size bloat eliminated. Disk space reclaimed. Backup and restore times improved. Requires brief DHCP service downtime during compaction. | ○ Partial - command scriptable but requires maintenance window and service interruption |
| 12 | Review DHCP Option Configurations | Audit server-level, scope-level, and reservation-level DHCP options for accuracy and consistency across infrastructure | Export all DHCP options using Get-DhcpServerv4OptionValue for server level, Get-DhcpServerv4Scope for scope options, and Get-DhcpServerv4Reservation for reservation options. Review against standards document. Verify DNS servers (option 6), routers (option 3), domain name (option 15), WINS servers if applicable. Check for conflicting option inheritance. | Incorrect options cause DNS resolution failures, incorrect default gateways, and connectivity issues. Conflicting option inheritance creates inconsistent client configuration. Standardized options ensure predictable network behavior. | ○ No - configuration review requires architectural knowledge and standards compliance validation |
| 13 | Comprehensive DHCP Failover Testing | Test full failover scenarios including partner down, replication recovery, and manual intervention procedures | Schedule comprehensive test window. Test scenarios: normal failover (service stop), partner communication failure (firewall block), database corruption recovery, manual failover intervention. Measure client impact during each scenario. Verify replication resumes after recovery. Document actual vs expected behavior. Update runbooks with findings. | Incomplete testing leaves failure modes undiscovered until production incidents. Edge cases identified and documented. Recovery procedures validated under various failure conditions. Confidence in HA design verified through comprehensive testing. | ○ No - requires planning, execution, and validation across multiple failure scenarios |
| 14 | Review DHCP Lease Duration Settings | Validate lease durations are appropriate for environment type and network characteristics | Query all scope lease durations using Get-DhcpServerv4Scope | Select ScopeId, LeaseDuration. Review against policy: workstations (8 days typical), guest networks (shorter - 4-8 hours), IoT/servers (longer or reservation). Identify scopes with non-standard durations. Document business justification for exceptions. Adjust as needed. | Improper lease durations cause excessive DHCP renewal traffic (too short) or stale lease accumulation (too long). Network performance impacted by renewal storms. Address exhaustion from abandoned long leases. | ○ No - policy decision requires architectural review and business context |
| 15 | Audit DHCP Reservation Usage and Cleanup | Review all DHCP reservations and remove obsolete entries for decommissioned devices to reclaim IP space | Export all reservations using Get-DhcpServerv4Reservation | Export-Csv. Test connectivity to reserved addresses using Test-Connection. Cross-reference with CMDB for device status. Identify reservations for decommissioned systems (failed connectivity + not in CMDB). Document findings. Remove obsolete reservations after approval. Archive removed reservations for reference. | Reservation sprawl creates IP address waste. Obsolete reservations cause confusion during troubleshooting. Address space efficiency maintained. Documentation accuracy improved enabling faster problem resolution. | ○ Partial - detection of inactive reservations scriptable, cleanup requires approval process |
| 16 | Comprehensive IPv6 Security Assessment | Conduct thorough security audit of entire IPv6 implementation including firewall rules, ACLs, monitoring coverage, and incident response procedures | Perform penetration testing targeting IPv6 stack (rogue RA, ND spoofing, ICMPv6 attacks). Verify firewall rules provide equivalent protection for IPv4 and IPv6. Test IDS/IPS detection of IPv6 attacks. Review SIEM dashboards for IPv6 visibility. Validate incident response runbooks cover IPv6 scenarios. Audit transition mechanism status (all should be disabled). Document findings and remediate gaps. | IPv6 security posture validated. Blind spots eliminated through comprehensive testing. Incident response readiness verified. Compliance with security frameworks (NIST, CIS Benchmarks) demonstrated. Audit evidence collected. | × Manual - requires security assessment team, penetration testing tools, human analysis |
| 17 | DFS replication schedule review | Evaluate if current replication schedules still align with business operations and WAN utilization patterns. | Review replication windows, assess if business hours changed, analyze WAN bandwidth utilization trends. Adjust schedules if needed. | Moderate — ensures replication doesn't impact business operations and maximizes WAN efficiency. | Manual (requires business context) |
| 18 | DFS security and permissions audit | Comprehensive review of DFS share permissions, NTFS permissions, and administrative access to DFS infrastructure. | Audit DFS share permissions, verify NTFS permissions alignment, review membership of DFS Administrators group, validate least-privilege access. | High — prevents unauthorized data access and ensures security compliance. | Partial (permission reporting automated) |
| 19 | DFS documentation and runbook update | Update DFS architecture diagrams, namespace structure, replication topology, and operational runbooks. | Review documentation for accuracy, update topology diagrams, verify runbook procedures, commit changes to version control. | Moderate — ensures accurate documentation for troubleshooting and knowledge transfer. | Manual (documentation requires human review) |
| 20 | Hyper-V Security Baseline Audit | Compare host config vs CIS/Microsoft baseline | Run baseline tools; remediate deviations. | Prevents drift weakening host hardening. | Partial |
| 21 | Network Configuration Consistency | Validate vSwitch / VLAN / QoS uniformity | Export configs; diff across nodes. | Removes drift causing LM or failover anomalies. | Yes |
| 22 | Documentation Currency | Update cluster diagrams & runbooks | Refresh topology, dependency & escalation data. | Improves incident response speed (lower MTTR). | No |
| 23 | Naming Convention Audit | Ensure all GPOs follow prefix + Tier pattern. | List & flag deviations; schedule rename. | Preserves clarity & troubleshooting speed. | Yes (Script) |
| 24 | GPO Count KPI Review | Assess total & exception counts vs targets. | Generate metrics; propose reductions. | Prevents sprawl re-emergence. | Partial (Script + Analysis) |
| 25 | WMI Filter Relevance | Remove obsolete OS/hardware filters. | Enumerate; retire stale predicates. | Improves performance & reduces complexity. | Yes (Script) |
| 26 | Cross-Tier Contamination Review | Ensure Tier0 baselines not linked outside Tier0. | Enumerate links; remediate anomalies. | Maintains privileged boundary. | Yes (Script) |
| 27 | Legacy Preference Migration | Identify legacy Preferences to modernize. | Map to ADMX or MDM CSP replacements. | Reduces technical debt. | Manual (Planning) |
Annual Tasks
| # | Name | Description | Task | Impact Definition | Automated |
|---|---|---|---|---|---|
| 1 | Trust Relationship Audit | Review if any given trust relationship is still valid and if it should continue. Check for the correct configuration of the trust and ensure it meets current security requirements. | Enumerate all trusts: Get-ADTrust -Filter *. For each trust, verify business justification still exists. Check trust properties: direction (one-way vs. two-way), type (forest vs. external), authentication scope (forest-wide vs. selective). Test trust: netdom trust /verify. Review SID filtering status. Validate trust authentication levels. Document findings and remove obsolete trusts. | Obsolete trusts removed reducing attack surface. Trust configurations validated against security baseline. SID filtering properly configured preventing privilege escalation across boundaries. Trust removal will break cross-domain/forest access - requires thorough validation before removal. | Manual review with stakeholder coordination - trust changes require RFC and business approval |
| 2 | Review Groups with no members | Empty groups are considered orphaned objects. Delete those groups if not used to maintain directory hygiene and reduce clutter. | Query all groups: Get-ADGroup -Filter * -Properties Members | Where {-not $_.Members}. Filter out built-in groups and groups used for delegation even when empty. Verify group not actively used for permissions (check ACLs on resources). Generate list for review. Delete confirmed orphaned groups after approval. | Directory clutter reduced. Group enumeration performance improved. Administrative overhead decreased. If group is actually used (e.g., in ACLs or GPO security filtering even when empty), deletion will remove that permission structure. | Requires an RFC |
| 3 | Review ownership of important AD objects | Ownership attribute grants modification rights over the object. Any sensitive asset of the domain MUST be owned by Domain Admins to prevent unauthorized modification. | Query ownership of critical objects: Domain Controllers OU, Sites, Subnets, AdminSDHolder, default domain policy. Use Get-Acl and check Owner property. For DCs: Get-ADComputer -Filter {PrimaryGroupID -eq 516} | Get-Acl. Reset ownership to Domain Admins using Set-Acl or ADSI if incorrect. Document any exceptions. | Ownership attack vector eliminated. Sensitive infrastructure protected from unauthorized modification. Object ownership properly controlled per security policy. Ownership change may affect delegated administration if custom ownership was intentional - validate before changing. | Can be partially automated with PowerShell - Test-CriticalObjectOwnership and Set-ADObjectOwner |
| 4 | AdminSdHolder permissions | Verify that permissions have not been changed. Apart from default permissions, there might be additional legitimate permissions as described in the Delegation Model. Amend if necessary. | Export AdminSDHolder ACL: (Get-ADObject "CN=AdminSDHolder,CN=System,$((Get-ADDomain).DistinguishedName)").GetAccessControl(). Compare against documented baseline. Verify default permissions intact (Domain Admins, Enterprise Admins full control). Check for unauthorized additions. Validate legitimate delegation model exceptions. Remove unauthorized ACEs. Trigger SDProp propagation: Run "Protect admin accounts and groups task" or wait for automatic 60-minute cycle. | AdminSDHolder integrity maintained. Unauthorized changes to privileged account permissions prevented. Delegation model properly enforced. Permission changes may affect legitimate delegation - validate carefully before modification. | Manual review against delegation documentation - PowerShell assisted with Compare-AdminSDHolderACL |
| 5 | Disaster Recovery Full Forest Recovery Test | Perform complete forest recovery drill to validate documented procedures, identify gaps, and ensure team readiness for catastrophic AD failure. | Build isolated test environment. Simulate forest-level failure. Execute documented forest recovery procedures: restore first DC in forest from system state backup, clean up metadata for other DCs, rebuild forest infrastructure, restore remaining DCs or rebuild. Validate all services (DNS, SYSVOL replication, authentication, GPO application). Document time to recovery (RTO actual). Update procedures based on lessons learned. | Forest recovery procedures validated and documented. Team trained on critical recovery processes. Recovery time objectives (RTO) known. Gaps in documentation identified and corrected. Confidence in disaster recovery capability. Untested recovery procedures often fail during actual disasters. | Requires significant lab environment and planning - annual drill with change management approval |
| 6 | Active Directory Security Baseline Audit | Comprehensive audit of AD security configuration against industry best practices (CIS Benchmarks, Microsoft Security Baselines, DISA STIG) to identify security gaps. | Run security baseline assessment tools: Microsoft Security Compliance Toolkit, CIS-CAT, PowerStig. Review GPO settings against benchmarks. Audit privileged group memberships. Verify authentication protocols (disable NTLM where possible). Check legacy protocol usage (SMBv1, LM hashes). Review delegation model implementation. Assess Tier model compliance. Document findings and remediation plan. | Security posture validated against industry standards. Compliance gaps identified. Remediation priorities established. Regulatory compliance demonstrated (PCI-DSS, HIPAA, SOX). Security drift prevented through annual revalidation. Without baseline audit, security degradation goes undetected. | Manual review process - use automated scanning tools to accelerate assessment |
| 7 | DHCP Infrastructure Architecture Review | Comprehensive review of DHCP design, high availability strategy, capacity planning, and alignment with business needs | Review current DHCP architecture documentation. Assess high availability implementation (failover vs split-scope). Evaluate scope design against network growth. Review server placement and site topology. Analyze performance baselines and capacity trends. Compare against DHCP best practices. Identify architecture gaps and improvement opportunities. Plan multi-year roadmap. | Architecture drift leads to inefficiencies, security gaps, and scalability issues. Long-term capacity requirements identified. Business alignment validated. Technical debt documented. Investment priorities established for infrastructure improvements. | ○ No - strategic review requires architect involvement and business context |
| 8 | DHCP Security Audit | Full security assessment including permissions, audit logging, network segmentation, and Tier 0 controls | Perform comprehensive security review using CIS benchmarks, Microsoft baselines, and internal security standards. Audit DHCP Administrators group membership and permission delegation. Verify audit logging configuration and SIEM integration. Review network segmentation and DHCP server placement. Validate Tier 0 classification and controls. Assess rogue DHCP detection capabilities. Document findings and create remediation plan. | Unidentified security weaknesses accumulate into exploitable attack vectors. Compliance violations prevented through systematic assessment. Security controls validated. Audit findings addressed proactively. Regulatory compliance demonstrated (SOX, PCI-DSS, HIPAA). | ○ No - security expertise required for comprehensive assessment |
| 9 | DHCP Disaster Recovery Plan Update | Update DHCP disaster recovery runbooks based on infrastructure changes and lessons learned from drills | Review current DR documentation and runbooks. Incorporate lessons learned from quarterly DR drills. Update procedures for infrastructure changes (new servers, scope additions, configuration changes). Validate contact information and escalation procedures. Document recovery dependencies (DNS, AD, network). Test runbook procedures for accuracy. Publish updated documentation and train team. | Outdated runbooks fail during actual disasters when procedures no longer match reality. Recovery time objectives (RTO) affected by inaccurate documentation. Team confusion during crisis prevented through current procedures. Infrastructure changes incorporated ensuring runbook accuracy. | ○ No - documentation authoring and validation requires human expertise |
| 10 | DHCP Performance Benchmark | Establish annual performance baselines for lease assignment latency, database query times, and failover replication lag | Collect comprehensive performance metrics over representative period. Measure lease assignment latency (DORA process timing). Analyze database query response times. Monitor failover replication lag. Document scope utilization patterns. Establish performance baselines for future quarterly comparisons. Generate annual performance report. Identify performance trends requiring attention. | Without baselines, performance degradation goes unnoticed until becoming critical. Annual benchmarking enables year-over-year performance comparison. Capacity planning informed by performance data. Performance trends identified for proactive optimization. | Can be automated - metrics collection automated, baseline establishment and analysis requires human review |
| 11 | Review DHCP Administrator Training and Knowledge Transfer | Assess staff knowledge of DHCP operations and provide training on new features, procedures, or tools | Review operational procedures with team. Assess knowledge gaps through interviews or practical exercises. Provide training on DHCP best practices, troubleshooting techniques, and new Windows Server features. Document tribal knowledge from senior administrators. Conduct knowledge transfer sessions. Update operational documentation based on team feedback. Cross-train team members to reduce single points of failure. | Knowledge gaps lead to operational mistakes and longer incident resolution times. Single points of failure eliminated through cross-training. Tribal knowledge documented preventing knowledge loss from staff turnover. Team capability improved enabling more efficient operations. | ○ No - training and knowledge transfer require human interaction and adult learning principles |
| 12 | Seize FSMO role | Seize a role when the original owner will not be recovered. | Follow emergency runbook: verify backups, seize via NTDSUTIL/PowerShell, document actions. | Critical — restores functionality when role owner is permanently lost. | No (manual) |
| 13 | Transfer FSMO roles during decommission | Move FSMO roles off a DC that is being retired or replaced. | Perform graceful transfer and validate replication; update inventory. | High — prevents accidental outages during planned changes. | Partial (scripted transfer) |
| 14 | Schema change coordination | Planned schema updates require availability of the Schema Master owner. | Schedule maintenance with Schema Master owner, take backups, perform change, validate. | High — schema changes are forest-wide and potentially irreversible. | No (manual/planned) |
| 15 | FSMO Emergency patching | Apply urgent security patches to FSMO DCs outside normal windows. | Follow emergency change control; ensure backups and rollbacks are available. | High — patching can cause reboots or service interruptions. | Partial |
| 16 | FSMO Respond to split-brain or replication corruption | Investigate and remediate AD inconsistencies after catastrophic events. | Invoke advanced recovery procedures per runbook and coordinate across teams. | Critical — prevents data loss and inconsistent authentication behavior. | No |
| 17 | DFS replication and Windows patching coordination | Plan DFS replication member patching to minimize service disruption and prevent prolonged replication outages. | Stagger patching of replication partners (never patch hub and all spokes simultaneously), verify replication health before/after patching, document maintenance windows. | High — prevents cascading replication failures and data synchronization gaps during patching cycles. | Manual (requires change management coordination) |
| 18 | DFS architecture review | Comprehensive evaluation of DFS Namespace structure, replication topology, and alignment with organizational changes (mergers, branch closures, datacenter migrations). | Review namespace hierarchy, assess if folder targets reflect current infrastructure, evaluate replication topology efficiency, recommend consolidations or redesigns. | Moderate — ensures DFS architecture evolves with business needs and prevents technical debt accumulation. | Manual (requires strategic planning) |
| 19 | DFS operations training and knowledge transfer | Conduct annual training for IT staff on DFS troubleshooting, replication health monitoring, and disaster recovery procedures. | Schedule training sessions, review runbooks with team, practice DR scenarios in lab environment, update documentation based on lessons learned. | Moderate — ensures team readiness and reduces mean time to resolution (MTTR) during incidents. | Manual (training requires human instruction) |
| 20 | Virtualization Architecture Review | Strategic assessment vs business roadmap | Evaluate scaling, HCI adoption, DR posture, cost efficiency. | Identifies modernization & optimization opportunities. | No |
| 21 | Full DR Hyper-V Exercise | Execute replica failover & recovery timing | Run planned drill; measure achieved RPO/RTO. | Validates disaster readiness & compliance. | No |
| 22 | Operations Team Skills Refresh | Training on new Hyper-V/cluster features | Workshops: LM perf, CAU tuning, RDMA troubleshooting. | Reduces single-point knowledge risks. | No |
| 23 | Baseline Rationalization Workshop | Stakeholder review of all baselines & exceptions. | Conduct workshop; record decisions. | Keeps architecture intentionally minimal. | Manual (Workshop) |
| 24 | AGPM Role Recertification | Validate Approver/Editor/Reviewer memberships. | Export roles; confirm justification. | Prevents privilege creep undermining control. | Partial (Script + Manual) |
| 25 | Full DR Simulation | Restore all Tier0 baselines in isolated lab. | Automate import; validate checksums. | Assures end-to-end recoverability. | Partial (Script) |
| 26 | Decision Matrix Refresh | Update exception criteria with new platforms. | Review emerging tech & security patterns. | Maintains governance clarity. | Manual (Review) |
| 27 | External Security Validation | Independent assessment of baseline effectiveness. | Engage audit; remediate findings. | Provides external assurance & compliance evidence. | Manual (External Audit) |
As-Needed Tasks
| # | Name | Description | Task | Impact Definition | Automated |
|---|---|---|---|---|---|
| 1 | Group Membership Change - Critical Groups | Monitor and investigate immediate membership changes to critical privileged groups requiring instant security review. | Monitor Group Membership change on SchemaAdmin, EnterpriseAdmin, DomainAdmin, Server Operators, Backup Operators, Account Operators, Print Operators, Replicators and KRBTGT groups. Alert on Event ID 4728, 4732, 4756 (member added to security-enabled group). Immediate investigation of unauthorized additions. Validate with change management records. | Unauthorized privileged access detected immediately. Security incident response triggered. Forensics capability maintained through real-time alerts. Compromise contained before lateral movement. Critical for Tier 0 breach detection. | - integrate with SIEM for real-time alerting |
| 2 | Post-Breach Password Reset Campaign | Execute mass password reset for all accounts after confirmed credential compromise or security breach. | Identify scope of compromise. Prioritize Tier 0 accounts first (immediate reset). Force password change for affected users: Set-ADUser -ChangePasswordAtLogon $true. Reset service account passwords (coordinate with application teams). Reset KRBTGT twice (see semi-annual task). Invalidate existing Kerberos tickets. Monitor for authentication failures. | Compromised credentials invalidated. Attacker access terminated. Golden ticket attacks mitigated by KRBTGT reset. Service disruption minimized through coordination. Mass password reset causes temporary authentication issues - plan carefully. | Requires incident response coordination - use prepared PowerShell scripts for rapid execution |
| 3 | Emergency Privilege Elevation Review | Review and revoke temporary privileged access granted during emergencies or incidents after resolution. | Query recently added privileged group members with timestamp filter. Cross-reference with emergency access requests and tickets. Verify incident closure. Remove temporary access: Remove-ADGroupMember. Disable break-glass accounts if activated. Audit emergency access usage for compliance. | Temporary privileged access revoked preventing privilege creep. Break-glass accounts secured. Emergency access audit trail complete. Compliance maintained for privileged access reviews. Forgotten emergency access creates ongoing security risk. | Manual review coordinated with incident management - track via ticketing system |
| 4 | Schema Modification Review | Review and validate schema changes after extension (Exchange, ConfigMgr, applications) to ensure no unauthorized or malicious modifications. | Query schema version: Get-ADObject "CN=Schema,CN=Configuration,$((Get-ADDomain).DistinguishedName)" -Properties objectVersion. Review schema modification history in Event Viewer (Event ID 1565). Document legitimate schema extensions. Identify unauthorized schema changes. Verify schema replication to all DCs. Backup schema partition before and after changes. | Schema integrity maintained. Unauthorized schema extensions detected. Application deployment schema requirements validated. Schema corruption identified early. Schema changes are forest-wide and permanent - cannot be easily reversed. | Requires Schema Admin coordination - manual review after any schema extension activity |
| 5 | FSMO Role Seizure Execution | Forcibly seize FSMO roles when role holder permanently failed and cannot be brought back online (disaster recovery scenario). | Verify original role holder truly offline and unrecoverable. Document business justification. Identify target DC for role seizure. Use ntdsutil: "seize [role]" for each affected role. Clean up metadata of failed DC. Verify role transfer successful: netdom query fsmo. Document seizure in change management. Monitor for replication issues post-seizure. | FSMO roles operational on surviving DCs. Critical AD operations resume (password changes, schema mods, RID allocation). Service restored after DC failure. Metadata cleanup prevents replication conflicts. Improper seizure can cause duplicate FSMO roles and corruption. | Emergency procedure - Requires RFC and senior AD administrator approval due to risk |
| 6 | Rapid Malware/Ransomware Response | Execute AD-specific containment and recovery procedures when malware or ransomware detected targeting domain infrastructure. | Isolate affected DCs from network immediately. Disable compromised accounts (especially admin accounts). Reset krbtgt twice to invalidate golden tickets. Change all service account passwords. Review recent AD changes (deleted OUs, modified groups). Restore from clean backup if necessary. Perform authoritative restore for targeted OUs. Scan all DCs for malware before reintegration. | Malware spread contained. AD infrastructure protected. Attacker persistence mechanisms eliminated. Ransomware encryption prevented on domain controllers. Clean recovery possible from backups. Rapid response prevents complete domain compromise. | Emergency IR procedure - coordinate with security team, requires tested runbooks |
| 7 | Domain Controller Failure or Crash Recovery | Respond to unexpected DC crash or hardware failure with proper recovery, metadata cleanup, and service restoration procedures. | Assess DC failure: hardware failure vs. software corruption. If hardware replacement: perform metadata cleanup on surviving DCs (ntdsutil metadata cleanup), remove DNS records, clean up SYSVOL references. Transfer/seize FSMO roles if DC held any. Promote replacement DC or restore from backup. If restoring: boot to DSRM, restore system state, verify AD database integrity, reboot to normal mode. Force replication from partners. Validate all AD services operational. | DC service restored with minimal downtime. Orphaned metadata cleaned preventing replication errors. FSMO roles redistributed if necessary. Replication topology updated automatically by KCC. Directory data loss prevented through backup restore. Improper recovery leaves orphaned metadata causing ongoing issues. | Emergency procedure - requires senior AD administrator and may need vendor engagement for hardware |
| 8 | Emergency Active Directory Rollback (Authoritative Restore) | Perform authoritative restore to roll back unintended or malicious AD changes (mass deletion, permission changes, group modifications) affecting critical objects. | Identify scope of damage: deleted objects, modified attributes, timeframe. Take DC offline and boot to DSRM. Restore system state backup from before incident. Use ntdsutil authoritative restore: "activate instance ntds" > "authoritative restore" > restore specific subtree or objects. Increment object version numbers to force replication. Reboot DC. Monitor authoritative replication to all DCs. Verify restored objects replicated correctly. | Malicious or accidental changes reversed. Deleted objects recovered with full attributes. Damage contained to specific scope. Directory rolled back to known-good state. Service restored. Authoritative restore requires recent backup and causes temporary inconsistency during replication. | Emergency procedure - requires RFC approval, extensive coordination, and tested procedures |
| 9 | DFS-R database corruption recovery (Event 2212) | When Event ID 2212 appears indicating DFS-R database corruption, immediate recovery action required to restore replication. | Stop DFSR service, delete corrupted database (C:\System Volume Information\DFSR\), restart service to rebuild database. Use Repair-DfsReplication.ps1 script for automated recovery. Monitor Event ID 4112 (database initialized successfully). | Critical — database corruption halts replication entirely. Recovery typically completes in 15-30 minutes. Data loss risk minimal (existing files preserved). | Partial (detection automated, recovery requires approval) |
| 10 | DFS-R stuck replication (persistent high backlog) | Replication backlog exceeds 1,000 files and persists for >24 hours despite normal service operation. | Check Event IDs 4012 (staging quota), 4302 (dirty shutdown), 6002 (communication failure). Verify network connectivity between partners. Check staging quota utilization. Force sync using Sync-DfsReplication.ps1. If unsuccessful, consider re-initializing replication using "dfsradmin membership set /rgname:GroupName /rfname:FolderName /memname:ServerName /isprimary:true". | High — prolonged replication delays cause data inconsistency. Users may access stale data. Recovery can take hours to days depending on backlog size. | Partial (detection automated, resolution varies) |
| 11 | Add new DFS replication member (branch office opening, DR site) | New server needs to join existing DFS replication group (typically due to site expansion or disaster recovery preparation). | Use New-DfsReplicationGroup.ps1 script with -AddMember parameter, or manually add member using Add-DfsrMember, configure connection using Add-DfsrConnection, set membership using Set-DfsrMembership. Perform initial sync (prestage data or full replication). Monitor backlog until new member fully synchronized. | Moderate — initial sync can consume significant WAN bandwidth. Recommend prestaging data via backup/restore to minimize network impact. | Partial (membership configuration automated, initial sync requires planning) |
| 12 | Investigate DFS stale data reports (users accessing outdated files) | User reports indicate they're accessing outdated file versions, suggesting replication lag or namespace referral issues. | Check replication backlog for affected folder: Get-DfsrBacklogReport.ps1 -SourceServer ServerA -DestinationServer ServerB. Verify namespace referrals: dfsutil.exe client referral \\domain.com\namespace\folder. Check file timestamps across all targets. If one target significantly behind, investigate specific replication path (network, staging quota, Event Logs). | Moderate — stale data can cause business process errors (e.g., editing outdated documents). Typically indicates localized replication issue rather than systemic failure. | Manual (requires user context and investigation) |
| 13 | DFS replication schedule change (bandwidth constraint, business hours shift) | WAN bandwidth constraints require adjusting replication schedules, or business operations change necessitate different replication windows. | Review current schedules: Get-DfsrConnection | Select-Object GroupName, SourceComputerName, DestinationComputerName, Schedule. Modify schedules using Set-DfsrConnection -Schedule parameter with ScheduleType (e.g., UseGroupSchedule, Always, Never, Custom). Test new schedule and monitor for replication delays. | Low — schedule changes don't disrupt replication immediately, but poorly planned schedules can cause prolonged data staleness. | Manual (requires business context and change management) |
| 23 | Zero-Day Hardening Overlay | Rapid deployment of emergent mitigation settings. | Create exception GPO; set sunset date. | Reduces exposure window during exploitation. | Manual (Expedited) |
| 24 | Rollback Misapplied Policy | Undo accidental or harmful baseline change. | Restore last good via AGPM / backup. | Minimizes outage & instability. | Partial (AGPM + Script) |
| 25 | New OS Baseline Introduction | Onboard baseline for new Windows version. | Clone template; adapt ADMX; stage approval. | Ensures secure posture at adoption. | Manual (Initial Build) |
| 26 | Legacy Preference Modernization | Migrate legacy Preferences to ADMX / MDM CSP. | Inventory & map replacements. | Reduces technical debt. | Manual (Project) |
| 27 | Decommission Test / Pilot GPO | Remove obsolete pilot GPOs post rollout. | Confirm no links; archive & delete. | Prevents clutter & accidental reuse. | Partial (Script) |
Implementation and Best Practices
Successful Active Directory housekeeping requires a structured approach and organizational commitment:
Getting Started
- Prioritize Critical Tasks: Begin with daily and weekly tasks that address immediate security risks (privileged group audits, replication monitoring, stale account management)
- Establish Baseline: Document current state before implementing cleanup tasks to understand scope and measure progress
- Start Small: Implement one frequency category at a time rather than attempting all tasks simultaneously
- Validate Changes: Always test procedures in non-production before applying to production Active Directory
Automation Strategy
Leverage automation to maintain consistency and reduce administrative burden:
- EguibarIT PowerShell Modules: Utilize the Delegation, Housekeeping, and Privileged Account Management modules for common housekeeping tasks
- Scheduled Tasks: Configure Windows Task Scheduler or Azure Automation to run scripts on defined schedules
- Alerting: Integrate with monitoring systems (SIEM, email, Teams) for automated notifications of issues requiring attention
- Reporting: Generate regular reports showing housekeeping metrics and trends over time
Change Management Integration
Many housekeeping tasks have operational impact and require proper change control:
- RFC Process: Tasks marked "Requires an RFC" must follow formal change approval workflow
- Communication: Notify stakeholders before running tasks that may affect users or systems
- Rollback Plans: Maintain backups and rollback procedures for destructive operations
- Documentation: Keep detailed records of housekeeping activities for audit and troubleshooting
Measuring Success
Track key metrics to demonstrate the value of your housekeeping program:
- Number of stale accounts removed per period
- Reduction in replication errors and directory inconsistencies
- Percentage of privileged accounts properly segregated
- Time to resolve security audit findings
- Compliance scores for regulatory requirements (SOX, PCI-DSS, etc.)
- Reduction in AD-related helpdesk tickets
Related Resources
For implementation assistance and automation tools, explore these related topics:
- EguibarIT PowerShell Module - Comprehensive delegation and housekeeping automation
- Housekeeping Module - Specialized scripts for AD maintenance tasks
- Privileged Account Hygiene - Tier 0 account management best practices
- Delegation Model - Framework for proper privilege segregation
- Tier Model - Enterprise administration tier architecture
Key Takeaway
Active Directory housekeeping is not optional—it's a critical security and operational practice. Implementing a structured housekeeping program reduces risk, improves performance, ensures compliance, and maintains the long-term health of your directory services infrastructure. Start today with high-priority tasks and expand your program iteratively.