Microsoft Sentinel data lake direct ingestion guidance

You can configure log sources to ingest directly into the data lake tier without mirroring to the analytics tier. Direct ingestion is useful when you have high-volume log sources that you want to retain for hunting and forensic purposes but don't need for real-time alerting.

By ingesting these sources directly into the data lake tier, you avoid analytics tier ingestion costs while still making the data available for KQL queries, Spark notebooks, and long-term analysis. This article helps you determine which log sources are good candidates for direct data lake ingestion based on their value for detection, hunting, and investigation workloads.

Configure direct data lake ingestion from the connector setup pages or the Table management page in the Microsoft Defender portal. For more information, see Configure table settings in Microsoft Sentinel.

Which logs should you ingest into the data lake?

After you onboard to Microsoft Sentinel data lake, you can choose which log sources to send to the data lake tier, the analytics tier, or both. The analytics tier is optimized for real-time detection and alerting, while the data lake tier is optimized for cost-effective long-term retention and hunting. Use the following guidance to determine which log sources to ingest into each tier based on their value for different security workloads.

Analytics tier use cases

You use the analytics tier to ingest log data into Microsoft Sentinel workspaces, where you can run analytics rules, custom detections, and live queries. Ingest log sources into the analytics tier when you need:

  • Real-time detection and correlation: Alert on critical events from endpoints, identity systems, cloud security controls, and network perimeter devices.
  • Active incident investigation: Run live queries against current data during incident response.
  • High-fidelity signals: Ingest sources with direct detection value, such as EDR alerts, privileged access logs, authentication events, and threat intelligence indicators.

Data lake tier use cases

You use the data lake tier to store logs at lower cost for workloads that don't require real-time alerting. Ingest log sources into the data lake tier when you need:

  • High-volume log retention: Store sources that are useful for forensic analysis or periodic threat hunts but are too costly to retain in the analytics tier.
  • Threat hunting: Run cross-log searches, trend analysis, and historical queries to identify patterns across extended time ranges.
  • Batch analytics and summarization: Use Spark notebooks, KQL, or similar tools to enrich, correlate, or summarize data, then forward only high-priority signals to the analytics tier.
  • Machine learning and advanced analytics: Apply big data techniques to identify complex relationships and anomalies in historical data.

Analytics rules and custom detections

You can't run analytics rules or custom detections on data in the data lake tier. If you ingest logs only into the data lake tier, those logs don't generate alerts. To maintain real-time detection coverage, keep time-sensitive, high-fidelity log sources in the analytics tier.

Choose an ingestion tier by log source type

Use the following table as a general guide to decide where to ingest each log source type. Assess your own workloads, alerting requirements, and risk tolerance when you configure log ingestion. Some of the log sources have dedicated Microsoft Sentinel connectors while others may require Syslog, CEF, API-based, or custom connectors for ingestion.

Log source type Typical log volume Value for real-time threat detection and alerting Value for threat hunting Value for incident investigation and forensics Data lake only ingestion fit
AAA (TACACS/Radius) Medium High High High Poor fit
Active Directory (on-premises) High High High High Poor fit
Application Logs High Medium Medium High Suitable fit
AV Logs (Windows Events 5000s & 3rd party) Medium High High High Poor fit
Azure Activity Medium High High High Poor fit
Biometric Access System Logs Low Medium Low High Suitable fit
Building Security System Logs Low Low Low Medium Suitable fit
Call Center/VoIP Logs Medium Low Low Medium Suitable fit
CASB High High High High Suitable fit
Citrix/Horizon/ALBs Medium Medium Medium High Suitable fit
Cloud IAM Medium High High High Poor fit
Cloud PaaS High High High High Suitable fit
Cloud Security Controls Medium High Medium High Poor fit
Cloud Storage (S3, Blob, etc.) Logs High Low High High Poor fit
CRM Audit Logs Low-Medium Low Low Medium Poor fit
Database Audit Tools Medium High High High Suitable fit
DHCP Logs Medium Medium Medium High Suitable fit
DLP Alerts Low High High High Suitable fit
DNS Logs High High High High Suitable fit
Endpoint Detection and Response (EDR) (Alerts) Medium High High High Poor fit
Endpoint Detection and Response (EDR) (Raw) High High High High Suitable fit
Email Security (3rd party alerts) Medium High Medium High Poor fit
ERP Audit Logs Low-Medium Low Low Medium Suitable fit
File Integrity Low Medium Medium High Suitable fit
Firewall Threat/Malware/IPS/IDS High High High High Poor fit
Firewall Traffic Logs High High High High Suitable fit
GitHub/GitLab/Code Repo Logs Low-Medium Medium Medium High Suitable fit
Google Workspace Logs Medium Medium Medium High Suitable fit
Identity (Microsoft Entra ID, Okta, LDAP) Medium High High High Poor fit
IIS/Apache Logs Medium High High High Suitable fit
IoT Device Logs High Medium Medium Medium Suitable fit
Kubernetes/Container Logs (alerts, critical) High High High High Poor fit
Kubernetes/Container Logs (raw logs) High High High High Suitable fit
LAN/WAN Router Switch High Medium Medium Medium Suitable fit
Linux Server AuditD Medium High High High Poor fit
Mobile Device Management (Microsoft Intune) Medium Medium Medium Medium Suitable fit
Microsoft Office Logs (Teams, Office, SharePoint) Medium Medium Medium High Poor fit
Microsoft XDR Alerts (Defender: Office, Identity, Endpoint, CloudApp) Medium High High High Poor fit
Multifactor authentication (MFA) Medium High Medium High Poor fit
Netflow High Medium High Medium Suitable fit
Network Detection (Corelight, Vectra, Darktrace) High High High High Poor fit
OT/ICS System Logs Medium High High High Suitable fit
PAM (Privileged Access Management) Low High High High Poor fit
PIM (Privileged Identity Management) Low High High High Poor fit
POS System Logs High High High High Suitable fit
Proxy Logging (URL filtering) High High High High Suitable fit
Salesforce Audit Logs Medium Medium Medium High Suitable fit
SD-WAN Medium Medium Medium Medium Suitable fit
ServiceNow Audit Logs Low Low Low Medium Suitable fit
SIEM/SOAR Platform Logs Medium High High High Not recommended
Slack/Teams Collaboration Logs Medium Low Medium Medium Suitable fit
Sysmon (Endpoint, for EDR complement) Medium High High High Suitable fit
Threat Intelligence Indicators Low High High High Not recommended
VDI Logs Medium Medium Medium High Suitable fit
VPN Medium High High High Not recommended
Vulnerability Scanning Low Medium Medium Medium Suitable fit
Web Application Firewall (WAF) Logs Medium High High High Suitable fit
Windows Server Events High High High High Not recommended
XDR Source Logs (Defender: Office, Identity, Endpoint, CloudApp) Medium High High High Not recommended
Zoom Meeting Logs Low-Medium Low Low Medium Suitable fit