Skip to main content

Modern Observability for Sitecore 10.4 on AKS: Grafana, Alloy, Loki, and Prometheus


In this post, I’ll walk through how I extended the base Sitecore XM 10.4 AKS setup with a modern observability stack using Grafana, Alloy, Loki, and Prometheus. This setup provides deep insights into both infrastructure and application health, with powerful log aggregation and visualization.

Project Overview

  • Base: Sitecore XM 10.4 running on Azure Kubernetes Service (AKS)
  • Enhancements: Added a full Grafana observability stack:
  • Grafana for dashboards and visualization
  • Alloy (Grafana Alloy, formerly Promtail) for log collection and multiline parsing
  • Loki for log aggregation and querying
  • Prometheus for metrics collection

All configuration files and setup scripts are available in my public GitHub repo.


Why Add Grafana, Alloy, Loki, and Prometheus?

Sitecore’s default AKS setup provides only the most basic health and logging capabilities. Out of the box, log files are stored in persistent volume as plain .txt files. There is no built-in UI for searching, filtering, or correlating these logs across pods or services. This makes troubleshooting and monitoring in a distributed Kubernetes environment extremely challenging. With the Grafana stack, you gain:

  • Centralized log aggregation 
  • Powerful, flexible dashboards
  • Metrics-based alerting and troubleshooting

Component Breakdown

1. Grafana

Grafana is the visualization layer. It connects to both Loki (for logs) and Prometheus (for metrics), letting you build dashboards that combine infrastructure and application data. 

Example dashboard:


  • Top: CPU and memory usage for Windows node (akswin000005)
  • Bottom: Real-time Sitecore CD and CM logs (warnings and errors), parsed and searchable

2. Alloy (Grafana Alloy, formerly Promtail)

Alloy is the log collector and shipper. It runs as a DaemonSet on both Windows and Linux nodes, tails log files, and forwards them to Loki.

Key detail:  Multiline log parsing

Sitecore logs are not simple one-line entries. For example, an ERROR log often includes a stack trace and nested exceptions, spanning many lines:

If you simply tail these files line-by-line, you lose the context of the error. That’s why Grafana Alloy is configured with a custom multiline parser (see alloy-configmap.yaml in the repo). This parser recognizes the start of a new log entry (using a regex for the timestamp or thread prefix) and groups all subsequent lines until the next entry, ensuring each log event is captured in full.

You can see error message as one line in Grafana Loki:

As logs are parsed, Alloy extracts key fields such as timestamp and log level, and attaches extra labels like job=sitecore, role=cm or role=cd, etc. This makes it easy to filter logs in Grafana by environment, service, or severity.

Example log labels in Loki/Grafana:

job="sitecore"

role="cm" or role="cd"

level="ERROR" or level="INFO"


3. Loki

Loki is the log aggregation backend. It stores logs from Sitecore pods, indexed by labels (e.g., role=cm, role=cd, job=sitecore).

Log querying in Grafana:


  • Filter logs by labels (e.g., role=cm or role=cd)
  • Search for errors, warnings, or specific text

Example Loki Role Filter:






4. Prometheus

Prometheus scrapes metrics from your cluster (including Windows nodes via the Windows exporter). It provides data for CPU, memory, and custom application metrics.

Example Prometheus query:

100 - (avg by (instance) (rate(windows_cpu_time_total{mode="idle"}[5m])) * 100)

This shows CPU usage percentage per Windows node.

Visualization:




How to Set Up

  1. Clone the repo:

git clone https://github.com/dogabenli/grafana-sitecore-demo.git

2.       Deploy all components

Deploy Sitecore AKS first, followed by Grafana components.

3.       Access Grafana:

Expose the Grafana service and log in to start exploring dashboards.


Conclusion

By adding Grafana, Alloy, Loki, and Prometheus to your Sitecore AKS environment, you gain:

  • Centralized, searchable logs (with multiline support)
  • Real-time dashboards for both infrastructure and application health
  • The ability to troubleshoot and correlate issues quickly

All code and configuration is available on GitHub: https://github.com/dogabenli/grafana-sitecore-demo 




Comments

Popular posts from this blog

Deploying SolrCloud with Zookeeper on Azure Kubernetes Service (AKS)

SolrCloud on Azure Kubernetes Service (AKS) Running SolrCloud on Kubernetes — particularly Azure Kubernetes Service (AKS) — can provide you with a highly scalable, cost-efficient, and cloud-native architecture.  This guide walks through how I deployed SolrCloud 8.11.2 with Zookeeper on AKS . Why This Matters for Sitecore Deployments If you're running Sitecore XP or XM , you know that Solr is a mandatory dependency — powering xDB indexing, content search. While Sitecore provides a developer-friendly Solr container for local use, it clearly states: ⚠️ The included Solr image is intended only for development and testing . This means Sitecore does not provide a production-ready Solr setup. If you're deploying Sitecore in production — especially in Kubernetes — you need to create your own scalable, HA SolrCloud cluster. That’s why this deployment matters: You’re building a production-grade SolrCloud setup You’re deploying 3 Solr + 3 Zookeeper nodes for high availabilit...

Post setup tips of Sitecore Commerce on Azure

Sitecore official documentation provides how to setup your instance on Azure. After you follow the instructions, you would need a few key steps to complete your setup. Use postman to bootstrap and initialize the Commerce Engine:   Call bootstrap method ( {{OpsApiHost}} / {{OpsApi}} /Bootstrap() ) for your environments. You would need to get token first ( {{SitecoreIdServerHost}} /connect/token ) Creating a new webshop: When you create a new webshop from Sitecore content editor, a new domain is created automatically on your content management server. But you will need to add manually to your CD and Identity servers. Update Domain.config file under app_config > security folder as below.   If you don’t add, you will get domain is missing error while customers sign in. You also need to register your domain to BizFX server.  Then, you will be able to see new domain while adding new customers in Sitecore Commerce panel. Configure produ...