Skip to main content

Modern Observability for Sitecore 10.4 on AKS: Grafana, Alloy, Loki, and Prometheus


In this post, I’ll walk through how I extended the base Sitecore XM 10.4 AKS setup with a modern observability stack using Grafana, Alloy, Loki, and Prometheus. This setup provides deep insights into both infrastructure and application health, with powerful log aggregation and visualization.

Project Overview

  • Base: Sitecore XM 10.4 running on Azure Kubernetes Service (AKS)
  • Enhancements: Added a full Grafana observability stack:
  • Grafana for dashboards and visualization
  • Alloy (Grafana Alloy, formerly Promtail) for log collection and multiline parsing
  • Loki for log aggregation and querying
  • Prometheus for metrics collection

All configuration files and setup scripts are available in my public GitHub repo.


Why Add Grafana, Alloy, Loki, and Prometheus?

Sitecore’s default AKS setup provides only the most basic health and logging capabilities. Out of the box, log files are stored in persistent volume as plain .txt files. There is no built-in UI for searching, filtering, or correlating these logs across pods or services. This makes troubleshooting and monitoring in a distributed Kubernetes environment extremely challenging. With the Grafana stack, you gain:

  • Centralized log aggregation 
  • Powerful, flexible dashboards
  • Metrics-based alerting and troubleshooting

Component Breakdown

1. Grafana

Grafana is the visualization layer. It connects to both Loki (for logs) and Prometheus (for metrics), letting you build dashboards that combine infrastructure and application data. 

Example dashboard:


  • Top: CPU and memory usage for Windows node (akswin000005)
  • Bottom: Real-time Sitecore CD and CM logs (warnings and errors), parsed and searchable

2. Alloy (Grafana Alloy, formerly Promtail)

Alloy is the log collector and shipper. It runs as a DaemonSet on both Windows and Linux nodes, tails log files, and forwards them to Loki.

Key detail:  Multiline log parsing

Sitecore logs are not simple one-line entries. For example, an ERROR log often includes a stack trace and nested exceptions, spanning many lines:

If you simply tail these files line-by-line, you lose the context of the error. That’s why Grafana Alloy is configured with a custom multiline parser (see alloy-configmap.yaml in the repo). This parser recognizes the start of a new log entry (using a regex for the timestamp or thread prefix) and groups all subsequent lines until the next entry, ensuring each log event is captured in full.

You can see error message as one line in Grafana Loki:

As logs are parsed, Alloy extracts key fields such as timestamp and log level, and attaches extra labels like job=sitecore, role=cm or role=cd, etc. This makes it easy to filter logs in Grafana by environment, service, or severity.

Example log labels in Loki/Grafana:

job="sitecore"

role="cm" or role="cd"

level="ERROR" or level="INFO"


3. Loki

Loki is the log aggregation backend. It stores logs from Sitecore pods, indexed by labels (e.g., role=cm, role=cd, job=sitecore).

Log querying in Grafana:


  • Filter logs by labels (e.g., role=cm or role=cd)
  • Search for errors, warnings, or specific text

Example Loki Role Filter:






4. Prometheus

Prometheus scrapes metrics from your cluster (including Windows nodes via the Windows exporter). It provides data for CPU, memory, and custom application metrics.

Example Prometheus query:

100 - (avg by (instance) (rate(windows_cpu_time_total{mode="idle"}[5m])) * 100)

This shows CPU usage percentage per Windows node.

Visualization:




How to Set Up

  1. Clone the repo:

git clone https://github.com/dogabenli/grafana-sitecore-demo.git

2.       Deploy all components

Deploy Sitecore AKS first, followed by Grafana components.

3.       Access Grafana:

Expose the Grafana service and log in to start exploring dashboards.


Conclusion

By adding Grafana, Alloy, Loki, and Prometheus to your Sitecore AKS environment, you gain:

  • Centralized, searchable logs (with multiline support)
  • Real-time dashboards for both infrastructure and application health
  • The ability to troubleshoot and correlate issues quickly

All code and configuration is available on GitHub: https://github.com/dogabenli/grafana-sitecore-demo 




Comments

Popular posts from this blog

Deploying SolrCloud with Zookeeper on Azure Kubernetes Service (AKS)

SolrCloud on Azure Kubernetes Service (AKS) Running SolrCloud on Kubernetes — particularly Azure Kubernetes Service (AKS) — can provide you with a highly scalable, cost-efficient, and cloud-native architecture.  This guide walks through how I deployed SolrCloud 8.11.2 with Zookeeper on AKS . Why This Matters for Sitecore Deployments If you're running Sitecore XP or XM , you know that Solr is a mandatory dependency — powering xDB indexing, content search. While Sitecore provides a developer-friendly Solr container for local use, it clearly states: ⚠️ The included Solr image is intended only for development and testing . This means Sitecore does not provide a production-ready Solr setup. If you're deploying Sitecore in production — especially in Kubernetes — you need to create your own scalable, HA SolrCloud cluster. That’s why this deployment matters: You’re building a production-grade SolrCloud setup You’re deploying 3 Solr + 3 Zookeeper nodes for high availabilit...

Sitecore Commerce – XC9 Tips – Missing Commerce Components in SXA Toolbox on Experience Editor

I've recently had an issue that commerce components were missing in SXA Toolbox. I setup Sitecore Commerce on top of an existing instance and I already had a SXA website working on it. The idea was to add commerce components and functionality to my existing website. But after commerce setup, the toolbox was still showing default SXA components and commerce components were missing although I add commerce tenant and website modules: I checked Available Renderings under Presentation folder, there was no problem, commerce renderings were there. I created another tenant and website to see if it shows the commerce components in toolbox. Nothing seemed different but I was seeing commerce components for new website and it was missing on existing one. Then, I noticed two things: 1- Selected catalog was empty in content editor (/sitecore/Commerce/Catalog Management/Catalogs) even if I see Habitat_Master catalog in Merchandising section on commerce management panel. 2- Bootstrap ...