Azure Service Fabric Diagnose Common Scenarios

This article illustrates common scenarios users have encountered in the area of monitoring and diagnostics with Service Fabric. The scenarios presented cover all 3 layers of service fabric: Application, Cluster, and Infrastructure. Each solution uses Application Insights and Log Analytics, Azure monitoring tools, to complete each scenario. The steps in each solution give users an introduction on how to use Application Insights and Log Analytics in the context of Service Fabric.

Prerequisites and Recommendations

The solutions in this article will use the following tools. We recommend you have these set up and configured:

How can I see unhandled exceptions in my application?

  1. Navigate to your Application Insights resource that your application is configured with.
  2. Click on Search in the top left. Then click filter on the next panel.

    AI Overview

  3. You will see lots of types of events (traces, requests, custom events). Choose “Exception” as your filter.

    AI Filter List

    By clicking an exception in the list, you can look at more details including the service context if you are using the Service Fabric Application Insights SDK.

    AI Exception

How do I view which HTTP calls are used in my services?

  1. In the same Application Insights resource, you can filter on “requests” instead of exceptions and view all requests made
  2. If you are using the Service Fabric Application Insights SDK, you can see a visual representation of your services connected to one another, and the number of succeeded and failed requests. On the left click “Application Map”

    AI App Map BladeAI App Map

    For more information on the application map, visit the Application Map documentation

How do I create an alert when a node goes down

  1. Node events are tracked by your Service Fabric cluster. Navigate to the Service Fabric Analytics solution resource named ServiceFabric
  2. Click on the graph on the bottom of the blade titled “Summary”

    OMS Solution

  3. Here you have many graphs and tiles displaying various metrics. Click on one of the graphs and it will take you to the Log Search. Here you can query for any cluster events or performance counters.

  4. Enter the following query. These event IDs are found in the Node events reference

    | where EventId >= 25623 or EventId <= 25626
  5. Click “New Alert Rule” at the top and now anytime an event arrives based on this query, you will receive an alert in your chosen method of communication.

    OMS New Alert

How can I be alerted of application upgrade rollbacks?

  1. On the same Log Search window as before enter the following query for upgrade rollbacks. These event IDs are found under Application events reference

    | where EventId == 29623 or EventId == 29624
  2. Click “New Alert Rule” at the top and now anytime an event arrives based on this query, you will receive an alert.

How do I see container metrics?

In the same view with all the graphs, you will see some tiles for the performance of your containers. You need the OMS Agent and Container Monitoring solution for these tiles to populate.

OMS Container Metrics

How can I monitor performance counters?

  1. Once you have added the OMS Agent to your cluster you need to add the specific performance counters you want to track. Navigate to the OMS Workspace’s page in the portal – from the solution’s page the workspace tab is on the left menu.

    OMS Workspace Tab

  2. Once you’re on the workspace’s page, click on “Advanced settings” in the same left menu.

    OMS Advanced Settings

  3. Click on Data > Windows Performance Counters (Data > Linux Performance Counters for Linux machines) to start collecting specific counters from your nodes via the OMS Agent. Here are examples of the format for counters to add

    • .NET CLR Memory(<ProcessNameHere>)\# Total committed Bytes
    • Processor(_Total)\% Processor Time
    • Service Fabric Service(*)\Average milliseconds per request

      In the quickstart, VotingData and VotingWeb are the process names used, so tracking these counters would look like

    • .NET CLR Memory(VotingData)\# Total committed Bytes

    • .NET CLR Memory(VotingWeb)\# Total committed Bytes

      OMS Perf Counters

  4. This will allow you to see how your infrastructure is handling your workloads, and set relevant alerts based on resource utilization. For example – you may want to set an alert if the total Processor utilization goes above 90% or below 5%. The counter name you would use for this is “% Processor Time.” You could do this by creating an alert rule for the following query:

    Perf | where CounterName == "% Processor Time" and InstanceName == "_Total" | where CounterValue >= 90 or CounterValue <= 5.

How do I track performance of my Reliable Services and Actors?

For tracking performance of Reliable Services or Actors in your applications, you should add the Service Fabric Actor, Actor Method, Service, and Service Method counters as well. You can add these counters in a similar fashion as the scenario above, here are examples of reliable service and actor performance counters to add in OMS

* `Service Fabric Service(*)\Average milliseconds per request`
* `Service Fabric Service Method(*)\Invocations/Sec`
* `Service Fabric Actor(*)\Average milliseconds per request`
* `Service Fabric Actor Method(*)\Invocations/Sec`

Check these links for the full list of performance counters on Reliable [Services]( and [Actors](

Next steps