Setting up Open Telemetry (OTel) with OCaml and Azure Monitor Integration
After weeks of iteration, testing, and debugging through GitHub Actions CI/CD pipelines, I've successfully implemented a production-ready OpenTelemetry integration for Chaufr. This post documents the complete journey from initial setup through deployment, design decisions, and architectural patterns that made this possible.
Project: Chaufr – Personal drivers, on demand, in your
own vehicle
Tech Stack: OCaml, Dream Framework, OpenTelemetry,
Azure Monitor, Docker, GitHub Actions
Status: ✅ GitHub Actions Green (Production Ready)
The Challenge: Observability at Scale
When deploying an OCaml application to production on Azure App Service, traditional logging and debugging approaches fall short. You need:
- Distributed tracing to understand request flow across components
- Real-time metrics for performance monitoring
- Structured logging with context propagation
- Seamless Azure integration without platform-specific code
This is where OpenTelemetry comes in—a vendor-neutral, open-source standard for collecting telemetry data.
Architecture Overview
Here's the high-level architecture of how telemetry flows through Chaufr:
OCaml Application (Chaufr)
↓
Telemetry Middleware (Dream)
↓
OpenTelemetry Client (ocurl)
↓
Environment Decision:
├─ Production/Staging → Azure Monitor (Direct OTLP)
├─ Development → Local OpenTelemetry Collector (Port 4318)
└─ Fallback → Console Logging Only
Prerequisites
You need to have the following libraries installed in your OCaml project:
- opentelemetry
- opentelemetry-client-ocurl
- opentelemetry-lwt
- opentelemetry.trace
- opentelemetry-cohttp-lwt
Step 1: Core Configuration Module (telemetry.ml)
The foundation of our observability system is the
telemetry.ml module, which handles all configuration and
initialization.
Configuration Type System
type config = {
service_name : string;
service_version : string;
environment : string;
azure_monitor_connection_string : string option;
otlp_endpoint : string option;
enable_console_export : bool;
sample_rate : float;
}
This type captures all necessary telemetry configuration parameters, allowing for flexibility across different deployment environments.
Environment-Aware Initialization
One of the key design decisions was automatic environment detection:
let environment =
match Sys.getenv_opt "ENVIRONMENT" with
| Some env -> env
| None -> (
match Sys.getenv_opt "WEBSITE_SITE_NAME" with
| Some _ -> "production" (* Running on Azure App Service *)
| None -> "development") (* Local development *)
This clever detection means:
-
When running locally, environment defaults to
"development" -
When deployed to Azure App Service, the
WEBSITE_SITE_NAMEenvironment variable is automatically set by Azure, allowing automatic production detection -
You can still override with explicit
ENVIRONMENTvariable if needed
Smart Endpoint Routing
The initialization logic routes to the appropriate telemetry backend based on environment:
let export_endpoint =
match cfg.environment with
| "production" | "staging" -> (
(* Production/Staging: Use Azure Monitor direct OTLP endpoint *)
match cfg.azure_monitor_connection_string with
| Some conn_str ->
(* Parse Azure Monitor connection string for OTLP endpoint *)
let _, ingestion_endpoint_opt =
parse_azure_connection_string conn_str
in
(match ingestion_endpoint_opt with
| Some endpoint -> Some (endpoint ^ "/v1/traces")
| None -> Some "http://localhost:4318/v1/traces")
| None -> Some "http://localhost:4318/v1/traces")
| _ -> (
(* Development: Use local OpenTelemetry Collector *)
match cfg.otlp_endpoint with
| Some endpoint -> Some endpoint
| None -> Some "http://localhost:4318/v1/traces")
Key Design Decisions:
- Production uses Azure Monitor directly - No intermediary collector needed
- Graceful fallbacks - If configuration is missing, fall back to local collector
-
Configurable collector URL - Via
OTEL_EXPORTER_OTLP_ENDPOINTfor flexibility
Span Creation with Automatic Timing
Two complementary functions handle span creation:
Async version (for Lwt code):
let with_span ~name ?(attrs = []) f =
if not !telemetry_enabled then f ()
else
let cfg = !service_config in
Opentelemetry.Trace.with_ ~service_name:cfg.service_name ~attrs name
(fun scope ->
Lwt.catch
(fun () -> f ())
(fun exn ->
let event_fn () =
Opentelemetry.Event.make "exception"
~attrs:[
("exception.type", `String (Printexc.to_string exn));
("exception.message", `String (Printexc.to_string exn));
("exception.stacktrace", `String (Printexc.get_backtrace ()));
]
in
Opentelemetry.Scope.add_event scope event_fn;
Lwt.fail exn))
Sync version (for blocking code):
let with_span_sync ~name ?(attrs = []) f =
if not !telemetry_enabled then f ()
else
let cfg = !service_config in
let start_time = Unix.gettimeofday () in
try
let result = f () in
let duration = Unix.gettimeofday () -. start_time in
let attrs_with_duration =
attrs @ [ ("duration_ms", `Float (duration *. 1000.0)) ]
in
(* ... send span with duration ... *)
result
with exn ->
(* ... log exception with duration ... *)
raise exn
Both functions automatically track execution time and exception handling, reducing boilerplate throughout the codebase.
Step 2: Dream Framework Integration
(telemetry_middleware.ml)
The middleware layer integrates OpenTelemetry seamlessly with the Dream web framework.
Request/Response Attribute Extraction
let extract_request_attrs request =
let method_str = Dream.method_to_string (Dream.method_ request) in
let path = Dream.target request in
let client_ip = Dream.client request in
let user_agent =
Dream.header request "User-Agent" |> Option.value ~default:"unknown"
in
[
("http.method", `String method_str);
("http.target", `String path);
("http.route", `String path);
("http.client_ip", `String client_ip);
("http.user_agent", `String user_agent);
("http.scheme", `String "http");
]
This captures all relevant HTTP context for later analysis in Azure Monitor or other visualization tools.
Dual-Middleware Pattern
The implementation uses two complementary middleware layers:
- Trace Middleware - Creates spans for each HTTP request with timing and error handling
- Logger Middleware - Logs request lifecycle events
let telemetry_middleware handler =
handler |> trace_middleware |> logger_middleware
Why dual middleware?
- Separation of concerns - Tracing and logging are independent
- Flexibility - Either middleware can be disabled independently
- Performance - Logger middleware doesn't require span overhead
Error Handling with Context
The middleware captures exceptions with rich context:
Lwt.catch
(fun () -> inner_handler request)
(fun exn ->
let error_attrs = [
("error", `Bool true);
("error.type", `String (Printexc.to_string exn));
("error.message", `String (Printexc.to_string exn));
] in
Telemetry.log_event ~name:"http.server.error" ~attrs:error_attrs ();
Lwt.fail exn)
This ensures every error is captured and available in Azure Monitor before the exception propagates.
Step 3: OpenTelemetry Collector Configuration
For local development, the OpenTelemetry Collector (Docker container) acts as a central aggregation point:
Receiver Configuration
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
Both gRPC and HTTP protocols are enabled for maximum compatibility. The HTTP endpoint (4318) is used by our OCaml client.
Processor Pipeline
processors:
batch:
timeout: 10s
send_batch_size: 1024
memory_limiter:
check_interval: 1s
limit_mib: 512
resource:
attributes:
- key: service.name
value: chaufr
action: upsert
- key: deployment.environment
from_attribute: ENVIRONMENT
action: insert
Why these processors?
- Batch processor - Groups telemetry data for efficient transmission
- Memory limiter - Prevents the collector from consuming excessive resources
- Resource processor - Enriches all telemetry with service context
Multi-Pipeline Export Strategy
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch, resource]
exporters: [azuremonitor, debug]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch, resource]
exporters: [azuremonitor, prometheus, debug]
logs:
receivers: [otlp]
processors: [memory_limiter, batch, resource]
exporters: [azuremonitor, debug]
Each signal type (traces, metrics, logs) has its own pipeline, allowing independent configuration and export strategies.
Step 4: Application Integration (main.ml)
The main entry point integrates telemetry initialization with the application lifecycle:
Initialization Strategy
let init_result =
Lwt_main.run
(let* telemetry_result = Telemetry.initialize () in
match telemetry_result with
| Error _err ->
Printf.printf "Warning: Failed to initialize telemetry\n";
Connection.init ()
| Ok () ->
Printf.printf "Telemetry initialized successfully\n";
Connection.init ())
Key Design:
- Telemetry initialization is non-blocking
- If telemetry fails, application continues without it
- Database initialization proceeds regardless of telemetry status
- This is important for resilience—observability should never prevent the app from starting
Graceful Shutdown
Sys.set_signal sigterm
(Signal_handle
(fun _ ->
Printf.printf "Received SIGTERM, shutting down gracefully...\n";
Lwt.async (fun () ->
let* () = Telemetry.shutdown () in
Connection.close ())))
On shutdown signals (like during Azure deployment), telemetry data is flushed to ensure no traces are lost.
Health Check Endpoints
The application provides multiple observability endpoints:
let health_routes = [
Dream.get "/health" (fun _ -> Dream.respond ~status:`OK "ok");
Dream.get "/health/detailed" (fun _request ->
let* health_result = Connection.get_health_status () in
match health_result with
| Ok status -> Dream.respond ~status:`OK status
| Error err -> Dream.respond ~status:`Service_Unavailable
("Database Health Check Failed\n" ^ error_msg));
Dream.get "/diagnostic" (fun _request ->
(* Returns environment variables and diagnostic info *)
Dream.html ("<pre>Azure App Service Diagnostics\n" ^ diagnostics ^ "</pre>"));
Dream.get "/status" (fun _request ->
(* Returns container status and timestamp *)
Dream.html ("<pre>" ^ status_info ^ "</pre>"));
]
These endpoints are automatically instrumented with telemetry through the middleware layer.
Deployment Configuration
Environment Variables for Production
When deploying to Azure App Service, set these environment variables:
# Required for Azure Monitor integration
APPLICATIONINSIGHTS_CONNECTION_STRING=InstrumentationKey=<key>;IngestionEndpoint=<endpoint>
# Optional: override environment detection
ENVIRONMENT=production
# Optional: custom OTEL settings
OTEL_SERVICE_NAME=chaufr
OTEL_SERVICE_VERSION=1.0.0
OTEL_SAMPLE_RATE=1.0
Local Development Setup
For local development, run the OpenTelemetry Collector:
docker-compose up -d otel-collector
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
dune exec bin/main.exe
Key Design Decisions Explained
1. Automatic Environment Detection
Rather than requiring explicit configuration, the code detects
WEBSITE_SITE_NAME (Azure's automatic environment variable)
to automatically enable production mode. This reduces configuration
errors.
2. Two Middleware Functions
Separating trace and logging middleware allows each to evolve independently and provides better debugging when one fails.
3. Graceful Degradation
If telemetry initialization fails:
- Application continues running
- Console-only logging takes over
- Observability isn't a single point of failure
4. Both Sync and Async Spans
OCaml codebases often mix Lwt async code with synchronous utilities.
Providing both with_span and
with_span_sync ensures comprehensive coverage.
5. Direct Azure Monitor Export
In production, we export directly to Azure Monitor's OTLP endpoint rather than routing through a local collector. This reduces latency and operational complexity.
Testing and Validation
To verify the telemetry is working in production, send a burst of requests to any health endpoint:
for i in {1..65}; do \
curl -s http://localhost:8080/health > /dev/null; \
curl -s http://localhost:8080/health/detailed > /dev/null; \
curl -s http://localhost:8080/status > /dev/null; \
curl -s http://localhost:8080/diagnostic > /dev/null; \
done && echo "Generated 260 requests (65 iterations × 4 endpoints)"
Then check OpenTelemetry Collector to see the traces appear in real-time.
Lessons Learned
1. Environment Detection is Powerful
Automatic environment detection via
WEBSITE_SITE_NAME eliminated entire classes of
configuration errors.
2. Middleware Composition Matters
Dream's middleware pattern enabled clean separation between telemetry concerns without tangling code.
3. Graceful Degradation Builds Confidence
Knowing that telemetry failures won't crash the app gave confidence to deploy observability to production first.
4. OTLP Standardization is Worth It
Using the OpenTelemetry Protocol standard means we could easily swap backends (local collector → Azure Monitor) without code changes.
Resources & Next Steps
For Local Development
- Start with the Docker Compose setup for the OpenTelemetry Collector
- Use Jaeger UI (port 16686) to visualize traces locally
- Use Prometheus (port 9090) to view metrics
For Production
- Azure Monitor Application Insights dashboard shows all traces, metrics, and logs
- Set up alerts based on error rates, latency percentiles, and custom metrics
- Use the Application Map to understand service dependencies
Conclusion
Setting up OpenTelemetry in OCaml isn't just about adding monitoring—it's about building observability into the application architecture from day one. The patterns we've established in Chaufr provide a foundation for understanding application behavior at scale, debugging production issues quickly, and making data-driven performance improvements.
Happy tracing! 🚀