Wed. Nov 20th, 2019

Zipkin 2.19.2 releases, distributed tracing system

2 min read

Zipkin is a distributed tracing system. It helps gather timing data needed to troubleshoot latency problems in microservice architectures. It manages both the collection and lookup of this data. Zipkin’s design is based on the Google Dapper paper.

This project includes a dependency-free library and a spring-boot server. Storage options include in-memory, JDBC (mysql), Cassandra, and Elasticsearch.

zipkin

Applications are instrumented to report timing data to Zipkin. The Zipkin UI also presents a Dependency diagram showing how many traced requests went through each application. If you are troubleshooting latency problems or errors, you can filter or sort all traces based on the application, length of trace, annotation, or timestamp. Once you select a trace, you can see the percentage of the total trace time each span takes which allows you to identify the problem application.

There are 4 components that make up Zipkin:

  • collector
  • storage
  • search
  • web UI

Zipkin Collector

Once the trace data arrives at the Zipkin collector daemon, it is validated, stored, and indexed for lookups by the Zipkin collector.

Storage

Zipkin was initially built to store data on Cassandra since Cassandra is scalable, has a flexible schema, and is heavily used within Twitter. However, we made this component pluggable. In addition to Cassandra, we natively support ElasticSearch and MySQL. Other back-ends might be offered as third party extensions.

Zipkin Query Service

Once the data is stored and indexed, we need a way to extract it. The query daemon provides a simple JSON API for finding and retrieving traces. The primary consumer of this API is the Web UI.

Web UI

We created a GUI that presents a nice interface for viewing traces. The web UI provides a method for viewing traces based on service, time, and annotations. Note: there is no built-in authentication in the UI!

Zipkin 2.19.2 has been released.

This is a patch release that fixes a regression in handling of certain span names and adds a feature to control server timeout of queries.

Bugfixes

  • JSON API response that includes whitespace in names (such as service names) correctly escapes strings #2915

New Features

  • The QUERY_TIMEOUT environment variable can be set to a duration string to control how long the server will wait while serving a query (e.g., search for traces) before failing with a timeout. For example, when increasing the client timeout of Elasticsearch with ES_TIMEOUT it is often helpful to increase QUERY_TIMEOUT to a larger number so the server waits longer than the Elasticsearch client. #2809

Downloads