Logs Schema and Writing ClickHouse Queries for Building Dashboard Panels.

At SigNoz we store our data on ClickHouse. In this documentation, we will go through the schema of the logs table and see how we can write clickhouse queries to create different dashboard panels from Logs Data.

Migrating Logs ClickHouse Queries

If you are migrating from older version of SigNoz to version >=0.53.0 here are the following changes you need to make in your clickhouse queries.

Change table name from signoz_logs.distributed_logs to signoz_logs.distributed_logs_v2
Attributes and Resources data type is changed from array to map.
- attributes_string_key, attributes_string_value is now a single map attributes_string
- attributes_int64_key, attributes_int64_value is now a single map attributes_number
- attributes_float64_key, attributes_float64_value is now a single map attributes_number
- attributes_bool_key, attributes_bool_value is now a single map attributes_bool
- resources_string_key, resources_string_value is now a single map resources_string
Eg:-
- attributes_string_value[indexOf(attributes_string_key, 'method')] will change to attributes_string['method']
- resources_string_value[indexOf(resources_string_key, 'service.name')] will change to resources_string['service.name']
- indexOf(attributes_string_key, 'method') !=0 will change to mapContains(attributes_string, 'method')
- indexOf(resources_string_key, 'host') !=0 will change to mapContains(resources_string, 'host')

Logs Schema V2

⚠️ Warning

If you are using SigNoz version <= 0.52.0 please follow this guide here

If we check the schema of the logs table in clickhouse this is what it looks like. The table was created with respect to the OpenTelemetry Logs Data Model

(
    `timestamp` UInt64 CODEC(DoubleDelta, LZ4),
    `observed_timestamp` UInt64 CODEC(DoubleDelta, LZ4),
    `id` String CODEC(ZSTD(1)),
    `trace_id` String CODEC(ZSTD(1)),
    `span_id` String CODEC(ZSTD(1)),
    `trace_flags` UInt32,
    `severity_text` LowCardinality(String) CODEC(ZSTD(1)),
    `severity_number` UInt8,
    `body` String CODEC(ZSTD(2)),
    `attributes_string` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    `attributes_number` Map(LowCardinality(String), Float64) CODEC(ZSTD(1)),
    `attributes_bool` Map(LowCardinality(String), Bool) CODEC(ZSTD(1)),
    `resources_string` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    `scope_name` String CODEC(ZSTD(1)),
    `scope_version` String CODEC(ZSTD(1)),
    `scope_string` Map(LowCardinality(String), String) CODEC(ZSTD(1)),
    INDEX body_idx lower(body) TYPE ngrambf_v1(4, 60000, 5, 0) GRANULARITY 1,
    INDEX id_minmax id TYPE minmax GRANULARITY 1,
    INDEX severity_number_idx severity_number TYPE set(25) GRANULARITY 4,
    INDEX severity_text_idx severity_text TYPE set(25) GRANULARITY 4,
    INDEX trace_flags_idx trace_flags TYPE bloom_filter GRANULARITY 4,
    INDEX scope_name_idx scope_name TYPE tokenbf_v1(10240, 3, 0) GRANULARITY 4,
    INDEX attributes_string_idx_key mapKeys(attributes_string) TYPE tokenbf_v1(1024, 2, 0) GRANULARITY 1,
    INDEX attributes_string_idx_val mapValues(attributes_string) TYPE ngrambf_v1(4, 5000, 2, 0) GRANULARITY 1,
    INDEX attributes_int64_idx_key mapKeys(attributes_number) TYPE tokenbf_v1(1024, 2, 0) GRANULARITY 1,
    INDEX attributes_int64_idx_val mapValues(attributes_number) TYPE bloom_filter GRANULARITY 1,
    INDEX attributes_bool_idx_key mapKeys(attributes_bool) TYPE tokenbf_v1(1024, 2, 0) GRANULARITY 1
)

There is a distributed logs table which references the above table in each shard. The name of the table is distributed_logs_v2. The schema is same as above.

📝 Note

Any query that we write will be written for the distributed_logs_v2 table.

Columns in the Logs Table

NAME	DESCRIPTION
timestamp	Time when the log line was generated at the source. The default value is the time at which it is received and it can be changed using the time parser.
observed_timestamp	Time when the log line is observed at the collection system. It is automatically added by the collector.
id	It is a ksuid, it helps us in paginating and sorting log lines. It is automatically added by the collector.
trace_id	Trace ID of the log line. W3C Trace Context. It can be filled using trace parser.
span_id	Span ID for the log line or set of log lines that are part of a particular processing span. It can be filled using trace parser.
trace_flags	Trace Flag of the log line. W3C Trace Context. It can be filled using trace parser.
severity_text	It is the log level. eg:- `info` . It can be filled using severity parser.
severity_number	Numerical value of the severity_text. It can be filled using severity parser.
body	The body/message of the log record.
resources_string	ALl the resource attributes are stored in this map.
attributes_string	All the attributes with string data type are stored in this map.
attributes_number	All the attributes with number data type are stored in this map.
attributes_bool	All the attributes with boolean data type are stored in this map.
scope_name	Instrumentation scope name.
scope_version	Instrumentation scope version.
scope_string	Instrumentation scope attributes

The attributes and resources can be added and transformed using different processors and operators. You can read more about them here

Timestamp Bucketing

In the new schema for logs i.e distributed_logs_v2, we have a new column ts_bucket_start which is used to store the start timestamp of the bucket. This is used for faster filtering on timestamp attribute.

How to use this column in the queries? If your timestamp query is timestamp BETWEEN {{.start_datetime}} AND {{.end_datetime}} then you will have to add ts_bucket_start BETWEEN {{.start_timestamp}} - 1800 AND {{.end_timestamp}}

We have added -1800, as bucketing is done in 30 minute intervals.

Selected Attributes/Resources:-

When an attribute/resource field is converted to selected(indexed) field. Then two new columns are added.

Ex: if our attribute name is method which is present in attributes_string then the corresponding columns that will be created are attribute_string_method and attribute_string_method_exists. It will look like following in the logs schema.

`attribute_string_method` String MATERIALIZED attributes_string['xyz'] CODEC(ZSTD(1)),
`attribute_string_method_exists` Bool MATERIALIZED if(mapContains(attributes_string, 'xyz') != 0, true, false) CODEC(ZSTD(1)),

Writing Clickhouse Queries for Dashboard Panels

While writing queries for logs table, if you want to use an attribute/resource attribute in your query you will have to reference it in the following format <type>_<dataType>[<keyname>]

where type can be attributes/resources , dataType can be number/string/bool and keyname is the name of the key.

Eg: If your keyname is status of dataType string and type attribute, it needs to be referenced as attributes_string['status']

📝 Note

In the above example, if status is an selected field , then it can be referenced as attribute_string_status

We will use two variables i.e {{.start_timestamp_nano}} and {{.end_timestamp_nano}} while writing our queries to specify the time range.

Timeseries

This panel is used when you want to view your aggregated data in a timeseries.

Examples

Count of log lines per minute

SELECT 
    toStartOfInterval(fromUnixTimestamp64Nano(timestamp), INTERVAL 1 MINUTE) AS ts,
    toFloat64(count()) AS value 
FROM 
    signoz_logs.distributed_logs_v2
WHERE 
    (timestamp >= {{.start_timestamp_nano}} AND timestamp <= {{.end_timestamp_nano}}) AND
    ts_bucket_start BETWEEN {{.start_timestamp}} - 1800 AND {{.end_timestamp}}
GROUP BY ts 
ORDER BY ts ASC;

Count of log lines per minute group by container name

SELECT 
    toStartOfInterval(fromUnixTimestamp64Nano(timestamp), INTERVAL 1 MINUTE) AS ts,
    attributes_string['container_name'] as container_name,
    toFloat64(count()) AS value 
FROM 
    signoz_logs.distributed_logs_v2
WHERE 
    (timestamp >= {{.start_timestamp_nano}} AND timestamp <= {{.end_timestamp_nano}}) AND
    ts_bucket_start BETWEEN {{.start_timestamp}} - 1800 AND {{.end_timestamp}} AND
    mapContains(attributes_string, 'container_name')
GROUP BY container_name, ts
ORDER BY ts ASC;

Count of log lines per minute where `severity_text = 'INFO'`

SELECT 
    toStartOfInterval(fromUnixTimestamp64Nano(timestamp), INTERVAL 1 MINUTE) AS ts, 
    toFloat64(count()) AS value 
FROM 
    signoz_logs.distributed_logs_v2  
WHERE 
    (timestamp >= {{.start_timestamp_nano}} AND timestamp <= {{.end_timestamp_nano}}) AND 
    severity_text='INFO' AND
    ts_bucket_start BETWEEN {{.start_timestamp}} - 1800 AND {{.end_timestamp}}
GROUP BY ts 
ORDER BY ts ASC;

Count of log lines per minute where `severity_text = 'INFO'` , `method = 'GET'` , `service.name = 'demo'`. Here `method` is an attribute while `service.name` is a resource attribute.

SELECT 
    toStartOfInterval(fromUnixTimestamp64Nano(timestamp), INTERVAL 1 MINUTE) AS ts, 
    toFloat64(count()) AS value 
FROM 
    signoz_logs.distributed_logs_v2  
WHERE 
    (timestamp >= {{.start_timestamp_nano}} AND timestamp <= {{.end_timestamp_nano}}) AND
    severity_text='INFO' AND
    attributes_string['method'] = 'GET' AND 
    resources_string['service.name'] = 'demo' AND
    ts_bucket_start BETWEEN {{.start_timestamp}} - 1800 AND {{.end_timestamp}}
GROUP BY ts 
ORDER BY ts ASC;

Count of log lines per minute where `severity_text = 'INFO'` , `method = 'GET'` , `service.name = 'demo'`. Here `method` is an attribute while `service.name` is a resource attribute and both `method` and `service_name` is selected field.

SELECT 
    toStartOfInterval(fromUnixTimestamp64Nano(timestamp), INTERVAL 1 MINUTE) AS ts, 
    toFloat64(count()) AS value 
FROM 
    signoz_logs.distributed_logs_v2  
WHERE 
    (timestamp >= {{.start_timestamp_nano}} AND timestamp <= {{.end_timestamp_nano}}) AND 
    severity_text='INFO' AND 
    attribute_string_method = 'GET' AND 
    resource_string_service_name = 'demo' AND
    ts_bucket_start BETWEEN {{.start_timestamp}} - 1800 AND {{.end_timestamp}}
GROUP BY ts 
ORDER BY ts ASC;

Value

For the value type panel, the overall query will be similar to timeseries, just that you will have to get the absolute value at the end. You can reduce your end result to either average, latest, sum, min, or max.

Examples

Average count of log lines where `severity_text = 'INFO'` , `method = 'GET'` , `service.name = 'demo'`. Here `method` is an attribute while `service.name` is a resource attribute.

SELECT 
    avg(value) as value, 
    any(ts) as ts FROM (
        SELECT 
            toStartOfInterval(fromUnixTimestamp64Nano(timestamp), INTERVAL 1 MINUTE) AS ts, 
            toFloat64(count()) AS value 
        FROM 
            signoz_logs.distributed_logs_v2  
        WHERE 
            (timestamp >= {{.start_timestamp_nano}} AND timestamp <= {{.end_timestamp_nano}}) AND
            severity_text='INFO' AND
            attributes_string['method'] = 'GET' AND 
            resources_string['service_name'] = 'demo' AND
            ts_bucket_start BETWEEN {{.start_timestamp}} - 1800 AND {{.end_timestamp}}
        GROUP BY ts 
        ORDER BY ts ASC
    )

📝 Note

attributes_string['method'] will change to attribute_string_method if method is a selected field and resources_string['service.name'] will change to resource_string_service$$name if service.name is a selected field.

Table

This is used when you want to view the timeseries data in a tabular format.

The query is similar to timeseries query but instead of using time interval we use just use now() as ts in select.

Examples

Count of log lines where `severity_text = 'INFO'` , `method = 'GET'` group by `service.name`. Here `method` is an attribute while `service.name` is a resource attribute.

SELECT 
    now() as ts,
    resources_string['service.name'] as service_name,
    toFloat64(count()) AS value 
FROM 
    signoz_logs.distributed_logs_v2  
WHERE 
    (timestamp >= {{.start_timestamp_nano}} AND timestamp <= {{.end_timestamp_nano}}) AND
    severity_text='INFO' AND
    attributes_string['method'] = 'GET' AND
    ts_bucket_start BETWEEN {{.start_timestamp}} - 1800 AND {{.end_timestamp}}
GROUP BY service_name, ts 
ORDER BY ts ASC;

📝 Note

Real Life Use Cases Example

Number of log lines generated by each kubernetes cluster

SELECT 
    toStartOfInterval(fromUnixTimestamp64Nano(timestamp), INTERVAL 1 MINUTE) AS ts,
    resources_string['k8s.cluster.name'] as k8s_cluster_name,
    toFloat64(count()) AS value 
FROM 
    signoz_logs.distributed_logs_v2  
WHERE 
    (timestamp >= {{.start_timestamp_nano}} AND timestamp <= {{.end_timestamp_nano}}) AND 
    mapContains(resources_string, 'k8s.cluster.name') AND
    ts_bucket_start BETWEEN {{.start_timestamp}} - 1800 AND {{.end_timestamp}}
GROUP BY k8s_cluster_name, ts
ORDER BY ts ASC;

📝 Note

resources_string['k8s_cluster_name'] will change to resource_string_k8s_cluster_name if k8s_cluster_name is a selected field and mapContains(resources, 'k8s_cluster_name') will change to resource_string_k8s_cluster_name_exists = true

Number of error logs generated by each service

SELECT 
    toStartOfInterval(fromUnixTimestamp64Nano(timestamp), INTERVAL 1 MINUTE) AS ts,
    resources_string_['service.name'] as service_name,
    toFloat64(count()) AS value 
FROM 
    signoz_logs.distributed_logs_v2  
WHERE 
    (timestamp >= {{.start_timestamp_nano}} AND timestamp <= {{.end_timestamp_nano}}) AND 
    severity_text='ERROR' AND
    mapContains(resources_string, 'service.name') AND
    ts_bucket_start BETWEEN {{.start_timestamp}} - 1800 AND {{.end_timestamp}}
GROUP BY service_name,ts 
ORDER BY ts ASC;

📝 Note

resources_string['service.name'] will change to resource_string_service$$name if service.name is a selected field and mapContains(resources_string, 'service.name') will change to resource_string_service$$name_exists = true

Panel Time preference

Using the Panel Time Preference present on the right you can select a custom time range for your panel. When you open the dashboard the specific panel will render for the time specified for that panel.

Was this page helpful?

Metrics

Traces

On this page

Migrating Logs ClickHouse Queries

Logs Schema V2

Columns in the Logs Table

Timestamp Bucketing

Selected Attributes/Resources:-

Writing Clickhouse Queries for Dashboard Panels

Timeseries

Value

Table

Real Life Use Cases Example

Number of log lines generated by each kubernetes cluster

Number of error logs generated by each service

Panel Time preference

Try SigNoz Cloud for FREE

Instant setup, predictable pricing, and advanced features without infrastructure hassles.

Get Started - Free