# How the DECODE_HISTOGRAM() function works in Mariadb?

The `DECODE_HISTOGRAM()`

function is a built-in function that can decode the histogram data stored in the `mysql.column_stats`

table and return a string of comma-separated numeric values that represent the probability distribution of the column values.

Histograms are a useful tool for analyzing the distribution of data in a column. They can help optimize queries by providing more accurate statistics for the query optimizer. Mariadb supports histogram-based statistics for columns, which can be collected and stored in the `mysql.column_stats`

table using the `ANALYZE TABLE`

statement.

The `DECODE_HISTOGRAM()`

function is a built-in function that can decode the histogram data stored in the `mysql.column_stats`

table and return a string of comma-separated numeric values that represent the probability distribution of the column values. This function can be used to inspect the histogram data and understand how the column values are distributed.

## Syntax

The syntax of the `DECODE_HISTOGRAM()`

function is as follows:

```
DECODE_HISTOGRAM(hist_type, histogram)
```

The function takes two arguments:

`hist_type`

: This is an enumeration value that specifies the type of the histogram. It can be either`SINGLE_PREC_HB`

or`DOUBLE_PREC_HB`

, which stand for single-precision and double-precision histograms, respectively. The histogram type is determined by the`histogram_type`

system variable when the histogram data is collected by the`ANALYZE TABLE`

statement.`histogram`

: This is a binary string that contains the histogram data for a column. It can be obtained from the`histogram`

column of the`mysql.column_stats`

table.

The function returns a string of comma-separated numeric values that correspond to the probability distribution of the column values. The number of values in the string is equal to the number of buckets in the histogram, which is determined by the `histogram_size`

system variable when the histogram data is collected by the `ANALYZE TABLE`

statement. The values are normalized to sum up to 1, and each value represents the fraction of rows that fall into the corresponding bucket of the histogram.

## Examples

In this section, we will show some examples of how to use the `DECODE_HISTOGRAM()`

function to decode and inspect the histogram data for a column.

### Example 1: Decoding a single-precision histogram

Suppose we have a table called `origin`

that contains two columns: `i`

and `v`

. The column `i`

is the primary key and has values from 1 to 18. The column `v`

has some random values as shown below:

```
SELECT * FROM origin;
+----+-----+
| i | v |
+----+-----+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
| 6 | 10 |
| 7 | 20 |
| 8 | 30 |
| 9 | 40 |
| 10 | 50 |
| 11 | 60 |
| 12 | 70 |
| 13 | 80 |
| 14 | 90 |
| 15 | 100 |
| 16 | 200 |
| 17 | 400 |
| 18 | 800 |
+----+-----+
```

We can collect and store the histogram data for the column `v`

by using the following statement:

```
SET histogram_size=10, histogram_type=SINGLE_PREC_HB;
ANALYZE TABLE origin PERSISTENT FOR ALL;
```

The `histogram_size`

variable specifies the number of buckets in the histogram, and the `histogram_type`

variable specifies the type of the histogram. In this case, we use 10 buckets and a single-precision histogram.

The `ANALYZE TABLE`

statement collects the histogram data and stores it in the `mysql.column_stats`

table. We can query this table to see the histogram data for the column `v`

as follows:

```
SELECT db_name, table_name, column_name, hist_type, hex(histogram)
FROM mysql.column_stats
WHERE db_name='test' AND table_name='origin' AND column_name='v';
+---------+------------+-------------+----------------+----------------------+
| db_name | table_name | column_name | hist_type | hex(histogram) |
+---------+------------+-------------+----------------+----------------------+
| test | origin | v | SINGLE_PREC_HB | 000001060C0F161C1F7F |
+---------+------------+-------------+----------------+----------------------+
```

The `hist_type`

column shows the type of the histogram, and the `histogram`

column shows the binary string that contains the histogram data. We can use the `DECODE_HISTOGRAM()`

function to decode this binary string and get the probability distribution of the column values as follows:

```
SELECT DECODE_HISTOGRAM(hist_type, histogram)
FROM mysql.column_stats
WHERE db_name='test' AND table_name='origin' AND column_name='v';
+-------------------------------------------------------------------+
| DECODE_HISTOGRAM(hist_type, histogram) |
+-------------------------------------------------------------------+
| 0.000,0.000,0.004,0.020,0.024,0.012,0.027,0.024,0.012,0.376,0.502 |
+-------------------------------------------------------------------+
```

The result is a string of 11 comma-separated numeric values that sum up to 1. Each value represents the fraction of rows that fall into the corresponding bucket of the histogram. For example, the first value 0.000 means that no rows have values between 0 and 80, the second value 0.000 means that no rows have values between 80 and 160, and so on. The last value 0.502 means that half of the rows have values between 720 and 800.

We can also visualize the histogram by using a bar chart as shown below:

![Histogram for column v]

The x-axis shows the range of values for each bucket, and the y-axis shows the fraction of rows for each bucket. The bar chart shows that the column values are skewed to the right, with most of the rows having values close to 800.

### Example 2: Decoding a double-precision histogram

Suppose we have the same table `origin`

as in the previous example, but we want to use a double-precision histogram instead of a single-precision histogram. We can collect and store the histogram data for the column `v`

by using the following statement:

```
SET histogram_size=20, histogram_type=DOUBLE_PREC_HB;
ANALYZE TABLE origin PERSISTENT FOR ALL;
```

The `histogram_size`

variable specifies the number of buckets in the histogram, and the `histogram_type`

variable specifies the type of the histogram. In this case, we use 20 buckets and a double-precision histogram.

The `ANALYZE TABLE`

statement collects the histogram data and stores it in the `mysql.column_stats`

table. We can query this table to see the histogram data for the column `v`

as follows:

```
SELECT db_name, table_name, column_name, hist_type, hex(histogram)
FROM mysql.column_stats
WHERE db_name='test' AND table_name='origin' AND column_name='v';
+---------+------------+-------------+----------------+------------------------------------------+
| db_name | table_name | column_name | hist_type | hex(histogram) |
+---------+------------+-------------+----------------+------------------------------------------+
| test | origin | v | DOUBLE_PREC_HB | 5200F600480116067E0CB30F1B16831CB81FD67F |
+---------+------------+-------------+----------------+------------------------------------------+
```

The `hist_type`

column shows the type of the histogram, and the `histogram`

column shows the binary string that contains the histogram data. We can use the `DECODE_HISTOGRAM()`

function to decode this binary string and get the probability distribution of the column values as follows:

```
SELECT DECODE_HISTOGRAM(hist_type, histogram)
FROM mysql.column_stats
WHERE db_name='test' AND table_name='origin' AND column_name='v';
+-----------------------------------------------------------------------------------------+
| DECODE_HISTOGRAM(hist_type, histogram) |
+-----------------------------------------------------------------------------------------+
| 0.00125,0.00250,0.00125,0.01877,0.02502,0.01253,0.02502,0.02502,0.01253,0.37546,0.50063 |
+-----------------------------------------------------------------------------------------+
```

The result is a string of 11 comma-separated numeric values that sum up to 1. Each value represents the fraction of rows that fall into the corresponding bucket of the histogram. For example, the first value 0.00125 means that 0.125% of the rows have values between 0 and 40, the second value 0.00250 means that 0.25% of the rows have values between 40 and 80, and so on. The last value 0.50063 means that 50.063% of the rows have values between 760 and 800.