Blog

Databend Monthly (Sep 2024)

avatarEricOct 16, 2024
Databend Monthly (Sep 2024)

We're excited to bring you the latest updates, new features, and improvements for September 2024 in Databend! We hope these enhancements are helpful, and we look forward to your feedback.

New Way to Integrate: Dictionary

You can now create a dictionary by defining its structure and specifying an external source. Databend's dictionary feature provides an efficient way to integrate and query data from supported external sources (currently MySQL and Redis) directly within Databend. By acting as an in-memory key-value store, the dictionary enables rapid access to external data without the need for complex data pipelines or traditional ETL processes.

The following creates a dictionary that references the MySQL

orders
table. For a complete example, see https://docs.databend.com/guides/query/dictionary#usage-example.

CREATE DICTIONARY order_dict
(
order_id INT,
customer_name STRING,
order_total INT
)
PRIMARY KEY order_id
SOURCE(MYSQL(
host='mysql'
port='3306'
username='root'
password='admin'
db='dict'
table='orders'
));

If you're interested, follow along this tutorial for hands-on experience: Accessing MySQL and Redis with Dictionaries.

SQL Variables

Databend now allows you to store and manage temporary data with SQL variables. To use variables in your queries, leverage both

$
for value substitution and IDEN`TIFIER for accessing database objects like tables. For more information, see SQL Variables.

You can reference the value of a variable within a SQL statement using either the

$
symbol or the
getvariable()
function. Both methods allow dynamic substitution, where the variable's value is directly embedded into the query at runtime.

-- Set a variable to use as a filter value
SET VARIABLE threshold = 100;

-- Use the variable in a query with $
SELECT * FROM sales WHERE amount > $threshold;

-- Alternatively, use the getvariable() function
SELECT * FROM sales WHERE amount > getvariable('threshold');

The

IDENTIFIER
keyword allows you to dynamically reference database objects whose names are stored in variables. Please note that accessing objects with
IDENTIFIER
is not supported by BendSQL yet.

-- Create a table with sales data
CREATE TABLE sales_data (region TEXT, sales_amount INT, month TEXT) AS
SELECT 'North', 5000, 'January' UNION ALL
SELECT 'South', 3000, 'January';

select * from sales_data;

-- Set variables for the table name and column name
SET VARIABLE table_name = 'sales_data';
SET VARIABLE column_name = 'sales_amount';

-- Use IDENTIFIER to dynamically reference the table and column in the query
SELECT region, IDENTIFIER($column_name)
FROM IDENTIFIER($table_name)
WHERE IDENTIFIER($column_name) > 4000;

Fail-Safe

Fail-Safe in Databend refers to mechanisms aimed at recovering lost or accidentally deleted data from object storage. Currently, Fail-Safe supports only S3-compatible storage types. For more information about Fail-Safe, see Fail-Safe.

CALL SYSTEM$FUSE_AMEND('<database_name>', '<table_name>');

For Fail-Safe to work, bucket versioning must be enabled. Note that data created before enabling versioning cannot be recovered using this method.

Stream Hints

Stream hints allow you to specify various stream configuration options using hints to control how a stream is processed.

SELECT ...
FROM <stream_name> WITH (<hint1> = <value1>[, <hint2> = <value2>, ...])

Available hints:

  • CONSUME
    : Specifies whether this query will consume the stream. Defaults to False.
  • MAX_BATCH_SIZE
    : Defines the maximum number of rows per batch processed from the stream.

For more information about these hints and their examples, see WITH Stream Hints.

New FUSE Engine Option

We introduced a new FUSE engine option,

data_retention_period_in_hours
, allowing you to specify the number of hours to retain table data. For more information, see Fuse Engine Options.

Databend Cloud Enhancements

We've enhanced your Databend Cloud experience, so you can enjoy improved features and performance.

TASK_HISTORY Table Function

A new table function, TASK_HISTORY, was introduced for displaying task running history. Examples:

-- Retrieve all task history records
SELECT
*
FROM TASK_HISTORY() order by scheduled_time;

-- Retrieve the task history records where the scheduled time range starts at '2022-01-02T01:12:00-07:00' and ends at '2022-01-02T01:12:30-07:00'.
SELECT *
FROM TASK_HISTORY(
SCHEDULED_TIME_RANGE_START=>TO_TIMESTAMP('2022-01-02T01:12:00-07:00'),
SCHEDULED_TIME_RANGE_END=>TO_TIMESTAMP('2022-01-02T01:12:30-07:00'));

Bulk Operations

You can perform bulk operations on warehouses, including bulk restart, bulk suspend, bulk resume, and bulk delete. To do so, select the warehouses for bulk operations by checking the checkboxes in the warehouse list, and then click the ellipse button for the desired operation.

alt text

New Tutorials

  • Analyzing AWS Billing: In this tutorial, we’ll walk you through the process of importing AWS billing data and conducting cost analysis using SQL. You’ll learn how to load the AWS billing data into Databend Cloud, query it to find key cost drivers, and gain insights into your AWS usage.

  • Migrating from Snowflake: In this tutorial, we’ll walk you through the process of exporting data from Snowflake in Parquet format to an Amazon S3 bucket, and then loading it into Databend Cloud.

Granting Access to Finance Personnel

To facilitate the work of your finance team while ensuring data security, you can create a role named

billing
within Databend Cloud.

CREATE ROLE billing;

This role will be specifically tailored to provide access only to billing-related information. For more information, see Granting Access to Finance Personnel.

alt text

Share this post

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!