This Week in Databend #109
PsiACESep 3, 2023
Databend is a modern cloud data warehouse, serving your massive-scale analytics needs at low cost and complexity. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com .
What's On In Databend
Stay connected with the latest news about Databend.
Understanding Cluster Key
By defining a Cluster Key, you can enhance query performance by clustering tables. In this case, data will be organized and grouped based on the Cluster Key, rather than relying solely on the order in which data is ingested. This optimizes the data retrieval logic for large tables and accelerates queries.
When using the
COPY INTO
REPLACE INTO
Clustering or re-clustering a table takes time. Databend suggests defining cluster keys primarily for sizable tables that experience slow query performance.
If you are interested in learning more, please check out the resources listed below.
Code Corner
Discover some fascinating code snippets or projects that showcase our work or learning journey.
Exploring Databend Local Mode
Databend Local Mode aims to provide a simple Databend environment where users can directly interact with Databend using SQL without the need to perform a full Databend deployment. This makes it convenient for developers to perform basic data processing using Databend.
databend-query local
❯ alias databend-local="databend-query local"
❯ echo " select sum(a) from range(1, 100000) as t(a)" | databend-local
4999950000
❯ databend-local
databend-local:) select number %3 n, number %4 m, sum(number) from numbers(1000000000) group by n,m limit 3 ;
┌───────────────────────────────────┐
│ n │ m │ sum(number) │
│ UInt8 │ UInt8 │ UInt64 NULL │
├───────┼───────┼───────────────────┤
│ 0 │ 0 │ 41666666833333332 │
│ 1 │ 0 │ 41666666166666668 │
│ 2 │ 0 │ 41666666500000000 │
└───────────────────────────────────┘
0 row result in 1.669 sec. Processed 1 billion rows, 953.67 MiB (599.02 million rows/s, 4.46 GiB/s)
If you need to use Databend in a production environment, we recommend deploying a Databend cluster according to the official documentation or using Databend Cloud. databend-local is a good choice for developers to quickly leverage Databend's capabilities without the need to deploy a full Databend instance.
If you are interested in learning more, please check out the resources listed below:
Highlights
We have also made these improvements to Databend that we hope you will find helpful:
- Added initial support for .
MERGE INTO
- Implemented SQLsmith testing framework to support more accurate fuzzy testing.
- Read the document Docs | Setting Environment Variables for how to change Databend configurations through environment variables.
- Implemented and
json_strip_nulls
functions. You can also read Docs | Semi-Structured Functions to check out Databend functions for semi-structured data processing.json_typeof
What's Up Next
We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.
Optimizing the implementation of MERGE INTO
In PR #12350 | feat: support Merge-Into V1, Databend has initially supported the
MERGE INTO
Based on this, there are many optimizations worth considering, such as providing parallel and distributed implementations, reducing I/O, and simplifying data block splitting.
Issue #12595 | Feature: Merge Into Optimizations
Please let us know if you're interested in contributing to this feature, or pick up a good first issue at https://link.databend.com/i-m-feeling-lucky to get started.
Changelog
You can check the changelog of Databend Nightly for details about our latest developments.
Full Changelog: https://github.com/datafuselabs/databend/compare/v1.2.87-nightly...v1.2.96-nightly
Subscribe to our newsletter
Stay informed on feature releases, product roadmap, support, and cloud offerings!