This Week in Databend #137
PsiACEMar 25, 2024
Databend is a modern cloud data warehouse, serving your massive-scale analytics needs at low cost and complexity. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com .
What's New
Stay informed about the latest features of Databend.
Adding Support for Query Matching Inverted Index
Inverted indexes are the most commonly used data structures in document retrieval systems, storing the mapping of where a word is located in a document or a set of documents in full text search.
Databend now supports the use of
match
CREATE TABLE t (id int, content string)
INSERT INTO t VALUES
(1, 'The quick brown fox jumps over the lazy dog'),
(2, 'A picture is worth a thousand words'),
(3, 'The early bird catches the worm'),
(4, 'Actions speak louder than words'),
(5, 'Time flies like an arrow; fruit flies like a banana'),
(6, 'Beauty is in the eye of the beholder'),
(7, 'When life gives you lemons, make lemonade'),
(8, 'Put all your eggs in one basket'),
(9, 'You can not judge a book by its cover'),
(10, 'An apple a day keeps the doctor away')
CREATE INVERTED INDEX IF NOT EXISTS idx1 ON t(content)
REFRESH INVERTED INDEX idx1 ON t
SELECT id, score(), content FROM t WHERE match(content, 'word')
----
2 1.5948367 A picture is worth a thousand words
4 1.6550698 Actions speak louder than words
If you would like to learn more, please contact the Databend team or refer to the resources listed below:
Code Corner
Discover some fascinating code snippets or projects that showcase our work or learning journey.
Leveraging TASK & STREAM to Capture and Track User Activity in Real-Time
Streams in Databend are dynamic real-time representations of table changes. Creating a stream can capture and track modifications to related tables for ongoing analysis.
A task encapsulates specific SQL statements intended to be executed at predefined time intervals, upon specific event triggers, or as part of a broader task sequence.
When creating a task, you can follow the workflow below:
The following example demonstrates how to combine TASK and STREAM to capture and track user activity in real-time, periodically synchronizing the
user_activity_profiles
activities_stream
user_activity_profiles
-- Define a task in Databend
CREATE TASK user_activity_task
WAREHOUSE = 'default'
SCHEDULE = 1 MINUTE
-- Trigger task when new data arrives in activities_stream
WHEN stream_status('activities_stream') AS
-- Insert new records into user_activity_profiles
INSERT INTO user_activity_profiles
SELECT
-- Join activities_stream with user_profiles based on user_id
a.user_id, p.username, p.location, a.activity, a.timestamp
FROM
activities_stream AS a
JOIN user_profiles AS p
ON a.user_id = p.user_id
-- Include only rows where the action is 'INSERT'
WHERE a.change$action = 'INSERT';
If you're interested, please read the documentation below to learn how to complete this task using Databend Cloud.
- Docs | Automating Data Loading with Tasks
- Docs | Example: Tracking and Transforming Data in Real-Time
Highlights
We have also made these improvements to Databend that we hope you will find helpful:
- Added support for and
show views
.desc view
- Added ,
is_error
,is_not_error
functions.error_or
- Added pagination for .
task_history
- Added support for the PRQL query language.
What's Up Next
We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.
Supporting CHANGES
Clause
CHANGES
The
CHANGES
Combining multiple queries can be used to retrieve change tracking metadata between different transaction boundaries.
SELECT ...
FROM ...
CHANGES ( INFORMATION => { DEFAULT | APPEND_ONLY } )
AT ( { TIMESTAMP => <timestamp> | OFFSET => <time_difference> | STREAM => '<name>' } )
[ END( { TIMESTAMP => <timestamp> | OFFSET => <time_difference> } ) ]
[ ... ]
Issue #15028 | Feature: support CHANGES clause
Please let us know if you're interested in contributing to this feature, or pick up a good first issue at https://link.databend.com/i-m-feeling-lucky to get started.
New Contributors
We always open arms to everyone and can't wait to see how you'll help our community grow and thrive.
- @ncuwaln implemented support for PRQL, #14922.
- @blackstar-baba implemented support for &
show views
, #14926.desc view
Changelog
You can check the changelog of Databend Nightly for details about our latest developments.
Full Changelog: https://github.com/datafuselabs/databend/compare/v1.2.378-nightly...v1.2.386-nightly
Subscribe to our newsletter
Stay informed on feature releases, product roadmap, support, and cloud offerings!