This Week in Databend #107
PsiACEAug 20, 2023
Databend is a modern cloud data warehouse, serving your massive-scale analytics needs at low cost and complexity. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com .
What's On In Databend
Stay connected with the latest news about Databend.
Understanding Connection Parameters
The connection parameters refer to a set of essential connection details required for establishing a secure link to supported external storage services, like Amazon S3. These parameters are enclosed within parentheses and consists of key-value pairs separated by commas or spaces. It is commonly utilized in operations such as creating a stage, copying data into Databend, and querying staged files from external sources.
For example, the following statement creates an external stage on Amazon S3 with the connection parameters:
CREATE STAGE my_s3_stage
's3://load/files/'
CONNECTION = (
ACCESS_KEY_ID = '<your-access-key-id>',
SECRET_ACCESS_KEY = '<your-secret-access-key>'
);
If you are interested in learning more, please check out the resources listed below.
Adding Storage Parameters for Hive Catalog
Over the past week, Databend introduced storage parameters for the Hive Catalog, allowing the configuration of specific storage services. This means that the catalog no longer relies on the storage backend of the default catalog.
The following example shows how to create a Hive Catalog using MinIO as the underlying storage service:
CREATE CATALOG hive_ctl
TYPE = HIVE
CONNECTION =(
ADDRESS = '127.0.0.1:9083'
URL = 's3://warehouse/'
AWS_KEY_ID = 'admin'
AWS_SECRET_KEY = 'password'
ENDPOINT_URL = 'http://localhost:9000/'
)
If you are interested in learning more, please check out the resources listed below.
- Issue #12407 | Feature: Add storage support for Hive catalog
- PR #12469 | feat: Add storage params in hive catalog
Code Corner
Discover some fascinating code snippets or projects that showcase our work or learning journey.
Using gitoxide
to Speed Up Git Dependency Downloads
gitoxide
gitoxide
gitoxide
gitoxide
git2
Databend has recently enabled this feature for
cargo {build | clippy | test}
cargo -Zgitoxide=fetch,shallow-index,shallow-deps build
If you are interested in learning more, please check out the resources listed below:
Highlights
We have also made these improvements to Databend that we hope you will find helpful:
- clause can be used without being combined with
VALUES
.SELECT
- You can now set a default value when modifying the type of a column. See Docs | ALTER TABLE COLUMN for details.
- Databend can now automatically recluster a table after write operations such as and
COPY INTO
.REPLACE INTO
What's Up Next
We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.
Enhancing infer_schema
for All File Locations
infer_schema
Currently, it is possible to query files using file locations or from stages in Databend.
select * from 'fs:///home/...';
select * from 's3://bucket/...';
select * from @stage;
However, the
infer_schema
select * from infer_schema(location=>'@stage/...');
When attempting to use
infer_schema
select * from infer_schema(location =>'fs:///home/...'); -- this will panic.
So, the improvement involves extending the
infer_schema
infer_schema
Issue #12458 | Feature: infer_schema
Please let us know if you're interested in contributing to this feature, or pick up a good first issue at https://link.databend.com/i-m-feeling-lucky to get started.
Changelog
You can check the changelog of Databend Nightly for details about our latest developments.
Full Changelog: https://github.com/datafuselabs/databend/compare/v1.2.62-nightly...v1.2.74-nightly
Subscribe to our newsletter
Stay informed on feature releases, product roadmap, support, and cloud offerings!