Roadmap
As of 2025Q1, the content and order of goals are very much in flux.
Your input is very much welcome, please engage us in a support contract or on the community forums.
About this page
Tags used in the roadmap
- #external: Likely source of the bug is not us, but in a dependency
- #maybe: We might not do this
- #urgency/high
- #urgency/medium
- #urgency/low
- #severity/high
- #severity/medium
- #severity/low
- #size/small
- #size/medium
- #size/big
- #api-change: Changes already-published API
- #bug: Likely bug in the software (not documentation)
- #enhancement: New feature or improving behavior some other way
- #performance: Improves performance, minimal effect on other behaviors
- #ops: Affects operational aspects, not functionality; for example, observability
- #dev: Only affects development and building from source, not the final product
- #test: Topic where test coverage needs to be better
- #doc: Improvements or additions to documentation
- #good-first-issue: Good for newcomers
- #need-more-info: We don't have know enough yet to work on this
- #help-wanted: Extra attention would be appreciated
- #ecosystem/*: We are hoping our dependencies will improve
- #waiting: Cannot proceed at this time
The checklist items are in very rough priority order. Order will be adjusted based on feedback and discovery of critical paths.
The big picture
- v0: Bootstrap to a real SQL database
- v1: Production-ready database using RocksDB
- v2: Pure Rust with new storage subsystem
Platforms
Support for non-Linux operating systems is not currently on the roadmap. We intend to actively use Linux-specific functionality for performance, security and simplicity.
For now, only x86-64 is actively tested, but everything is supposed to work cross-architecture (higher risk: 32-bit processing >4 GiB of data at once, 32-bit in general, big-endian).
Pending work
Work that is more or less "about to be done".
-
#dev make sequence inc take type, promise never to return
u64::MAX
, add some intermediate type or trait to make that work, avoid lots of error checks -
#dev re-evaluate
rocky
vsrust-rocksdb
high-level API -
#dev stringy internal errors like
internal error#
, inkey-format-v1
andrecord-format-v1
; do we need to return even more errors -
#dev extract higher-level backend provider, split non-RocksDB specifics from
kanto_backend_rocksdb
-
#dev eliminate remaining
expect
(Panic policy) -
#dev eliminate or excuse clippy exceptions either missing reason or where reason includes
TODO
-
clippy::indexing_slicing
-
-
#dev
BlockingSentinel
or such to make sure we're doing blocking right -
#bug random rare
pthread lock: invalid argument
(likely RocksDB shutdown, which is supposed to be safe and cancelling background tasks is said to be speed up only) - table scan: start/stop from DataFusion predicate pushdown #performance #severity/high #ref/1twdr3jqbpzen
-
table scan: inform DataFusion of the sort order we guarantee for primary keys #performance
- populate
output_ordering
: ExecutionPlan::properties
ExecutionPlanProperties::output_ordering
- populate
-
implement
TableProvider::constraints
https://docs.rs/datafusion/latest/datafusion/catalog/trait.TableProvider.html#method.constraints #performance- background: https://docs.rs/datafusion/latest/datafusion/common/struct.FunctionalDependence.html
- note non-public helper function for building constraints https://github.com/apache/datafusion/issues/15443 #ecosystem/datafusion
- ugly: to fill
Unique
, we need to know all indexes for the table; maybe mergeIndexDef
intoTableDef
(and then maybeIndexId
gets scoped within one table, change RocksDB keys)
-
#performance use indexes for lookups, not just correctness
- see https://github.com/apache/datafusion/discussions/9963
- variants: table scan (current), table get, index scan, index get, index and table
- is that maybe
TableProvider::get_logical_plan
? - probably need a custom query planner step
- index scan: inform DataFusion of the sort order we guarantee for keys of this index #performance
- index scan: start/stop from DataFusion predicate pushdown #performance #severity/high #ref/1twdr3jqbpzen
-
maybe use RocksDB
multi_get
for table lookups from index scan: https://docs.rs/rocksdb/latest/rocksdb/struct.DBCommon.html#method.batched_multi_get_cf_opt- also
multi_get_for_update
- also
-
#test fix everything from
cargo-mutants
- #test validate index state better, integrity test to run between mutations
-
#test transaction semantics better, including isolation and concurrent access,
SELECT FOR UPDATE
#test- use https://github.com/ept/hermitage
- use http://www.redbook.io/ch6-isolation.html
- see https://jepsen.io/consistency
- see https://www.postgresql.org/docs/current/sql-select.html#SQL-FOR-UPDATE-SHARE
- see
LOCK IN SHARE MODE
https://stackoverflow.com/questions/32827650/mysql-innodb-difference-between-for-update-and-lock-in-share-mode -
ensure iterator snapshot consistency
- add mutation test tap point where disabling this causes test failure
-
hide
arbitrary
behind a feature -
re-evaluate unsafe (non-validating) use of strings
- consider https://crates.io/crates/bstr -- but can't make Datafusion use that
-
study rocksdb size estimation
- https://github.com/facebook/rocksdb/wiki/Approximate-Size
- https://docs.rs/librocksdb-sys/0.17.1+9.9.3/librocksdb_sys/fn.rocksdb_approximate_sizes_cf.html
- https://docs.rs/librocksdb-sys/0.17.1+9.9.3/librocksdb_sys/fn.rocksdb_approximate_sizes_cf_with_flags.html
- https://docs.rs/librocksdb-sys/0.17.1+9.9.3/librocksdb_sys/fn.rocksdb_column_family_metadata_get_size.html
-
kantodb rocksdb-create
twice with same path gives confusing error message -
ensure
fsync
on commit, audit our use of RocksDB, validate with tests- prefer to delay until we can demonstrate it working, for trust
- leave a mutation test tap point where disabling fsync on commit causes test failure
- is this just
WriteOptions
sync
, or more involved?- maybe that can optimized based on things being no-sync within transaction, only transaction commit is sync
- https://github.com/facebook/rocksdb/wiki/Basic-Operations#synchronous-writes
Unimplemented or buggy SQL operations
-
DataFusion explicitly refuses to work with many SQL types even when it has the corresponding underlying type; see https://docs.rs/datafusion/latest/datafusion/sql/planner/struct.SqlToRel.html#method.build_schema and
convert_data_type
, e.g.UInt64
vsUNSIGNED BIGINT
-
large number literals behave differently than in Postgres;
SELECT i64_max_plus_1
gives float, Postgres looks to be using big numbers -
CREATE UNIQUE INDEX .. [NULLS DISTINCT] with null columns
(#ref/unimplemented_sql/w9j8q485anior)- use a multi-index with nullable columns in value not key, prefix scan on insert to detect conflict
-
UPDATE of column in primary key
(#ref/unimplemented_sql/fpb157cronqcy) -
tables without primary keys are not usable
- Datafusion
UPDATE
semantics are tricky, we need to dissect theinput
LogicalPlan
and convert to table scan of the current data (instead of blindly executing it), update records as we see them in the scan -
#bug
DELETE
(#ref/gs5muy13tu1uk) orUPDATE
(#ref/i61jpodrazscs) for a table usingrow_id
(not primary keys) -
#bug
DELETE
orUPDATE
for a table usingrow_id
(not primary keys) #ref/gs5muy13tu1uk #ref/i61jpodrazscs - possible fix: adjust
logical_plan
to also get a magic-named column__kanto_row_key
, use that to do deletes - possible fix: understand the
logical_plan
, don't use DataFusion to produce record batches, run a custom table/filter scan that also deletes on the fly
- Datafusion
- return number of rows affected
-
index existing data at
CREATE INDEX
time #severity/high #urgency/high- ideally go straight to concurrent index building logic, no table locks
-
INSERT RETURNING
(#ref/hixyde6h9n77o),UPDATE RETURNING
(#ref/pp5wz7nq4yhzs),DELETE RETURNING
(#ref/yn84qzsk1f9ny) refused by DataFusion SQL parser, not sure how to work around #ecosystem/datafusion #severity/high #urgency/medium -
CTEs (test coverage):
SELECT
,UPDATE
,INSERT
,DELETE
inside a CTE,SELECT
,UPDATE
,INSERT
,DELETE
following a CTE - rest of the data types
-
should we enforce
VARCHAR(30)
etc length, truncate; forbidding that syntax until clarified -
INTEGER(10)
etc precision-
type { INTEGER(precision) | INT(precision) | INT4(precision) }
(#ref/unimplemented_sql/tiptbb5d8buxa) -
type { UNSIGNED INTEGER(precision) | UNSIGNED INT(precision) | UNSIGNED INT4(precision) }
(#ref/unimplemented_sql/fzt1q8mfgkr6n) -
type { (UNSIGNED) INT2(precision) | (UNSIGNED) SMALLINT(precision) }
(#ref/unimplemented_sql/g57gd8499t4hg) -
type { BIGINT(precision) | INT8(precision) }
(#ref/unimplemented_sql/88fzntq3mz14g) -
type { UNSIGNED BIGINT(precision) | UNSIGNED INT8(precision) }
(#ref/unimplemented_sql/sze33j7qzt91s) -
type TINYINT(precision)
(#ref/unimplemented_sql/hueii8mjgk8qq) -
type UNSIGNED TINYINT(precision)
(#ref/unimplemented_sql/ewt6tneh7ecps)
-
-
type { NUMERIC | DECIMAL | DEC }
(#ref/unimplemented_sql/uoakmzwax448h) -
FLOAT(42)
etc precision-
type FLOAT(precision)
(#ref/unimplemented_sql/xfwbr486jaw5q) -
type DOUBLE(precision)
(#ref/unimplemented_sql/wo43sei9ubpon)
-
-
NUMERIC(precision, scale)
,NUMERIC(precision)
,NUMERIC
; akaDECIMAL(precision, scale)
-
type { CHARACTER | CHAR | FIXEDSTRING }
(#ref/unimplemented_sql/hdhf8ygp9zna4): fixed-length strings -
type INT128 (ClickHouse syntax)
(#ref/unimplemented_sql/4sg3mftgxu9j4) -
type INT256 (ClickHouse syntax)
(#ref/unimplemented_sql/ex9mnxxipbqqs) -
type UINT128 (ClickHouse syntax)
(#ref/unimplemented_sql/riob7j6jahpte) -
type UINT256 (ClickHouse syntax)
(#ref/unimplemented_sql/q6tzjzd7xf3ga) -
type { MEDIUMINT (UNSIGNED) }
(#ref/unimplemented_sql/m7snzsrqj68sw) -
type { BIGNUMERIC | BIGDECIMAL } (BigQuery syntax)
(#ref/unimplemented_sql/an57g4d4uzwew) -
type BIT (VARYING)
(#ref/unimplemented_sql/88uneott9owc1) -
type UUID
(#ref/unimplemented_sql/531ax7gb73ce1) -
type TIMESTAMP
(#ref/unimplemented_sql/njad51sbxzwnn) -
type TIME
(#ref/unimplemented_sql/bppzhg7ck7xhr) -
type DATE
(#ref/unimplemented_sql/1r5ez1z8j7ryo) -
type DATE32
(#ref/unimplemented_sql/781y5fxih7pnn) -
type DATETIME (ClickHouse syntax)
(#ref/unimplemented_sql/oak3s5g3rnutq) -
type DATETIME (MySQL syntax)
(#ref/unimplemented_sql/dojz85ngwo5wr) -
type INTERVAL
(#ref/unimplemented_sql/5wxynwyfua69s) -
type { JSON | JSONB }
(#ref/unimplemented_sql/abcw878jr35qc)- JSON manipulation functions #ecosystem/datafusion (best done in DataFusion https://github.com/apache/datafusion/issues/7845)
- inspiration: https://clickhouse.com/blog/a-new-powerful-json-data-type-for-clickhouse
-
type BINARY
(#ref/unimplemented_sql/i16wqwmie17eg) -
type ENUM
(#ref/unimplemented_sql/1geqh5qxdeoko) -
type ARRAY
(#ref/unimplemented_sql/hnmqj3qbtutn4) -
type SET
(#ref/unimplemented_sql/hm53sq3gco3sw) -
type STRUCT
(#ref/unimplemented_sql/nsapujxjjk9hg) -
type MAP
(#ref/unimplemented_sql/7ts8zgnafnp7g) -
type UNION
(#ref/unimplemented_sql/oownfhnfj5b9r) -
type TUPLE
(#ref/unimplemented_sql/x7qwkwdhdznxe) -
type NESTED
(#ref/unimplemented_sql/fa6nofmo1s49g) -
type { CHARACTER LARGE OBJECT | CHAR LARGE OBJECT | CLOB }
(#ref/unimplemented_sql/ew7uhufhkzj9w) -
type TRIGGER
(#ref/unimplemented_sql/zf5jx1ykoc5s1) -
custom type
(#ref/unimplemented_sql/dt315zutjqpzc) -
type UNSPECIFIED
(#ref/unimplemented_sql/916n48obgiqh6) -
type LOWCARDINALITY
(#ref/unimplemented_sql/78exey8xjz3yk) -
type REGCLASS (Postgres syntax)
(#ref/unimplemented_sql/quzjzcsusttgc) -
type NULLABLE (ClickHouse syntax)
(#ref/unimplemented_sql/ra5kspnstauu6)
-
should we enforce
-
emulate more Postgres system tables, like
pg_cancel_backend
Unimplemented SQL syntax support
See also DataFusion SQL support status: https://arrow.apache.org/datafusion/user-guide/sql/sql_status.html, roadmap https://arrow.apache.org/datafusion/contributor-guide/roadmap.html.
For reference, see Postgres SQL, SQLite SQL, and MariaDB SQL.
-
SET LOCAL
(#ref/unimplemented_sql/7b1ab6tms3uz4) -
SHOW variable
(#ref/unimplemented_sql/7a69iu5nr1oxk) -
CREATE TABLE IF NOT EXISTS
(#ref/unimplemented_sql/pf1mfhd9sz3jn) -
DROP TABLE
(#ref/unimplemented_sql/47hnz51gohsx6) -
ALTER TABLE
(#ref/unimplemented_sql/dny7j9hx3ihha) -
CREATE INDEX IF NOT EXISTS
(#ref/unimplemented_sql/tzwc96kk9173n) -
DROP INDEX
(#ref/unimplemented_sql/4kgkg4jhqhfrw) -
ALTER INDEX
(#ref/unimplemented_sql/ha1fybdqqwa8k) -
SET TRANSACTION
(#ref/unimplemented_sql/yybqytw7q8bmn)- may cause a bigger refactoring of our transaction start; we currently do eager creation
-
CREATE TABLE without columns
(#ref/unimplemented_sql/g7hdmcm5bk69a) -
constraints on tables #severity/high #urgency/high
-
CREATE TABLE with table constraint CHECK
(#ref/unimplemented_sql/8zaqnh9mb9tor)
-
-
constraints on columns
-
CREATE TABLE with column option { PRIMARY KEY | UNIQUE }
(#ref/unimplemented_sql/hwcjakao83bue) #severity/high #urgency/medium -
CREATE TABLE with column option CHECK
(#ref/unimplemented_sql/dwcpda8aj66d6) #severity/high #urgency/medium
-
-
default and computed values for columns
-
CREATE TABLE with column option DEFAULT
(#ref/unimplemented_sql/944dzboki4djc) #severity/high #urgency/medium -
CREATE TABLE with column option GENERATED
(#ref/unimplemented_sql/bxzpaj1quo5a4) #severity/high #urgency/medium
-
-
CREATE INDEX with dotted column name
(#ref/unimplemented_sql/kbezqzh7p6yky) -
CREATE INDEX with non-column expression
(#ref/unimplemented_sql/8cjqfzsfpnduw) -
CREATE INDEX WHERE
(#ref/unimplemented_sql/9134bk3fe98x6): partial indexes -
CREATE INDEX .. WITH
(#ref/unimplemented_sql/w7t6c8xrehsnh) -
CREATE INDEX .. INCLUDE
(#ref/unimplemented_sql/ip415b5s8sa6h): non-key included columns -
foreign keys #severity/high #urgency/high
-
CREATE TABLE with table constraint FOREIGN KEY
(#ref/unimplemented_sql/mamfde4rdzgeo) -
CREATE TABLE with column option REFERENCES
(#ref/unimplemented_sql/ufnoiinkkp3wy) - enforce
-
-
roles and access control #severity/high #urgency/medium
-
CREATE ROLE
(#ref/unimplemented_sql/hqyj3srdk1g4h) -
DROP ROLE
(#ref/unimplemented_sql/q8wh8nqixoytq) -
ALTER ROLE
(#ref/unimplemented_sql/u6set5k1gzyhh) -
GRANT
(#ref/unimplemented_sql/mekqqm6hxy64s) -
REVOKE
(#ref/unimplemented_sql/wha7884jy4bty) -
SET ROLE
(#ref/unimplemented_sql/ryqxdr55r49sr)
-
-
prepared statements
-
PREPARE
(#ref/unimplemented_sql/fgi986yc4d7cw) -
EXECUTE
(#ref/unimplemented_sql/3w91mfrbzguyh) -
DEALLOCATE
(#ref/unimplemented_sql/fn1q45ys1t8ks)
-
-
views, first dynamic, then materialized, then incremental updates etc extras #severity/high
-
CREATE VIEW
(#ref/unimplemented_sql/b9m5uhu9pnnsw) -
ALTER VIEW
(#ref/unimplemented_sql/uhwpi8nzojcxy) -
DROP VIEW
(#ref/unimplemented_sql/hwk7gp6ffh8zy)
-
-
TRUNCATE
(#ref/unimplemented_sql/uf3rr4diw3o1o) -
temporary tables
-
CREATE TEMPORARY TABLE
(#ref/unimplemented_sql/3u561ykck5m76)
-
-
sequences (exposed to SQL)
-
CREATE SEQUENCE
(#ref/unimplemented_sql/rdkjx9ryredcw) -
DROP SEQUENCE
(#ref/unimplemented_sql/ca149fq1ixm5h)
-
-
START TRANSACTION READ ONLY
(#ref/unimplemented_sql/bhetj3emnc8n1) -
multiple schemas
- make schema creation explicit not implicit
-
CREATE SCHEMA
(#ref/unimplemented_sql/x9cna73uf8cew) -
DROP SCHEMA
(#ref/unimplemented_sql/5rnpn33hia5a6)
-
multiple databases/catalogs/backends; all three might be the same thing for us
- re-evaluate the whole "catalog name is backend" idea, especially when a backend can store multiple catalogs, duplication can lead to mismatch, renames are weird; maybe it's a lookup order?
- is a backend a
TABLESPACE
?sqlparser-rs
doesn't haveTABLESPACE
support as of 2025-03
- is a backend a
- re-evaluate the idea of invisibly spreading every transaction across all backends; maybe force each transaction to stick to one, maybe make it lazy
- Postgres docs specifically say "Databases are equivalent to catalogs"
-
CREATE DATABASE
(#ref/unimplemented_sql/6c76p5eqsuea6) -
ALTER DATABASE
: not insqlparser
v0.54.0 #ecosystem/sqlparser-rs -
DROP DATABASE
(#ref/unimplemented_sql/qn6s3c397johw)
- re-evaluate the whole "catalog name is backend" idea, especially when a backend can store multiple catalogs, duplication can lead to mismatch, renames are weird; maybe it's a lookup order?
-
EXPLAIN
(#ref/unimplemented_sql/fugjaua7zkfd4) -
CREATE INDEX DESC
(#ref/unimplemented_sql/n7x3zhjk7bf8k): descending sort order -
CREATE INDEX CONCURRENTLY
(#ref/unimplemented_sql/wya1a3w874xba) -
COMMIT AND CHAIN
(#ref/unimplemented_sql/ic5jpcqsjghm6) -
ROLLBACK AND CHAIN
(#ref/unimplemented_sql/q1pay866nux9n) -
COPY
(#ref/unimplemented_sql/1fxo667s3wwes): the kind that transfers data over wire; missing support in https://github.com/sunng87/pgwire #waiting- we do not intend to support anything where server-side file read/write gets path from user over network
-
RELEASE SAVEPOINT
(#ref/unimplemented_sql/hyzhkewjpnug6): #severity/low #urgency/low #size/small- test error from trying to use it again
-
{ LISTEN | UNLISTEN | NOTIFY }
(#ref/unimplemented_sql/8t9hqpx4umzaq) -
user-defined functions #severity/medium #urgency/low
-
CREATE FUNCTION
(#ref/unimplemented_sql/fj9ny1gu79cbg) -
DROP FUNCTION
(#ref/unimplemented_sql/ptp9d3xe1456h) -
SHOW FUNCTIONS
(#ref/unimplemented_sql/yqiswgsqjc5uh): #severity/low
-
-
user-defined types #severity/medium #urgency/low
-
CREATE TYPE
(#ref/unimplemented_sql/ur33y834br1wa): user defined types (see [Postgres](https://www.postgresql.org/docs/16/sql-createtype.html, DataFusion issue) -
DROP TYPE
(#ref/unimplemented_sql/t1fti3yz8jsdn)
-
-
collation, as in locale-specific ordering:
-
CREATE TABLE .. COLLATE
(#ref/unimplemented_sql/qd36gh6hj78br): configurable collation order for columns: #severity/medium #urgency/low - see https://databasearchitects.blogspot.com/2016/08/equivalence-of-unicode-strings-is.html
- see https://en.wikipedia.org/wiki/Common_Locale_Data_Repository
-
-
procedures #severity/medium #urgency/low
-
CREATE PROCEDURE
(#ref/unimplemented_sql/69jzm59u8c6jc) -
CALL
(#ref/unimplemented_sql/dx5ic5bkaaq7c): see Postgres -
DROP PROCEDURE
(#ref/unimplemented_sql/w173ctuodtcaa)
-
-
triggers #severity/medium #urgency/low
-
CREATE TRIGGER
(#ref/unimplemented_sql/cxts4du59pwqn) -
DROP TRIGGER
(#ref/unimplemented_sql/i1omqjczqksqk)
-
-
KILL [CONNECTION]
(#ref/unimplemented_sql/jqji1j6my4f5w): #severity/medium #urgency/low -
KILL QUERY
(#ref/unimplemented_sql/77ecapaed3bso): #severity/medium #urgency/low -
MERGE
(#ref/unimplemented_sql/y5gzyy5kxg5ws): #severity/low #urgency/low -
CREATE TABLE .. COMMENT
(#ref/unimplemented_sql/dgyhzknpfx756) -
CREATE TABLE with column option COMMENT
(#ref/unimplemented_sql/4itjrujemcztq) #severity/medium #urgency/low - miscellaneous column options
-
COMMENT ON
(#ref/unimplemented_sql/e8uhkzz1krcgw): #severity/low #urgency/low -
populate table with data at creation time #severity/low #urgency/low
-
CREATE TABLE AS SELECT
(#ref/unimplemented_sql/cirnphk69igxy): (Ctas
in https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.WriteOp.html) -
CREATE TABLE AS SELECT
(#ref/unimplemented_sql/nuqha5e3r7g6h)
-
-
READ COMMITTED
: currently upgrading toREAD REPEATABLE
, could support directly (#ref/bm4q45m5ajz9s) -
START TRANSACTION ISOLATION LEVEL SERIALIZABLE
(#ref/unimplemented_sql/dy1ch15p7n3ze): maybe just a better error message: #severity/low #urgency/low- i think supporting this requires two things:
TransactionOptions::set_snapshot(true)
and everyget
is upgraded toget_for_update
. - then marking transactions as
READ ONLY
can relax that for performance
- i think supporting this requires two things:
-
cursors (see Postgres; not in standard interactive SQL) #severity/low #urgency/low
-
DECLARE
(#ref/unimplemented_sql/159jmakdyuewy) -
FETCH
(#ref/unimplemented_sql/9nf8hju6ie9e1) -
CLOSE
(#ref/unimplemented_sql/gc1557zek4hne)
-
-
row-level security (see Postgres)
-
CREATE POLICY
(#ref/unimplemented_sql/gce4sb6hptjfy) -
DROP POLICY
(#ref/unimplemented_sql/n9zgxwn6736wr) -
ALTER POLICY
(#ref/unimplemented_sql/wj1tnq4hkth7s)
-
-
DISCARD
(#ref/unimplemented_sql/9a5zkcwgbhewh): (see Postgres): #severity/low #urgency/low -
REPLACE INTO (MySQL syntax) or INSERT OR REPLACE (SQLite syntax)
(#ref/unimplemented_sql/54xqh7ornzpso): #severity/low #urgency/low -
CREATE OR REPLACE TABLE
(#ref/unimplemented_sql/65so99qr1hyck): #severity/low #urgency/low -
ASSERT (PowerSQL syntax)
(#ref/unimplemented_sql/qq1s1fq45ztdw): #severity/low #urgency/low -
CREATE TABLE .. LIKE
(#ref/unimplemented_sql/xutp6m7rjnc54) -
CREATE TABLE .. CLONE
(#ref/unimplemented_sql/7b5kah43yqquc): optionally also copies data; likely RocksDB backend will not support that option -
CREATE TABLE .. WITH
(#ref/unimplemented_sql/3kozdfoeczmrk): see Postgres, likely parse superficially and error if given #severity/low #urgency/low -
RENAME TABLE (MySQL syntax)
(#ref/unimplemented_sql/85ax9gtrkcneg) -
CREATE TABLE with column option CHARACTER SET
(#ref/unimplemented_sql/ifi7dygtua3oc): parse superficially and refuse non-utf8 #severity/low #urgency/low -
CREATE TABLE .. DEFAULT CHARSET (MySQL syntax)
(#ref/unimplemented_sql/dcan9mz8g4eja): parse superficially and refuse non-utf8 #urgency/low #severity/low #size/small - relaxing constraints to end of transaction:
-
CREATE TABLE with table constraint PRIMARY KEY .. DEFERRABLE
(#ref/unimplemented_sql/uzf4oabd178rr) -
CREATE TABLE with table constraint PRIMARY KEY .. INITIALLY DEFERRED
(#ref/unimplemented_sql/ysnkhzw6ukbms)
-
-
BEGIN {DEFERRED | IMMEDIATE | EXCLUSIVE} (SQLite syntax)
(#ref/unimplemented_sql/o969ikdsjz76r): #severity/low #urgency/low -
CREATE TABLE with column option ON CONFLICT (SQLite syntax)
(#ref/unimplemented_sql/ehefbqgnnfqwe) #severity/medium #urgency/low -
PRAGMA (SQLite syntax)
(#ref/unimplemented_sql/a1k38hk48fpzr): #severity/low #urgency/low -
SHOW CREATE (MySQL syntax)
(#ref/unimplemented_sql/c7cewp15zm5ys) -
SHOW STATUS (MySQL syntax)
(#ref/unimplemented_sql/ejj6ir6dsxhrc) -
SHOW DATABASES (MySQL syntax)
(#ref/unimplemented_sql/kskgotui7rupc) -
SHOW COLUMNS (MySQL syntax)
(#ref/unimplemented_sql/zx98bjxguht5o) -
SHOW VARIABLES (MySQL syntax)
(#ref/unimplemented_sql/n5cs7ue5gzpde) -
SHOW SCHEMAS (Snowflake syntax)
(#ref/unimplemented_sql/poi5oxunjyq5a) -
SHOW VIEWS (Snowflake syntax)
(#ref/unimplemented_sql/zzi1wb3jq5jeh) -
{ EXPLAIN | DESCRIBE } TABLE (MySQL syntax)
(#ref/unimplemented_sql/4tpmma7ewdf3a): #severity/low #urgency/low -
CREATE TABLE .. AUTO_INCREMENT (MySQL syntax)
(#ref/unimplemented_sql/18kxgbi8rzb5r): #severity/low #urgency/low -
USE
(#ref/unimplemented_sql/ng34e6kt6rrt4): #severity/low #urgency/low -
CREATE TABLE with table constraint PRIMARY KEY .. COMMENT (MySQL syntax)
(#ref/unimplemented_sql/ss9iqz1c8ffrh) -
MySQL-style names on constraints or indexes:
-
CREATE TABLE with table constraint PRIMARY KEY given a name (MySQL syntax)
(#ref/unimplemented_sql/5rr7ouyasn1dk) -
CREATE TABLE with table constraint PRIMARY KEY given an index name (MySQL syntax)
(#ref/unimplemented_sql/mexuaio6ngad4)
-
-
CREATE TABLE with table constraint UNIQUE {INDEX | KEY} (MySQL syntax)
(#ref/unimplemented_sql/m3op6enhb5pgw) -
CREATE TABLE with table constraint PRIMARY KEY .. NOT ENFORCED (MySQL syntax)
(#ref/unimplemented_sql/u339u8jzzkqdy) -
CREATE TABLE with column option ON UPDATE timestamp function (MySQL syntax)
(#ref/unimplemented_sql/zaub41ukgqfmq) #severity/low #urgency/low -
CREATE TABLE with column option MATERIALIZED (ClickHouse syntax)
(#ref/unimplemented_sql/ouq6dfrgbxn9w) #severity/low #urgency/low -
CREATE TABLE with column option EPHEMERAL (ClickHouse syntax)
(#ref/unimplemented_sql/78qipewmqn31h) #severity/low #urgency/low -
CREATE TABLE with column option ALIAS (ClickHouse syntax)
(#ref/unimplemented_sql/1adkef6yd1qgc) #severity/low #urgency/low -
CREATE TABLE with column option OPTIONS (BigQuery syntax)
(#ref/unimplemented_sql/7drpfhozbre1k) #severity/low #urgency/low -
CREATE TABLE with column option TAG (Snowflake syntax)
(#ref/unimplemented_sql/sjpe4f636kn8n) #severity/low #urgency/low -
CREATE TABLE with column option { MASKING POLICY | PROJECTION POLICY } (Snowflake syntax)
(#ref/unimplemented_sql/ynucdy5u7a35w) #severity/low #urgency/low -
CREATE TABLE with weird column option
(#ref/unimplemented_sql/4jqhus4cj4fho) #severity/low #urgency/low
There are plenty of corners of the SQL syntax we have chosen to not support.
These are marked in the source code as NotSupportedSql
.
We may have judged wrong; let us know if you need some non-supported syntax.
Later
Features and improvements we would like to work on, but that aren't started yet.
- #admin set up security contacts
- reconsider each item in https://datafusion.apache.org/user-guide/configs.html
-
maybe interpret
INT8
andUINT8
bits vs bytes based on SQL dialect; Clickhouse is the odd one out - safer arithmetic than silent wraparound in SQL
- we could pack booleans tighter in a record, prefer finding an unused bit in last byte/u32 allocated to booleans
-
existence of
test_session
method in a couple of places probably means the non-test API is not convenient enough -
move away from
arrow-row
for keys, it's not guaranteed stable; we're mostly using it to avoid writing our own SIMD code for now -
#maybe #dev split types so different byte slices aren't confused with each other: row key, index key, rocksdb key, record, arrow data, etc
- would require wrapping the rocksdb api to enforce there?
-
#maybe #performance special case point query so it doesn't go through a table scan and RocksDB iterator
rust-rocksdb
makes copies of keys to do FFI, bad design- perhaps there is even a way to avoid columnar result overhead?
-
make
information_schema
from DataFusion show our types, notUtf8View
etc- potential: DataFusion supporting logical types: https://github.com/apache/datafusion/issues/12622 #waiting
- alternatively, re-implement ourselves
- issue on what is missing from
information_schema
https://github.com/apache/datafusion/issues/8916
-
#performance #maybe partition workloads (inputs for insert, key ranges for table scan) and run multiple RocksDB blocking worker threads
datafusion::physical_plan::execute_stream_partitioned
instead ofexecute_stream
- #performance consider switching to minitrace: https://crates.io/crates/minitrace, or NIH it https://news.ycombinator.com/item?id=41570163
-
#performance support
pgwire
extended queries, newer clients might not implement simple queries - #ops make server use only UTC time as native, never local time
- #ops landlock, seccomp etc sandboxing
- #cleanup #dev maybe we want stronger types to never confuse key-value keys and values?
-
#dev #doc
KantoError
convention, error codes -
#dev lint: every
UnimplementedSql
andNotSupportedSql
must have a SLT test -
#dev lint: handle
Unimplemented
likeUnimplementedSql
, figure out a way to avoid duplication -
#dev lint: all
expect
must be internal errors and unique (in Panic policy) -
#dev lint: all
#waiting
have URL to upstream issue -
#dev lint:
thiserror
source field: named source or error, be consistent - #ops more readable logging (maybe by postprocessing the JSON)
-
#ops collect metrics in-memory and expose them via a Prometheus endpoint
- maybe also expose Tokio metrics: https://docs.rs/tokio-metrics/latest/tokio_metrics/
- TLS support in Postgres-compatible server
- shutdown on I/O error (in Panic policy)
- reevaluate panic on OOM
- create a REPL for ad hoc interaction (useful for library, not as useful for server) (limitation: exclusive access only): https://docs.rs/clap/latest/clap/_derive/_cookbook/repl/index.html
- #enhancement Streaming remote backup
- #dev set up branch coverage (needs nightly rust: https://github.com/taiki-e/cargo-llvm-cov/issues/8)
-
#dev continuous-integration:
./task/nextest-run-valgrind
-
#dev continuous-integration:
cargo miri nextest run
-
#dev continuous-integration: (with unstable)
cargo fmt --check -- --unstable-features --config-path=./rustfmt-unstable.toml
-
#dev continuous-integration:
cargo run -p lint-unique-error-codes
-
#dev continuous-integration:
cargo run -p lint-unimplemented-sql
-
#dev continuous-integration:
cargo shear
-
#dev continuous-integration:
cargo spellcheck
-
#dev continuous-integration:
typos
-
#test tests for
CREATE TABLE
,CREATE INDEX
etc that dump the rkyvTableDef
/IndexDef
as JSON -
#test regularly test on nightly
- with
-Zrandomize-layout
(seeflake.nix
) - re-evaluate
rustc
flags for enhanced testing every now and then
- with
-
#dev lint: no accidental public API changes, and eyeball the public API for sanity
-
cargo-public-api
- broken with Rust >=1.84: https://github.com/cargo-public-api/cargo-public-api/issues/750 #ecosystem #waiting
- the diff logic is annoying, I really don't want networking or touching my git checkout; use as library only: https://docs.rs/public-api/latest/public_api/
-
cargo-semver-checks
-
- #dev lint that we re-export every crate exposed by public APIs
-
#dev speed up builds
- study https://matklad.github.io/2021/09/04/fast-rust-builds.html
- study https://fasterthanli.me/articles/why-is-my-rust-build-so-slow
- study https://doc.rust-lang.org/cargo/reference/timings.html
- study https://crates.io/crates/cargo-llvm-lines
- study https://www.reddit.com/r/rust/comments/1bhpfeb/vastly_improved_recompile_times_in_rust_with/
- study https://corrode.dev/blog/tips-for-faster-rust-compile-times/
- study https://nnethercote.github.io/perf-book/build-configuration.html#minimizing-compile-times
-
#maybe speed up linking with
lld
ormold
- #dev set up MSRV policy
-
#doc #website support
/docs/<version>
, navigation between those and/docs/latest
- bindings to C
- bindings to Typescript/Javascript: Deno, NodeJS
- bindings to Python
- does compiling to WASM make sense, given the filesystem use; is a good enough API emerging
Maybe
Possible features we have not at this time committed to implementing, or that are so far out that we just don't have a clear picture of how to implement them.
-
#performance
kanto-backend-rocksdb
: use merges where possible (e.g.x=x+1
)- what are the wanted update messages? set, delete, delta_wraparound, delta_saturated? https://en.wikipedia.org/wiki/Fractal_tree_index#Upserts
-
kanto-backend-rocksdb
: use compaction filtering for vacuum - encryption at rest
- standalone integrity checker/fixer ("fsck")?
- UNIX domain socket support in Postgres-compatible server
- RocksDB bind to C++ directly
-
#performance #maybe tracing: filter out some low-level tracing messages when in release mode
- https://docs.rs/tracing/latest/tracing/level_filters/index.html#compile-time-filters
- not sure i actually want to remove
TRACE
level; it's a bit annoying thatDEBUG
is more important thanTRACE
.
- #doc #maybe #urgency/low include section in docs with all used licenses
- #admin study tokio-console, would it benefit us
-
re-evaluate legacy compatibility with SQLite
- we have marked some vendor-specific SQL as
NotSupportedSql
that could be implemented; they were just not relevant for Postgres compatibility
- we have marked some vendor-specific SQL as
-
re-evaluate legacy compatibility with MySQL
- we have marked some vendor-specific SQL as
NotSupportedSql
that could be implemented; they were just not relevant for Postgres compatibility
- we have marked some vendor-specific SQL as
Small cleanups
-
exact_zip
helper: we useIterator::zip
in many places where having non-equal lengths would be a bug, systematize
Ecosystem wishlist
DataFusion
Specific bugs and missing features:
-
DataFusion
CASE WHEN
is wrong: https://github.com/apache/arrow-datafusion/issues/8475 -
DataFusion
DataFrame::cache
breaks if someCASE
branches areNULL
: https://github.com/apache/arrow-datafusion/issues/8476 -
DataFusion
CASE
mishandle: https://github.com/apache/arrow-datafusion/issues/8909 -
DataFusion
COALESCE
mishandle: https://github.com/apache/arrow-datafusion/issues/8910 #ecosystem/datafusion #waiting #severity/medium #urgency/low -
DataFusion
MIN
mishandle: https://github.com/apache/datafusion/issues/8911 -
DataFusion does not support
avg(DISTINCT)
: https://github.com/apache/arrow-datafusion/issues/2408 #ecosystem/datafusion #waiting #severity/medium #urgency/low - DataFusion non-unique column names https://github.com/apache/arrow-datafusion/issues/6543, https://github.com/apache/arrow-datafusion/issues/8379 #ecosystem/datafusion #waiting #severity/high #urgency/medium
-
support Postgres
'\x0102'
style escaped strings -
wishlist: datafusion should have better ways to convert from
ResolvedTableReference
toTableReference::Full
-
#project-management enumerate DataFusion trouble from sqllogictest
is_datafusion_limitation
, ensure upstream tickets exist -
TableProvider::constraints
helper function is no longer public: https://github.com/apache/datafusion/issues/15443
Architectural:
-
Arrow
is_null
/is_valid
API is broken by design: https://github.com/apache/arrow-rs/issues/4840 -
DataFusion schema handling is not great for us
- see https://github.com/apache/datafusion/discussions/8887
- with the "pre-scan" idea, the statement gets scanned twice, once by us once by DataFusion
-
INSERT RETURNING
(#ref/yn84qzsk1f9ny)` et al; or, take more control over DML execution
sqlparser-rs
-
UNCACHE
fails to parse with a semicolon: https://github.com/sqlparser-rs/sqlparser-rs/issues/1244 - docs link to non-English content https://github.com/apache/datafusion-sqlparser-rs/issues/1523
-
CreateTable.location
is confusing: https://github.com/apache/datafusion-sqlparser-rs/issues/1518 #severity/low -
Snowflake
COPY INTO
fails to parse with a semicolon: https://github.com/apache/datafusion-sqlparser-rs/issues/1519 #severity/low -
Snowflake
LIST @foo
andREMOVE @foo
fail to parse with a semicolon: https://github.com/apache/datafusion-sqlparser-rs/issues/1773 #severity/low -
SKEWED BY
not supported/buggy https://github.com/apache/datafusion-sqlparser-rs/issues/1499 #severity/low -
docs: typo in docs of ClickHouse column option
MATERIALIZED
https://github.com/apache/datafusion-sqlparser-rs/issues/1760 #severity/low #urgency/low #doc #dev -
docs:
ColumnOption::ForeignKey
docs have wrong SQL syntax https://github.com/apache/datafusion-sqlparser-rs/issues/1761 #severity/low #urgency/low #doc #dev
sqllogictest-rs
-
expected column type API is wrong way around: https://github.com/risinglightdb/sqllogictest-rs/issues/227
- our
*.slt
probably has lots of incorrectquery I
lines because nothing is enforcing correctness yet
- our
- sqllogictest-rs can't run sqlite tests: https://github.com/risinglightdb/sqllogictest-rs/issues/148
clippy (?)
-
no way to lint or
debug_assert
that we're not blocking in async context; how do I validate that I never use RocksDB's potentially-blocking FFI functions in Tokio async context?- blocking rocksdb calls should only be made inside
block_in_place
or such - https://github.com/rust-lang/wg-async/issues/19, https://github.com/rust-lang/wg-async/issues/248
- clippy issues: https://github.com/rust-lang/rust-clippy/issues/4377, https://github.com/rust-lang/rust-clippy/issues/10794
- https://github.com/rust-lang/wg-async/issues/73
- https://github.com/rust-lang/async-book/issues/64
- https://internals.rust-lang.org/t/warning-when-calling-a-blocking-function-in-an-async-context/11440
- maybe only use my own wrapper that uses thread-local state for that
- Tokio's own docs claim that their
block_on
panics if called in wrong context
- blocking rocksdb calls should only be made inside
tracing
-
tracing_subscriber
(and thustest_log
) intentional leak makes valgrind annoying: https://github.com/tokio-rs/tracing/issues/2069#issuecomment-1096929365 -
the tracing output formats are not very useful
- wish to combine best of tailing logs and nested traces
- i think i want to log to json, and use some sort of a viewer GUI -- but what?
Rust
-
rustfmt
fails silently extremely often: https://github.com/rust-lang/rustfmt/issues/3863, https://github.com/rust-lang/rustfmt/issues/6202, https://github.com/rust-lang/rustfmt/issues/3697, https://github.com/rust-lang/rustfmt/issues/6491- i find any discussion about "overlong lines" simply a sign of bad design; a formatter needs to work toward a goal, not silently stop working except in the one case where it can achieve perfection; the fact that the line remains long doesn't mean you should stop indenting it (or even worse, the whole containing block)
- the fact that
rustfmt
can eat comments is really concerning to me; this is just a bad architecture: https://github.com/rust-lang/rustfmt/issues/6044
-
error_generic_member_access
stabilized could help with providing context inKantoError
https://github.com/rust-lang/rust/issues/99301, https://github.com/rust-lang/rust/issues/96024- Likely reimplement
Op
, maybecode
too, with this? Make sure unique code scanner keeps working, including code kind that now comes from enum variant.
- Likely reimplement
misc
- branch coverage stabilized: https://github.com/taiki-e/cargo-llvm-cov/issues/8
-
uninitialized value in
opendir
(seen with valgrind)
Details
RocksDB bind to C++ directly #maybe
RocksDB is a C++ library we're currently using through a less-featureful C shim. This crate allows calling C++ directly: https://github.com/dtolnay/cxx/. What are the downsides?
See also https://github.com/cozodb/cozo/tree/main/cozorocks, https://crates.io/crates/autocxx.
Compare to https://github.com/hkalbasi/zngur, https://crates.io/crates/cpp.
Prior work at https://github.com/google/autocxx/issues/1073.
autocxx
has been stagnant since 2022, so I'm pessimistic on this approach now.
MSRV policy
- https://doc.rust-lang.org/cargo/reference/manifest.html#the-rust-version-field
- verify it builds with
cargo-msrv
or just Nix - discussions:
Streaming remote backup
- also usable for Change Data Capture (CDC), e.g. send to pubsub and update consumers
- inspiration: Litestream (WAL follower), LiteFS (FUSE server that understands SQLite file formats), LiteVFS (SQLite plugin):
- Litestream: https://litestream.io/how-it-works/, https://fly.io/blog/litefs-cloud/, https://fly.io/blog/all-in-on-sqlite-litestream/
- LiteFS has a more efficient to restore format (tl;dr stores pages not WAL segments): https://news.ycombinator.com/item?id=36604556
- some details on the LiteFS file format https://news.ycombinator.com/item?id=36606114
- using LiteFS as basis of read-only replicas: https://news.ycombinator.com/item?id=36604370
- experiences using Litestream: https://news.ycombinator.com/item?id=39065201
Streaming remote backup for RocksDB backend
- could be done in a second process with https://github.com/facebook/rocksdb/blob/main/examples/multi_processes_example.cc
- inspiration: replication features of https://github.com/apache/kvrocks
- inspiration: https://github.com/pinterest/rocksplicator (dead project)