KantoDB developer guide
Welcome to the KantoDB developer guide.
This document is intended to give a competent software developer familiar with the topics of backend software, storage, data structures and Rust a better idea of how KantoDB works internally.
The Open Source source code for KantoDB is available at https://git.kantodb.com/kantodb/kantodb.
Keeping documentation up to date is a difficult task, doubly so for developer-oriented documentation. Statements made in this document may in fact be false. Projections of what work will be done in the future, and how things might work one day, are even more hazardous.
If anything here seems out of touch with reality, please reach out on the community forums.
Roadmap
As of 2025Q1, the content and order of goals are very much in flux.
Your input is very much welcome, please engage us in a support contract or on the community forums.
About this page
Tags used in the roadmap
- #external: Likely source of the bug is not us, but in a dependency
- #maybe: We might not do this
- #urgency/high
- #urgency/medium
- #urgency/low
- #severity/high
- #severity/medium
- #severity/low
- #size/small
- #size/medium
- #size/big
- #api-change: Changes already-published API
- #bug: Likely bug in the software (not documentation)
- #enhancement: New feature or improving behavior some other way
- #performance: Improves performance, minimal effect on other behaviors
- #ops: Affects operational aspects, not functionality; for example, observability
- #dev: Only affects development and building from source, not the final product
- #test: Topic where test coverage needs to be better
- #doc: Improvements or additions to documentation
- #good-first-issue: Good for newcomers
- #need-more-info: We don't have know enough yet to work on this
- #help-wanted: Extra attention would be appreciated
- #ecosystem/*: We are hoping our dependencies will improve
- #waiting: Cannot proceed at this time
The checklist items are in very rough priority order. Order will be adjusted based on feedback and discovery of critical paths.
The big picture
- v0: Bootstrap to a real SQL database
- v1: Production-ready database using RocksDB
- v2: Pure Rust with new storage subsystem
Platforms
Support for non-Linux operating systems is not currently on the roadmap. We intend to actively use Linux-specific functionality for performance, security and simplicity.
For now, only x86-64 is actively tested, but everything is supposed to work cross-architecture (higher risk: 32-bit processing >4 GiB of data at once, 32-bit in general, big-endian).
Pending work
Work that is more or less "about to be done".
-
#dev make sequence inc take type, promise never to return
u64::MAX
, add some intermediate type or trait to make that work, avoid lots of error checks -
#dev re-evaluate
rocky
vsrust-rocksdb
high-level API -
#dev stringy internal errors like
internal error#
, inkey-format-v1
andrecord-format-v1
; do we need to return even more errors -
#dev extract higher-level backend provider, split non-RocksDB specifics from
kanto_backend_rocksdb
-
#dev eliminate remaining
expect
(Panic policy) -
#dev eliminate or excuse clippy exceptions either missing reason or where reason includes
TODO
-
clippy::indexing_slicing
-
-
#dev
BlockingSentinel
or such to make sure we're doing blocking right -
#bug random rare
pthread lock: invalid argument
(likely RocksDB shutdown, which is supposed to be safe and cancelling background tasks is said to be speed up only) - table scan: start/stop from DataFusion predicate pushdown #performance #severity/high #ref/1twdr3jqbpzen
-
table scan: inform DataFusion of the sort order we guarantee for primary keys #performance
- populate
output_ordering
: ExecutionPlan::properties
ExecutionPlanProperties::output_ordering
- populate
-
implement
TableProvider::constraints
https://docs.rs/datafusion/latest/datafusion/catalog/trait.TableProvider.html#method.constraints #performance- background: https://docs.rs/datafusion/latest/datafusion/common/struct.FunctionalDependence.html
- note non-public helper function for building constraints https://github.com/apache/datafusion/issues/15443 #ecosystem/datafusion
- ugly: to fill
Unique
, we need to know all indexes for the table; maybe mergeIndexDef
intoTableDef
(and then maybeIndexId
gets scoped within one table, change RocksDB keys)
-
#performance use indexes for lookups, not just correctness
- see https://github.com/apache/datafusion/discussions/9963
- variants: table scan (current), table get, index scan, index get, index and table
- is that maybe
TableProvider::get_logical_plan
? - probably need a custom query planner step
- index scan: inform DataFusion of the sort order we guarantee for keys of this index #performance
- index scan: start/stop from DataFusion predicate pushdown #performance #severity/high #ref/1twdr3jqbpzen
-
maybe use RocksDB
multi_get
for table lookups from index scan: https://docs.rs/rocksdb/latest/rocksdb/struct.DBCommon.html#method.batched_multi_get_cf_opt- also
multi_get_for_update
- also
-
#test fix everything from
cargo-mutants
- #test validate index state better, integrity test to run between mutations
-
#test transaction semantics better, including isolation and concurrent access,
SELECT FOR UPDATE
#test- use https://github.com/ept/hermitage
- use http://www.redbook.io/ch6-isolation.html
- see https://jepsen.io/consistency
- see https://www.postgresql.org/docs/current/sql-select.html#SQL-FOR-UPDATE-SHARE
- see
LOCK IN SHARE MODE
https://stackoverflow.com/questions/32827650/mysql-innodb-difference-between-for-update-and-lock-in-share-mode -
ensure iterator snapshot consistency
- add mutation test tap point where disabling this causes test failure
-
hide
arbitrary
behind a feature -
re-evaluate unsafe (non-validating) use of strings
- consider https://crates.io/crates/bstr -- but can't make Datafusion use that
-
study rocksdb size estimation
- https://github.com/facebook/rocksdb/wiki/Approximate-Size
- https://docs.rs/librocksdb-sys/0.17.1+9.9.3/librocksdb_sys/fn.rocksdb_approximate_sizes_cf.html
- https://docs.rs/librocksdb-sys/0.17.1+9.9.3/librocksdb_sys/fn.rocksdb_approximate_sizes_cf_with_flags.html
- https://docs.rs/librocksdb-sys/0.17.1+9.9.3/librocksdb_sys/fn.rocksdb_column_family_metadata_get_size.html
-
kantodb rocksdb-create
twice with same path gives confusing error message -
ensure
fsync
on commit, audit our use of RocksDB, validate with tests- prefer to delay until we can demonstrate it working, for trust
- leave a mutation test tap point where disabling fsync on commit causes test failure
- is this just
WriteOptions
sync
, or more involved?- maybe that can optimized based on things being no-sync within transaction, only transaction commit is sync
- https://github.com/facebook/rocksdb/wiki/Basic-Operations#synchronous-writes
Unimplemented or buggy SQL operations
-
DataFusion explicitly refuses to work with many SQL types even when it has the corresponding underlying type; see https://docs.rs/datafusion/latest/datafusion/sql/planner/struct.SqlToRel.html#method.build_schema and
convert_data_type
, e.g.UInt64
vsUNSIGNED BIGINT
-
large number literals behave differently than in Postgres;
SELECT i64_max_plus_1
gives float, Postgres looks to be using big numbers -
CREATE UNIQUE INDEX .. [NULLS DISTINCT] with null columns
(#ref/unimplemented_sql/w9j8q485anior)- use a multi-index with nullable columns in value not key, prefix scan on insert to detect conflict
-
UPDATE of column in primary key
(#ref/unimplemented_sql/fpb157cronqcy) -
tables without primary keys are not usable
- Datafusion
UPDATE
semantics are tricky, we need to dissect theinput
LogicalPlan
and convert to table scan of the current data (instead of blindly executing it), update records as we see them in the scan -
#bug
DELETE
(#ref/gs5muy13tu1uk) orUPDATE
(#ref/i61jpodrazscs) for a table usingrow_id
(not primary keys) -
#bug
DELETE
orUPDATE
for a table usingrow_id
(not primary keys) #ref/gs5muy13tu1uk #ref/i61jpodrazscs - possible fix: adjust
logical_plan
to also get a magic-named column__kanto_row_key
, use that to do deletes - possible fix: understand the
logical_plan
, don't use DataFusion to produce record batches, run a custom table/filter scan that also deletes on the fly
- Datafusion
- return number of rows affected
-
index existing data at
CREATE INDEX
time #severity/high #urgency/high- ideally go straight to concurrent index building logic, no table locks
-
INSERT RETURNING
(#ref/hixyde6h9n77o),UPDATE RETURNING
(#ref/pp5wz7nq4yhzs),DELETE RETURNING
(#ref/yn84qzsk1f9ny) refused by DataFusion SQL parser, not sure how to work around #ecosystem/datafusion #severity/high #urgency/medium -
CTEs (test coverage):
SELECT
,UPDATE
,INSERT
,DELETE
inside a CTE,SELECT
,UPDATE
,INSERT
,DELETE
following a CTE - rest of the data types
-
should we enforce
VARCHAR(30)
etc length, truncate; forbidding that syntax until clarified -
INTEGER(10)
etc precision-
type { INTEGER(precision) | INT(precision) | INT4(precision) }
(#ref/unimplemented_sql/tiptbb5d8buxa) -
type { UNSIGNED INTEGER(precision) | UNSIGNED INT(precision) | UNSIGNED INT4(precision) }
(#ref/unimplemented_sql/fzt1q8mfgkr6n) -
type { (UNSIGNED) INT2(precision) | (UNSIGNED) SMALLINT(precision) }
(#ref/unimplemented_sql/g57gd8499t4hg) -
type { BIGINT(precision) | INT8(precision) }
(#ref/unimplemented_sql/88fzntq3mz14g) -
type { UNSIGNED BIGINT(precision) | UNSIGNED INT8(precision) }
(#ref/unimplemented_sql/sze33j7qzt91s) -
type TINYINT(precision)
(#ref/unimplemented_sql/hueii8mjgk8qq) -
type UNSIGNED TINYINT(precision)
(#ref/unimplemented_sql/ewt6tneh7ecps)
-
-
type { NUMERIC | DECIMAL | DEC }
(#ref/unimplemented_sql/uoakmzwax448h) -
FLOAT(42)
etc precision-
type FLOAT(precision)
(#ref/unimplemented_sql/xfwbr486jaw5q) -
type DOUBLE(precision)
(#ref/unimplemented_sql/wo43sei9ubpon)
-
-
NUMERIC(precision, scale)
,NUMERIC(precision)
,NUMERIC
; akaDECIMAL(precision, scale)
-
type { CHARACTER | CHAR | FIXEDSTRING }
(#ref/unimplemented_sql/hdhf8ygp9zna4): fixed-length strings -
type INT128 (ClickHouse syntax)
(#ref/unimplemented_sql/4sg3mftgxu9j4) -
type INT256 (ClickHouse syntax)
(#ref/unimplemented_sql/ex9mnxxipbqqs) -
type UINT128 (ClickHouse syntax)
(#ref/unimplemented_sql/riob7j6jahpte) -
type UINT256 (ClickHouse syntax)
(#ref/unimplemented_sql/q6tzjzd7xf3ga) -
type { MEDIUMINT (UNSIGNED) }
(#ref/unimplemented_sql/m7snzsrqj68sw) -
type { BIGNUMERIC | BIGDECIMAL } (BigQuery syntax)
(#ref/unimplemented_sql/an57g4d4uzwew) -
type BIT (VARYING)
(#ref/unimplemented_sql/88uneott9owc1) -
type UUID
(#ref/unimplemented_sql/531ax7gb73ce1) -
type TIMESTAMP
(#ref/unimplemented_sql/njad51sbxzwnn) -
type TIME
(#ref/unimplemented_sql/bppzhg7ck7xhr) -
type DATE
(#ref/unimplemented_sql/1r5ez1z8j7ryo) -
type DATE32
(#ref/unimplemented_sql/781y5fxih7pnn) -
type DATETIME (ClickHouse syntax)
(#ref/unimplemented_sql/oak3s5g3rnutq) -
type DATETIME (MySQL syntax)
(#ref/unimplemented_sql/dojz85ngwo5wr) -
type INTERVAL
(#ref/unimplemented_sql/5wxynwyfua69s) -
type { JSON | JSONB }
(#ref/unimplemented_sql/abcw878jr35qc)- JSON manipulation functions #ecosystem/datafusion (best done in DataFusion https://github.com/apache/datafusion/issues/7845)
- inspiration: https://clickhouse.com/blog/a-new-powerful-json-data-type-for-clickhouse
-
type BINARY
(#ref/unimplemented_sql/i16wqwmie17eg) -
type ENUM
(#ref/unimplemented_sql/1geqh5qxdeoko) -
type ARRAY
(#ref/unimplemented_sql/hnmqj3qbtutn4) -
type SET
(#ref/unimplemented_sql/hm53sq3gco3sw) -
type STRUCT
(#ref/unimplemented_sql/nsapujxjjk9hg) -
type MAP
(#ref/unimplemented_sql/7ts8zgnafnp7g) -
type UNION
(#ref/unimplemented_sql/oownfhnfj5b9r) -
type TUPLE
(#ref/unimplemented_sql/x7qwkwdhdznxe) -
type NESTED
(#ref/unimplemented_sql/fa6nofmo1s49g) -
type { CHARACTER LARGE OBJECT | CHAR LARGE OBJECT | CLOB }
(#ref/unimplemented_sql/ew7uhufhkzj9w) -
type TRIGGER
(#ref/unimplemented_sql/zf5jx1ykoc5s1) -
custom type
(#ref/unimplemented_sql/dt315zutjqpzc) -
type UNSPECIFIED
(#ref/unimplemented_sql/916n48obgiqh6) -
type LOWCARDINALITY
(#ref/unimplemented_sql/78exey8xjz3yk) -
type REGCLASS (Postgres syntax)
(#ref/unimplemented_sql/quzjzcsusttgc) -
type NULLABLE (ClickHouse syntax)
(#ref/unimplemented_sql/ra5kspnstauu6)
-
should we enforce
-
emulate more Postgres system tables, like
pg_cancel_backend
Unimplemented SQL syntax support
See also DataFusion SQL support status: https://arrow.apache.org/datafusion/user-guide/sql/sql_status.html, roadmap https://arrow.apache.org/datafusion/contributor-guide/roadmap.html.
For reference, see Postgres SQL, SQLite SQL, and MariaDB SQL.
-
SET LOCAL
(#ref/unimplemented_sql/7b1ab6tms3uz4) -
SHOW variable
(#ref/unimplemented_sql/7a69iu5nr1oxk) -
CREATE TABLE IF NOT EXISTS
(#ref/unimplemented_sql/pf1mfhd9sz3jn) -
DROP TABLE
(#ref/unimplemented_sql/47hnz51gohsx6) -
ALTER TABLE
(#ref/unimplemented_sql/dny7j9hx3ihha) -
CREATE INDEX IF NOT EXISTS
(#ref/unimplemented_sql/tzwc96kk9173n) -
DROP INDEX
(#ref/unimplemented_sql/4kgkg4jhqhfrw) -
ALTER INDEX
(#ref/unimplemented_sql/ha1fybdqqwa8k) -
SET TRANSACTION
(#ref/unimplemented_sql/yybqytw7q8bmn)- may cause a bigger refactoring of our transaction start; we currently do eager creation
-
CREATE TABLE without columns
(#ref/unimplemented_sql/g7hdmcm5bk69a) -
constraints on tables #severity/high #urgency/high
-
CREATE TABLE with table constraint CHECK
(#ref/unimplemented_sql/8zaqnh9mb9tor)
-
-
constraints on columns
-
CREATE TABLE with column option { PRIMARY KEY | UNIQUE }
(#ref/unimplemented_sql/hwcjakao83bue) #severity/high #urgency/medium -
CREATE TABLE with column option CHECK
(#ref/unimplemented_sql/dwcpda8aj66d6) #severity/high #urgency/medium
-
-
default and computed values for columns
-
CREATE TABLE with column option DEFAULT
(#ref/unimplemented_sql/944dzboki4djc) #severity/high #urgency/medium -
CREATE TABLE with column option GENERATED
(#ref/unimplemented_sql/bxzpaj1quo5a4) #severity/high #urgency/medium
-
-
CREATE INDEX with dotted column name
(#ref/unimplemented_sql/kbezqzh7p6yky) -
CREATE INDEX with non-column expression
(#ref/unimplemented_sql/8cjqfzsfpnduw) -
CREATE INDEX WHERE
(#ref/unimplemented_sql/9134bk3fe98x6): partial indexes -
CREATE INDEX .. WITH
(#ref/unimplemented_sql/w7t6c8xrehsnh) -
CREATE INDEX .. INCLUDE
(#ref/unimplemented_sql/ip415b5s8sa6h): non-key included columns -
foreign keys #severity/high #urgency/high
-
CREATE TABLE with table constraint FOREIGN KEY
(#ref/unimplemented_sql/mamfde4rdzgeo) -
CREATE TABLE with column option REFERENCES
(#ref/unimplemented_sql/ufnoiinkkp3wy) - enforce
-
-
roles and access control #severity/high #urgency/medium
-
CREATE ROLE
(#ref/unimplemented_sql/hqyj3srdk1g4h) -
DROP ROLE
(#ref/unimplemented_sql/q8wh8nqixoytq) -
ALTER ROLE
(#ref/unimplemented_sql/u6set5k1gzyhh) -
GRANT
(#ref/unimplemented_sql/mekqqm6hxy64s) -
REVOKE
(#ref/unimplemented_sql/wha7884jy4bty) -
SET ROLE
(#ref/unimplemented_sql/ryqxdr55r49sr)
-
-
prepared statements
-
PREPARE
(#ref/unimplemented_sql/fgi986yc4d7cw) -
EXECUTE
(#ref/unimplemented_sql/3w91mfrbzguyh) -
DEALLOCATE
(#ref/unimplemented_sql/fn1q45ys1t8ks)
-
-
views, first dynamic, then materialized, then incremental updates etc extras #severity/high
-
CREATE VIEW
(#ref/unimplemented_sql/b9m5uhu9pnnsw) -
ALTER VIEW
(#ref/unimplemented_sql/uhwpi8nzojcxy) -
DROP VIEW
(#ref/unimplemented_sql/hwk7gp6ffh8zy)
-
-
TRUNCATE
(#ref/unimplemented_sql/uf3rr4diw3o1o) -
temporary tables
-
CREATE TEMPORARY TABLE
(#ref/unimplemented_sql/3u561ykck5m76)
-
-
sequences (exposed to SQL)
-
CREATE SEQUENCE
(#ref/unimplemented_sql/rdkjx9ryredcw) -
DROP SEQUENCE
(#ref/unimplemented_sql/ca149fq1ixm5h)
-
-
START TRANSACTION READ ONLY
(#ref/unimplemented_sql/bhetj3emnc8n1) -
multiple schemas
- make schema creation explicit not implicit
-
CREATE SCHEMA
(#ref/unimplemented_sql/x9cna73uf8cew) -
DROP SCHEMA
(#ref/unimplemented_sql/5rnpn33hia5a6)
-
multiple databases/catalogs/backends; all three might be the same thing for us
- re-evaluate the whole "catalog name is backend" idea, especially when a backend can store multiple catalogs, duplication can lead to mismatch, renames are weird; maybe it's a lookup order?
- is a backend a
TABLESPACE
?sqlparser-rs
doesn't haveTABLESPACE
support as of 2025-03
- is a backend a
- re-evaluate the idea of invisibly spreading every transaction across all backends; maybe force each transaction to stick to one, maybe make it lazy
- Postgres docs specifically say "Databases are equivalent to catalogs"
-
CREATE DATABASE
(#ref/unimplemented_sql/6c76p5eqsuea6) -
ALTER DATABASE
: not insqlparser
v0.54.0 #ecosystem/sqlparser-rs -
DROP DATABASE
(#ref/unimplemented_sql/qn6s3c397johw)
- re-evaluate the whole "catalog name is backend" idea, especially when a backend can store multiple catalogs, duplication can lead to mismatch, renames are weird; maybe it's a lookup order?
-
EXPLAIN
(#ref/unimplemented_sql/fugjaua7zkfd4) -
CREATE INDEX DESC
(#ref/unimplemented_sql/n7x3zhjk7bf8k): descending sort order -
CREATE INDEX CONCURRENTLY
(#ref/unimplemented_sql/wya1a3w874xba) -
COMMIT AND CHAIN
(#ref/unimplemented_sql/ic5jpcqsjghm6) -
ROLLBACK AND CHAIN
(#ref/unimplemented_sql/q1pay866nux9n) -
COPY
(#ref/unimplemented_sql/1fxo667s3wwes): the kind that transfers data over wire; missing support in https://github.com/sunng87/pgwire #waiting- we do not intend to support anything where server-side file read/write gets path from user over network
-
RELEASE SAVEPOINT
(#ref/unimplemented_sql/hyzhkewjpnug6): #severity/low #urgency/low #size/small- test error from trying to use it again
-
{ LISTEN | UNLISTEN | NOTIFY }
(#ref/unimplemented_sql/8t9hqpx4umzaq) -
user-defined functions #severity/medium #urgency/low
-
CREATE FUNCTION
(#ref/unimplemented_sql/fj9ny1gu79cbg) -
DROP FUNCTION
(#ref/unimplemented_sql/ptp9d3xe1456h) -
SHOW FUNCTIONS
(#ref/unimplemented_sql/yqiswgsqjc5uh): #severity/low
-
-
user-defined types #severity/medium #urgency/low
-
CREATE TYPE
(#ref/unimplemented_sql/ur33y834br1wa): user defined types (see [Postgres](https://www.postgresql.org/docs/16/sql-createtype.html, DataFusion issue) -
DROP TYPE
(#ref/unimplemented_sql/t1fti3yz8jsdn)
-
-
collation, as in locale-specific ordering:
-
CREATE TABLE .. COLLATE
(#ref/unimplemented_sql/qd36gh6hj78br): configurable collation order for columns: #severity/medium #urgency/low - see https://databasearchitects.blogspot.com/2016/08/equivalence-of-unicode-strings-is.html
- see https://en.wikipedia.org/wiki/Common_Locale_Data_Repository
-
-
procedures #severity/medium #urgency/low
-
CREATE PROCEDURE
(#ref/unimplemented_sql/69jzm59u8c6jc) -
CALL
(#ref/unimplemented_sql/dx5ic5bkaaq7c): see Postgres -
DROP PROCEDURE
(#ref/unimplemented_sql/w173ctuodtcaa)
-
-
triggers #severity/medium #urgency/low
-
CREATE TRIGGER
(#ref/unimplemented_sql/cxts4du59pwqn) -
DROP TRIGGER
(#ref/unimplemented_sql/i1omqjczqksqk)
-
-
KILL [CONNECTION]
(#ref/unimplemented_sql/jqji1j6my4f5w): #severity/medium #urgency/low -
KILL QUERY
(#ref/unimplemented_sql/77ecapaed3bso): #severity/medium #urgency/low -
MERGE
(#ref/unimplemented_sql/y5gzyy5kxg5ws): #severity/low #urgency/low -
CREATE TABLE .. COMMENT
(#ref/unimplemented_sql/dgyhzknpfx756) -
CREATE TABLE with column option COMMENT
(#ref/unimplemented_sql/4itjrujemcztq) #severity/medium #urgency/low - miscellaneous column options
-
COMMENT ON
(#ref/unimplemented_sql/e8uhkzz1krcgw): #severity/low #urgency/low -
populate table with data at creation time #severity/low #urgency/low
-
CREATE TABLE AS SELECT
(#ref/unimplemented_sql/cirnphk69igxy): (Ctas
in https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.WriteOp.html) -
CREATE TABLE AS SELECT
(#ref/unimplemented_sql/nuqha5e3r7g6h)
-
-
READ COMMITTED
: currently upgrading toREAD REPEATABLE
, could support directly (#ref/bm4q45m5ajz9s) -
START TRANSACTION ISOLATION LEVEL SERIALIZABLE
(#ref/unimplemented_sql/dy1ch15p7n3ze): maybe just a better error message: #severity/low #urgency/low- i think supporting this requires two things:
TransactionOptions::set_snapshot(true)
and everyget
is upgraded toget_for_update
. - then marking transactions as
READ ONLY
can relax that for performance
- i think supporting this requires two things:
-
cursors (see Postgres; not in standard interactive SQL) #severity/low #urgency/low
-
DECLARE
(#ref/unimplemented_sql/159jmakdyuewy) -
FETCH
(#ref/unimplemented_sql/9nf8hju6ie9e1) -
CLOSE
(#ref/unimplemented_sql/gc1557zek4hne)
-
-
row-level security (see Postgres)
-
CREATE POLICY
(#ref/unimplemented_sql/gce4sb6hptjfy) -
DROP POLICY
(#ref/unimplemented_sql/n9zgxwn6736wr) -
ALTER POLICY
(#ref/unimplemented_sql/wj1tnq4hkth7s)
-
-
DISCARD
(#ref/unimplemented_sql/9a5zkcwgbhewh): (see Postgres): #severity/low #urgency/low -
REPLACE INTO (MySQL syntax) or INSERT OR REPLACE (SQLite syntax)
(#ref/unimplemented_sql/54xqh7ornzpso): #severity/low #urgency/low -
CREATE OR REPLACE TABLE
(#ref/unimplemented_sql/65so99qr1hyck): #severity/low #urgency/low -
ASSERT (PowerSQL syntax)
(#ref/unimplemented_sql/qq1s1fq45ztdw): #severity/low #urgency/low -
CREATE TABLE .. LIKE
(#ref/unimplemented_sql/xutp6m7rjnc54) -
CREATE TABLE .. CLONE
(#ref/unimplemented_sql/7b5kah43yqquc): optionally also copies data; likely RocksDB backend will not support that option -
CREATE TABLE .. WITH
(#ref/unimplemented_sql/3kozdfoeczmrk): see Postgres, likely parse superficially and error if given #severity/low #urgency/low -
RENAME TABLE (MySQL syntax)
(#ref/unimplemented_sql/85ax9gtrkcneg) -
CREATE TABLE with column option CHARACTER SET
(#ref/unimplemented_sql/ifi7dygtua3oc): parse superficially and refuse non-utf8 #severity/low #urgency/low -
CREATE TABLE .. DEFAULT CHARSET (MySQL syntax)
(#ref/unimplemented_sql/dcan9mz8g4eja): parse superficially and refuse non-utf8 #urgency/low #severity/low #size/small - relaxing constraints to end of transaction:
-
CREATE TABLE with table constraint PRIMARY KEY .. DEFERRABLE
(#ref/unimplemented_sql/uzf4oabd178rr) -
CREATE TABLE with table constraint PRIMARY KEY .. INITIALLY DEFERRED
(#ref/unimplemented_sql/ysnkhzw6ukbms)
-
-
BEGIN {DEFERRED | IMMEDIATE | EXCLUSIVE} (SQLite syntax)
(#ref/unimplemented_sql/o969ikdsjz76r): #severity/low #urgency/low -
CREATE TABLE with column option ON CONFLICT (SQLite syntax)
(#ref/unimplemented_sql/ehefbqgnnfqwe) #severity/medium #urgency/low -
PRAGMA (SQLite syntax)
(#ref/unimplemented_sql/a1k38hk48fpzr): #severity/low #urgency/low -
SHOW CREATE (MySQL syntax)
(#ref/unimplemented_sql/c7cewp15zm5ys) -
SHOW STATUS (MySQL syntax)
(#ref/unimplemented_sql/ejj6ir6dsxhrc) -
SHOW DATABASES (MySQL syntax)
(#ref/unimplemented_sql/kskgotui7rupc) -
SHOW COLUMNS (MySQL syntax)
(#ref/unimplemented_sql/zx98bjxguht5o) -
SHOW VARIABLES (MySQL syntax)
(#ref/unimplemented_sql/n5cs7ue5gzpde) -
SHOW SCHEMAS (Snowflake syntax)
(#ref/unimplemented_sql/poi5oxunjyq5a) -
SHOW VIEWS (Snowflake syntax)
(#ref/unimplemented_sql/zzi1wb3jq5jeh) -
{ EXPLAIN | DESCRIBE } TABLE (MySQL syntax)
(#ref/unimplemented_sql/4tpmma7ewdf3a): #severity/low #urgency/low -
CREATE TABLE .. AUTO_INCREMENT (MySQL syntax)
(#ref/unimplemented_sql/18kxgbi8rzb5r): #severity/low #urgency/low -
USE
(#ref/unimplemented_sql/ng34e6kt6rrt4): #severity/low #urgency/low -
CREATE TABLE with table constraint PRIMARY KEY .. COMMENT (MySQL syntax)
(#ref/unimplemented_sql/ss9iqz1c8ffrh) -
MySQL-style names on constraints or indexes:
-
CREATE TABLE with table constraint PRIMARY KEY given a name (MySQL syntax)
(#ref/unimplemented_sql/5rr7ouyasn1dk) -
CREATE TABLE with table constraint PRIMARY KEY given an index name (MySQL syntax)
(#ref/unimplemented_sql/mexuaio6ngad4)
-
-
CREATE TABLE with table constraint UNIQUE {INDEX | KEY} (MySQL syntax)
(#ref/unimplemented_sql/m3op6enhb5pgw) -
CREATE TABLE with table constraint PRIMARY KEY .. NOT ENFORCED (MySQL syntax)
(#ref/unimplemented_sql/u339u8jzzkqdy) -
CREATE TABLE with column option ON UPDATE timestamp function (MySQL syntax)
(#ref/unimplemented_sql/zaub41ukgqfmq) #severity/low #urgency/low -
CREATE TABLE with column option MATERIALIZED (ClickHouse syntax)
(#ref/unimplemented_sql/ouq6dfrgbxn9w) #severity/low #urgency/low -
CREATE TABLE with column option EPHEMERAL (ClickHouse syntax)
(#ref/unimplemented_sql/78qipewmqn31h) #severity/low #urgency/low -
CREATE TABLE with column option ALIAS (ClickHouse syntax)
(#ref/unimplemented_sql/1adkef6yd1qgc) #severity/low #urgency/low -
CREATE TABLE with column option OPTIONS (BigQuery syntax)
(#ref/unimplemented_sql/7drpfhozbre1k) #severity/low #urgency/low -
CREATE TABLE with column option TAG (Snowflake syntax)
(#ref/unimplemented_sql/sjpe4f636kn8n) #severity/low #urgency/low -
CREATE TABLE with column option { MASKING POLICY | PROJECTION POLICY } (Snowflake syntax)
(#ref/unimplemented_sql/ynucdy5u7a35w) #severity/low #urgency/low -
CREATE TABLE with weird column option
(#ref/unimplemented_sql/4jqhus4cj4fho) #severity/low #urgency/low
There are plenty of corners of the SQL syntax we have chosen to not support.
These are marked in the source code as NotSupportedSql
.
We may have judged wrong; let us know if you need some non-supported syntax.
Later
Features and improvements we would like to work on, but that aren't started yet.
- #admin set up security contacts
- reconsider each item in https://datafusion.apache.org/user-guide/configs.html
-
maybe interpret
INT8
andUINT8
bits vs bytes based on SQL dialect; Clickhouse is the odd one out - safer arithmetic than silent wraparound in SQL
- we could pack booleans tighter in a record, prefer finding an unused bit in last byte/u32 allocated to booleans
-
existence of
test_session
method in a couple of places probably means the non-test API is not convenient enough -
move away from
arrow-row
for keys, it's not guaranteed stable; we're mostly using it to avoid writing our own SIMD code for now -
#maybe #dev split types so different byte slices aren't confused with each other: row key, index key, rocksdb key, record, arrow data, etc
- would require wrapping the rocksdb api to enforce there?
-
#maybe #performance special case point query so it doesn't go through a table scan and RocksDB iterator
rust-rocksdb
makes copies of keys to do FFI, bad design- perhaps there is even a way to avoid columnar result overhead?
-
make
information_schema
from DataFusion show our types, notUtf8View
etc- potential: DataFusion supporting logical types: https://github.com/apache/datafusion/issues/12622 #waiting
- alternatively, re-implement ourselves
- issue on what is missing from
information_schema
https://github.com/apache/datafusion/issues/8916
-
#performance #maybe partition workloads (inputs for insert, key ranges for table scan) and run multiple RocksDB blocking worker threads
datafusion::physical_plan::execute_stream_partitioned
instead ofexecute_stream
- #performance consider switching to minitrace: https://crates.io/crates/minitrace, or NIH it https://news.ycombinator.com/item?id=41570163
-
#performance support
pgwire
extended queries, newer clients might not implement simple queries - #ops make server use only UTC time as native, never local time
- #ops landlock, seccomp etc sandboxing
- #cleanup #dev maybe we want stronger types to never confuse key-value keys and values?
-
#dev #doc
KantoError
convention, error codes -
#dev lint: every
UnimplementedSql
andNotSupportedSql
must have a SLT test -
#dev lint: handle
Unimplemented
likeUnimplementedSql
, figure out a way to avoid duplication -
#dev lint: all
expect
must be internal errors and unique (in Panic policy) -
#dev lint: all
#waiting
have URL to upstream issue -
#dev lint:
thiserror
source field: named source or error, be consistent - #ops more readable logging (maybe by postprocessing the JSON)
-
#ops collect metrics in-memory and expose them via a Prometheus endpoint
- maybe also expose Tokio metrics: https://docs.rs/tokio-metrics/latest/tokio_metrics/
- TLS support in Postgres-compatible server
- shutdown on I/O error (in Panic policy)
- reevaluate panic on OOM
- create a REPL for ad hoc interaction (useful for library, not as useful for server) (limitation: exclusive access only): https://docs.rs/clap/latest/clap/_derive/_cookbook/repl/index.html
- #enhancement Streaming remote backup
- #dev set up branch coverage (needs nightly rust: https://github.com/taiki-e/cargo-llvm-cov/issues/8)
-
#dev continuous-integration:
./task/nextest-run-valgrind
-
#dev continuous-integration:
cargo miri nextest run
-
#dev continuous-integration: (with unstable)
cargo fmt --check -- --unstable-features --config-path=./rustfmt-unstable.toml
-
#dev continuous-integration:
cargo run -p lint-unique-error-codes
-
#dev continuous-integration:
cargo run -p lint-unimplemented-sql
-
#dev continuous-integration:
cargo shear
-
#dev continuous-integration:
cargo spellcheck
-
#dev continuous-integration:
typos
-
#test tests for
CREATE TABLE
,CREATE INDEX
etc that dump the rkyvTableDef
/IndexDef
as JSON -
#test regularly test on nightly
- with
-Zrandomize-layout
(seeflake.nix
) - re-evaluate
rustc
flags for enhanced testing every now and then
- with
-
#dev lint: no accidental public API changes, and eyeball the public API for sanity
-
cargo-public-api
- broken with Rust >=1.84: https://github.com/cargo-public-api/cargo-public-api/issues/750 #ecosystem #waiting
- the diff logic is annoying, I really don't want networking or touching my git checkout; use as library only: https://docs.rs/public-api/latest/public_api/
-
cargo-semver-checks
-
- #dev lint that we re-export every crate exposed by public APIs
-
#dev speed up builds
- study https://matklad.github.io/2021/09/04/fast-rust-builds.html
- study https://fasterthanli.me/articles/why-is-my-rust-build-so-slow
- study https://doc.rust-lang.org/cargo/reference/timings.html
- study https://crates.io/crates/cargo-llvm-lines
- study https://www.reddit.com/r/rust/comments/1bhpfeb/vastly_improved_recompile_times_in_rust_with/
- study https://corrode.dev/blog/tips-for-faster-rust-compile-times/
- study https://nnethercote.github.io/perf-book/build-configuration.html#minimizing-compile-times
-
#maybe speed up linking with
lld
ormold
- #dev set up MSRV policy
-
#doc #website support
/docs/<version>
, navigation between those and/docs/latest
- bindings to C
- bindings to Typescript/Javascript: Deno, NodeJS
- bindings to Python
- does compiling to WASM make sense, given the filesystem use; is a good enough API emerging
Maybe
Possible features we have not at this time committed to implementing, or that are so far out that we just don't have a clear picture of how to implement them.
-
#performance
kanto-backend-rocksdb
: use merges where possible (e.g.x=x+1
)- what are the wanted update messages? set, delete, delta_wraparound, delta_saturated? https://en.wikipedia.org/wiki/Fractal_tree_index#Upserts
-
kanto-backend-rocksdb
: use compaction filtering for vacuum - encryption at rest
- standalone integrity checker/fixer ("fsck")?
- UNIX domain socket support in Postgres-compatible server
- RocksDB bind to C++ directly
-
#performance #maybe tracing: filter out some low-level tracing messages when in release mode
- https://docs.rs/tracing/latest/tracing/level_filters/index.html#compile-time-filters
- not sure i actually want to remove
TRACE
level; it's a bit annoying thatDEBUG
is more important thanTRACE
.
- #doc #maybe #urgency/low include section in docs with all used licenses
- #admin study tokio-console, would it benefit us
-
re-evaluate legacy compatibility with SQLite
- we have marked some vendor-specific SQL as
NotSupportedSql
that could be implemented; they were just not relevant for Postgres compatibility
- we have marked some vendor-specific SQL as
-
re-evaluate legacy compatibility with MySQL
- we have marked some vendor-specific SQL as
NotSupportedSql
that could be implemented; they were just not relevant for Postgres compatibility
- we have marked some vendor-specific SQL as
Small cleanups
-
exact_zip
helper: we useIterator::zip
in many places where having non-equal lengths would be a bug, systematize
Ecosystem wishlist
DataFusion
Specific bugs and missing features:
-
DataFusion
CASE WHEN
is wrong: https://github.com/apache/arrow-datafusion/issues/8475 -
DataFusion
DataFrame::cache
breaks if someCASE
branches areNULL
: https://github.com/apache/arrow-datafusion/issues/8476 -
DataFusion
CASE
mishandle: https://github.com/apache/arrow-datafusion/issues/8909 -
DataFusion
COALESCE
mishandle: https://github.com/apache/arrow-datafusion/issues/8910 #ecosystem/datafusion #waiting #severity/medium #urgency/low -
DataFusion
MIN
mishandle: https://github.com/apache/datafusion/issues/8911 -
DataFusion does not support
avg(DISTINCT)
: https://github.com/apache/arrow-datafusion/issues/2408 #ecosystem/datafusion #waiting #severity/medium #urgency/low - DataFusion non-unique column names https://github.com/apache/arrow-datafusion/issues/6543, https://github.com/apache/arrow-datafusion/issues/8379 #ecosystem/datafusion #waiting #severity/high #urgency/medium
-
support Postgres
'\x0102'
style escaped strings -
wishlist: datafusion should have better ways to convert from
ResolvedTableReference
toTableReference::Full
-
#project-management enumerate DataFusion trouble from sqllogictest
is_datafusion_limitation
, ensure upstream tickets exist -
TableProvider::constraints
helper function is no longer public: https://github.com/apache/datafusion/issues/15443
Architectural:
-
Arrow
is_null
/is_valid
API is broken by design: https://github.com/apache/arrow-rs/issues/4840 -
DataFusion schema handling is not great for us
- see https://github.com/apache/datafusion/discussions/8887
- with the "pre-scan" idea, the statement gets scanned twice, once by us once by DataFusion
-
INSERT RETURNING
(#ref/yn84qzsk1f9ny)` et al; or, take more control over DML execution
sqlparser-rs
-
UNCACHE
fails to parse with a semicolon: https://github.com/sqlparser-rs/sqlparser-rs/issues/1244 - docs link to non-English content https://github.com/apache/datafusion-sqlparser-rs/issues/1523
-
CreateTable.location
is confusing: https://github.com/apache/datafusion-sqlparser-rs/issues/1518 #severity/low -
Snowflake
COPY INTO
fails to parse with a semicolon: https://github.com/apache/datafusion-sqlparser-rs/issues/1519 #severity/low -
Snowflake
LIST @foo
andREMOVE @foo
fail to parse with a semicolon: https://github.com/apache/datafusion-sqlparser-rs/issues/1773 #severity/low -
SKEWED BY
not supported/buggy https://github.com/apache/datafusion-sqlparser-rs/issues/1499 #severity/low -
docs: typo in docs of ClickHouse column option
MATERIALIZED
https://github.com/apache/datafusion-sqlparser-rs/issues/1760 #severity/low #urgency/low #doc #dev -
docs:
ColumnOption::ForeignKey
docs have wrong SQL syntax https://github.com/apache/datafusion-sqlparser-rs/issues/1761 #severity/low #urgency/low #doc #dev
sqllogictest-rs
-
expected column type API is wrong way around: https://github.com/risinglightdb/sqllogictest-rs/issues/227
- our
*.slt
probably has lots of incorrectquery I
lines because nothing is enforcing correctness yet
- our
- sqllogictest-rs can't run sqlite tests: https://github.com/risinglightdb/sqllogictest-rs/issues/148
clippy (?)
-
no way to lint or
debug_assert
that we're not blocking in async context; how do I validate that I never use RocksDB's potentially-blocking FFI functions in Tokio async context?- blocking rocksdb calls should only be made inside
block_in_place
or such - https://github.com/rust-lang/wg-async/issues/19, https://github.com/rust-lang/wg-async/issues/248
- clippy issues: https://github.com/rust-lang/rust-clippy/issues/4377, https://github.com/rust-lang/rust-clippy/issues/10794
- https://github.com/rust-lang/wg-async/issues/73
- https://github.com/rust-lang/async-book/issues/64
- https://internals.rust-lang.org/t/warning-when-calling-a-blocking-function-in-an-async-context/11440
- maybe only use my own wrapper that uses thread-local state for that
- Tokio's own docs claim that their
block_on
panics if called in wrong context
- blocking rocksdb calls should only be made inside
tracing
-
tracing_subscriber
(and thustest_log
) intentional leak makes valgrind annoying: https://github.com/tokio-rs/tracing/issues/2069#issuecomment-1096929365 -
the tracing output formats are not very useful
- wish to combine best of tailing logs and nested traces
- i think i want to log to json, and use some sort of a viewer GUI -- but what?
Rust
-
rustfmt
fails silently extremely often: https://github.com/rust-lang/rustfmt/issues/3863, https://github.com/rust-lang/rustfmt/issues/6202, https://github.com/rust-lang/rustfmt/issues/3697, https://github.com/rust-lang/rustfmt/issues/6491- i find any discussion about "overlong lines" simply a sign of bad design; a formatter needs to work toward a goal, not silently stop working except in the one case where it can achieve perfection; the fact that the line remains long doesn't mean you should stop indenting it (or even worse, the whole containing block)
- the fact that
rustfmt
can eat comments is really concerning to me; this is just a bad architecture: https://github.com/rust-lang/rustfmt/issues/6044
-
error_generic_member_access
stabilized could help with providing context inKantoError
https://github.com/rust-lang/rust/issues/99301, https://github.com/rust-lang/rust/issues/96024- Likely reimplement
Op
, maybecode
too, with this? Make sure unique code scanner keeps working, including code kind that now comes from enum variant.
- Likely reimplement
misc
- branch coverage stabilized: https://github.com/taiki-e/cargo-llvm-cov/issues/8
-
uninitialized value in
opendir
(seen with valgrind)
Details
RocksDB bind to C++ directly #maybe
RocksDB is a C++ library we're currently using through a less-featureful C shim. This crate allows calling C++ directly: https://github.com/dtolnay/cxx/. What are the downsides?
See also https://github.com/cozodb/cozo/tree/main/cozorocks, https://crates.io/crates/autocxx.
Compare to https://github.com/hkalbasi/zngur, https://crates.io/crates/cpp.
Prior work at https://github.com/google/autocxx/issues/1073.
autocxx
has been stagnant since 2022, so I'm pessimistic on this approach now.
MSRV policy
- https://doc.rust-lang.org/cargo/reference/manifest.html#the-rust-version-field
- verify it builds with
cargo-msrv
or just Nix - discussions:
Streaming remote backup
- also usable for Change Data Capture (CDC), e.g. send to pubsub and update consumers
- inspiration: Litestream (WAL follower), LiteFS (FUSE server that understands SQLite file formats), LiteVFS (SQLite plugin):
- Litestream: https://litestream.io/how-it-works/, https://fly.io/blog/litefs-cloud/, https://fly.io/blog/all-in-on-sqlite-litestream/
- LiteFS has a more efficient to restore format (tl;dr stores pages not WAL segments): https://news.ycombinator.com/item?id=36604556
- some details on the LiteFS file format https://news.ycombinator.com/item?id=36606114
- using LiteFS as basis of read-only replicas: https://news.ycombinator.com/item?id=36604370
- experiences using Litestream: https://news.ycombinator.com/item?id=39065201
Streaming remote backup for RocksDB backend
- could be done in a second process with https://github.com/facebook/rocksdb/blob/main/examples/multi_processes_example.cc
- inspiration: replication features of https://github.com/apache/kvrocks
- inspiration: https://github.com/pinterest/rocksplicator (dead project)
Coding conventions
Prefer lints over documenting things here. What needs to be explicitly said, we do not have a lint for.
C pointers
- Name the struct field
raw
(orraw_*
for multiple). - Use
std::ptr::NonNull
wherever possible.
Keepalives
- Name the struct field
_keepalive_*
, or if it needs to be assigned to after creationkeepalive_*
. - Put keepalives at the end of the struct; struct fields are dropped in declaration order.
Reference count keepalives
- If
A
depends onB
(for example,rocksdb_a_destroy
must be called beforerocksdb_b_destroy
), and doesn't naturally refer toB
, add a struct field_keepalive_b: B
(orArc<B>
, if the reference count is not hidden as aninner
field).
C pointer keepalives
- If you are passing Rust-allocated memory to C, where C holds on to that memory past the function call, use
std::pin::Pin
to ensure Rust does not move the data from underneath.
No lint leftovers
(Needed to state justification for manual #[allow]
s.)
Use #[expect(...)]
instead of #[allow(...)]
to allow lints.
Acceptable exceptions:
- macros or other generated code that only sometimes triggers the lint rule
For lints that trigger only with some architectures/platforms/configurations, express that with cfg
.
Indent don't align
Trying to align "columns" in source code where the amount of content in the "left column" varies leads to unnecessary code churn; don't do it.
Examples:
- aligning
#
or//
in inline comments - aligning
=
in assignments
tx
means database transactions, not transmit
The identifier tx
shall refer to database transactions, or lower-level transactions such as RocksDB transaction used to implement a database transaction.
For consistency, avoid txn
.
Ends of channels are called "sender" and "receiver", not tx/rx.
Rust value builders have .build()
methods
For consistency, avoid .finish()
.
Transactions are finished.
Inner error fields are named error
not source
Prefer Box<[T]>
over Vec
when the length does not change
Also collect directly into Box<[T]>
, not to Vec<T>
only to call Vec::into_boxed_slice
.
Check vs validate vs assert
To validate user input, parse don't validate: fn parse(input: &[u8]) -> Result<..., KantoError>
.
Check non-parsing related operational preconditions, where failing the check is within normal operation and depends on user input.
Think 42.checked_add(13)
.
Typically fn check_foo(...) -> Result<(), KantoError>
.
Validate invariants and such that are always expected to be true, and failing means corruption or internal error.
Typically fn validate_foo(...) -> Result<(), KantoError>
.
Assert additional, potentially costly, checks that are only done in debug builds.
Typically fn assert_foo(...)
with no return value.
It is less painful to type tracing::debug!
than to deal with import churn
Sometimes being explicit and verbose is actually the shorter and simpler way.
Manifesto
(For things we do have a lint for.)
Panic policy
The codebase still has undesirable use of .unwrap()
or .expect()
.
We are working to limit their use.
KantoDB should not panic regardless of library API misuse, corrupt or malicious database files, or user input (SQL queries, network protocol). Please report it on the community forum if you see an unexpected panic.
- using
.unwrap()
can be ok during development, but typically not in final code - can panic on out of memory; for now, this is the easy path, waiting for Rust Allocator changes and ecosystem to catch up
- ideally, never panic on corrupt data, but this is tough to enforce if panic on internal errors is allowed; fuzzing is only probabilistic
- debug mode only invariant checking can panic (use
debug_assert!
); this should be used only where it is impossible to trigger by file/network input - open question: should we panic on internal errors where we "know it will be
Some
" but can't express that through APIs; this should be used only where it is impossible to trigger by file/network input- current plan is to unify to
.expect("internal error XYZ: ...")
style, add lint that those are the only expects, and then let them stay in the code as long as a fuzzer doesn't hit them
- current plan is to unify to
- never panic on I/O error; however, default policy might change to controlled stop & recommend reboot
Resources:
- https://www.thecodedmessage.com/posts/2022-07-14-programming-unwrap/ (discussion: HN)
- https://blog.burntsushi.net/unwrap/
Decisions
Not using TableProvider.insert_into
That's a fine convenience, but since there's nothing comparable for UPDATE
or DELETE
, we're going to need the less convenient code path anyway.
Links:
- https://docs.rs/datafusion/latest/datafusion/catalog/trait.TableProvider.html#method.insert_into
- https://docs.rs/datafusion/latest/datafusion/physical_plan/insert/struct.InsertExec.html
- https://docs.rs/datafusion/latest/datafusion/physical_plan/insert/trait.DataSink.html
Developer glossary
Composite key
A key consisting of multiple values.
Primary key
Primary key identifies a row for tables that have PRIMARY KEY
specified in the schema.
It is a Composite key.
Row key
Whatever identifies a row, whether a Row ID or Primary key.
Row ID
Row ID identifies a row for tables that do not have PRIMARY KEY
specified in the schema.
It is a numeric sequential identifier automatically assigned for rows as needed.
Resources
Unorganized
rocksdb
crate docs- The Design of the Postgres Storage System (VLDB 1987)
- How CockroachDB's MVCC Model Works (RocksDB Meetup 2015, video, slides
- they store one key for metadata, then key per each MVCC version
- custom order logic to get MVCC versions sorted largest-first
- RocksDB transactions: https://github.com/facebook/rocksdb/wiki/Transactions#optimistictransactiondb, https://github.com/facebook/rocksdb/wiki/WritePrepared-Transactions, https://github.com/facebook/rocksdb/wiki/WriteUnprepared-Transactions
- RocksDB objects must be deleted before exit (or you crash with
"pure virtual method called" in destructor
): https://github.com/facebook/rocksdb/issues/649#issuecomment-119779477 via https://github.com/rust-rocksdb/rust-rocksdb/issues/463 - RocksDB merge operator: https://github.com/facebook/rocksdb/wiki/Merge-Operator, https://github.com/facebook/rocksdb/wiki/Merge-Operator-Implementation
- each RocksDB column family takes about 64 MiB RAM: https://github.com/facebook/rocksdb/issues/2426