- Published on
- -5 min read
Building a Simple DB in Rust - Part 3 - Less Basic Execution
This article is part of the Building a Simple DB in Rust series.
Table of Contents
Where we left off
In the last post we ended up with extremely basic query execution. We could parse queries, update our tables in memory, and read full tables back; all while having decent error messages throughout.
While that was a great milestone it leaves a lot to be desired
- While we parse columns in select queries, we never actually project/use them, so you always get all the columns back
- We aren't validating types on insert, so the column types mean nothing
So those main points are what I want to tackle for this post
This post is much less of a tutorial/guide than the previous posts. I needed to cleanup what I have so the changes are pretty straightforward, small, and all over. Check the repo for more detailed changes.
Actually selecting columns
For a refresher, here's what the select response looks like
pub enum ExecResponse<'a> {
Select(Vec<Row<'a>>),
Insert,
Create,
}
pub struct Row<'a> {
id: usize,
columns: Rc<ColumnInfo>,
data: &'a HashMap<String, String>, // reference to the actual row in the table struct
}
So the select response is just a Vec
over rows, each row holding info on the columns of the table and the data for that row.
There's a few things not great about this
- Why return a
Vec
when we could just return an iterator, the consumer is going to iterate over it anyway - The select response does not return any column info outside of the rows, so empty results would not give any column info
- The data in the row is just a reference to the table, not horrible for
select *
but need to modify it to only hold the columns we care about- We probably will update this type over time to reduce space but for now we'll stick with a hash map
To address the first 2 issues, we can just return our existing TableIter
struct, this is an iterator, and it holds column info
pub enum ExecResponse<'a> {
Select(TableIter<'a>),
Insert,
Create,
}
For real let's do a select
We didn't actually have a select function on Table
before, so let's add one that's validates the columns we pass in actually exist
#[derive(Debug, Clone, Default, Serialize, Deserialize, derive_more::From)]
pub struct ColumnInfo {
columns: Vec<Column>,
}
impl ColumnInfo {
pub fn iter(&self) -> impl Iterator<Item = &Column> {
self.columns.iter()
}
pub fn find_column(&self, column_name: &String) -> Result<&Column, QueryExecutionError> {
self.iter()
.find(|col| col.name == *column_name)
.ok_or_else(|| QueryExecutionError::ColumnDoesNotExist(column_name.to_owned()))
}
}
// Also updated Table.columns to be `ColumnInfo` instead of Vec<Column>
impl Table {
pub fn select(&self, columns: Vec<String>) -> Result<TableIter, QueryExecutionError> {
let selected_columns = columns
.into_iter()
.map(|column_name| {
self.columns
.find_column(&column_name)
.map(|col| col.clone())
})
.collect::<Result<Vec<_>, _>>()?;
let col_info: Rc<ColumnInfo> = Rc::new(selected_columns.into());
Ok(TableIter::new(self.rows.iter(), col_info))
}
}
Since the select function validates all the columns exist, the iterator can avoid the same error checking like so
impl<'a> Iterator for TableIter<'a> {
type Item = Row<'a>;
fn next(&mut self) -> Option<Self::Item> {
self.map_iter.next().map(|(id, data)| {
let projected_data = data
.data // the stored row hashmap
.iter()
.filter_map(|(key, value)| self.columns.find_column(key).ok().map(|_| (key, value)))
.collect();
Row::new(self.columns.clone(), *id, projected_data)
})
}
}
All the iterator does is go over the stored table's b-tree, for each row it filters the hash map of row data to only the columns we selected.
Doing type checking
Right now we track column types when you create tables
CREATE TABLE foo (
col1 int,
col2 string
);
but we can just throw anything in there
INSERT INTO foo
VALUES
thisIsNotAnInt, 3
to have proper type checking we first need to update the insert statement to be type aware (instead of having every value be strings).
So let's make a SQL value type
#[derive(Debug, Clone, PartialEq, Eq, Hash, Serialize, Deserialize, Display)]
pub enum Value {
Number(BigDecimal),
String(String),
}
Here we just make an enum of what types/values we support in our SQL language. Here I decided to have one number representation with BigDecimal
instead of having Int
/Float
types. Eventually I will, I'm just getting the basics in.
Now we need to parse these values
/// Parse a single quoted string value
fn parse_string_value(input: RawSpan<'_>) -> ParseResult<'_, Value> {
// TODO: look into https://github.com/rust-bakery/nom/blob/main/examples/string.rs
// for escaped strings
let (remaining, (_, str_value, _)) = context(
"String Literal",
tuple((
tag("'"),
take_until("'").map(|s: RawSpan| Value::String(s.fragment().to_string())),
tag("'"), // take_until does not consume the ending quote
)),
)(input)?;
Ok((remaining, str_value))
}
/// Parse a numeric literal
fn parse_number_value(input: RawSpan<'_>) -> ParseResult<'_, Value> {
let (remaining, digits) =
context("Number Literal", take_while(|c: char| c.is_numeric()))(input)?; // TODO: handle floats
let digits = digits.fragment();
Ok((
remaining,
Value::Number(BigDecimal::from_str(digits).unwrap()),
))
}
impl<'a> Parse<'a> for Value {
fn parse(input: RawSpan<'a>) -> ParseResult<'a, Self> {
context(
"Value",
preceded(
multispace0,
terminated(
// peek_then_cut will give better errors so if we see
// a quote we knows to only try to parse a string
alt((peek_then_cut("'", parse_string_value), parse_number_value)),
multispace0,
),
),
)(input)
}
}
The parsing logic is mostly straightforward, the main weirdness you might be confused by is why the calls to .fragment()
? That is because the input to the parse functions is a RawSpan
, you need to call fragment()
to get the actual string from the span.
Now when we wants strings we will need to quote them like "aString"
Now that we have this value type, we need to replace all our string values with this new type. This change was sorta all over, if you want a good diff this commit has the needed changes for that. This is still probably a little wasteful representation, but it starts us on the path to better types and performance.
We can now check the value on insert and see if it lines up with the given column's data type
impl Table {
pub fn insert(&mut self, values: Vec<Value>) -> Result<(), QueryExecutionError> {
let id = self
.rows
.last_key_value()
.map_or(0, |(max_id, _)| max_id + 1);
let row = values
.into_iter()
.zip(self.columns.iter())
.map(|(value, col)| match (col.type_info, value) {
(SqlTypeInfo::String, v @ Value::String(_)) => Ok((col.name.to_owned(), v)),
(SqlTypeInfo::Int, v @ Value::Number(_)) => Ok((col.name.to_owned(), v)), // TODO: when we add floats make sure number is an int
(_,v) => Err(QueryExecutionError::InsertTypeMismatch(col.type_info, v)),
})
.collect::<Result<HashMap<_, _>,_>>()?;
self.rows.insert(id, row.into());
Ok(())
}
}
Here it's pretty basic, we match on the tuple of column type and value, if the value and type match up we allow the insert, otherwise error with a type mismatch. Later on we could add some logic for implicit conversions for things like converting int
s to float
s but for now we are going to be strict.
So now we can run
CREATE TABLE sad (
col1 int,
col2 string
);
INSERT INTO sad
VALUES
1, 2;
and get back
× Value 2 can not be inserted into a String column
For the happy path
CREATE TABLE foo (
col1 int,
col2 string
);
INSERT INTO foo
VALUES
1, 'aString';
INSERT INTO foo
VALUES
4, 'aDiffString with spaces';
SELECT
col1,
col2
FROM
foo;
will return
+------+-------------------------+
| col1 | col2 |
+------+-------------------------+
| 1 | aString |
+------+-------------------------+
| 4 | aDiffString with spaces |
+------+-------------------------+
Wrap up
This is definitely getting to the point in every personal project where the initial burst of interest starts to wane. While I would love to focus on more "sexy" features like where clauses, joins, and saving the DB to disk; I really needed to clean up what I had.
I do plan on focusing on some other side projects for now (and may write some posts on them), but I do plan on focusing more on the DB soon.
If you really want some more I would highly recommend trying this yourself! As you can probably tell from reading this posts, I'm far from a DB expert. Just start somewhere, you would be impressed with how much you can accomplish already and how much you will learn as you go!