![]() |
VOOZH | about |
26th February 2020
As discussed previously, the biggest hole in Datasette’s feature set at the moment involves writing to the database.
Datasette was born as a hack to abuse serverless, stateless hosting by bundling a static, immutable database as part of the deployment. The key idea was that for some use-cases—such as data journalism—you don’t need to be able to continually update your data. It’s just the facts that support the story you are trying to tell.
I also believed the conventional wisdom that SQLite is fine for reads but shouldn’t be trusted to handle web application writes. I no longer believe this to be the case: SQLite is great at handling writes, as millions of iPhone and Android apps will attest.
Meanwhile, the biggest blocker to people trying out Datasette is that they would need to convert their data to SQLite somehow in order to use it. I’ve been building a family of CLI tools for this, but that requires users to both be familiar with the command-line and to install software on their computers.
So: Datasette needs to grow web-based tools for loading data into the database.
Datasette’s plugin system is the ideal space for experimenting with ways of doing this, without needing to try out crazy new features on Datasette’s own core.
There’s just one big problem: SQLite may be great at fast, reliable writes but it still doesn’t like concurrent writes: it’s important to only ever have one connection writing to a SQLite database at a time.
I’ve been mulling over the best way to handle this for the best part of a year... and then a couple of days ago I had a breakthrough: with a dedicated write thread for a database file, I could use a Python queue to ensure only one write could access the database at a time.
There’s prior art for this: SQLite wizard Charles Leifer released code plus a beautiful explanation of how to queue writes to SQLite back in 2017. I’m not sure why I didn’t settle on his approach sooner.
So... Datasette 0.37, released this evening, has a new capability exposed to plugins: they can now request that an operation (either a SQL statement or a full custom Python function) be queued up to execute inside a thread that posesses an exclusive write connection to a SQLite database.
I’ve documented how plugins can use this in the new plugin internals documentation: execute_write() and execute_write_fn().
So far there’s only one public plugin that takes advantage of this: datasette-upload-csvs, which previously used a dirty hack but has now been upgraded to use the new execute_write_fn() method.
I’m really excited about the potential plugins this unlocks though. I experimented with a logging plugin and a plugin for deleting tables while I was building the hooks (full implementations of those are posted as comments in the pull request).
Other use-cases I’m interested to explore include:
This is Weeknotes: Datasette Writes by Simon Willison, posted on 26th February 2020.
csv 33 plugins 127 sqlite 467 threads 18 datasette 1,519 weeknotes 193Next: Weeknotes: datasette-ics, datasette-upload-csvs, datasette-configure-fts, asgi-csrf
Previous: Things I learned about shapefiles building shapefile-to-sqlite
Sponsor me for $10/month and get a curated email digest of the month's most important LLM developments.
Pay me to send you less!
Sponsor & subscribeWeeknotes: Datasette Writes - in which Datasette plugins gain the ability to send writes to the database! https://t.co/KPBT5TPn8z
— Simon Willison (@simonw) February 26, 2020