What are the properties of file system UX and how we implemented it in postgres.
Before we begin, here's the table SQL
CREATE TABLE documents (
id text NOT NULL,
collection_id text NULL,
created_at timestamptz NULL,
updated_at timestamptz NULL,
deleted_at timestamptz NULL,
url text NULL,
"type" text NULL,
size_bytes int8 NULL,
"name" text NULL,
"path" ltree NULL,
status text NULL DEFAULT 'pending'::text,
status_text text NULL,
nbx_bucket bool NULL DEFAULT false,
size_k_b int8 NULL,
metadata jsonb NULL,
CONSTRAINT documents_pkey PRIMARY KEY (id),
CONSTRAINT fk_documents_collection FOREIGN KEY (collection_id) REFERENCES public.collections(id)
);
Remember:
- file systems are glorified key:value pairs
- we give meaning to key by spltting around
/
Have you ever seen two files with the same path? Even the idea feels weird, that is because all files are unique. However in our flat tables we can have any number of same name
because our primary is id. So we created a unique index on (collection_id, name)
but what happens when a soft delete happens and we try to create a file with the same name. In this case the logic will fail, so we have to add deleted_at
as a column in the unique command:
ALTER TABLE public.documents
ADD CONSTRAINT
uq_documents_deleted_collection_name unique NULLS NOT DISTINCT (deleted_at, collection_id, name);
See the big NULLS NOT DISTINCT
, that is because before Postgres 15 nulls were treated differently, read stackoverflow. So we upgraded out Postgres to v16 and fixed this.
Now you can keep on deleting and keep creating same file again and again.
File system use the simple /
split key value. Filesystems show folders and files both really fast, a simple ls -la
shows you this:
-rw-r--r-- 1 yashbonde staff 480 Sep 20 17:22 .env
-rw-r--r-- 1 yashbonde staff 190 Sep 19 19:08 README.md
drwxr-xr-x 5 yashbonde staff 160 Sep 18 12:30 armoury
where the d
at the start tells that it is a directory. This can then be brought to life in a UI that contains folders and files as icons. This means that a simple list-table
kinda structure would need to be modified for filepaths.
Assume we have the file structure like this:
├── a
│ ├── b.pdf
│ └── c.pdf
├── d.pdf
├── e.txt
└── f
│ ├── g.txt
│ └── h
│ └── i.pdf
One method is to use LIKE
operator, read stackoverflow. But it solves only one part of the problem which is the start. Imagine by some sorting order these are the files in the table:
a/b.pdf
e.txt
d.pdf
a/c.pdf
f/h/i.pdf
f/g.pdf
If you do an LIKE
with a/
as prefix you get [a/b.pdf, a/c.pdf]
What if I ask you what are all the folders here the answer should be [a, f]
and if I ask at prefix=f
then should be [h]
. This is a very useful addition to the file system UI. Doing this with LIKE is almost impossible to get right.
We use an extension called ltree
, read pg doc. This allows us to store data in a herarchical format so we can do complex queries like an actual prefix match. We need to create a new index for this:
CREATE EXTENSION IF NOT EXISTS ltree;
ADD CONSTRAINT uq_documents_deleted_collection_name unique NULLS NOT DISTINCT (deleted_at, collection_id, name);
create index tree_path_idx on public.documents using gist ("path");
Without going too much in implementation:
- we add a concept of root folder
_r
that is added as the fist element in the path - we get speed ups over doing LIKE where all the strings would be matched in all the rows. There is potential for funny business with
offset
andlimit
where you might end up interating over the entire dataset to find matches. so you can get all the files inside a folder with query like:
select *
FROM public.documents d
where
d.path = '_r.baz'
offset 0
limit 4;
- folders became an absolute breeze with powerful queries which can group and aggregate complex information about folders which would have been impossible without
ltree
:
select
replace(ltree2text(d.path), '_r.', '') as folder_name,
COUNT(*) as file_count,
SUM(d.size_bytes) as indexed_size,
SUM(d.hit_count) as hits,
FROM public.documents d
where
d.path <@ '_r' and
nlevel(d.path) = 2
group by d.path;
A single API:
With these you should be able to build all the possible reasonable file system UIs:
- icons view
- list view
- gallery: More information with creative covers can be shown
- columns (nested-list view): different list are shown for different folders, depth is towards the right (depth in armoury is locked to 4 folders. So UI can be built with that assumption)