All the key information in this file was generated by an LLM. Treat it as a starting point, don't ever run auto-generated code without a sandbox unless you have checked it yourself.
The primary intent for this analysis is to go from an object to all the tools you need to use this library. The hardest part is the object-to-relational conversion, of converting a (potentially) nested json into flat tables that can be queried.
We will generate the following things:
- A typespec - We need good type definition for our incoming object. You might already have this, in which case make sure to provide a string version to the function to treat this as a starting point.
- DDL - The SQLite table descriptors that can hold the information in the object.
- Structured DDL - This is an internal type that is used by the library to add additional information - like example values, potential min and max ranges, which fields are visible to the LLM converting natural language to query, etc. Most of this is metadata intended for the library and the LLM, and are stripped out when creating the tables.
- ObjectToRow - This is ostensibly the hardest function out of the bunch, so treat it as a starting point. It should do the drudge work of writing a function to turn every field in the JSON into a flat string array that sqlite can ingest.
interface Genre {
id: number;
name: string;
}
interface ProductionCompany {
name: string;
id: number;
}
interface ProductionCountry {
iso_3166_1: string;
name: string;
}
interface SpokenLanguage {
iso_639_1: string;
name: string;
}
interface Movie {
adult: string;
budget: string;
genres: Genre[];
id: string;
imdb_id: string;
original_language: string;
original_title: string;
overview: string;
popularity: string;
production_companies: ProductionCompany[];
production_countries: ProductionCountry[];
release_date: string;
revenue: string;
runtime: string;
spoken_languages: SpokenLanguage[];
status: string;
tagline: string;
title: string;
vote_average: string;
vote_count: string;
}
-- Main Movies Table
CREATE TABLE Movies (
id TEXT, -- Movie ID
adult TEXT, -- Is adult content
budget_int INT, -- Budget as integer
original_language TEXT, -- Original language
original_title TEXT, -- Original title
overview TEXT, -- Overview description
popularity_float FLOAT, -- Popularity as float
release_date TEXT, -- Release date
revenue_int INT, -- Revenue as integer
runtime_int INT, -- Runtime as integer
status TEXT, -- Status
tagline TEXT, -- Tagline
title TEXT, -- Title
vote_average_float FLOAT, -- Vote average as float
vote_count_int INT -- Vote count as integer
);
-- Genres Table
CREATE TABLE Genres (
movie_id TEXT, -- Corresponding movie ID
genre_id INT, -- Genre ID
genre_name TEXT -- Genre name
);
-- Production Companies Table
CREATE TABLE ProductionCompanies (
movie_id TEXT, -- Corresponding movie ID
company_id INT, -- Company ID
company_name TEXT -- Company name
);
-- Production Countries Table
CREATE TABLE ProductionCountries (
movie_id TEXT, -- Corresponding movie ID
country_code TEXT, -- Country code
country_name TEXT -- Country name
);
-- Spoken Languages Table
CREATE TABLE SpokenLanguages (
movie_id TEXT, -- Corresponding movie ID
language_code TEXT, -- Language code
language_name TEXT -- Language name
);
const moviesTable: DDLTable = {
name: 'Movies',
columns: [
{
name: 'id',
columnSpec: 'TEXT',
description: 'Movie ID',
},
{
name: 'adult',
columnSpec: 'TEXT',
description: 'Is adult content',
},
{
name: 'budget_int',
columnSpec: 'INT',
description: 'Budget as integer',
},
{
name: 'original_language',
columnSpec: 'TEXT',
description: 'Original language',
},
{
name: 'original_title',
columnSpec: 'TEXT',
description: 'Original title',
},
{
name: 'overview',
columnSpec: 'TEXT',
description: 'Overview description',
},
{
name: 'popularity_float',
columnSpec: 'FLOAT',
description: 'Popularity as float',
},
{
name: 'release_date',
columnSpec: 'TEXT',
description: 'Release date',
},
{
name: 'revenue_int',
columnSpec: 'INT',
description: 'Revenue as integer',
},
{
name: 'runtime_int',
columnSpec: 'INT',
description: 'Runtime as integer',
},
{
name: 'status',
columnSpec: 'TEXT',
description: 'Status',
},
{
name: 'tagline',
columnSpec: 'TEXT',
description: 'Tagline',
},
{
name: 'title',
columnSpec: 'TEXT',
description: 'Title',
},
{
name: 'vote_average_float',
columnSpec: 'FLOAT',
description: 'Vote average as float',
},
{
name: 'vote_count_int',
columnSpec: 'INT',
description: 'Vote count as integer',
},
]
};
const genresTable: DDLTable = {
name: 'Genres',
columns: [
{
name: 'movie_id',
columnSpec: 'TEXT',
description: 'Corresponding movie ID',
foreignKey: {
table: 'Movies',
column: 'id'
}
},
{
name: 'genre_id',
columnSpec: 'INT',
description: 'Genre ID',
},
{
name: 'genre_name',
columnSpec: 'TEXT',
description: 'Genre name',
}
]
};
const productionCompaniesTable: DDLTable = {
name: 'ProductionCompanies',
columns: [
{
name: 'movie_id',
columnSpec: 'TEXT',
description: 'Corresponding movie ID',
foreignKey: {
table: 'Movies',
column: 'id'
}
},
{
name: 'company_id',
columnSpec: 'INT',
description: 'Company ID',
},
{
name: 'company_name',
columnSpec: 'TEXT',
description: 'Company name',
}
]
};
const productionCountriesTable: DDLTable = {
name: 'ProductionCountries',
columns: [
{
name: 'movie_id',
columnSpec: 'TEXT',
description: 'Corresponding movie ID',
foreignKey: {
table: 'Movies',
column: 'id'
}
},
{
name: 'country_code',
columnSpec: 'TEXT',
description: 'Country code',
},
{
name: 'country_name',
columnSpec: 'TEXT',
description: 'Country name',
}
]
};
const spokenLanguagesTable: DDLTable = {
name: 'SpokenLanguages',
columns: [
{
name: 'movie_id',
columnSpec: 'TEXT',
description: 'Corresponding movie ID',
foreignKey: {
table: 'Movies',
column: 'id'
}
},
{
name: 'language_code',
columnSpec: 'TEXT',
description: 'Language code',
},
{
name: 'language_name',
columnSpec: 'TEXT',
description: 'Language name',
}
]
};
import { Database, SqlValue } from 'sql.js';
function objectToRows(movies: Movie[]): string[][][] {
const rows: string[][][] = [];
movies.forEach(movie => {
// Movies table
const movieRow = [
movie.id,
movie.adult,
movie.budget ? movie.budget.toString() : '0',
movie.original_language,
movie.original_title,
movie.overview,
movie.popularity ? movie.popularity.toString() : '0',
movie.release_date,
movie.revenue ? movie.revenue.toString() : '0',
movie.runtime ? movie.runtime.toString() : '0',
movie.status,
movie.tagline,
movie.title,
movie.vote_average ? movie.vote_average.toString() : '0',
movie.vote_count ? movie.vote_count.toString() : '0'
];
rows.push([movieRow]);
// Genres table
if (movie.genres) {
movie.genres.forEach(genre => {
const genreRow = [
movie.id,
genre.id,
genre.name
];
rows.push([genreRow]);
});
}
// Production companies table
if (movie.production_companies) {
movie.production_companies.forEach(company => {
const companyRow = [
movie.id,
company.id,
company.name
];
rows.push([companyRow]);
});
}
// Production countries table
if (movie.production_countries) {
movie.production_countries.forEach(country => {
const countryRow = [
movie.id,
country.iso_3166_1,
country.name
];
rows.push([countryRow]);
});
}
// Spoken languages table
if (movie.spoken_languages) {
movie.spoken_languages.forEach(language => {
const languageRow = [
movie.id,
language.iso_639_1,
language.name
];
rows.push([languageRow]);
});
}
});
return rows;
}
See [Examples] for an example of how to use this code to create an instance of the search engine.
{
"adult": "False",
"budget": "30000000",
"genres": [
{
"id": 16,
"name": "Animation"
},
{
"id": 35,
"name": "Comedy"
},
{
"id": 10751,
"name": "Family"
}
],
"id": "862",
"imdb_id": "tt0114709",
"original_language": "en",
"original_title": "Toy Story",
"overview": "Led by Woody, Andy's toys live happily in his room until Andy's birthday brings Buzz Lightyear onto the scene. Afraid of losing his place in Andy's heart, Woody plots against Buzz. But when circumstances separate Buzz and Woody from their owner, the duo eventually learns to put aside their differences.",
"popularity": "21.946943",
"production_companies": [
{
"name": "Pixar Animation Studios",
"id": 3
}
],
"production_countries": [
{
"iso_3166_1": "US",
"name": "United States of America"
}
],
"release_date": "1995-10-30",
"revenue": "373554033",
"runtime": "81.0",
"spoken_languages": [
{
"iso_639_1": "en",
"name": "English"
}
],
"status": "Released",
"tagline": "",
"title": "Toy Story",
"vote_average": "7.7",
"vote_count": "5415"
}
Here is how I would structure SQLite tables to hold the movie data following the guidelines:
This will be the main table holding the core movie information.
CREATE TABLE Movies (
id TEXT, -- Movie ID
adult TEXT, -- Is adult content
budget_int INT, -- Budget as integer
original_language TEXT, -- Original language
original_title TEXT, -- Original title
overview TEXT, -- Overview description
popularity_float FLOAT, -- Popularity as float
release_date TEXT, -- Release date
revenue_int INT, -- Revenue as integer
runtime_int INT, -- Runtime as integer
status TEXT, -- Status
tagline TEXT, -- Tagline
title TEXT, -- Title
vote_average_float FLOAT, -- Vote average as float
vote_count_int INT -- Vote count as integer
);
- I used descriptive column names with suffixes to indicate the datatype where it may be ambiguous like integers or floats.
- The core movie info is in this single table.
The genres data can be flattened into a single table since it's a simple 1-to-many relationship.
CREATE TABLE Genres (
movie_id TEXT, -- Corresponding movie ID
genre_id INT, -- Genre ID
genre_name TEXT -- Genre name
);
- This contains the genre information for each movie in a flattened form, with movie ID copied for each genre to maintain the relationship.
Same flattening approach for production companies:
CREATE TABLE ProductionCompanies (
movie_id TEXT, -- Corresponding movie ID
company_id INT, -- Company ID
company_name TEXT -- Company name
);
And same for production countries:
CREATE TABLE ProductionCountries (
movie_id TEXT, -- Corresponding movie ID
country_code TEXT, -- Country code
country_name TEXT -- Country name
);
And finally, the spoken languages:
CREATE TABLE SpokenLanguages (
movie_id TEXT, -- Corresponding movie ID
language_code TEXT, -- Language code
language_name TEXT -- Language name
);
So in summary, one main Movies table containing the core info, and additional flattened tables for the 1-to-many relationships to simplify the structure. The movie_id column is used to maintain the relationship between the main Movies table and the other tables.
Some columns may need comments for additional clarity on things like _int suffixes indicating data type. But overall this provides a reasonably simple normalized structure for the data.