Practical AI with Postgres
There's a lot of excitement around AI, and even more discussion than excitement. The question of Postgres and AI isn't a single question, there are a ton of paths you can take under that heading...
- Can I use Postgres for building AI related apps? Absolutely.
- Can Postgres become an autonomous database with AI? Maybe.
- Should you use AI to help you troubleshoot issues? Perhaps, but better left to experts.
- Should I avoid AI when working with my database? Not at all!
We've already talked about the first item previously here. Today we want to drill into the last item on the list and look at some very practical use cases for using it. In particular ChatGPT can help you be more effective when working with your database.
The basics
ChatGPT is absolutely wonderful at the basics of helping you generate schema, queries, and basic Postgres commands. The PostgreSQL docs themselves are wonderful if you know what to look for. They are wonderful reference docs. But if you don't know what to look for (you mean common table expressions and window functions aren't in your everyday lexicon?) then you find yourself hunting and pecking. But before I get too far ahead of myself, let’s get started with the very basics. I want a schema we can use as an example:
I asked Chat GPT: Can you help me generate a schema for a multi-tenant CRM app in Postgres?
And here's what it gave me:
CREATE TABLE tenants (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE TABLE users (
id SERIAL PRIMARY KEY,
tenant_id INTEGER NOT NULL,
email VARCHAR(255) NOT NULL UNIQUE,
password VARCHAR(255) NOT NULL,
first_name VARCHAR(255) NOT NULL,
last_name VARCHAR(255) NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW(),
FOREIGN KEY (tenant_id) REFERENCES tenants(id) ON DELETE CASCADE
);
CREATE TABLE accounts (
id SERIAL PRIMARY KEY,
tenant_id INTEGER NOT NULL,
name VARCHAR(255) NOT NULL,
website VARCHAR(255),
phone VARCHAR(255),
address VARCHAR(255),
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW(),
FOREIGN KEY (tenant_id) REFERENCES tenants(id) ON DELETE CASCADE
);
CREATE TABLE contacts (
id SERIAL PRIMARY KEY,
tenant_id INTEGER NOT NULL,
first_name VARCHAR(255) NOT NULL,
last_name VARCHAR(255) NOT NULL,
email VARCHAR(255) NOT NULL UNIQUE,
phone VARCHAR(255),
address VARCHAR(255),
account_id INTEGER NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW(),
FOREIGN KEY (tenant_id) REFERENCES tenants(id) ON DELETE CASCADE,
FOREIGN KEY (account_id) REFERENCES accounts(id) ON DELETE CASCADE
);
CREATE TABLE deals (
id SERIAL PRIMARY KEY,
tenant_id INTEGER NOT NULL,
name VARCHAR(255) NOT NULL,
amount DECIMAL(10,2),
close_date DATE,
stage VARCHAR(255),
account_id INTEGER NOT NULL,
contact_id INTEGER NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW(),
FOREIGN KEY (tenant_id) REFERENCES tenants(id) ON DELETE CASCADE,
FOREIGN KEY (account_id) REFERENCES accounts(id) ON DELETE CASCADE,
FOREIGN KEY (contact_id) REFERENCES contacts(id) ON DELETE CASCADE
);
In fact it gave me even more tables including activities
. I personally wanted to trim this example so our blog post wasn't so long, but you'd lose quite a bit. Now is this schema perfect? No. But did it save me a ton of time? 💯 it very much did. A quick look at changes I might make:
- Would use
bigint
likely for the primary key - May or may not have
ON DELETE CASCADE
in a lot of cases logical deletes can be nice - Lots of use of
VARCHAR(255)
in places that may not be ideal such as phone, address, email.
ChatGPT did do some things I especially like on this schema which includes:
- Putting the
tenant_id
on all the tables - Setting up foreign keys
Onward to data
Let’s keep progressing, this is maybe my favorite use for ChatGPT which is generating fake data. The tables above aren't entirely complex, but it takes some time to generate data. I simply asked ChatGPT:
Can you help me generate some sample records for the above schema?
And viola:
INSERT INTO tenants (name, created_at) VALUES ('Acme Inc.', NOW());
INSERT INTO tenants (name, created_at) VALUES ('Globex Corporation', NOW());
INSERT INTO users (tenant_id, email, password, first_name, last_name, created_at, updated_at)
VALUES (1, 'johndoe@acme.com', 'password123', 'John', 'Doe', NOW(), NOW());
INSERT INTO users (tenant_id, email, password, first_name, last_name, created_at, updated_at)
VALUES (1, 'janedoe@acme.com', 'password456', 'Jane', 'Doe', NOW(), NOW());
INSERT INTO users (tenant_id, email, password, first_name, last_name, created_at, updated_at)
VALUES (2, 'bobsmith@globex.com', 'password789', 'Bob', 'Smith', NOW(), NOW());
INSERT INTO accounts (tenant_id, name, website, phone, address, created_at, updated_at)
VALUES (1, 'Acme Corporation', '<https://www.acme.com>', '555-1234', '123 Main St., Anytown, USA', NOW(), NOW());
INSERT INTO accounts (tenant_id, name, website, phone, address, created_at, updated_at)
VALUES (2, 'Globex Inc.', '<https://www.globex.com>', '555-5678', '456 Broad St., Anytown, USA', NOW(), NOW());
INSERT INTO contacts (tenant_id, first_name, last_name, email, phone, address, account_id, created_at, updated_at)
VALUES (1, 'Tom', 'Jones', 'tjones@acme.com', '555-4321', '456 Oak St., Anytown, USA', 1, NOW(), NOW());
INSERT INTO contacts (tenant_id, first_name, last_name, email, phone, address, account_id, created_at, updated_at)
VALUES (2, 'Alice', 'Johnson', 'ajohnson@globex.com', '555-8765', '789 Maple St., Anytown, USA', 2, NOW(), NOW());
INSERT INTO deals (tenant_id, name, amount, close_date, stage, account_id, contact_id, created_at, updated_at)
VALUES (1, 'New Contract', 10000.00, '2023-06-01', 'Proposal', 1, 1, NOW(), NOW());
INSERT INTO deals (tenant_id, name, amount, close_date, stage, account_id, contact_id, created_at, updated_at)
VALUES (2, 'Expansion Deal', 5000.00, '2023-07-01', 'Negotiation', 2, 2, NOW(), NOW());
INSERT INTO activities (tenant_id, type, subject, due_date, completed, deal_id, contact_id, user_id, created_at, updated_at)
VALUES (1, 'Call', 'Follow up on proposal', '2023-05-15', false, 1, 1, 1, NOW(), NOW());
INSERT INTO activities (tenant_id, type, subject, due_date, completed, deal_id, contact_id, user_id, created_at, updated_at)
VALUES (2, 'Email', 'Send contract terms', '2023-05-20', false, 2, 2, 3, NOW(), NOW());
Is it just me or does it feel like asking nicely "Can you" is a smart choice vs. just demanding?
At first ChatGPT only gave me data for 2 tenants, so I asked it for 10 tenants, and it kindly obliged. I only mention that because tenants number 3 and 4 I especially loved Wayne Enterprises
and Stark Industries
.
Indexing
In general the best plan for indexing is not to overcomplicate it. Wait to see some of the workload of your application, then add indexes accordingly. One of the best tools for monitoring what consumes time in your database is via the pg_stat_statements extension. But let’s still see what ChatGPT can do to help us here:
What would a good indexing scheme be for common queries if I were to build an app on top of this?
To determine a good indexing scheme for common queries, it's important to
understand the types of queries that will be executed frequently in the
application.
Here are some common queries that may be executed in a CRM app:
Retrieve all the customers for a given tenant.
Retrieve all the orders for a given customer.
Retrieve all the orders for a given tenant.
Retrieve all the products for a given tenant.
Retrieve all the users for a given tenant.
Retrieve all the tasks for a given user.
Retrieve all the tasks for a given customer.
Based on these queries, here are some suggested indexes:
Index on the customers table: (tenant_id).
Index on the orders table: (customer_id).
Index on the orders table: (tenant_id).
Index on the products table: (tenant_id).
Index on the users table: (tenant_id).
Index on the tasks table: (user_id).
Index on the tasks table: (customer_id).
It's important to keep in mind that adding too many indexes can slow down write
performance, so it's important to only add indexes that are necessary for
frequently executed queries. Additionally, you should monitor the performance of
your queries and adjust your indexing scheme as necessary.
At first glance we’ve got a decent set of inputs here and could easily create indexes on this, and ChatGPT could help us with the commands for creating those indexes. If you do follow the above approach of creating a bunch of stock indexes just make sure to clean up your unused indexes over time.
Reporting
How’s your SQL? Do you love writing SQL and use it as the first tool you reach for when you want to know something about your data? No? You’re not alone, the majority of developers aren’t SQL experts, that doesn’t mean SQL isn’t the right tool for the job. Just because you’re not an expert at SQL doesn’t mean you should be scared of it. ChatGPT is great at helping you become more familiar with SQL by expressing in plain english what you want and getting SQL out.
I want to create a report that shows me open opportunities that are expected to close this quarter, can you help me write a query that would generate that report?
SELECT *
FROM opportunities
WHERE status = 'Open'
AND close_date BETWEEN DATE_TRUNC('quarter', CURRENT_DATE) AND DATE_TRUNC('quarter', CURRENT_DATE) + INTERVAL '3 months' - INTERVAL '1 day';
In cases ChatGPT’s SQL may not be perfect, it may pick wrong columns that don’t actually exist or do the wrong calculation. While ChatGPT isn’t perfect it does offer a good method for quickly iterating often faster than having to go reference SQL that you’re looking to create.
More SQL usage
As we talked about earlier PostgreSQL has great reference documentation–IF you know what you’re looking for. But if you don’t know the name of the function to truncate a date is date_trunc
then you may find yourself spending extra time to guess how Postgres references it or how that is classified in the docs. When it comes to how do I do X in SQL ChatGPT is great for this:
How do I round a date to the nearest date in Postgres?
ChatGPT:
To round a date to the nearest date in PostgreSQL, you can use the date_trunc
function in combination with the round
function. Here's an example:
SELECT round(date_trunc('day', your_date_column)::date) AS rounded_date
FROM your_table;
In this example, replace your_date_column
with the actual name of the column that contains the date you want to round, and your_table
with the name of the table where the column resides.
The date_trunc
function is used to truncate the time portion of the date, keeping only the year, month, and day. The round
function is then used to round the truncated date to the nearest day.
By casting the result of date_trunc
to date
using ::date
, we ensure that the result is a proper date data type.
After executing this query, you will get a result set with a column named rounded_date
, which contains the rounded dates for each row in the table.
SQL is your friend and ChatGPT is it’s side kick
You can have a lot of fun with ChatGPT and go down a lot of rabbit holes. Want to have ChatGPT create a four course menu themed around Postgres? Go for it, we won’t stop you. In fact, you can use our Postgres Playground to load data and play with Postgres queries. We’ve seen a lot of takes ranging from I’m smarter than ChatGPT because I found out something it said wrong to how it’s going change anything.
The simple reality is ChatGPT can help you work better with your database. We hope these practical examples make you a little less afraid of SQL and your database.
Related Articles
- Sidecar Service Meshes with Crunchy Postgres for Kubernetes
12 min read
- pg_incremental: Incremental Data Processing in Postgres
11 min read
- Smarter Postgres LLM with Retrieval Augmented Generation
6 min read
- Postgres Partitioning with a Default Partition
16 min read
- Iceberg ahead! Analyzing Shipping Data in Postgres
8 min read