Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more

Ruby on Rails Neighbor Gem for AI Embeddings

Avatar for Christopher Winslett

Christopher Winslett

3 min read

Over the past 12 months, AI has taken over budgets and initiatives. Postgres is a popular store for AI embedding data because it can store, calculate, optimize, and scale using the pgvector extension. A recently introduced gem to the Ruby on Rails ecosystem, the neighbor gem, makes working with pgvector and Rails even better.

Background on AI in Postgres

An “embedding” is a set of floating point values that represent the characteristics of a thing (nothing new, we’ve had these since the 70s). Using the OpenAI API or any of their competitors, you can send over blocks of text, images, and pdfs, and OpenAI will return an embedding with 1536 values representing the characteristics. With the pgvector extension, you can store that embedding in a vector column type on Postgres. Then, using nearest neighbor calculations, you can then find the most-similar objects. For a deeper review of AI with Postgres, see my previous posts in this series.

The neighbor gem

By default, Ruby on Rails does not know about the "vector" data type. If you've used Ruby on Rails + Postgres + pgvector, you've probably written SQL queries in your migrations, and implemented some other janky-code. The neighbor gem will remove the janky-code, and take you back to a native ActiveRecord experience.

At a minimum, all you have to do is add the following to you Gemfile:

gem 'neighbor'

Side note: I can't overstate the impact Andrew Kane has had on embedding data in Postgres. He's also making it easy for developers to use those vector data types with Ruby on Rails and Node.

Fixed schema dump

The biggest risk of not using Neighbor is that ActiveRecord will create a failing db/schema.rb file. Because ActiveRecord does not understand the vector data type, instead of failing, running rails db:schema:dump will omit any table with that data type. It will show this error in your db/schema.rb:

# Could not dump table "recipe_embeddings" because of following StandardError
#   Unknown type 'vector(1536)' for column 'embedding'

With Neighbor, you'll get a fully-functional schema like the following:

create_table "recipe_embeddings", primary_key: "recipe_id", id: :bigint, default: nil, force: :cascade do |t|
    t.vector "embedding", limit: 1536, null: false
    t.datetime "created_at", null: false
    t.datetime "updated_at", null: false
    t.index ["embedding"], name: "recipe_embeddings_embedding", opclass: :vector_l2_ops, using: :hnsw
    t.index ["recipe_id"], name: "index_recipe_embeddings_on_recipe_id"
end

Notice that Neighbor also understands the []hnsw index type](https://www.crunchydata.com/blog/hnsw-indexes-with-postgres-and-pgvector) released with pgvector 0.5.

Side note: for projects that go all-in on Postgres, I opt to use the following to dump to a db/structure.sql:

SCHEMA_FORMAT=sql rails db:schema:dump

Easier migrations + data type handling

Without Neighbor, ActiveRecord is not informed of vector. Just as your config/schema.rb file is important for your typical migration would look something like the following:

create_table :recipe_embeddings, primary_key: [:recipe_id] do |t|
  t.references :recipe, null: false, foreign_key: true
  t.vector :embedding, limit: 1536, null: false

  t.timestamps
end

Additionally, you get improved handling of the vector data type. Without Neighbor, working with embedding data required to_s to manipulate the values when inserting into Postgres. But, with Postgres, it's simplifies to a native process:

RecipeEmbedding.create!(recipe_id: Recipe.last.id, embedding: [-0.078427136, 0.0014401458, ...])

But, wait! There's more …

The nearest_neighbor method

After you add the embedding column to a table, you can use has_neighbors to define your nearest neighbor queries:

class RecipeEmbedding < ApplicationRecord
  has_neighbors :embedding
end

Then, you can find the nearest neighbors like so:

recipe_embedding.nearest_neighbors(:embedding, distance: "euclidean").first

The distance calcuations include euclidean and cosine.

Conclusion

Launching a project to use embeddings with Ruby on Rails?

Step 1: use the neighbor gem

Step 2: provision your database on Crunchy Bridge with pgvector

Step 3: profit