Implementing Semantic Search with Supabase, Next.js, and OpenAI: A Tutorial

Introduction

As the amount of information available online continues to grow at an unprecedented pace, traditional search engines are struggling to keep up. This is where semantic search comes in—a technique that aims to understand the intent behind a user's query and provide more accurate results. In this tutorial, we will explore how to implement semantic search using Supabase and PostgreSQL for the database, Next.js for the frontend, and OpenAI's GPT-3 for the natural language processing. By the end of this tutorial, you will have a fully functional semantic search engine that can provide relevant and accurate results to your users.

One of the key components of implementing semantic search is the ability to represent textual data in a way that allows for meaningful comparison and analysis. This is where vector representations come in—they allow us to transform words or phrases into high-dimensional vectors that capture their meaning and context.

In PostgreSQL, the pg-vector module provides a way to store and manipulate vector data within the database. By using pg-vector, we can represent our text data as vectors and perform vector operations such as cosine similarity to compare the similarity of different documents or search queries.

To implement semantic search with Supabase and PostgreSQL, we can first create a table to store our documents and their vector representations using the pg-vector data type. Then, when a user submits a search query, we can convert the query into a vector representation using the same method as the documents, and perform a cosine similarity search to find the most relevant results.

By using pg-vector in conjunction with Supabase and Next.js, we can create a powerful semantic search engine that can provide accurate and relevant results to our users. This approach is especially useful when dealing with large amounts of text data, where traditional keyword-based search engines may struggle to provide meaningful results.

This tutorial will be using code snippets from my portfolio site (which is a public repository). Link are at the bottom.

The Process

The secret to making this work is OpenAI's GPT-3 embeddings API, which handles converting the text snippets into a very large array of numbers called a vector (1536 to be precise). By using a chunk size of 200 tokens, it gives the embeddings API just enough information to match relevant results to the query, but not too much information to overload our answer request to ChatGPT, who will interpret that information.

The process is as follows:

We parse the document text into chunks approximately 200-tokens each
We run each chunk through OpenAI's embeddings API to get the semantic vector
We store that text, including the vector in our PG table

That's all done in advance. The continuous process is as follows:

A user asks a question on the front-end form
The question is converted to a vector via OpenAI on-the-fly
We run a vector search query against our database to get the top 3 results
We concatenate those results and feed them and the original question to ChatGPT
ChatGPT gives us the answer

Setting Up Supabase

First, we need to create a Supabase project and database. If you haven't done so already, sign up for a free Supabase account and create a new project.
Once you have created your project, navigate to the SQL editor and connect to your database. You can do this by clicking on the "SQL" button in the left sidebar, and then clicking "Connect to Database".
In the SQL editor, create a new table to store your documents. For this example, we will create a table called "documents" with the following columns:

--  RUN 1st
create extension vector;

-- RUN 2nd
create table vsearch (
  id bigserial primary key,
  document_title text,
  page_no int2,
  content text,
  content_length bigint,
  content_tokens bigint,
  embedding vector (1536)
);

-- RUN 3rd after running the scripts
create or replace function vector_search (
  query_embedding vector(1536),
  similarity_threshold float,
  match_count int
)
returns table (
  id bigint,
  document_title text,
  page_no int2,
  content text,
  content_length bigint,
  content_tokens bigint,
  similarity float
)
language plpgsql
as $$
begin
  return query
  select
    vsearch.id,
    vsearch.document_title,
    vsearch.page_no,
    vsearch.content,
    vsearch.content_length,
    vsearch.content_tokens,
    1 - (vsearch.embedding <=> query_embedding) as similarity
  from vsearch
  where 1 - (vsearch.embedding <=> query_embedding) > similarity_threshold
  order by vsearch.embedding <=> query_embedding
  limit match_count;
end;
$$;

-- RUN 4th
create index on vsearch 
using ivfflat (embedding vector_cosine_ops)
with (lists = 100);

Here is a breakdown of what each section of the code is doing:

create extension vector; is creating the vector extension which provides support for vector data types and operations in PostgreSQL.
create table vsearch ... is creating a new table called vsearch with columns for the document title, page number, content, content length, content tokens, and embedding vector.
create or replace function vector_search ... is defining a new function called vector_search that takes in a query embedding vector, a similarity threshold, and a match count, and returns a table of document information with a similarity score. This function is written in PL/pgSQL, a procedural language for PostgreSQL.
create index on vsearch ... is creating an index on the embedding column using the IVFFLAT method with 100 lists. This index speeds up similarity searches by clustering vectors together in groups, reducing the number of comparisons needed to find the most similar vectors.

Parsing A Document Into the PG-Vector Table

Ok, hopefully everything went to plan and you have your table and the RPC function created. If not, just ask ChatGPT—he'll know what to do. 🤣

In my experiment, I used a PDF file, a short book about early IBM programmers, titled True Hackers for my body of information. I wrote several scripts that I run from my dev environment to pre-process information and then insert into the table.

Here's the script for parsing the PDF:

import fs from 'fs'
import { encode } from 'gpt-3-encoder';
import PDF from 'pdf-scraper';

const inFile = 'data/hackers.pdf';
const outFile = 'data/hackers.json';
const dataBuffer = fs.readFileSync(inFile);
const tokensPerChunk = 200;
const documentTitle = "True Hackers"

PDF(dataBuffer).then(function (data) {
    console.log(`Successfully parsed ${data.numpages} pages from ${inFile}`);

    // Iterate over PDF pages
    let chunkIndex = 0;
    let currentChunk = '';
    let currentChunkWords = 0;

    const output = [];
    let content = '';
    for (let pageIndex = 0; pageIndex < data.pages.length; pageIndex++) {
        console.log(`Parsing page #${pageIndex}...`);

        const pushChunk = (chunk) => {
            const contentLength = encode(chunk).length;
            console.log('Creating chunk of token length:', contentLength);
            output.push({
                documentTitle: documentTitle,
                pageNo: pageIndex + 1,
                tokens: contentLength,
                content: chunk.trim()
            });
        }

        // Normalize all whitespace to a single space
        content = data.pages[pageIndex].replace(/\s+/g, ' ');

        // If page content is longer than tokens limit, parse into sentences
        if (encode(content).length > tokensPerChunk) {

            // Split content into sentences
            let sentences = content.split('. ');
            let chunk = '';

            for (let i = 0; i < sentences.length; i++) {
                const sentence = sentences[i];
                const sentenceTokenLength = encode(sentence).length;
                const chunkTokenLength = encode(chunk).length;

                // If our chunk has grown to exceed the tokens limit, append to output buffer
                if (chunkTokenLength + sentenceTokenLength > tokensPerChunk) {
                    pushChunk(chunk);
                    chunk = '';
                }

                // If current sentence ends with a character, append a period, otherwise a space
                if (sentence && sentence[sentence.length - 1].match(/[a-z0-9]/i)) {
                    chunk += sentence + ". ";
                } else {
                    chunk += sentence + " ";
                }
            }
            // Append the remaining text
            pushChunk(chunk);
        } else {
            pushChunk(content);
        }
    }
    fs.writeFileSync(outFile, JSON.stringify(output));
});

The code begins by importing necessary modules such as 'fs', 'gpt-3-encoder', and 'pdf-scraper'.

Then it defines the input and output file paths, and reads the content of the input PDF file using fs.readFileSync() into a buffer object.

Next, the PDF content is passed through the pdf-scraper module using PDF(dataBuffer). The pdf-scraper module extracts the text content from the PDF file, and returns a Promise which is handled in a callback function. The text content is stored in data.pages, where each element of the array represents a page of the PDF.

The script then iterates over the pages of the PDF using a for loop. For each page, it checks if the length of the page content exceeds the maximum token length specified by tokensPerChunk. If it does, the content is split into sentences, and each sentence is concatenated into a chunk of text with a maximum token length of tokensPerChunk. If a sentence exceeds this limit, it is split into multiple chunks.

The gpt-3-encoder module is used to encode the text chunks into a format that can be used as input for the OpenAI GPT-3 language model. The encoded text is stored in the output array, which is a collection of objects that contain the document title, page number, number of tokens, and the content chunk itself.

Finally, the output array is written to a JSON file using fs.writeFileSync() with the output file path specified by outFile.

Next, here is the code for generating the embeddings and inserting into our PG table. Be sure to check the JSON output for validity first. That's why I split these functions into two scripts:

import { loadEnvConfig } from "@next/env";
import { createClient } from "@supabase/supabase-js";
import fs from "fs";
import { Configuration, OpenAIApi } from "openai";
import { encode } from "gpt-3-encoder";

loadEnvConfig("");

(async function () {
    try {
        const configuration = new Configuration({ apiKey: process.env.OPENAI_KEY });
        const openai = new OpenAIApi(configuration);

        const supabase = createClient(process.env.NEXT_PUBLIC_SB_URL, process.env.SB_SERVICE_KEY);

        const inFile = 'data/hackers.json';
        const dataBuffer = fs.readFileSync(inFile);
        const data = JSON.parse(dataBuffer);

        data.forEach(async item => {
            // Generate embedding via GPT model ada-002
            const aiRes = await openai.createEmbedding({
                model: "text-embedding-ada-002",
                input: item.content
            });
            const [{ embedding }] = aiRes.data.data;

            // Insert data and embedding into PG table

            const { data, error } = await supabase
                .from("vsearch")
                .insert({
                    document_title: item.documentTitle,
                    page_no: item.pageNo,
                    embedding,
                    content: item.content,
                    content_length: item.content.length,
                    content_tokens: encode(item.content).length
                })
                .select("*");
            return;
        });
    } catch (err) {
        console.error(err.message, err.stack);
    }
})();

We load our JSON data from the file we created earlier
We use the text-embedding-ada-002 OpenAI endpoint to get our vector
We use the Supabase SDK to insert our snippets into our search table

Putting It Into Practice

We'll divide the back-end process into two separate controllers. The first will query our PG table and get the 3 most relevant results. The second will prompt ChatGPT with the pulled data and the original query.

Here's the first:

import { createClient } from "@supabase/supabase-js";
import readRequestBody from "@/lib/api/readRequestBody";
import checkTurnstileToken from "@/lib/api/checkTurnstileToken";

// Use Next.js edge runtime
export const config = {
    runtime: 'edge',
}

export default async function handler(request) {
    const MATCHES = 3; // max matches to return
    const THRESHOLD = 0.01; // similarity threshold

    try {
        const requestData = await readRequestBody(request);

        // Validate CAPTCHA response
        if (!await checkTurnstileToken(requestData.token)) {
            throw new Error('Captcha verification failed');
        }

        const supabase = createClient(process.env.NEXT_PUBLIC_SB_URL, process.env.SB_SERVICE_KEY);

        // Get embedding vector for search term via OpenAI
        const input = requestData.searchTerm.replace(/\n/g, " ");
        const result = await fetch("https://api.openai.com/v1/embeddings", {
            headers: {
                "Content-Type": "application/json",
                Authorization: `Bearer ${process.env.OPENAI_KEY}`
            },
            method: "POST",
            body: JSON.stringify({
                model: "text-embedding-ada-002",
                input
            })
        });

        if (!result.ok) {
            const mess = await result.text();
            throw new Error(mess)
        }

        const json = await result.json();
        const embedding = json.data[0].embedding;

        // Perform cosine similarity search via RPC call
        const { data: chunks, error } = await supabase.rpc("vector_search", {
            query_embedding: embedding,
            similarity_threshold: THRESHOLD,
            match_count: MATCHES
        });

        if (error) {
            console.error(error);
            throw new Error(error);
        }

        return new Response(JSON.stringify(chunks), {
            status: 200,
            headers: {
                'Content-Type': 'application/json'
            }
        });
    } catch (err) {
        return new Response(`Server error: ${err.message}`, { status: 500 });
    }
}

Here's what is does:

The config object specifies that this is a Next.js edge runtime function. Edge runtime allows the function to run closer to the user, reducing latency.
The handler function is the main function that takes in a request object and returns a Response object.
The MATCHES variable determines the maximum number of matches to return.
The THRESHOLD variable specifies the similarity threshold. Matches with a similarity score below this threshold will not be returned.
The readRequestBody function reads the request body as a string.
The checkTurnstileToken function validates the CAPTCHA response by sending it to Supabase.
The createClient function creates a Supabase client with the Supabase URL and service key.
The fetch function sends a request to the OpenAI API to get the embedding vector for the search term.
The rpc function sends a remote procedure call to the Supabase server to perform the cosine similarity search.
The Response object is returned with the matching chunks in JSON format.

The next is a function that sends a prompt to the OpenAI API and returns a ReadableStream that receives the response in real-time:

import { createParser } from "eventsource-parser";

const openAiStream = async (system, prompt) => {
    const encoder = new TextEncoder();
    const decoder = new TextDecoder();

    const res = await fetch("https://api.openai.com/v1/chat/completions", {
        headers: {
            "Content-Type": "application/json",
            Authorization: `Bearer ${process.env.OPENAI_KEY}`
        },
        method: "POST",
        body: JSON.stringify({
            model: 'gpt-3.5-turbo',
            messages: [
                {
                    role: "system",
                    content: system
                },
                {
                    role: "user",
                    content: prompt
                }
            ],
            max_tokens: 150,
            temperature: 0.0,
            stream: true
        })
    });

    if (res.status !== 200) {
        const mess = await res.text();
        throw new Error(mess);
    }

    const stream = new ReadableStream({
        async start(controller) {
            const onParse = (event) => {
                if (event.type === "event") {
                    const data = event.data;

                    if (data === "[DONE]") {
                        controller.close();
                        return;
                    }

                    try {
                        const json = JSON.parse(data);
                        const text = json.choices[0].delta.content;
                        const queue = encoder.encode(text);
                        controller.enqueue(queue);
                    } catch (e) {
                        controller.error(e);
                    }
                }
            };

            const parser = createParser(onParse);

            for await (const chunk of res.body) {
                parser.feed(decoder.decode(chunk));
            }
        }
    });

    return stream;
};

export default openAiStream;

The createParser function creates a new Parser object that parses an EventSource stream.
The openAiStream function takes two parameters: system and prompt. system is a message from the system, and prompt is a prompt from the user.
The function sends a POST request to the OpenAI API with the specified system and prompt messages.
If the response status is not 200, an error is thrown.
The ReadableStream is created with an async start method that takes a controller object. The controller is used to manage the stream.
A callback function onParse is created to handle parsed events from the Parser.
Inside the start method, a for await...of loop is used to iterate over the response body chunks.
Each chunk is decoded from binary to UTF-8 using decoder.decode(chunk).
The decoded chunk is fed into the parser using parser.feed().
If the parser encounters an "event" type, the data field is parsed into JSON, and the response text is encoded and enqueued into the controller object.
If the parser encounters a "[DONE]" message, the controller is closed, and the start method exits.
If there is an error parsing the response, it is thrown using controller.error().

Lastly, here is the handler that will put it all together:

import readRequestBody from "@/lib/api/readRequestBody";
import endent from "endent";
import openAiStream from "@/lib/openAi/openAiStream";

// Use Next.js edge runtime
export const config = {
    runtime: 'edge',
}

export default async function handler(request) {
    try {
        const requestData = await readRequestBody(request);
        const query = requestData.searchTerm.replace(/\n/g, " ");

        const system = endent`
    You are a helpful assistant that answers questions based on a provided body of text.
    Use only the provided text to answer the question. Try not to copy the text word-for-word.
    If you are not certain of the answer, then answer: "Sorry, I can't help you with that."
    `

        const textBody = requestData.chunks.map(c => {
            return c.content + ' ';
        });

        const prompt = `Please answer this query: ${query}\n\n`
            + `Use only the following information:\n\n${textBody}`;

        const stream = await openAiStream(system, prompt);

        return new Response(stream);
    } catch (err) {
        return new Response(`Server error: ${err.message}`, { status: 500 });
    }
}

The first part is the system message that primes ChatGPT on how it should handle subsequent queries. The second is the actually query containing the user's search term and the data pulled from the database. We then feed it to our openAiStream function and then return the stream to the client.

Creating a Client

You're welcome to view the client I created on my portfolio site (see below), but this is getting to be long, so I won't put the full code here as most of it is not important to making this work. Here is the good stuff, though:

const embedSearchTerm = async () => {
	const body = new FormData();
	body.append('searchTerm', searchTerm);
	body.append('token', token);
	const result = await fetch('/api/vsearch/embed', {
		method: 'POST',
		body
	});
	if (!result.ok) {
		const mess = await result.text();
		throw new Error(mess);
	}
	const chunks = await result.json();
	if (!chunks.length) {
		setErrorMess('Sorry, I was unable to find a match');
		return;
	}
	setPageNumbers(chunks.map(c => (c.page_no)));

	return chunks;
}

const getAnswer = async (chunks) => {
	const body = {
		searchTerm,
		chunks
	};

	const response = await fetch('/api/vsearch/answer', {
		method: 'POST',
		body: JSON.stringify(body),
		headers: {
			'Content-Type': 'application/json'
		}
	});

	if (!response.ok) {
		const message = await response.text();
		throw new Error(message);
	}

	const stream = response.body;
	const reader = stream.getReader();

	while (true) {
		const { done, value } = await reader.read();

		if (done) {
			break;
		}

		const chunk = new TextDecoder('utf-8').decode(value);
		setAnswer((prev) => prev + chunk);
	}
};

const handleSearch = async (e) => {
	e.preventDefault();
	e.stopPropagation();
	if (isLoading) return;

	setSearchErrorMess('');
	if (!searchTerm.length) {
		setSearchErrorMess('Please fill this out');
		return;
	}

	setIsLoading(true);
	try {
		const chunks = await embedSearchTerm();
		if (chunks.length) {
			await getAnswer(chunks);
		}
	} catch (err) {
		setErrorMess(err.message);
		captcha.reset();
	} finally {
		setIsLoading(false);
	}
}

Conclusion

In this tutorial, we've explored how to implement semantic search using Supabase/PostgreSQL, Next.js, and OpenAI. We've set up a database table in Supabase to store our documents and their vector representations, and defined a function for performing semantic search against these vectors. We've also utilized the pg-vector extension in PostgreSQL to enable vector operations, and set up an index on the embedding column to speed up similarity searches.

Semantic search is a powerful tool for improving search accuracy and relevance, especially when dealing with large amounts of unstructured data. With the rise of natural language processing and machine learning, it's becoming increasingly feasible to implement semantic search in our own projects. By following the steps outlined in this tutorial, you can get started with semantic search using Supabase/PostgreSQL, Next.js, and OpenAI, and tailor it to your specific use case.

The power of these LLMs will unlock knowledge to people in a way never before possible. The world will be forever changed by this technology.

Links

Thank you for taking the time to read my article and I hope you found it useful (or at the very least, mildly entertaining). For more great information about web dev, systems administration and cloud computing, please read the Designly Blog. Also, please leave your comments! I love to hear thoughts from my readers.

I use Hostinger to host my clients' websites. You can get a business account that can host 100 websites at a price of $3.99/mo, which you can lock in for up to 48 months! It's the best deal in town. Services include PHP hosting (with extensions), MySQL, Wordpress and Email services.

Looking for a web developer? I'm available for hire! To inquire, please fill out a contact form.