Supabase’s GraphQL API is powered by pg_graphql
. In this post, we will look at the internal workings of pg_graphql
. Since it is an extension written in the Rust programming language, familiar with Rust will help - although it’s not a requirement to understand this post.
This article will give you a deeper understanding of pg_graphql
, helping you to:
- Make design decisions about how to use GraphQL in your application.
- Understand how a GraphQL query is processed internally by
pg_graphql
. - Understand, or even fix, simple bugs and implement beginner-friendly features in
pg_graphql
.
What is pg_graphql?
pg_graphql
is a Postgres extension that reads the SQL schema in a database and exposes it as a GraphQL schema. The GraphQL interface is made available through a SQL function graphql.resolve(...)
which allows any programming language to use GraphQL without any additional servers, processes, or libraries. It is also possible to call the graphql.resolve(...)
function from PostgREST, or any other HTTP proxy, to safely expose the GraphQL API via HTTP/S.
GraphQL request
When a client sends a GraphQL request to pg_graphql
it gets a response back. But do you know what happens inside pg_graphql
to serve that request? Let’s take a look at the life of a GraphQL request.
The entry point of a request is the graphql.resolve(query, ..)
function. This function calls the graphql._internal_resolve(query)
function which is written in Rust and where the real magic starts. The first line in that function uses the graphql-parser
crate to parse the GraphQL query into a tree structure. Let’s see how that works.
Parsing GraphQL
Parsing converts a query string into an abstract syntax tree (AST). This is a two-step process, first, a tokenizer converts the input string into tokens, and then a parser organizes the tokens into nodes of the AST.
Tokenizer
A tokenizer (aka a lexer) reads the query string and spits out tokens in the language. For example, take a look at the following query:
_10query {_10 bookCollection {_10 edges {_10 node {_10 id_10 }_10 }_10 }_10}
It will be turned into the following tokens by the tokenizer: query
, {
, bookCollection
, {
, edges
, {
, node
, {
, id
, }
, }
, }
and }
. How does the tokenizer know where a token starts and ends? The tokenizer relies on lexical grammar to figure out token boundaries. It looks at the next character in the text to first find the kind of token to expect and then uses the grammar rules for that token to find where it ends. For example, the first character in our example is q
which means it must be a Name
because its grammar looks like this:
The grammar tells the tokenizer that if the next character is a Letter
or an underscore then it is the start of a Name
. And when the tokenizer finds a character that is not a NameContinue
it ends the token. In our example, the Name
token ends before the first whitespace after query
.
Note that the tokenizer’s job is to just produce valid lexical tokens, even if those tokens do not make a valid GraphQL query. For example, the tokenizer will happily produce the tokens }
, query
, and {
for an input string } query {
. It is the parser’s job to reject these sequences of tokens as invalid.
There are also tokens that the tokenizer ignores. There’s good reason, for example, to ignore whitespace because it allows you to format your code as you please. But a quirk of the lexical structure of GraphQL is that it also ignores commas. This means, you can stick a comma just about anywhere and the query would still be valid. E.g. the last example can also be rewritten as:
_10query, {,_10 bookCollection, {,_10 edges, {,_10 node, {,_10 id,_10 },_10 },_10 },_10},
It’s possible you haven’t heard that before. We suggest not abusing the comma; use it thoughtfully to write queries that are easy to read.
Parser
The list of tokens produced by the tokenizer are consumed by the parser to generate the AST. The parsing grammar dictates how the parser makes sense of the tokens. Since pg_graphql
needs to execute the query, it expects an ExecutableDocument
. So the parser tries to parse an ExecutableDocument
which is defined in the grammar like this:
An ExecutableDocument
contains a list of ExecutableDefinitions
which is defined like this:
An ExecutableDefinition
can be either an OperationDefinition
or a FragmentDefinition
. Which one should the parser try to parse? Similar to how the tokenizer looks at the next character, the parser can look at the next token to know which definition lies next. The next token in our example is query
. Can query
appear at the beginning of OperationDefinition
? Let’s check its definition:
The parser again has two choices since OperationDefinition
can either be an OperationType Name(opt) VariableDefinitions(opt) Directives(opt) SelectionSet
or a SelectionSet
. So which of those two can start with query?
The first one starts with OperationType
which is defined like this:
Et voilà! The query
token can start an OperationType
, which means the parser now knows that it has to parse the first arm of the OperationDefinition
. Which in turn means that it is going to parse an OperationDefinition
arm of the ExecutableDefinition
. Can’t the other arm of the definition also start with query
? No, the language designers designed the grammar to avoid such ambiguities.
Now that the parser knows it will parse OperationType Name(opt) VariableDefinitions(opt) Directives(opt) SelectionSet
, it skips the query
token and looks at the next token, which is {
. A {
It can’t appear in the beginning of Name(opt)
, VariableDefinitions(opt)
or Directives(opt)
, but it can start a SelectionSet
:
So the parser skips past the {
token and then tries to parse a list of Selection
s. The parser parses the rest of the input string using the same process. It rejects an invalid list of tokens like }
, query
and {
because no grammar rule starts with an }
. For our example query, the parser generates the following AST:
_40Document {_40 definitions: [Operation(Query(Query {_40 name: None,_40 variable_definitions: [],_40 directives: [],_40 selection_set: SelectionSet {_40 items: [Field(Field {_40 alias: None,_40 name: "bookCollection",_40 arguments: [],_40 directives: [],_40 selection_set: SelectionSet {_40 items: [Field(Field {_40 alias: None,_40 name: "edges",_40 arguments: [],_40 directives: [],_40 selection_set: SelectionSet {_40 items: [Field(Field {_40 alias: None,_40 name: "node",_40 arguments: [],_40 directives: [],_40 selection_set: SelectionSet {_40 items: [Field(Field {_40 alias: None,_40 name: "id",_40 arguments: [],_40 directives: [],_40 selection_set: SelectionSet { items: [] },_40 })],_40 },_40 })],_40 },_40 })],_40 },_40 })],_40 },_40 }))],_40}
An AST in graphql-parser
is just Rust structs and enums. For example, the Document
is a struct:
_10#[derive(Debug, Clone, PartialEq)]_10pub struct Document<'a, T: Text<'a>> {_10 pub definitions: Vec<Definition<'a, T>>,_10}
A Definition
is an enum:
_10#[derive(Debug, Clone, PartialEq)]_10pub enum Definition<'a, T: Text<'a>> {_10 Operation(OperationDefinition<'a, T>),_10 Fragment(FragmentDefinition<'a, T>),_10}
However, it is not enough to parse the query into an AST. Why? Similar to how a valid list of tokens produced by the tokenizer doesn’t mean it will produce a valid AST, a valid AST doesn’t mean that the AST can be executed by pg_graphql
. So before execution, pg_graphql
validates the AST.
Validation
To understand why a valid AST doesn’t mean that the query can be executed take the following example. Here the query produced a valid AST but it is still invalid because there can't be a named operation together with an anonymous operation in a GraphQL query:
_19{_19 bookCollection {_19 edges {_19 node {_19 id_19 }_19 }_19 }_19}_19_19query getBookNames {_19 bookCollection {_19 edges {_19 node {_19 name_19 }_19 }_19 }_19}
The above was just one example of the kind of validations performed. The GraphQL spec defines many other types of validations like:
- Operation names must be unique.
- Field on a selection set must be defined on the target type.
- Literal values in the query must be coercible to the expected type.
- Only defined argument names are accepted.
- And many more.
pg_graphql
performs these validations and returns errors if they fail. For example:
- Operation name uniqueness is tested in
resolve_inner
function inresolve.rs
. - Existence of fields in a selection set is tested in
resolve_selection_set
function inresolve.rs
. - Types of literal values are tested in
validate_arg_from_type
function inparser_util.rs
. - Allowed arguments are tested in
restrict_allowed_arguments
function inbuilder.rs
.
Some of the validations need to know the types in the reflected GraphQL schema. For example, for the field selection validation, the validation code must know which fields are defined on a GraphQL object. This information is contained in the reflected GraphQL schema.
Schema Reflection
pg_graphql
builds GraphQL schema by reading information from many system catalogs. The load_sql_context
function reads this information into a Context
object by running the query in the load_sql_context.sql
file. load_sql_context.sql
loads information about enums, composite and non-composite types, foreign keys, schemas, tables and functions.
It might look like the context is loaded for each GraphQL query, but this function is memoized for performance by the cached
attribute. This means it will only be called again if its input argument _config
of type Config
changes from the last time it was called.Config
has three parts:
- Current search path. It is a list of schemas that are searched in order when looking up a database object.
- Current role. It is the role under which any SQL statements will be executed.
- Current schema version. It is the current version of the database’s schema. The version is updated by a couple of triggers that detect changes in schema.
It makes sense to reload the schema if any Config
fields change because they can potentially alter the results of running transpiled SQL statements.
The Context
object is wrapped in a __Schema
object which is used to not only serve introspection queries but also provide information to run validations and for transpilation to SQL. For example, take a look at how tables are converted into<table_name>Collection
s. This code adds Collection
objects to the QueryType
object by iterating over tables and adding a <table_name>Collection
field. Notice howQueryType
's schema
field is used extensively throughout this code snippet.
Transpilation and Query Execution
Transpilation is a two-step process. First, builder objects are constructed from the AST and __Schema
, and then the builder objects are converted into SQL. A builder object contains all the information needed to produce a SQL query.
For example, when a table is added as a collection object to the QueryType
object, its type_
field is set to __Type::Connection(_)
. This type is then checked in the resolve_selection_set
function and converted to an appropriate builder type. A builder implements the QueryEntrypoint
trait. QueryEntrypoint
has only one required method named to_sql_entrypoint
which the ConnectionBuilder
implements by calling its to_sql
method. The to_sql
method contains the meat of the logic to generate SQL.
An important aspect of the SQL generation code is how it calls quote_ident
and quote_literal
functions to avoid SQL injection. Without them, a caller could potentially send a specially crafted input to execute arbitrary SQL code.
The generated SQL code is then run in the execute
method of QueryEntrypoint
trait. The transpiled queries return a jsonb
object which is deserialized into a pgrx::JsonB
object. Since pgrx::JsonB
is a wrapper over a serde_json::Value
object, it is trivial for the code to unwrap it and return this as a JSON
response to the client.
Conclusion
In this post, we looked at how pg_graphql
processes a GraphQL request. A request goes through the steps of tokenization, parsing, validation, transpilation, and execution. We looked in detail at the actions pg_graphql
performs in each step. This knowledge should equip you to understand how pg_graphql
works internally and help you make more informed decisions about how you can better use GraphQL APIs. If you feel ambitious, you can also start contributing to pg_graphql
which we always welcome.