Completing a Golang Backend With Postgres

My first experience as a developer was working with Ruby on Rails, and the transition to learning Go scared me because I was used to having most of the heavy lifting already done for me. Rails has all the functionality you need to start coding right away, but that also means that it obscures what is happening behind abstractions.

One illustrative example is integrating an app and a database. One of the things I love about Ruby on Rails is the database support that it comes with. In Rails you simply choose a database, and it already has all the Command Line Interface (CLI) tools needed to operate it: it creates and drops the database and handles it migrations. None of that functionality exists in Go by default, but that doesn't mean we can't build it ourselves.

After some more experience and patience, learning Go helped me realize that I don't need Rails for every single web development app. It also helped me understand how developers might end up misusing the framework if they don't have enough insight into its inner workings.

Coupling your app with a database is a huge topic in itself. The most common paradigm is the relational database — like MySQL, SQLite, and Postgres — but many use cases are better fits for other paradigms, like NoSQL (MongoDB) or Graphs (Neo4j).

This blog will walk through the steps needed to create a robust database management environment in a golang app using a Makefile and the relational database Postgres. This article builds on a previous post explaining how to bootstrap a quality Golang backend, but stands on its own and can be approached without that prior knowledge.

Postgres
Database Driver
Seeding Data
Conclusion

Postgres

You are going to need at least two databases in your machine: one for automated tests and one to run the server in your machine. Let's call them test and dev, respectively. There are many migration tools out there, and you don't need one written in Go. We are looking for a tool that creates migration files in a scalable format, uses plain SQL, supports PostgreSQL, and creates a schema.sql file to keep track of changes.

A great tool for all of this is dbmate, which describes itself as "a lightweight, framework-agnostic database migration tool."

We are going to create two different ENV variables, one for each database. Typing environment variables every time you create a new shell session (terminal) is cumbersome, but permanently creating them in your ~/.bashrc file will pollute your workspace and you may forget to remove them if you want to get rid of this project. To solve these problems, you can store them in a .env file and load them using % > source .env from your project's root folder:

# .env
export DATABASE_URL=postgres://user:password@localhost:5432/my_db?sslmode=disable
export DATABASE_URL_TEST=postgres://user:password@localhost:5432/my_test_db?sslmode=disable

Now let us start designing our make targets. First, we want a tool to create and drop the databases. Executing the creation and deletion of both databases in the same Make target makes this easier for other developers:

db-create: ## Creates the databases
				@dbmate -u $(DATABASE_URL_TEST) create
				@dbmate -u $(DATABASE_URL) create

db-drop: ## Drops the databases
				@dbmate -u $(DATABASE_URL_TEST) drop
				@dbmate -u $(DATABASE_URL) drop

Now we want to be able to create migration files. There are some good practices that you might want to follow: migration files are going to be executed in production, so they shouldn't include data, just database changes like adding tables and columns. We can use a Makefile argument that we will pass to dbmate to create our migration:

# This target can be called e.g. with `db-migration name=my_migration`
db-migration: ## Creates a new migration
				@dbmate new $(name) # Will save the migration under ./db/migrations

Applying the migrations will update the ./db/schema.sql file, which contains the refactored statements to create the same database from scratch. If needed, this file allows you to recreate your database without unnecessary statements like creating a table that is later going to be dropped by another migration. We need to apply this process to both databases, but we only need to update the schema file once:

migrate: db-migrate ## Alias for db-migrate
db-migrate: ## Applies pending migrations to the dev and test databases
				@dbmate --wait --no-dump-schema -u $(DATABASE_URL_TEST) up
				@dbmate --wait -u $(DATABASE_URL) up # Updates the ./db/schema.sql file

I prefer to apply all changes to the test database first. That way, if there is an error in the migration file, it will only affect the test database and we have more chances of catching it in time. We want to avoid the burden of debugging the dev database, which may already have data that you don't want to lose.

Seeding Data

I previously mentioned that we don't want to add data through our migration files, but you are probably going to benefit from some predefined data while developing. You can keep this seed data under a db/seeds/ folder using a similar format than that of the migrations: a timestamp followed by the name of the seed. These seed files are fairly simple, and they should always be applied to an updated database. We can just create them using shell commands and a template:

-- ./db/seed_template.sql

-- Add a description here

-- Note: 
-- Seed files MUST be idempotent by design
-- Seed files are not meant to be executed in production.

You can create a new seed file using a Make target and the cp tool:

db-seed-file: ## Creates a new seed file with using the provided name (name=)
				@cp ./db/seed_template ./db/seeds/$$(date +%Y%m%d%H%M%S)_$$(name).sql

Apply the seeds in order using another Make target with psql:

db-seed: db-migrate ## Seeds the dev database
				@for file in ./db/seeds/*.sql; do psql -f $$file $(DATABASE_URL); done

Note that the seed is only applied to the DATABASE_URL variable: the test database remains untouched. If you seed your test database, you will create tests that rely on data assumptions (e.g. relying on an already created user), and you will need to maintain that database as an extra hassle. Imagine you have a seeded user and then you add a migration that changes its table; now you have to change the seed file, but you also need to drop the previously existing user. It is better to create isolated tests that don't leak beyond their runtime.

Let's put the migrations and seeds to the test. Let's create a users table with a created_at and an updated_at column. The updated_at column needs a trigger, so let's create that one first:

> make db-migration name=last_update_trigger

Update the newly created migration file ./db/migrations/<timestamp>_last_update_trigger.sql:

-- migrate:up
CREATE FUNCTION last_update_trigger() RETURNS trigger
		LANGUAGE plpgsql AS
$$BEGIN
		NEW.updated_at := CURRENT_TIMESTAMP;
		RETURN NEW;
END;
$$;

-- migrate:down
DROP FUNCTION last_update_trigger();

And use another migration to create a simple users table to see it working:


-- migrate:up
CREATE TABLE users (
				id INTEGER GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
				uuid UUID DEFAULT gen_random_uuid() NOT NULL, -- do not show id to the public
				name VARCHAR(255),
				email VARCHAR(255) NOT NULL,
				created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
				updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TRIGGER update_users_updated_at
				BEFORE INSERT OR UPDATE ON users
				FOR EACH ROW
				EXECUTE PROCEDURE last_update_trigger();

-- migrate:down
DROP TRIGGER update_users_updated_at ON users; -- drop the trigger FIRST
DROP TABLE users;

You can now create a new seed file with data to be injected in the users table:

> make db-seed-file name=insert_initial_users

And update it with:

-- Insert users to be used in the dev database
INSERT INTO users (id, name, email)
VALUES
		(1, 'John Doe', 'john@example.com'),
		(2, 'Jahnet Doe', 'jannet@example.com')

Run the migrations and the seed, and you should see the correct value in both created_at and updated_at columns of the users table.

> make db-migrate
> make db-seed

Try running make db-seed a second time and it will fail because the ids already exist, failing to comply with the PRIMARY KEY constraint. Good job!

Section 2. Database Driver

Now we face an important decision: to ORM or not to ORM, that is the question. This is a controversial one. All over the internet you will find arguments for and against using ORMs, and how long you can defer the decision. However, one thing is certain: you do not want to end up creating your own ORM. Object-Relational Mappers try to solve a difficult problem, and giving your own shot at it might make more harm than good.

If you choose to use one, it is better to be careful knowing when not to use it, as Ted Neward shows us: ORMs might solve 80 percent of the object-relation problem, the rest will be better off coded directly in SQL.

I collected some advice from my colleagues at 8th Light about how to approach this decision:

[...] The main principle I design any application is to make sure that any types of these ORMs do not leak outside of where they're used. Meaning just because an ORM might give me a `User` type that can give me rows from a `users` table easily does not mean that I will pass this very object anywhere outside of the function where the query itself happened. [...] Then, when all of the usage is centralized somewhere it doesn't matter *that* much anymore whether or not an ORM is used or raw SQL queries.

— Christoph Gockel, Principal Software Crafter

I'd highly recommend making this a decision that you can revisit later. Especially since you don't know if it's a good idea. [...] Philosophically, to me this means thinking of the business logic first, not the data. First you design (and test) how the thing is supposed to work, *then* you write a web server as a "client" of your business logic.

— Brad Ediger, Head of Technology

I tend to reach for an ORM not so much for all the classes you create, but because it is very likely to be better at SQL than I am, and have mechanisms for things like proper query sanitization already in place. It's essentially a buffer against [my mistakes].

— Eric Smith, Principal Software Crafter

In general, you want to avoid leaking database interfaces all over your application, and you want to make decisions that you can revisit later. Because you are going to be writing some SQL code even when using an ORM, it is often better to start with a plain SQL driver and integrate an ORM later if needed.

As of May 2022, there are multiple famous SQL drivers for Go:

Driver	Stars	Forks	First release	Latest release
jackc/pgx	5.4k	526	14-Jul-2014	22-Apr-2022
lib/pq	7.3k	852	23-Aug-2018	08-Nov-2021
go-pg/pg (ORM)	5.1k	376	30-Sep-2020	23-Sep-2021

Quoting pgx's readme file: it aims to be low-level, fast, and performant, while also enabling PostgreSQL-specific features that the standard database/sql package does not allow for. There are some benchmarks that compare pgx and lib/pq, and pgx keeps proving faster than the alternative.

We are choosing to go with pgx this time. Its version 5 is not ready yet, so we need to install version 4:

> go get github.com/jackc/pgx/v4

A simple database connection looks like this:


import (
				"context"
				"fmt"
				"os"
				"github.com/jackc/pgx/v4"
)

// Opens and returns a persistent connection to the given database URL
func NewConnection(ctx context.Context) (*pgx.Conn, error) {
				// Make sure that when the code is being run on automated tests,
				// that it connects to DATABASE_URL_TEST!
				localDB, exists := os.LookupEnv("DATABASE_URL")
				if !exists {
								panic("No ENV variable DATABASE_URL")
				}

				pgConnection, err := pgx.Connect(ctx, localDB)
				if err != nil {
								return nil, fmt.Errorf("database connection failed: %w", err)
				}

				return pgConnection, nil
}

And reading from our recently created users table looks like this:

import (
				"context"
				"time"
				"github.com/jackc/pgx/v4"
)

// User model
type User struct {
				Id int
				UUID string
				Name string
				Email string
				CreatedAt time.Time
				UpdatedAt time.Time
}

// Uses an int id to get a user from the database
func GetUser(ctx context.Context, conn *pgx.Conn, id int) (User, error) {
				var user User

				row := conn.QueryRow(ctx, `SELECT id, uuid, name, email, created_at, updated_at
				FROM users WHERE id=$1;`, id)

				err := row.Scan(
								&user.Id, &user.UUID, &user.Name, &user.Email,
								&user.CreatedAt, &user.UpdatedAt)
				if err != nil {
								return User{}, err
				}

				return user, nil
}

Note that we are not formatting the given id directly into the SQL query using fmt.Sprintf, as that would totally open the door for SQL injection, a dangerous security risk that allows hackers to execute SQL commands in your database. Instead the last bit of the SQL query (id=$1) will take care of using the given id when needed.

Wrapping Up

Database management is a profound topic with some controversial and opinionated solutions. This blog post went through some of the most important aspects to take into account when creating a new Golang application from scratch. Use this knowledge wisely and happy coding!

Completing a Golang Backend With Postgres

Postgres

Seeding Data

Section 2. Database Driver

Wrapping Up

Emmanuel Byrd

Empowering Game Developers with Faster Builds and Better Tools

Essential Strategies for Upgrading Microservices Without Downtime

Getting Started with TDD: A Practical Guide to Beginning a Lasting Practice