Will King
Database

Don't design your database like a frontend developer

As a frontend developer who has slowly meandered his way to the backend of the stack I had a lot of blind spots when it came to designing my data.

I saw the backend as a just a tool to give me the exact data I needed to render the apps I was building.

(Starting with MongoDB definitely didn't help with this approach)

You'll never guess what issue happens when you approach your data thinking primarily about what you're trying to render.

How many times has the first design you built for an app been the last time you changed it?

ZERO…unless your app went nowhere in which case you probably have bigger problems than database schema. 😅

As you grow a product to adapt to new use cases and customer growth a data layer built like a frontend developer will constantly run into fragility issues.

More fragility in your database means more time spent on both ends to change a product. Your shipping speed slows down to a crawl, and your willingness (and ability) to grow with your customers disappears.

Okay, so how should we design our database oh great designer of database schema?

Great question! I am still getting better at this myself, but let's look at a an example and I'll share what I have learned so far.

A Meal App

In this app we are building an interface that allow ls people to create a Meal and save it so that they can put together a weekly meal plan and get a list of ingredients for shopping.

What data do we need to render this?

model Meal { 
  id String @id @default(cuid())
  name String
  description String?
  ingredients String[]

  creator User @relation(fields: [creatorId], references: [id])
  creatorId String
}

Okay, let's say we take that data model and run with it. A couple weeks later we hear from our users that they are tired of having to copy the same ingredients across meals.

The first question we should ask

The first question we should be asking is, can we do this with the data we already have?

Could we technically fetch all of the meals and pull the list of ingredients to display to our users? Yes. Would it be really slow? Also, yes.

  • Run database query to fetch all ingredients arrays off of the meal.

  • Looping / reducing returned data filtering out duplicates.

    • The catch? You also need to filter out the amounts included with the ingredient strings.

Okay, that doesn't sound THAT bad right?

The second question we should ask

How does this look as your data grows? Let's say we currently only have 10 users with 10 meals.

Let's say that out of the 100 meals created on our platform 20 of them are chicken dishes, and I think all of them (hopefully) use salt.

Okay let's say in one year that meal count jumps to 5000 as your customer base grows. Now we run into some more serious concerns in two areas.

Rendering performance

We are now pulling 5000 arrays of unknown length, looping through these arrays and removing duplicates and measurements.

You are going to be experiencing a lot of latency now matter how you run this workflow, but that isn't our only concern.

Database Size and Cost

In the frontend we call pulling out repeated code DRY-ing up the code base. Database architecture has the same concern but it is called normalization.

Can you spot where the issue is with our data model?

There is a huge amount of data overlap and repetition in the ingredients field. Since it is stored as an array, every new meal stores their version of chicken, salt, you name it.

For 5000 meals you are looking at most likely 4000 instances of salt, not to mention the data duplication for other common ingredients like flour, chicken, butter, etc.

Without normalization for the ingredients field you are looking at a fixed growth in both size and cost for every meal you're customers are creating.

So, what do we do? Thankfully, databases are really great at solving both of these concerns.

Step One: Make ingredients independent

What makes up an ingredient?

  • The thing

  • How much of it

First pass at pulling ingredients out of the meal schema may look like this:

model Meal { 
  id String @id @default(cuid())
  name String
  description String?

  ingedients Ingredient[]
  creator User @relation(fields: [creatorId], references: [id])
  creatorId String
} 

model Ingredient { 
  id String @id @default(cuid())
  name String
  amount String
}

Now you have an Ingredients table that you can easily get a list from. However, we are still breaking rule one.

What does this table look like when you have one recipe that has 2 chicken breasts and another that has 4?

Step Two: Extract amount from Ingredient model

Here is the mental model for being able to spot this type of issue.

Does this table have data that only matters in the context of another one?

The amount of an ingredient only matters in the context of a specific meal. So, let's make one more change.

The pattern we are using in Prisma is referred to as an explicit many-to-many relationship.

model Meal { 
  id String @id @default(cuid())
  name String
  description String?

  ingredients IngredientForMeal[]
  creator User @relation(fields: [creatorId], references: [id])
  creatorId String
} 

model Ingredient { 
  id String @id @default(cuid())
  name String
  
  meals IngredientForMeal[]
} 

model IngredientForMeal { 
  amount String

  ingredient Ingredient @relation(fields: [ingredientId], references: [id])
  ingredientId String

  meal Meal @relation(fields: [mealId], references: [id])
  mealId String
}

I still need to render my app…

Okay, now that we have covered a better way to store our Recipe data, does that mean we have to use the optimized schemas directly to render our app pages?

No way! In this article we covered the rendering layer and the database layer, but there is a layer in between that is built for transforming our data layer into what our rendering layer can work with best. You can dig into that here:

Updates and More

Get updated when new articles, products, or components are released. Also, whatever else I feel like would be fun to send out.