Not Content With Contentful

Brian Winkers
The Startup
Published in
4 min readNov 18, 2020

--

This is the fourth in a short series on my attempts to monetize my parked domains. This covers my goals with Git-based work flows better and actually starts to provide code.

I’ve been looking for a centralized data store that 100+ domains can pull content from. A headless CMS is a must, the same content may be served under vastly different formatting and styling between sites. As a developer, a git-based solution would be ideal. I couldn’t find one I liked, so I started building one.

Contentful Reader Data Flow

Contentful — Headless CMS

Contentful is the leading headless CMS provider. I’ve used Contentful for some of my own blog content as well as at places I’ve worked. It makes it easy to create complex, multi-dimensional data that is fully localized. Contentful even have a free Community Space that supports a pretty complete site.

The Cons:

  • Only one free space is allowed and additional spaces start at $489.
  • The number of object types or schemas is limited.
  • The response time are not amazing, especially for a “CDN”.

Netlify-Git-based CMS

I’m intrigued by the promise of Netlify. It is git-based to fit in with the rest of the workflow and you could even store an editor with the code.

The Cons:

  • Tightly coupled with the Netlify website.
  • No easy way to mix and match content on sites.

Ghost — Headless CMS

Ghost seemed like it would be a good fit for a blog. But making a single blog stream into hundreds of blogs is more of a job for an application.

The Cons:

  • A limited object model paradigm.
  • No easy way to mix and match content on sites.

The Solution —Create a CMS on AWS

The intent was never to write any code, I thought maybe I’d end up with a couple CDK or Cloudformation resources. Being a cloud engineer building a solution on AWS is always appealing. Done correctly it can be cost-effective and can scale forever with zero changes or human attention. Using AWS it can be fast and reliable with very little effort.

Start with Contentful Export Files

As I said, I have some blog content in a free Contentful account and would like to use that. If I can export it I can use my free Community Space for playing with and testing Contentful using other content. Contentful also has a strong object paradigm that will be easy to build upon.

Break it up!

To make use of this large export file I created a Contentful Reader. Given a single large Contentful file it breaks up the JSON into many smaller files for each content type or entry. The reader can handle files that are GB’s in size and can process 60+ entries per second so a 1 GB file with 500k entries should take less than 15 minutes to process.

Git it together

These files are ideal for committing to a git repository. This allows you to apply all the git workflow tools in the market to your Contentful data. Git-based workflows with CMS data are like peanut butter and chocolate, they go great together.

Benefits of Git

  • Easily support multiple concurrent branches or versions.
  • Increased auditability.
  • Easy to revert mistakes.
  • Integrates with other development processes.

Dynamo is Dynamite

The last step between the CMS and web page still benefits from the flexibility of a content stored in a database. Flat git files a lone can’t support a performant GraphQL API.

For that we turn to DynamoDB and single table design. I use denormalization and hydration to create a Dynamo structure that can easily match my expected query patterns.

The non-key field data in the table looks much like the data in the JSON files. The magic is in the keys created for the data. The JSON files from the repo are used to create complex partition key, sort key and GSI key values. These keys make it easy to fetch the data already formatted in the way we need it.

Serve it up

The Dynamo data can easily be served through both a REST API and an AppSync GraphQL API. The data is stored in an optimized format so that little if any processing needs to take place when serving the data.

Due to the nature of AppSync and its resolvers it’s unlikely the same Lambda would be able to service both REST and GraphQL requests. Any logic that is shared between the two would be moved to custom JavaScript modules. I have a simple Cloudformation file for creating your own NPM and PIP repository using AWS CodeArtifact.

Save cash with cache

From a consumer perspective the Dynamo data is read only. This means we can easily implement a Cloudfront distribution in front of it with caching enabled. This can limit the max calls to the backend to handful in a month no matter how much traffic a site may generate.

Optimizing the Dynamo Schema

First let’s address a common misconception, DynamoDB shouldn’t be thought of as “schemaless”. It should be thought of as multi-schema or flexible schema. A single table can have items that adhere to different schemas. The schema for those items may not be explicitly defined but it is there or the data would be useless in an API. I highly suggest defining your schema as explicitly as possible.

Optimizing keys for parts and aggregations

For the use-case of breaking one blog stream into many there are two important considerations.

  1. How are sub-parts of the data used?
  2. How are parts of the data aggregated?

I’m still trying a few different things and modeling how things will work for importing, updating and serving efficiently.

Part 1 — Parking for Pennies
Part 2 — AWS SSL Certificates
Part 3 — Mass Hosting Paradigm

--

--

Brian Winkers
The Startup

35 years building the most cutting edge sites on the Internet