Protobuf management at Carousell

2020-04-09

Foreword

At Carousell we use a microservice architecture for our backend services. Communication between services is done via protobuf over gRPC. We started moving towards this architecture back in 2017, and along the way we’ve encountered some problems with how we manage protobufs across different services. In this post I’ll go over the problems we faced and how we overcame them.

How we manage .proto files

Proto files (.proto) for all services live in a single repository. This repository, named shared-proto, acts as the source of truth for our services. If service A wishes to communicate with service B, service A would use the shared-proto repository to generate language-specific protobuf code.

Take this sample shared-proto’s directory structure. Each directory corresponds to a single service.

▾ foo/
    foo.proto
▾ bar/
    bar.proto
▾ baz/
    baz.proto

Initial approach

When we first introduced protobuf over gPRC, this was how we initially set up our approach to protobuf management.

Each service would contain a bash script to generate protobufs for downstream services. At its most basic level, the bash script does the following:

Clone the shared-proto repository to a temporary directory
Execute protoc commands to generate protobuf files

Problem: consistency

There are a couple problems with this approach due to inconsistencies across different developer environments.

First of all, the generated protobuf files were not consistent between builds by different developers. For consistent builds, both the protoc binary and the installed protoc-gen-go package must have their versions pinned.

Secondly, compatibility between the generated protobuf files and the service code to read those protobuf files is not guaranteed. To guarantee that, the protoc-gen-go package (used for generation) and the protobuf package (used to read) must have the same version.

Thus the bash script was revised to do the following:

Check protoc version is X.X.X, shortcircuit if mismatch
Read the service’s expected github.com/golang/protobuf package version from go.mod or vendor.json.
Read the system installed github.com/golang/protobuf package version
If the system installed version is different, temporarily install the expected version
Clone the shared-proto repository to a temporary directory
Execute protoc commands to generate protobuf files
Reset to the original system installed github.com/golang/protobuf package version

At this point the script has gotten pretty messy, but we’re not done yet; there are still more problems with this workflow.

Problem: cascading build errors

While there is a source of truth for protobuf definitions, there isn’t one for how to generate the protobuf files. Each service is responsible for declaring the protoc directives to generate for another service. If the downstream service makes a change that requires a corresponding change to the generation directives, then all upstream services will have their generation script broken.

As a visual example, let’s say service foo has the following protobuf definitions:

syntax = "proto3";
package foo
message Foo {
    string name = 1;
}

Service X generate protobufs for service foo using the following generation directives:

protoc -I shared-proto/foo --go_out=plugins=grpc:. shared-proto/foo/foo.proto

Service foo introduces a new import, bar.proto, which is intended to come from the shared-proto/bar/ directory, and as such requires an additional -I shared-proto/bar protoc flag.

syntax = "proto3";
package foo
import "bar.proto";
// <omitted>

Now service X will have its generation script broken for service foo until it adds the -I shared-proto/bar flag.

Problem: conventions (or the lack thereof)

As we began building microservices communicating using protobufs, we did not spend time establishing a set of conventions for services to adhere to. Fast forward a few months, and we found ourselves in a situation where different teams started to establish their own conventions, with these conventions not always being compatible with another team’s. I’ve had my fair share of headaches trying to integrate with another service.

Here’s a quick summary of some differences between services:

Protobuf files with different packages in the same shared-proto directory.
Fully-qualified vs non-fully-qualified imports. For instance, bar/bar.proto vs bar.proto, with the latter being undescriptive regarding which package bar.proto is supposed to come from.
Inconsistent use of the go_package directive.
Inconsistent output directory for generated protobuf code.

Revised approach

Armed with the benefit of hindsight, I set out to solve those problems we were facing. I took this on as a side project, motivated both by the fact that I would be solving a long-standing problem within the company, and that I would be making my own life easier.

Establishing conventions

The first part to the solution is to introduce conventions to how we define and store our protobuf definitions. Carousell adopts the RFC process quite heavily, so this came in the form of an RFC document.

Import declarations

We enforce that imports must start from the root of our shared-proto repository. This makes it immediately clear to the reader which file is being imported. The additional benefit of this is that the root of shared-proto is the only include path that needs to be provided to the protoc command.

In our previous example, it would have been better written as:

syntax = "proto3";
package foo
- import "bar.proto";
+ import "bar/bar.proto";
// <omitted>

Single package per directory

Each directory in shared-proto must contain only a single package. This provides clear demarcation between services, visible from the directory structure without having to delve into individual .proto files.

Additionally, later on we introduce tooling that generates protobufs into the same directory structure as shared-proto, and this convention will help to facilitate that.

No long-form `go_package` directives

This is a little contentious, but I believe it’s the right call. The go_package directives tends to create subdirectories that you may not want. In our case, each services generates protobufs for services they wish to interact with, and having go_package declarations makes this messy.

Long-form go_package directives makes sense if you have each service generate their own protobufs. In this approach, upstream services will pull in downstream services as a dependency in order to use the downstream service’s protobufs. We opted not to go down this road due to possible backward incompatibilities with the github.com/golang/protobuf package. For instance, protobufs generated using v1.3.0 are unreadable by services using a lower package version. This scenario would come into play if an upstream service is on an older package version than the downstream service.

Consistent build environment

We use a docker image to provide a consistent environment to generate protobufs from. This docker image pins the versions of the protoc binary as well as those of language-specific plugins. The docker image is uploaded to a private registry, so engineers don’t have to build it themselves.

Custom tooling for generating protobufs (protogen)

We use an inhouse-built tool called protogen. It’s inspired by Uber’s prototool, but slightly different in the way we approach generating protobufs. Both protogen and prototool provide a declarative way to describe protobuf generation directives. However, these two tools differ in that the protogen config lives in the service’s repository (away from the source-of-truth repository for .proto files), whereas prototool’s config lives alongside the .proto files.

Some benefits of protogen:

Protogen allows services to simply specify which downstream service they wish to generate protobufs for. No need to concern themselves with invoking protoc with the right flags.
Protogen insulates services from breaking changes; if a service uses protogen they get a guarantee that their protobuf generation script will always work.

We’ll go into more detail into protogen in the next section.

Protogen: a deep dive

Protogen is generally unopinionated. It’s essentially a thin wrapper that allows us to define protoc flags in a declarative approach. However the way we integrate protogen into our workflow is fairly opinionated.

In this section we’ll first dive into the features of protogen. The next section will describe how protogen is integrated with our workflows.

Configuration

Here’s a documented version of protogen’s config file (protogen.yaml):

Click to expand

# Protogen specific configuration
protogen:
  # The protoc version to use when generating protobufs. Versions are
  # automatically downloaded and cached for future use.
  protoc_version: 3.7.1
  # Another configuration to inherit from. Directives defined in this
  # file will inherit from those defined in the base file only if the values in
  # this file are left empty.
  base_config: "/path/to/another/protogen.yaml"


# "Environment" variables that can be used when defining protobuf generation
# directives. This is defined using a dictionary of key-value pairs. The keys
# can be accessed using `{{key}}`.
envs:
  # `{{proto_dir}}` will be replaced with `./proto`
  "proto_dir": "./proto"

# Configuration for the `generate` command
generate:
  # Python specific generation configuration
  python:
    # Use the grpcio-tools package to generate protobufs instead of the
    # grpc_python_plugin binary. This uses `python -m grpc_tools.protoc` in lieu
    # of `protoc`.
    use_grpcio_tools_package: false

  # Default values for elements under the `services` section. These values will
  # be used if left empty in a service definition.
  service_defaults:
    includes:
      - "{{shared_proto}}"
    plugins:
      go:
        output: "{{output_dir}}"
        flags:
          - "plugins=grpc"

  # A list of services to generate protobufs for. The protogen command supports
  # generating protobufs for a single service.
  services:
    # The name of the service.
    - name: "foo"
      # Include paths for the protoc command. These will be transformed into
      # `-I <path>` entries.
      includes:
        - "{{proto_dir}}/foo"
        - "{{proto_dir}}"
      # Input files for the protoc command. Glob syntax can be used.
      inputs:
        - "{{proto_dir}}/foo/foo.proto"
      # Plugins for the protoc command. Multiple plugins can be defined.
      plugins:
        # The plugin name. This will be transformed into the `--<name>_out=`
        # directive.
        go:
          # The output directory for the plugin.
          output: "./pb/foopb"
          # Any additional flags to pass to the plugin. For Go, `plugins=grpc`
          # should be all you need
          flags:
            - "plugins=grpc"
      # List of plugins to use. Omit this to generate with all defined plugins.
      selected_plugins:
        - go
      # Addition options specific to the `go` plugin.
      go_options:
        # A map of modifiers. This will be rendered as `Mkey=value` and is used
        # to resolve imported files to their go package.
        modifiers:
          foo/foo.proto: "{{base_import}}/foo"

`protoc` version management

The protogen config specifies the version of the protoc binary to use. This version is automatically downloaded and cached on the user’s system. Since we use docker, where we pin the version in the image itself, we typically don’t have need for this feature. However it helps to enable deterministic builds outside of a docker environment.

Config inheritance

In the config there’s a field called protogen.base_config. This specifies the path to another config that the current config will inherit from. Config inheritance allows us to define the bulk of the configuration in shared-proto, our source-of-truth. Each service’s protogen.yaml will simply inherit from shared-proto, ensuring that all services using protogen will generate another service’s protobufs the same way.

Variables

Variables can be defined using the envs key, which is a string->string map. Subsequently in the config, values can contain a {{var}} string, and this will be replaced by the corresponding variable (var in this case).

Firstly, this reduces code duplication. Secondly and more importantly, this ties into how configs can inherit from another; a service inheriting from another config can define service-specific variables which will be used in the final output.

Declarative approach to `protoc` flags

In the generate section, the directives are largely the same as prototool. For the most part the documented config file above should be self-explanatory.

Of note is the selected_plugins field. Because of config inheritance, a service inheriting from another config might contain directives for plugins they don’t need, so the selected_plugins allows the service to specify which plugins they’re interested in.

Dry run

Protogen can be invoked with the --dry-run flag, which will output the protoc commands that will be run. The output is valid bash, and can be redirected into a bash script.

Protogen in Carousell

shared-proto

First let’s reiterate on our sample shared-proto directory structure

▾ foo/
    foo.proto
▾ bar/
    bar.proto
▾ baz/
    baz.proto
  protogen.yaml

File: shared-proto/protogen.yaml

In shared-proto, protogen.yaml contains the source of truth for how to generate protobufs for all services. Here’s what it might look like for our foo, bar, baz services:

protogen:
  protoc_version: "3.7.1"

envs:
  "shared_proto": "."
  "output_dir": "./pb/"
  "vendor_dir": "./vendor"
  "go_import_base": "github.com/carousell/shared-proto/pb"

generate:
  service_defaults:
    includes:
      - "{{shared_proto}}/"
    plugins:
      go:
        output: "{{output_dir}}"
        flags:
          - "plugins=grpc"
      orion:
        output: "{{output_dir}}"

  services:
    - name: "foo"
      inputs:
        - "{{shared_proto}}/foo/*.proto"
      go_options:
        modifiers:
          bar/bar.proto: "{{go_import_base}}/bar"

    - name: "bar"
      inputs:
        - "{{shared_proto}}/bar/bar.proto"

    - name: "baz"
      inputs:
        - "{{shared_proto}}/baz/baz.proto"

Some things to notice:

For the majority of services, only the input files need to be defined. The rest are specified in the generate.service_defaults section.
Inputs have a {{shared-proto}} variable, because the actual location on disk may vary between developers.
Outputs have a {{output_dir}} variable, because each service would output to their own repository.
There is a {{go_import_base}} variable, which is generally used to map imports to a package in the generated output.
This configuration does not inherit from anything, because shared-proto’s protogen.yaml is intended to be the base for all consuming services.

File: shared-proto/generate_protobufs.sh

The shared-proto repository also contains a script to generate protobufs. Services will invoke this script whenever they generate protobufs for another service. This allows us to make changes to this script and have services be able to pick up those changes immediately.

#!/bin/bash -euo pipefail

PROTOGEN_VERSION="v0.2.3"
SERVICE=""
SHARED_PROTO_DIR=""
GO_PROTOBUF_VERSION=""
PYTHON_GRPCIO_TOOLS_VERSION=""
USE_LOCAL_IMAGE=false

# OMITTED FOR BREVITY: flags to populate above variables
# --shared-proto-dir, --protogen-version, --go-protobuf-version, --python-grpcio-tools-version, --service, --use-local-image,

# Use this script's directory as SHARED_PROTO_DIR unless specified otherwise
if [ -z "${SHARED_PROTO_DIR}" ]; then
    SHARED_PROTO_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
fi

# Support builds from local docker images, in case a developer does not or
# should not have access to our online docker image registry.
if [ $USE_LOCAL_IMAGE = true ]; then
    dockerfile_lines=("FROM protogen:${PROTOGEN_VERSION}")
else
    dockerfile_lines=("FROM <private_registry_url>/protogen:${PROTOGEN_VERSION}")
fi

# Construct Dockerfile based on provided variables
[ ! -z "${GO_PROTOBUF_VERSION}" ] && dockerfile_lines+=("RUN go_plugin_install ${GO_PROTOBUF_VERSION}")
[ ! -z "${PYTHON_GRPCIO_TOOLS_VERSION}" ] && dockerfile_lines+=("RUN python_plugin_install ${PYTHON_GRPCIO_TOOLS_VERSION}")
dockerfile=$(IFS=$'\n'; echo "${dockerfile_lines[*]}")

# Build image
image_tag="protogen-custom"
docker build -t $image_tag -<<< "${dockerfile}"

# Generate protobufs
docker run --rm -v $(pwd):/work -v "$SHARED_PROTO_DIR:/tmp/shared-proto:ro" $image_tag protogen generate --service="$SERVICE"

This script exposes some flags for customizing plugin versions, and dynamically constructs a dockerfile to build a custom image from. Notice that the custom image inherits from the image uploaded to our private registry. The custom image simply overrides the versions of installed plugins. This provides a completely consistent environment for all engineers.

Build tests

We’ve also integrated build tests into shared-proto. This ensures that the directives defined in protogen.yaml are valid. As a consequence of this guarantee, we can provide a guarantee to services inheriting from shared-proto that they will not face errors during generation.

Services

Now that you’re familiar with how protogen integrates with shared-proto, this is what it looks like on a service level. Here’s protogen.yaml for our example foo service.

File: <service>/protogen.yaml

protogen:
  protoc_version: "3.7.1"
  base_config: "/tmp/shared-proto/protogen.yaml"

envs:
  "shared_proto": "/tmp/shared-proto"
  "output_dir": "./foo/pb"
  "go_import_base": "github.com/carousell/foo/pb"

generate:
  services:
    - name: "foo"
      selected_plugins: [go, orion]

    - name: "bar"
      selected_plugins: [go]

    - name: "baz"
      selected_plugins: [go]

At the service level, we inherit configuration from shared-proto while setting some service-specific variables.

Notice that from the service’s perspective, there is no need to be aware of protoc flags. The service is insulated from having to maintain this knowledge. This means that if, for instance, bar service requires a change to their generation directives, only shared-proto needs to be updated. All upstream services relying on bar service require no changes and will continue to just work.

Adding a new service is a matter of adding two lines to protogen.yaml.

File: <service>/generate_protobufs.sh

Here’s the script a service would use to invoke protogen.

#!/bin/bash

PROTOGEN_VERSION=""
GOLANG_PROTOBUF_VERSION=$(go list -f "{{.Version}}" -m github.com/golang/protobuf)
BRANCH="master"
SERVICE=""

while [[ $# -gt 0 ]]
do
key="$1"
case $key in
	-b|--branch)
	BRANCH="$2"
	shift
	shift
	;;
	--service)
	SERVICE="$2"
	shift
	shift
	;;
	*)
	shift
	;;
esac
done

SHARED_PROTO_DIR="$HOME/.shared-proto"
rm -rf $SHARED_PROTO_DIR || true
git clone -b ${BRANCH} --depth 1 git@github.com:carousell/shared-proto.git $SHARED_PROTO_DIR

$SHARED_PROTO_DIR/generate_protobufs.sh \
    --protogen-version      "${PROTOGEN_VERSION}" \
    --go-protobuf-version   "${GOLANG_PROTOBUF_VERSION}" \
    --service               "${SERVICE}"

This script does not differ between services for the most part. Notice how the protobuf version is pinned, retrieved dynamically based on the go mod dependencies. This script essentially just clones shared-proto and invokes shared-proto’s generate_protobufs.sh script. Everything else is taken care by protogen’s config.

Closing thoughts

These changes were primarily motivated by wanting to solve this problem I faced, but over time as the solution started to take shape I started to realize how it could be applied to the company at large. I encourage engineers to take a look at what their pain points are, and to start looking for a solution. If the solution can be applied more broadly (to your team, to your company, to the wider public), then even better. But even if your solution has little adoption, at least you’ve solved a personal problem, and that’s something!

Overall I’m quite happy with how protobuf management at Carousell has improved since these changes and tooling were introduced. So far I’ve not had protobuf generation break since integrating my services with protogen. I’ve also experienced far fewer headaches since.