Thriftrw

Triftrw is a general purpose, serialization-only, thrift encoding libraries written by Uber. Since there’re some thrift code generation ralated issues in my working project right now, I decide to give myself a chance to dive into this attractive library. Of course, the Go implementation.

AST

There are three basic interfaces in thriftrw’s ast package

  • Node is a single element in the Thrift AST.
  • Walker provides acccess to information about the state of the AST walker.
  • Visitor walks an AST. The Visit function is called on each node of the AST.
type Node interface {
	node()
	visitChildren(nodeStack, visitor)
}

type Walker interface {
	Ancestors() []Node
	Parent() Node
}

type Visitor interface {
	Visit(w Walker, n Node) Visitor
}

visitor adapts a user-provided Visitor so that we can use the internal visitChildren method on nodes. While nodeStack keeps nodes in the order they were visited

type visitor struct {
	Visitor Visitor
}

type nodeStack []Node

All the following types implement the Node interface

var _ Node = (*Annotation)(nil)
var _ Node = BaseType{}
var _ Node = (*Constant)(nil)
var _ Node = ConstantBoolean(true)
var _ Node = ConstantDouble(1.0)
var _ Node = ConstantInteger(1)
var _ Node = ConstantList{}
var _ Node = ConstantMap{}
var _ Node = ConstantMapItem{}
var _ Node = ConstantReference{}
var _ Node = ConstantString("hi")
var _ Node = (*Enum)(nil)
var _ Node = (*EnumItem)(nil)
var _ Node = (*Field)(nil)
var _ Node = (*Function)(nil)
var _ Node = (*Include)(nil)
var _ Node = ListType{}
var _ Node = MapType{}
var _ Node = (*Namespace)(nil)
var _ Node = (*Program)(nil)
var _ Node = (*Service)(nil)
var _ Node = SetType{}
var _ Node = (*Struct)(nil)
var _ Node = TypeReference{}
var _ Node = (*Typedef)(nil)

Type

In the very beginning, there is a Type interface representing all viable field types in Thrift.

type Type interface {
	Node
	fmt.Stringer
	fieldType()
}

For bool, byte, i16, i32, i64, double, string, binary these base types, BaseType gives us a good example.

type BaseTypeID int

const (
	BoolTypeID   BaseTypeID = iota + 1 // bool
	I8TypeID                           // byte/i8
	I16TypeID                          // i16
	I32TypeID                          // i32
	I64TypeID                          // i64
	DoubleTypeID                       // double
	StringTypeID                       // string
	BinaryTypeID                       // binary
)

type BaseType struct {
	ID          BaseTypeID
	Annotations []*Annotation
	Line        int
}

func (BaseType) node()      {}
func (BaseType) fieldType() {}

func (bt BaseType) lineNumber() int { return bt.Line }

func (bt BaseType) visitChildren(ss nodeStack, v visitor) {
	for _, ann := range bt.Annotations {
		v.visit(ss, ann)
	}
}

func (bt BaseType) String() string {
	var name string

	switch bt.ID {
	case BoolTypeID:
		name = "bool"
	case I8TypeID:
        // ...
	default:
		panic(fmt.Sprintf("unknown base type %v", bt.ID))
	}

	return appendAnnotations(name, bt.Annotations)
}

Further, MapType’s implementation turns to pretty straightforward.

type MapType struct {
	KeyType, ValueType Type
	Annotations        []*Annotation
	Line               int
}

func (MapType) node()      {}
func (MapType) fieldType() {}

func (mt MapType) lineNumber() int { return mt.Line }

func (mt MapType) visitChildren(ss nodeStack, v visitor) {
	v.visit(ss, mt.KeyType)
	v.visit(ss, mt.ValueType)
	for _, ann := range mt.Annotations {
		v.visit(ss, ann)
	}
}

func (mt MapType) String() string {
	return appendAnnotations(
		fmt.Sprintf("map<%s, %s>", mt.KeyType, mt.ValueType),
		mt.Annotations,
	)
}

In some language bindings, map type could be followed by type annotations, i.e. map<string, list<i32>> (java.type = "MultiMap"). Beside the key value nodes, each annotation should be visited as well.

Definition

Definition interface unifies the different types representing items defined in the Thrift file.

type Definition interface {
	Node
	Info() DefinitionInfo
	definition()
}

type DefinitionInfo struct {
	Name string
	Line int
}

Given const i32 foo = 42, a canonical implementation of Constant is defined as follows.

type Constant struct {
	Name  string
	Type  Type
	Value ConstantValue
	Line  int
	Doc   string
}

func (*Constant) node()       {}
func (*Constant) definition() {}

func (c *Constant) lineNumber() int { return c.Line }

func (c *Constant) visitChildren(ss nodeStack, v visitor) {
	v.visit(ss, c.Type)
	v.visit(ss, c.Value)
}

func (c *Constant) Info() DefinitionInfo {
	return DefinitionInfo{Name: c.Name, Line: c.Line}
}

type ConstantValue interface {
	Node
	constantValue()
}

Particularly ConstantValue interface is leveraged to unify all the different types representing constant values. Here is the builtin trivial implementation for boolean value:

type ConstantBoolean bool

func (ConstantBoolean) node()   {}
func (ConstantBoolean) visitChildren(nodeStack, visitor)   {}
func (ConstantBoolean) constantValue()   {}

Field

Before diving into complicated definitions, let’s take a look that how to define a single field in all the struct, union, exception and function parameter.

type Requiredness int

const (
	Unspecified Requiredness = iota // unspecified (default)
	Required                        // required
	Optional                        // optional
)

type Field struct {
	ID           int
	Name         string
	Type         Type
	Requiredness Requiredness
	Default      ConstantValue
	Annotations  []*Annotation
	Line         int
	Doc          string
}

Requiredness represents whether a field was marked as required or optional, or if the user did not specify either.

Struct

Struct is a collection of named fields with different types. For example,

struct User {
	1: required string name (min_length = "3")
	2: optional Status status = Enabled;
}

struct i128 {
	1: required i64 high
	2: required i64 low
} (py.serializer = "foo.Int128Serializer")

union Contents {
	1: string plainText
	2: binary pdf
}

exception ServiceError { 1: required string message }

Given all the above typical use cases, Struct’s definition comes clear.

type StructureType int

const (
	StructType    StructureType = iota + 1 // struct
	UnionType                              // union
	ExceptionType                          // exception
)

type Struct struct {
	Name        string
	Type        StructureType
	Fields      []*Field
	Annotations []*Annotation
	Line        int
	Doc         string
}

Function

Function is a single function inside a service.

type Function struct {
	Name        string
	Parameters  []*Field
	ReturnType  Type
	Exceptions  []*Field
	OneWay      bool
	Annotations []*Annotation
	Line        int
	Doc         string
}

Service

Service is a collection of functions. For inheriting from another service, ServiceReference references to its parent.

type Service struct {
	Name      string
	Functions []*Function
	Parent      *ServiceReference
	Annotations []*Annotation
	Line        int
	Doc         string
}

type ServiceReference struct {
	Name string
	Line int
}

Annotation

Annotation represents a type annotation. Type annotations are key-value pairs in the form (foo = "bar", baz = "qux") They may be used to customize the generated code. Annotations are optional anywhere in the code where they’re accepted and may be skipped completely.

type Annotation struct {
	Name  string
	Value string
	Line  int
}

func (*Annotation) node() {}
func (*Annotation) visitChildren(nodeStack, visitor) {}
func (ann *Annotation) lineNumber() int { return ann.Line }
func (ann *Annotation) String() string {
	return fmt.Sprintf("%s = %q", ann.Name, ann.Value)
}

Include

Include is a request to include another Thrift file, like include "shared.thrift"

type Include struct {
	Path string
	Name string
	Line int
}

In thriftrw, there is a custom Include-As syntax may be used to change the name under which the file is imported. For example, include t "shared.thrift"

Namespace

Namespace statements allow users to choose the package name used by the generated code in certain languages.

type Namespace struct {
	Scope string
	Name  string
	Line  int
}

What’s more is that both Include and Namespace implement the Header interface, which holds the header part information by parsing the whole thrift file into a Program node.

// HeaderInfo provides a common way to access the line for a header.
type HeaderInfo struct {
	Line int
}

// Header unifies types representing header in the AST.
type Header interface {
	Node

	Info() HeaderInfo
	header()
}

// Program represents the full syntax tree for a single .thrift file.
type Program struct {
	Headers     []Header
	Definitions []Definition
}

func (*Program) node() {}

func (p *Program) visitChildren(ss nodeStack, v visitor) {
	for _, h := range p.Headers {
		v.visit(ss, h)
	}
	for _, d := range p.Definitions {
		v.visit(ss, d)
	}
}

Compiler

There is an internel compiler taking charge of compiling a thrift file to predefined type specifications.

// compiler is responsible for compiling Thrift files.
type compiler struct {
	// fs is the interface used to interact with the filesystem.
	fs FS
	// nonStrict will compile Thrift files that do not pass strict validation.
	nonStrict bool
	// Map from file path to Module representing that file.
	Modules map[string]*Module
}

Module represents a compiled Thrift module. It contains all information about all known types, constants, services, and includes from the Thrift file.

type Module struct {
	Name       string
	ThriftPath string

	// Mapping from the /Thrift name/ to the compiled representation of
	// different definitions.

	Includes  map[string]*IncludedModule
	Constants map[string]*Constant
	Types     map[string]TypeSpec
	Services  map[string]*ServiceSpec

	Raw []byte // The raw IDL input.
}

An offhand Compile function for getting compiled module from a specific thrift file is exported as well.

func Compile(path string, opts ...Option) (*Module, error) {/*...*/}

Furthermore, Thriftrw use ragel and yacc to generate parser for gathering parsed definitions into module’s fields, I will dive into details in another post.

Generator

Package gen generates Go code based on a compiled Thrift module specification. The core logic can be extracted as follows.

func Generate(m *compile.Module, o *Options) error {
    // ...
    files := make(map[string][]byte)
    generate := func(m *compile.Module) error {
        // contents are generated go codes comply with the module
        path, contents, err := generateModule(m, o, // ...)
        // mark the target thrift file has been generated
        addFile(files, path, contents)
    }
    // ...

    // walk along with any other thrift files it includes
    if err := m.Walk(generate); err != nil {
        return err
    }
    // ...

    // writes to files
    for relPath, contents := range files {
        ioutil.WriteFile(relPath, contents)
    }
    
    return nil
}

Generally, it should always start walking from a given compiled thrift module, and generate the corresponding Go code with any other thrift files if it includes. The generated code will be kept in a map with target go file paths as keys, which is used to deal with conflict just in case.

Let’s dive into generateModule, the core generation process for a specific module, which returns a <key, value> pair in the above map. Here we drop some other parameters to keep the function logic simple and clear.

func generateModule(
	m *compile.Module,
	o *Options,
    // ...
) (outputFilepath string, contents []byte, err error) {
    // ...
    
    // new an internal generator
    g := NewGenerator(o)
    
    if len(m.Contants) > 0 {
        for _, constantName := range sortStringKeys(m.Constants) {
            if err := Constant(g, m.Constants[constantName]); err != nil {
                return "", nil, err
            }
        }
    }
    
    if len(m.Types) > 0 {
		for _, typeName := range sortStringKeys(m.Types) {
			if err := TypeDefinition(g, m.Types[typeName]); err != nil {
				return "", nil, err
			}
		}
	}
    
    // ...
    
    if len(m.Services) > 0 {
        if err = Services(g, m.Services); err != nil {
			return "", nil, fmt.Errorf("could not generate code for services %v", err)
		}
    }
    
    // output contents
    buff := new(bytes.Buffer)
	if err := g.Write(buff, nil); err != nil {
        return "", nil, err
	}

	return outputFilepath, buff.Bytes(), nil
}

The main process looks pretty straightforward. With an internal generator object, it takes turn to generate Constants, Types and Services.

For Constants generator, TemplateFunc is a utility function wrapper to register template functions, and DeclareFromTemplate is used to make sure the constant declarations are unique in the same namespace.

func Constant(g Generator, c *compile.Constant) error {
	err := g.DeclareFromTemplate(
		`<formatDoc .Doc><if canBeConstant .Type>const<else>var<end> <constantName .Name> <typeReference .Type> = <constantValue .Value .Type>`,
		c,
		TemplateFunc("constantValue", ConstantValue),
		TemplateFunc("canBeConstant", canBeConstant),
		TemplateFunc("constantName", constantName),
	)
	return wrapGenerateError(c.Name, err)
}

The TypeDefinition is also pretty straightforward, and tamplating processes for three different TypeSpecs are basically same to Constant.

func TypeDefinition(g Generator, spec compile.TypeSpec) error {
	switch s := spec.(type) {
	case *compile.EnumSpec:
		return enum(g, s)
	case *compile.StructSpec:
		return structure(g, s)
	case *compile.TypedefSpec:
		return typedef(g, s)
	default:
		panic(fmt.Sprintf("%q is not a defined type", spec.ThriftName()))
	}
}

Services generates code for all services into a single file and stores the code in the generator to be written. ServiceFunction is also essentially same to above mentioned templating staffs, but with more inevitable details around a function definition.

func Services(g Generator, services map[string]*compile.ServiceSpec) error {
	for _, serviceName := range sortStringKeys(services) {
		s := services[serviceName]
		for _, functionName := range sortStringKeys(s.Functions) {
			function := s.Functions[functionName]
			if err := ServiceFunction(g, s, function); err != nil {
				return fmt.Errorf(
					"could not generate types for %s.%s: %v",
					s.Name, functionName, err)
			}
		}
	}

	return nil
}

Summary

There’re a lot of examples in gen/internel/tests helping you understand Go code generation more comprehensively.