Thriftrw
Triftrw is a general purpose, serialization-only, thrift encoding libraries written by Uber. Since there’re some thrift code generation ralated issues in my working project right now, I decide to give myself a chance to dive into this attractive library. Of course, the Go implementation.
AST
There are three basic interfaces in thriftrw’s ast package
- Node is a single element in the Thrift AST.
- Walker provides acccess to information about the state of the AST walker.
- Visitor walks an AST. The Visit function is called on each node of the AST.
type Node interface {
node()
visitChildren(nodeStack, visitor)
}
type Walker interface {
Ancestors() []Node
Parent() Node
}
type Visitor interface {
Visit(w Walker, n Node) Visitor
}
visitor adapts a user-provided Visitor so that we can use the internal visitChildren method on nodes.
While nodeStack keeps nodes in the order they were visited
type visitor struct {
Visitor Visitor
}
type nodeStack []Node
All the following types implement the Node interface
var _ Node = (*Annotation)(nil)
var _ Node = BaseType{}
var _ Node = (*Constant)(nil)
var _ Node = ConstantBoolean(true)
var _ Node = ConstantDouble(1.0)
var _ Node = ConstantInteger(1)
var _ Node = ConstantList{}
var _ Node = ConstantMap{}
var _ Node = ConstantMapItem{}
var _ Node = ConstantReference{}
var _ Node = ConstantString("hi")
var _ Node = (*Enum)(nil)
var _ Node = (*EnumItem)(nil)
var _ Node = (*Field)(nil)
var _ Node = (*Function)(nil)
var _ Node = (*Include)(nil)
var _ Node = ListType{}
var _ Node = MapType{}
var _ Node = (*Namespace)(nil)
var _ Node = (*Program)(nil)
var _ Node = (*Service)(nil)
var _ Node = SetType{}
var _ Node = (*Struct)(nil)
var _ Node = TypeReference{}
var _ Node = (*Typedef)(nil)
Type
In the very beginning, there is a Type interface representing all viable field types in Thrift.
type Type interface {
Node
fmt.Stringer
fieldType()
}
For bool, byte, i16, i32, i64, double, string, binary these base types, BaseType gives us a good example.
type BaseTypeID int
const (
BoolTypeID BaseTypeID = iota + 1 // bool
I8TypeID // byte/i8
I16TypeID // i16
I32TypeID // i32
I64TypeID // i64
DoubleTypeID // double
StringTypeID // string
BinaryTypeID // binary
)
type BaseType struct {
ID BaseTypeID
Annotations []*Annotation
Line int
}
func (BaseType) node() {}
func (BaseType) fieldType() {}
func (bt BaseType) lineNumber() int { return bt.Line }
func (bt BaseType) visitChildren(ss nodeStack, v visitor) {
for _, ann := range bt.Annotations {
v.visit(ss, ann)
}
}
func (bt BaseType) String() string {
var name string
switch bt.ID {
case BoolTypeID:
name = "bool"
case I8TypeID:
// ...
default:
panic(fmt.Sprintf("unknown base type %v", bt.ID))
}
return appendAnnotations(name, bt.Annotations)
}
Further, MapType’s implementation turns to pretty straightforward.
type MapType struct {
KeyType, ValueType Type
Annotations []*Annotation
Line int
}
func (MapType) node() {}
func (MapType) fieldType() {}
func (mt MapType) lineNumber() int { return mt.Line }
func (mt MapType) visitChildren(ss nodeStack, v visitor) {
v.visit(ss, mt.KeyType)
v.visit(ss, mt.ValueType)
for _, ann := range mt.Annotations {
v.visit(ss, ann)
}
}
func (mt MapType) String() string {
return appendAnnotations(
fmt.Sprintf("map<%s, %s>", mt.KeyType, mt.ValueType),
mt.Annotations,
)
}
In some language bindings, map type could be followed by type annotations, i.e. map<string, list<i32>> (java.type = "MultiMap").
Beside the key value nodes, each annotation should be visited as well.
Definition
Definition interface unifies the different types representing items defined in the Thrift file.
type Definition interface {
Node
Info() DefinitionInfo
definition()
}
type DefinitionInfo struct {
Name string
Line int
}
Given const i32 foo = 42, a canonical implementation of Constant is defined as follows.
type Constant struct {
Name string
Type Type
Value ConstantValue
Line int
Doc string
}
func (*Constant) node() {}
func (*Constant) definition() {}
func (c *Constant) lineNumber() int { return c.Line }
func (c *Constant) visitChildren(ss nodeStack, v visitor) {
v.visit(ss, c.Type)
v.visit(ss, c.Value)
}
func (c *Constant) Info() DefinitionInfo {
return DefinitionInfo{Name: c.Name, Line: c.Line}
}
type ConstantValue interface {
Node
constantValue()
}
Particularly ConstantValue interface is leveraged to unify all the different types representing constant values.
Here is the builtin trivial implementation for boolean value:
type ConstantBoolean bool
func (ConstantBoolean) node() {}
func (ConstantBoolean) visitChildren(nodeStack, visitor) {}
func (ConstantBoolean) constantValue() {}
Field
Before diving into complicated definitions, let’s take a look that how to define a single field in all the struct, union, exception and function parameter.
type Requiredness int
const (
Unspecified Requiredness = iota // unspecified (default)
Required // required
Optional // optional
)
type Field struct {
ID int
Name string
Type Type
Requiredness Requiredness
Default ConstantValue
Annotations []*Annotation
Line int
Doc string
}
Requiredness represents whether a field was marked as required or optional, or if the user did not specify either.
Struct
Struct is a collection of named fields with different types. For example,
struct User {
1: required string name (min_length = "3")
2: optional Status status = Enabled;
}
struct i128 {
1: required i64 high
2: required i64 low
} (py.serializer = "foo.Int128Serializer")
union Contents {
1: string plainText
2: binary pdf
}
exception ServiceError { 1: required string message }
Given all the above typical use cases, Struct’s definition comes clear.
type StructureType int
const (
StructType StructureType = iota + 1 // struct
UnionType // union
ExceptionType // exception
)
type Struct struct {
Name string
Type StructureType
Fields []*Field
Annotations []*Annotation
Line int
Doc string
}
Function
Function is a single function inside a service.
type Function struct {
Name string
Parameters []*Field
ReturnType Type
Exceptions []*Field
OneWay bool
Annotations []*Annotation
Line int
Doc string
}
Service
Service is a collection of functions. For inheriting from another service, ServiceReference references to its parent.
type Service struct {
Name string
Functions []*Function
Parent *ServiceReference
Annotations []*Annotation
Line int
Doc string
}
type ServiceReference struct {
Name string
Line int
}
Annotation
Annotation represents a type annotation. Type annotations are key-value pairs in the form (foo = "bar", baz = "qux")
They may be used to customize the generated code. Annotations are optional anywhere in the code where they’re accepted and may be skipped completely.
type Annotation struct {
Name string
Value string
Line int
}
func (*Annotation) node() {}
func (*Annotation) visitChildren(nodeStack, visitor) {}
func (ann *Annotation) lineNumber() int { return ann.Line }
func (ann *Annotation) String() string {
return fmt.Sprintf("%s = %q", ann.Name, ann.Value)
}
Include
Include is a request to include another Thrift file, like include "shared.thrift"
type Include struct {
Path string
Name string
Line int
}
In thriftrw, there is a custom Include-As syntax may be used to change the name under which the file is imported.
For example, include t "shared.thrift"
Namespace
Namespace statements allow users to choose the package name used by the generated code in certain languages.
type Namespace struct {
Scope string
Name string
Line int
}
What’s more is that both Include and Namespace implement the Header interface, which holds the header part information by parsing
the whole thrift file into a Program node.
// HeaderInfo provides a common way to access the line for a header.
type HeaderInfo struct {
Line int
}
// Header unifies types representing header in the AST.
type Header interface {
Node
Info() HeaderInfo
header()
}
// Program represents the full syntax tree for a single .thrift file.
type Program struct {
Headers []Header
Definitions []Definition
}
func (*Program) node() {}
func (p *Program) visitChildren(ss nodeStack, v visitor) {
for _, h := range p.Headers {
v.visit(ss, h)
}
for _, d := range p.Definitions {
v.visit(ss, d)
}
}
Compiler
There is an internel compiler taking charge of compiling a thrift file to predefined type specifications.
// compiler is responsible for compiling Thrift files.
type compiler struct {
// fs is the interface used to interact with the filesystem.
fs FS
// nonStrict will compile Thrift files that do not pass strict validation.
nonStrict bool
// Map from file path to Module representing that file.
Modules map[string]*Module
}
Module represents a compiled Thrift module. It contains all information about all known types, constants, services, and includes from the Thrift file.
type Module struct {
Name string
ThriftPath string
// Mapping from the /Thrift name/ to the compiled representation of
// different definitions.
Includes map[string]*IncludedModule
Constants map[string]*Constant
Types map[string]TypeSpec
Services map[string]*ServiceSpec
Raw []byte // The raw IDL input.
}
An offhand Compile function for getting compiled module from a specific thrift file is exported as well.
func Compile(path string, opts ...Option) (*Module, error) {/*...*/}
Furthermore, Thriftrw use ragel and yacc to generate parser for gathering parsed definitions into module’s fields, I will dive into details in another post.
Generator
Package gen generates Go code based on a compiled Thrift module specification. The core logic can be extracted as follows.
func Generate(m *compile.Module, o *Options) error {
// ...
files := make(map[string][]byte)
generate := func(m *compile.Module) error {
// contents are generated go codes comply with the module
path, contents, err := generateModule(m, o, // ...)
// mark the target thrift file has been generated
addFile(files, path, contents)
}
// ...
// walk along with any other thrift files it includes
if err := m.Walk(generate); err != nil {
return err
}
// ...
// writes to files
for relPath, contents := range files {
ioutil.WriteFile(relPath, contents)
}
return nil
}
Generally, it should always start walking from a given compiled thrift module, and generate the corresponding Go code with any other thrift files if it includes. The generated code will be kept in a map with target go file paths as keys, which is used to deal with conflict just in case.
Let’s dive into generateModule, the core generation process for a specific module, which returns a <key, value> pair in the above map. Here we drop some other parameters to keep the function logic simple and clear.
func generateModule(
m *compile.Module,
o *Options,
// ...
) (outputFilepath string, contents []byte, err error) {
// ...
// new an internal generator
g := NewGenerator(o)
if len(m.Contants) > 0 {
for _, constantName := range sortStringKeys(m.Constants) {
if err := Constant(g, m.Constants[constantName]); err != nil {
return "", nil, err
}
}
}
if len(m.Types) > 0 {
for _, typeName := range sortStringKeys(m.Types) {
if err := TypeDefinition(g, m.Types[typeName]); err != nil {
return "", nil, err
}
}
}
// ...
if len(m.Services) > 0 {
if err = Services(g, m.Services); err != nil {
return "", nil, fmt.Errorf("could not generate code for services %v", err)
}
}
// output contents
buff := new(bytes.Buffer)
if err := g.Write(buff, nil); err != nil {
return "", nil, err
}
return outputFilepath, buff.Bytes(), nil
}
The main process looks pretty straightforward. With an internal generator object, it takes turn to generate Constants, Types and Services.
For Constants generator, TemplateFunc is a utility function wrapper to register template functions, and DeclareFromTemplate is used to make sure the constant
declarations are unique in the same namespace.
func Constant(g Generator, c *compile.Constant) error {
err := g.DeclareFromTemplate(
`<formatDoc .Doc><if canBeConstant .Type>const<else>var<end> <constantName .Name> <typeReference .Type> = <constantValue .Value .Type>`,
c,
TemplateFunc("constantValue", ConstantValue),
TemplateFunc("canBeConstant", canBeConstant),
TemplateFunc("constantName", constantName),
)
return wrapGenerateError(c.Name, err)
}
The TypeDefinition is also pretty straightforward, and tamplating processes for three different TypeSpecs are basically same to Constant.
func TypeDefinition(g Generator, spec compile.TypeSpec) error {
switch s := spec.(type) {
case *compile.EnumSpec:
return enum(g, s)
case *compile.StructSpec:
return structure(g, s)
case *compile.TypedefSpec:
return typedef(g, s)
default:
panic(fmt.Sprintf("%q is not a defined type", spec.ThriftName()))
}
}
Services generates code for all services into a single file and stores the code in the generator to be written. ServiceFunction is also essentially same to
above mentioned templating staffs, but with more inevitable details around a function definition.
func Services(g Generator, services map[string]*compile.ServiceSpec) error {
for _, serviceName := range sortStringKeys(services) {
s := services[serviceName]
for _, functionName := range sortStringKeys(s.Functions) {
function := s.Functions[functionName]
if err := ServiceFunction(g, s, function); err != nil {
return fmt.Errorf(
"could not generate types for %s.%s: %v",
s.Name, functionName, err)
}
}
}
return nil
}
Summary
There’re a lot of examples in gen/internel/tests helping you understand Go code generation more comprehensively.