How to use bi-directional gRPC in go

Most robots don’t have a great deal on on-board processing power and so when we considered the basic architecture of The Social Robot it was clear we need to offload as much work as we can. In the first architecture post I settled on using gRPC as the main brain-body transport. It’s now time to start building out the brain and body code and the communication between these components.

In order to keep the length manageable this article will implement the simplest possible server and a “fake body” client. The following articles will add authentication and build clients that run on the actual robots.

You can find all the code discussed here in this pull request.

What is gRPC?

If you’ve been working on web services exposing REST or GraphQL endpoints then gRPC may seem a little strange to you, so before getting down to code let’s see what gRPC is and how it differs from other web technologies. The “g” in “gRPC” stands for “Google” - gRPC is an open-source framework from Google that is designed to improve upon and replace a Google-internal framework. The “RPC” stands for “Remote Procedure Call” - the idea is that you can write code that invokes functions that execute not on the machine running the rest of your program but a remote machine via a network connection. “Procedure” is not a very common term these days and originates from languages such as Pascal (and possibly older ones, I’m not sure), but you can just think of it as equivalent to “function”.

Before REST became the most common way to build web-services, various flavours of RPC were the normal way for services to interact. Originally companies built proprietary mechanisms such as Sun RPC. Eventually standards bodies created frameworks suchs as DCE and CORBA - these were fairly heavyweight systems and (if I remember correctly) awkward to set up, expensive or both. As the web became more popular standards such as XML-RPC and later SOAP that used HTTP as the transport and XML to describe the content became more popular as these were much simpler and easier to get started with and people were comfortable exposing HTTP endpoints on the internet.

This may all be starting to sound like ancient history, so why is gRPC a thing? In two words: performance and efficiency. It’s very convenient to manipulate JSON data in a client running in a web browser but JSON encoded data is not very space efficient and can be relatively slow to parse. If you’re exchanging data between two services or your clients are not web browsers then it can be much faster and more space-efficient to use a binary encoding.

gRPC uses protocol buffers - commonly referred to as “protobuf” - to define a language neutral and platform neutral binary representation for the messages to be exchanged between a server and client. Tools are provide to read a protobuf file and generate code to serialise and de-serialise data in a variety of languages.

gRPC adds to a protobuf a network transport that takes advantage of modern web standards such as HTTP/2, and QUIC and provides mechanisms for encryption and authentication.

One of the cool things about gRPC (and the reason I decided to use it) is that it supports bi-directional streaming so that not only can clients stream messages to the server, but the service can independently stream messages back to the client. This means that the robot (client) can send events to the brain (server) and the brain can send commands back to the robot without requiring any polling, separate connection back from brain to robot or some custom protocol. I could have opted to just open a TCP/IP connection or websocket and do everything else myself, but why take on additional work if someone else has already done it?

How to define a bi-directional service

gRPC supports several types of remote call, all of which can be mixed together in the same service:

single function call - client sends a single message to the service and gets a single response back
client streaming - the client sends a stream of messages to the server and gets a single response back
server streaming - the client sends a single message to the server, and gets a stream of messages back
bi-directional streaming - the client sends a stream of messages to the server, and the serve sends an independent stream of messages back

In all these cases the client has to initiate the request - the server cannot invoke functions on the client.

We’ll almost certainly want to add more later, but for now let’s define a service that accepts a stream of events from the robot and sends back a stream of actions for the robot to take.

With the gRPC extensions to protobuf, this service looks like this:

  
service TheSocialRobot {
    rpc EventStream(stream BodyEvent) returns (stream BodyCommand) {}
}

The function is called “EventStream”. Its parameters are defined in the brackets after the name and the response, if any, in the brackets after return. In this case the parameter is a stream of BodyEvent messages from the client and the response is a stream of BodyCommand messages from the server.

We’ll figure out the format for body events later, so for now, they just contain a message ID:

  
// event from robot (the body) containing current state
message BodyEvent {
    int32 id = 1;
}

Each field in a protobuf message has an integer ID, typically these are incremented by 1 for each field. BodyEvent has a single field, called “id” with an identifier of 1.

The commands from the brain are a little more complex. We can imagine that the brain might want the robot to perform a number of actions - for example waving its arm and saying “hello” - so we want a BodyCommand to contain a sequence of actions and a way to indicate which, if any, should occur at the same time. The simplest way to do this feels like including a delay with each action with zero meaning “do this immediately” and a non-zero positive integer meaning “wait this number of milliseconds”. Actions that should happen at the same time will have the same delay.

We’re not going to try and define all possible actions right now, so we’ll start with a single action “say” so we can say “Hello World”.

  
message Say {
    string text = 1;
}

message Action {
    // delay in milliseconds before triggering the action
    int32 delay = 1;
    oneof action {
        Say say = 2;
    }
}

// message from brain to body, instructing the body do take one or more actions
message BodyCommand {
    int32 id = 1;
    repeated Action actions = 2;
}

We need a bit of boilerplate to indicate what version of protocol buffers we are using so the complete code looks like this:

  
syntax = "proto3";

option go_package = "github.com/TheSocialRobot/BrainCore/thesocialrobot";

package thesocialrobot;

service TheSocialRobot {
    rpc EventStream(stream BodyEvent) returns (stream BodyCommand) {}
}

// event from robot (the body) containing current state
message BodyEvent {
    int32 id = 1;
}

message Say {
    string text = 1;
}

message Action {
    // delay in milliseconds before triggering the action
    // could use the Duration type here, but don't think we need nanosecond
    // precision and the second/nanosecond split complicates things
    int32 delay = 1;
    oneof action {
        Say say = 2;
    }
}

// message from brain to body, instructing the body do take one or more actions
message BodyCommand {
    int32 id = 1;
    repeated Action actions = 2;
}

Writing the server

Defining the content of the messages exchanged between body (client) and brain (server) is all very well, but we need code to actually make anything happen. For the server, I’ve chosen to use go (at least initially) due to its performance, relatively simplicity, easy of deployment, community support and because it is one of the supported languages for gRPC.

If you’re not already set up with go and the protobuf compiler, take a look at: go install, protoc installation and gRPC quickstart. For me it was something like this (I already had go set up):

  
# get the protobuf compiler, protoc, for Linux
wget https://github.com/protocolbuffers/protobuf/releases/download/v21.5/protoc-21.5-linux-x86_64.zip

# unpack the zip file and move the protoc command to a convenient directory in your path
mkdir tmp-protoc ;  cd tmp-protoc
unzip protoc-21.5-linux-x86_64.zip
mv bin/proto ~/.local/bin 
cd .. ; rm -rf tmp-protoc

# install gRPC plugins for go
go install google.golang.org/protobuf/cmd/protoc-gen-go@v1.28
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@v1.2

Once you’re setup with go and gRPC then you can generate the go code for the service described the protobuf configuration in thesocialrobot/thesocialrobot.proto as follows:

  
protoc --go_out=. --go_opt=paths=source_relative --go-grpc_out=. --go-grpc_opt=paths=source_relative thesocialrobot/thesocialrobot.proto

This will generate the following files:

thesocialrobot.pb.go - go code to serialise/de-serialise the messages defined in the .proto file.
thesocialrobot_grpc.pb.go - go client stub code and gRPC server code for the service defined in the .proto file.

In go we need to import the generated files:

  
import pb "github.com/TheSocialRobot/BrainCore/thesocialrobot"

The generated go code defines an interface that we need to implement in our server. In thesocialrobot_grpc.pb.go:

  
// TheSocialRobotServer is the server API for TheSocialRobot service.
// All implementations must embed UnimplementedTheSocialRobotServer
// for forward compatibility
type TheSocialRobotServer interface {
    EventStream(TheSocialRobot_EventStreamServer) error
    mustEmbedUnimplementedTheSocialRobotServer()
}

We’ve only defined one method in our service, so we just need to implement EventStream as well as embedding the UnimplementedTheSocialRobotServer interface to allow for forward compatibility.

Since go does not require that we explicitly state that we’re implementing an interface we just need to define a struct and implement the EventStream method on it

  
type theSocialRobotServer struct {
    pb.UnimplementedTheSocialRobotServer
}

func (s *theSocialRobotServer) EventStream(stream pb.TheSocialRobot_EventStreamServer) error {
    // TBD
}

TheSocialRobot_EventStreamServer is a generated type that provides access to the stream of messages from the client, using stream.Recv(), and allows us to send messages from the server, using stream.Send().

In practice, we’d want to process incoming and outgoing streams independently, but for this “Hello World” example I’m just receiving a message from the client and sending a message back telling the client to say “Hello World” without any delay.

We first receive a message and check for errors (no exceptions in go):

  
_, err := stream.Recv()
if err == io.EOF {
    return nil
}
if err != nil {
    return err
}

We don’t care about the contents of the message at the moment, so we ignore it with _.

We then need to construct a BodyCommand instance containing a single embedded Say action:

  
command := &pb.BodyCommand{
    Id:      1,
    Actions:  []*pb.Action{{Delay: 0, Action: &pb.Action_Say{Say: &pb.Say{Text: "Hello World"}}}}, 
}

and then send it to the client:

  
stream.Send(command)

The complete function looks like this:

  
func (s *theSocialRobotServer) EventStream(stream pb.TheSocialRobot_EventStreamServer) error {
    for {
    // TODO handle events from the client
        _, err := stream.Recv()
        if err == io.EOF {
            return nil
        }
        if err != nil {
            return err
        }

        log.Printf("Received event, sending one command")
        // respond with a single command
        // TODO eventually we'll decouple receiving events from sending commands
        command := &pb.BodyCommand{
           Id:      1,
           Actions:  []*pb.Action{{Delay: 0, Action: &pb.Action_Say{Say: &pb.Say{Text: "Hello World"}}}}, 
        }
        stream.Send(command)
    }
}

We now need a way to wire this implementation of EventStream to incoming network requests. First we need to open a socket to listen for connections:

  
port := 50051 // we'll implement command line arguments leter
lis, err := net.Listen("tcp", fmt.Sprintf("localhost:%d", *port))
if err != nil {
    log.Fatalf("failed to listen: %v", err)
}

We now need to tell gRPC start a server

  
var opts []grpc.ServerOption
grpcServer := grpc.NewServer(opts...)

We then register our service with the server:

  
pb.RegisterTheSocialRobotServer(grpcServer, new(theSocialRobotServer))

Finally, we give the socket to the gRPC server:

  
grpcServer.Serve(lis)

Writing the client

The code running on the NAO and Alpha 2 robots will not be go. We’ll use C++ for NAO in order to integrate with the NAOqi SDK. For Alpha 2 we’ll use java or kotlin since Alpha 2 is android based. However, for testing purposes we’ll create a client in go which I’ve called “fake body”.

protoc created a client stub for us in addition to the server code - in this case an interface called TheSocialRobotClient with one method EventStream.

  
type TheSocialRobotClient interface {
    EventStream(ctx context.Context, opts ...grpc.CallOption) (TheSocialRobot_EventStreamClient, error)
}

Calling EventStream on this stub will result in the corresponding handler on the server being called.

Before we can use this client interface we need to create a connection to the server and create an instance of the client.

  
var opts []grpc.DialOption
serverAddr := "localhost:50051"
conn, err := grpc.Dial(*serverAddr, opts...)
if err != nil {
    log.Fatalf("fail to dial: %v", err)
}
defer conn.Close()
client := pb.NewTheSocialRobotClient(conn)

Now we have a client instance, we can invoke the EventStream method and get back an interface we can use to stream messages to the server and receive messages back.

  
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
stream, err := client.EventStream(ctx)
if err != nil {
    log.Fatalf("client.EventStream failed: %v", err)
}

In the same way as in the server code we can call stream.Send() to send a message and stream.Recv() to receive a message. Receiving io.EOF as an error means that the other end of the connection has closed the stream - there are no more messages to read.

  
in, err := stream.Recv()
if err == io.EOF {
    close(waitc)
    return
}
if err != nil {
    log.Fatalf("client.EventStream failed: %v", err)
}

Since a BodyCommand from the server can contain multiple actions of different types, we use a type switch to determine what we need to do.

  
log.Printf("Got message %d", in.Id)
for _, action := range in.Actions {
    switch op := action.Action.(type) {
    case *thesocialrobot.Action_Say:
        log.Printf("delay %d, say %s", action.Delay, op.Say.Text)
    }
}

Since the simple server above only responds when it receives a message, we now need to give it a kick by sending an event. For now we send only a single event and close the stream.

  
event := &pb.BodyEvent{Id: 1}
if err := stream.Send(event); err != nil {
    log.Fatalf("client.EventStream: stream.Send(%v) failed: %v", event, err)
}
stream.CloseSend()

In the pull request I’ve packaged the interesting client code into a function that sends a message and uses a channel to wait for a response before completing.

  
func runEventStream(client pb.TheSocialRobotClient) {
    ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
    defer cancel()
    stream, err := client.EventStream(ctx)
    if err != nil {
        log.Fatalf("client.EventStream failed: %v", err)
    }
    waitc := make(chan struct{})
    go func() {
        for {
            in, err := stream.Recv()
            if err == io.EOF {
                close(waitc)
                return
            }
            if err != nil {
                log.Fatalf("client.EventStream failed: %v", err)
            }
            log.Printf("Got message %d", in.Id)
            for _, action := range in.Actions {
                switch op := action.Action.(type) {
                case *thesocialrobot.Action_Say:
                    log.Printf("delay %d, say %s", action.Delay, op.Say.Text)
                }
            }
        }
    }()
    event := &pb.BodyEvent{Id: 1}
    if err := stream.Send(event); err != nil {
        log.Fatalf("client.EventStream: stream.Send(%v) failed: %v", event, err)
    }
    stream.CloseSend()
    <-waitc
}

Running the code

This will hardly be the most exciting demo. Start the server in one terminal:

go run core/main.go

and the client (fake body) in another terminal:

go run fake-body/main.go

Output: server

2022/09/20 17:17:41 Received event, sending one command

Output: client

2022/09/20 17:17:41 Got message 1
2022/09/20 17:17:41 delay 0, say Hello World

Conclusion

We’ve taken a first step, but there is still a long way to go! Conspicuously absent is any form of encrypted transport to protect the privacy of client/server traffic (although the PR does have some code copied from the gRPC route guide example to configure TLS) or authentication. We’ll fix that soon. We also need to set up CI/CD so our code gets built and tested automatically.

Next steps

Authentication, encryption, tests and CI/CD are all important, but this is “The Social Robot” so it’s about time we got some code running on an actual robot and that’s what we’ll do next.

How to use bi-directional gRPC in go

What is gRPC?

How to define a bi-directional service

Writing the server

Writing the client

Running the code

Conclusion

Next steps

Further Reading

I've been doing the social robot wrong

Empowering the Bedridden: How Robots Are Changing Lives by Enabling Work and Social Engagement

Bridging the Digital Divide: Can Humans and AIs Forge True Friendships?