Insights from Paper (Part I) - Zanzibar: Google's Consistent, Global Authorization System
Introduction
Online users are required to be authorized to access digital objects. The authorization is done using access control lists known as ACLs.
Zanzibar is a global system for storing and evaluating ACLs built and used inside Google for many services like Google Calendar, Google Cloud, Google Drive, Google Maps, Photos, and YouTube.
Zanzibar takes care of the causal ordering of user actions. And by this way, it provides external consistency while ACLs are being modified and the digital objects associated with those.
Zanzibar scales to trillions of ACLs and millions of authorization requests per second to support services used by billions of people.
Zanzibar has maintained 95th-percentile latency of less than 10 ms. Its availability of more than 99.999% over 3 years of production use.
Let’s start with an example where an online user requires an authorization check to confirm that the user can operate on a digital object.
A web-based photo storage service provides a feature to its user where they can share a photo with another user while other photos are private.
In this case, the service must check whether the photo is shared with the user before viewing it.
In a nutshell, Zanzibar stores permissions and performs authorization checks based on these permissions.
There are multiple benefits of a single authorization system:
It provides consistent semantics and user experience across applications.
The system makes it easier for applications to interoperate.
A common infrastructure can be built on top of this system.
It prevents many teams from solving data consistency and scalability issues.
The Zanzibar has the following goals:
Correctness: It must ensure consistency of access control decisions.
Flexibility: It must support a rich set of access control policies.
Low latency: It must respond quickly.
High availability: It must reliably respond to requests.
Large scale: It needs to protect billions of objects shared by billions of users.
The Zanzibar is built on top of two things:
Simple data model
Powerful configuration language
The Zanzibar allows clients to create, modify, and evaluate ACLs through a remote procedure call (RPC) interface.
Let’s understand what an ACL looks like.
Example 1: “User U has relation R to object O”
Example 2: “A set of users S has relation R to object O”
These examples are self-explanatory. There is one subtle point to note here. The set S needs a definition. And the good thing is that it can be specified in terms of another object-relation pair. It means one ACL can refer to another ACL.
Let’s understand this in bit better way. Group memberships can define a set S. In this class of ACL object is a group, and the relation is a member.
Let us again take an example of what an authorization check looks like. We will come back to this topic in detail.
Example: “Does user U have relation R to object O?”
Zanzibar operates on a global scale. It stores over two trillion ACLs and performs millions of authorization checks per second.
The ACL data is not partitioned.
The Zanzibar replicates all ACL data in tens of geographically distributed data centers and distributes the load across thousands of servers worldwide.
Zanzibar supports global consistency. It has two features.
It respects the order in which ACL changes are committed.
It ensures that authorization checks are based on client-specified changed ACL data.
Let’s take an example to make the second point crystal clear. A client can remove a user from a group. He is assured that subsequent membership checks reflect that removal.
Before we dive deep into the details, let’s look at the below quote from the paper:
The main contributions of this paper lie in conveying the engineering challenges in building and deploying a consistent, world-scale authorization system.
Model, Language, and API
Relation Tuples
ACLs are collections of object-user or object-object relations represented as relation tuples.
Groups are simply ACLs with membership semantics.
Relation tuples can be represented using a convenient text notation as given below:
⟨tuple⟩ ::= ⟨object⟩‘#’⟨relation⟩‘@’⟨user⟩
⟨object⟩ ::= ⟨namespace⟩‘:’⟨object_id⟩
⟨user⟩ ::= ⟨user_id⟩ | ⟨userset⟩
⟨userset⟩ ::= ⟨object⟩‘#’⟨relation⟩
Here user_id is an integer, and object_id is a string. A client configures a namespace. It specifies its relations. We will cover namespace configuration in detail in a section.
The above diagram shows a few relations, like owner, member, and viewer.
Consistency Model
ACL checks must respect the order in which users modify ACLs and object contents. The paper defines a “new enemy” problem. The problem is created when the ordering of ACL updates is not honored.
There are two examples given.
To prevent the “new enemy” problem, Zanzibar provides two key consistency properties: external consistency and snapshot reads
with bounded staleness. Let’s understand them.
External consistency: Zanzibar assigns a timestamp to each ACL or content update. Two causally related updates x ≺ y will be assigned timestamps that reflect the causal order: Tx < Ty. If a snapshot read of the ACL database observes an update x, it will observe all updates that happen causally before x.
Snapshot reads with bounded staleness: The next problem to solve is avoiding applying old ACLs to new contents. Given a content update at timestamp Tc, a snapshot read at timestamp ≥ Tc ensures that the ACL check will observe all ACL updates that happen causally before the content update.
We got the idea that ordering ACL checks would solve the problem. The question is how Zanzibar does implement this on a global scale.
The answer is that Zanzibar uses the Google Spanner database, which provides both guarantees globally. You can read my post on Google Spanner paper here.
I will cover the essence here. Spanner’s TrueTime mechanism assigns
each ACL write a microsecond-resolution timestamp. These timestamps of writes reflect the causal ordering between writes, providing external consistency.
One can think of a solution to always use the latest snapshot to evaluate ACLs. It will reflect all ACL writes. Logically it is the right solution, but this would require global data synchronization with high-latency round trips and limited availability.
Zanzibar designed following protocol to allow most checks to be evaluated on already replicated data with cooperation from Zanzibar clients.
A Zanzibar client requests a token called a zookie for each content version when the content modification is about to be saved. Zanzibar encodes a current global timestamp in the zookie and ensures all prior ACL writes have lower timestamps. The client stores the zookie with the content change in an atomic write to the client storage.
The client sends this zookie in the subsequent ACL check requests to ensure that the check snapshot is at least as fresh as the timestamp for the content version.
Let’s discuss example A to understand better.
ACL updates A1 and A2 will be assigned timestamps with TA1 < TA2.
Bob will not be able to see the new documents added by Charlie: if a check is evaluated at T < TA2, the document ACLs will not include the folder ACL.
If a check is evaluated at T ≥ TA2 > TA1, the check will observe update A1, which removed Bob from the folder ACL.
Namespace Configuration
Each Zanzibar client must configure their namespaces before storing relation tuples. A namespace configuration specifies two things:
Relation ( a string name such as viewer or editor)
Storage parameters ( sharding settings and encoding for object ids)
Relation Configs and Userset Rewrites
Consider a scenario where clients want users with editor permissions should have viewer permission on an object. We need to put one relation tuple for each object. This would be wasteful and hard to make modifications across all such objects.
To solve the above problem, Zanzibar defines object-agnostic relationships via userset rewrite rules in relation configs.
The above simple namespace configuration has concentric relations. The viewer contains the editor, and the editor contains the owner.
Userset rewrite rules are defined per relation in a namespace. Each rule specifies a function that inputs an object ID and outputs a userset expression tree. Each leaf node of the tree can be any of the following:
1. _this: Returns all users from stored relation tuples for the ⟨object#relation⟩ pair.
2. computed_userset: Computes, for the input object, a new userset. For example, this allows the userset expression for a viewer relation to refer to the editor userset on the same object.
3. tuple_to_userset: Computes a tupleset from the input object, fetches relation tuples matching the tupleset, and computes a userset from every fetched relation tuple.
API
Zanzibar provides APIs for clients to read and write relation tuples, watch
tuple updates, and inspect the effective ACLs.
Zanzibar API uses the zookie token we discussed the consistency model. Let’s elaborate on that before proceeding.
A zookie is an opaque byte sequence encoding a globally meaningful timestamp that reflects an ACL write, a client content version, or a read snapshot.
Read
A read request specifies one or multiple tuplesets and an optional zookie.
Each tupleset specifies the keys of a set of relation tuples.
All tuplesets in a read request are processed at a single snapshot.
If the zookie from the write response is given in the read request, Zanzibar read snapshot no earlier than a previous write.
Zanzibar reads the same snapshot as a previous read if the zookie from the earlier read response is given in the subsequent request.
If the request doesn’t contain a zookie, Zanzibar will choose a reasonably recent snapshot.
Write
Zanzibar provides API to modify a single relation tuple to add or remove
an ACL.
Clients can also modify all tuples related to an object. Before we go on that, there is a lock tuple that clients use to detect write races.
The read-modify-write process will be like below:
1. Read all relation tuples of an object, including a per-object lock tuple.
2. Generate the tuples to write or delete. Send the writes to Zanzibar, with
the condition that the writes will be committed only if the lock tuple has not been modified since the read.
3. If the write condition is unmet, go back to step 1.
Watch
Zanzibar provides Watch API so clients can maintain secondary indices of relation tuples if required.
A watch request specifies one or more namespaces and a zookie representing the time to start watching.
A watch response contains all tuple modification events in ascending timestamp order, from the requested start timestamp to a timestamp encoded in a heartbeat zookie included in the watch response.
Check
A check request specifies a userset, represented by ⟨object#relation⟩, a putative user, and a zookie.
A check is always evaluated at a consistent snapshot before the given zookie.
The client can send a content-change check to authorize content modifications. A content-change check request does not carry a zookie and
is evaluated at the latest snapshot.
If a content change is authorized, the check response includes a zookie for clients to store along with object contents and use for subsequent
checks of the content version.
Expand
The Expand API takes an ⟨object#relation⟩ pair and an optional zookie.
It returns the effective userset for the pair.
Expand follows indirect references expressed through userset rewrite rules.
In the next part, I will cover the architecture, Implementation, and rest of the paper.
Reference:
.