A few things I wish ATProto had done a bit differently

Bluesky's AT Protocol is well worth studying, for nerds like me at least. There are some interesting pieces of technology in there, and I think they've taken a valuable step towards building the kind of "decentralised social web", or whatever you want to call it, that I'd like to see.

There are a few places, though, where I wish they'd taken things just a bit further.

I can't speak for Bluesky and I don't know anything about why they made the decisions they did, but there are certainly some technical caveats to all this, and I can see some sound reasons why they might have done things the way they did, so I'll take a look at those too.

Update

Bryan Newbold of Bluesky has responded to this article and offered some insights into the rationale behind their decisions.

The did:plc identifier: a single point of failure

Much has been written about this, not all of it rational, and I'm not going to rehash that discussion. I'd just like to add that it seems to be nearly possible to build a fully decentralised did:plc variant today.

The operation log format is self-certifying and contains enough information for the updates to be put into order from the records alone, regardless of what order they were received in; replicating them between servers presents a few practical problems, as anything does, but they're not insurmountable. The official did:plc server even contains an endpoint for retrieving the entire operation log, and people have used this to build their own mirrors of the service.

There's a need for a client to decide which replica it should query, or which replica it should submit updates to. A simple way to do this would be to replicate the entire did:plc namespace onto every participating server; clients can send queries or updates to any server they choose, perhaps the one with the lowest latency, or just the one they find the most trustworthy. Another approach might be to use consistent hashing; use the hash of the did:plc identifier together with the hashes of the server hostnames to assign a set of 'preferred' servers to each individual DID.

That part seems solvable, though. The two further sticking points I can see here are:

Read-after-write consistency

From the Motivation section (emphasis mine):

Bluesky Social PBC developed DID PLC when designing the AT Protocol (atproto) because we were not satisfied with any of the existing DID methods. We wanted a strongly consistent, highly available, recoverable, and cryptographically secure method with fast and cheap propagation of updates.

It's not often necessary to update the contents of a did:plc DID document, but when it is, it's desirable that a subsequent read should return the updated version of the document, not the old version prior to the write. An important example of this is the signup process: when somebody joins Bluesky for the first time, a new did:plc identifier is created for them, and then they continue with the process - filling in their profile information or whatever - which requires the new identifier to be resolvable.

With a single server, this is easy: once the write has completed, a subsequent read will return the new identifier. In a distributed system, this is harder; what happens if the read ends up being handled by a server which hasn't received the update yet?

The consistent hashing approach I mentioned above can solve this at the expense of performance. If the write has succeeded on all of the 'preferred' servers then a read can be issued to any of them; alternatively, if the write has succeeded on one of the 'preferred' servers, then a read from all of them will return at least one valid result. (At least, assuming that the set of 'preferred' servers doesn't change between the write and the read.) How long will this take? Well, that's anyone's guess; it could be very variable, depending on the network conditions.

For the signup process in particular, worst-case performance matters, because first impressions count. It's probably OK for most DID document updates to propagate relatively slowly, but the initial creation of a DID really needs to be both fast and robust.

The 72 hour recovery window

From the specification:

The PLC server provides a 72hr window during which a higher authority rotation key can "rewrite" history, clobbering any operations (or chain of operations) signed by a lower-authority rotation key.

Again, this is easy with a single server: just check the time.

In a distributed system, this requires a way for all the relevant servers to agree on whether or not an update was received within 72 hours of the one it overrides. Checking the local time won't cut it, due to unavoidable clock drift between the servers. Including the current time in the write operation won't work either, because this allows an attacker to manipulate it - negating the whole point of the mechanism.

I think this requirement might need to be rephrased or relaxed a bit to build a robust decentralised did:plc variant.

Auth server is tied to the PDS

When somebody logs in to a client application using ATProto, the client needs to identify the authentication server that is responsible for them. This happens in (roughly) three stages:

  • Find their DID document.
  • From the DID document, find their PDS.
  • Query the PDS to find the auth server to use.

This has a couple of unfortunate consequences: firstly, that the presence of an ATProto PDS is required for the login process to work at all, and secondly that all users on that PDS must use the same auth server.

The first consequence is easy to forgive. After all, Bluesky's goal was to build ATProto. Requiring the presence of their own PDS only starts to look problematic when you try to generalise their work to a more diverse network.

As for the second consequence - in my opinion, the system would be a lot more flexible if a user could choose their auth server independently of their PDS, by listing it as a separate service record in their DID document. (This would also have the benefit of saving a round-trip during login; there are quite a few round-trips involved in an ATProto login flow, and most of them can't be made in parallel, which is why ATProto login can often feel kind of slow.)

The caveat here is access token validation. The auth server issues the client application with an access token; the client then presents this token to the PDS to prove that it is authorised to write data on the user's behalf. This requires the PDS to be able to inspect an access token and figure out whether it is valid, and if it is, which user it represents.

Traditionally OAuth doesn't specify how this validation should work; it's left to the auth server and resource server (PDS) to agree on a validation method between themselves. This is easy enough if the auth server and the PDS are tightly bound to one another; in a more "open world" system, this becomes trickier.

Standardising the format of the access token might be one approach, though in my view this would require the RS to implement logic which doesn't necessarily belong there:

  • Which auth server is responsible for this user's account? Was the token issued by that server?
  • What permissions do they have?
  • Have they been blocked from this service?
  • Are there any moderation restrictions, such as posting limits for a newly created account?

Token exchange approaches introduce a second auth server into the system, which seems to me like a better place to make these checks. They're a lot more complex, though, and I'm not at all confident that I understand all of the tradeoffs.

Repository format assumes a single writer

The ATProto repository format is based on a monotonic chain of updates originating from a single network node. Other nodes can verify the chain of signatures and hashes to be sure that they have an authentic copy of the data, but it's not possible for multiple nodes to write concurrently to the shared data store.

This is a shame because the underlying Merkle Search Tree data structure lends itself neatly to the construction of CRDTs: data types which allow for multiple concurrent writers in a robust, efficient and mathematically satisfying way. With a few adjustments to Bluesky's design it ought to be possible to construct a replicated / distributed variant of the PDS. Why choose between the mushrooms and a self-hosted PDS, when you can have both at the same time?

As with the above points, there are caveats. From the Repository Diffs section:

However, an observer which does not know the full state of the repository at the "older" revision can not reliably enumerate all of the records that have been removed from the repository. Such an observer also can not see the previous values of deleted or updated records, either as full values or by CID. Note that the later is an intentional design goal for the diff concept: it is desired that content deletion happen rapidly and not "draw attention" to the content which has been deleted.

So far, the best I've been able to do involves adding a "tombstone" record to identify entries which used to exist but have since been deleted. Retaining the value CID is not necessary, but the key has to remain forever. Tombstones might not be necessary; this may yet be fixable, given sufficient time and coffee.

Also CRDTs are far from a silver bullet. Concurrent updates to single records are certainly possible, but more complex updates will still require some thought to make them robust in the presence of concurrent writers. CRDTs can help, but they won't solve every problem.