Hi! I’m Ethan (@eagraf.dev) - this is my first blog post on ATProto, and also my first for Habitat. I’ve been a quiet follower of all things ATProto for the last couple years, and I spend a lot of my time thinking about interoperability on the internet. See ya on the ATmoshpere
👋
Attention in the Atmosphere is moving towards permissioned data. But so far, much of the discussion has been theoretical. At Habitat, we wanted to get our hands dirty and start building a permissioned data app today.
In this two-part post, we will present our approach, challenges, and findings for building a simple web application that requires permissioned data: a personal calendar app. A calendar app is the perfect example for the low-social use case: it clearly requires permissioned data, and has just enough inter-user interaction to generate surprisingly complex scenarios that we can use to validate our thinking.
Importantly, while building out this app and proofing out our design, we examined both protocol and application components of this puzzle. In this first post, we will start with a strong claim: permissioned data shared via AppViews is not truly private. We believe that an option for a higher bar of private data is a necessary component–and one that will unlock new possibilities for the entire ecosystem.
Bad Actors, Bad UX
In Daniel Holmgren’s first post in his permissioned data diaries, he asserts that “Apps need to see the data.” This is a significant statement, and one that we’re going to continue to challenge through both our calendar series, and our (sneak peek!) other prototypes. We hope to convince you that this is not the case for all apps, or even the majority of apps that can be built on the web.
When Daniel says that “apps need to see the data”, he is specifically referring to the component of the ATProtocol stack known as an AppView. In the public data model, these are backend servers that can aggregate and index the data that lives on peoples’ PDS, by consuming it all through the Firehose and creating their own “copy” of all that data, in whatever format desired. Because the data is public anyway, AppViews can do arbitrary things with the data, and actually need to in order to provide certain types of user experiences, such as a recommendation feed which can pull posts from anyone on a platform.
What we are challenging is whether this should be allowed for permissioned data. We believe this falls apart quickly when we dive into non-social-media-network cases, but let’s explore an example specifically for the app we are building, a calendar. For this and future posts, we are going to refer to a user’s PDS and a user’s devices as “first-party”, while AppViews and other service providers are “third-parties”, as colloquially understood by the industry.
When it comes to permissioned data, the elephant in the room for AppViews is what to do about bad actors. Imagine three friends each use a different calendar provider, with each implementing its own AppView server:
Alice uses GoodCal
Bob uses LeakyCal
Charlie uses EvilCal
Doris uses NewCal
Each friend owns all their own data: invites for events Alice created live on her PDS; RSVPs to that event by Bob and Charlie live on theirs. However, since GoodCal, LeakyCal, EvilCal, and NewCal are AppViews that serve the calendar to one of the four friends, they must each be authorized to read all of the records involved. That means that Alice, Bob, Charlie, and Doris have now copied their data to three different third parties.
Now, we are exposed to several failure modes:
LeakyCal is benevolent but buggy. It implements a buggy version of
getCalendarEvents, which causes unrelated events to be returned to Bob when they use it.EvilCal exfiltrates the event data it gathers to a data broker. A close read of EvilCal’s terms and conditions will reveal that the user consented to this arrangement.
NewCal is a new calendar provider on the scene - Alice has never heard of it before, and is reluctant to authorize her data to be indexed by NewCal’s servers. After her bad experiences with authorizing LeakyCal and EvilCal, she has lost trust in other third parties.
Bad actors have always existed, but their threat grows with the AppView based model of permissioned data. Now your privacy is only as good as the weakest link. For a hacker interested in your data, the new vector of attack is to create a compromised AppView, and then socially engineer some online interaction that requires authorizing it. Even if ways of mitigating this are found, users will now need to be educated on this new type of attack.
Beyond privacy, there are UX issues to consider as well. Imagine Alice, Bob, Charlie and Doris will not budge on their choice of calendar. In order for each of their calendar apps to work, all will need to authorize each others’ calendar AppViews to see their data. As more AppViews are added, the number of authorization steps needed increases exponentially. New entrants into the calendar market will face resistance as users learn to be distrustful of unrecognized providers. This outcome is at odds with ATProtocol’s mission to be open to new apps wishing to join the network.
Not to mention, the average end user is unaccustomed to this level of friction just to create a calendar event. Imagine seeing multiple authorization popups just to send out a calendar event. (That is, if we do agree to show consent screens to the user before allowing a third-party to aggregate their permission-ed data.)
These problems are structural. We can invent ways of working around these, but at best the reach of these issues can be limited - never eliminated. As long as third party app views are in the mix with permission-ed data, we will need to find ways to compensate for the new attack vectors and UX challenges they introduce.
PDS as the database?
Now we’ve seen that AppViews expose both privacy risk and unavoidable complexity to the user. Can we ignore them entirely? Well, if AppViews don’t exist, all requests for data must go directly to the PDS. In essence, the PDS will be our database.
Making the PDS the primary database has some interesting advantages, even though it has none of the traditional features you’d expect like indexes, transactions and sharding. But if we remember that many personal computing applications don’t store that much data per user, then it becomes significantly simpler to just run listRecords on the collections we care about, and do all the joining on the client side.
We will dive deeper into this technique, which we call Fetch-the-World (FTW!), in our next post. For now the basic idea is that we can query the different collections we need as flat lists, and then link together any references on the client side. For many cases, fetching literally all the records in a collection might be fine, but if we are worried about large collections, we can always make use of cursors or filter by time to reduce the payload on page load or specific fetches.
Let’s run some napkin math for a few personal app use-cases to see how much data we are talking about over a 10 year period:
These calculations are obviously making a lot of assumptions, but we can see that if we ignore blobs, the footprint of your digital life isn’t all that big. It’s conceivable to just call listRecords for the last year of data, and then stitch it back together from there. The full power of a database is not necessary for these cases.
Conventional database systems have some killer features that developers might miss. Being able to do complex joins is great, but it isn’t so important if you are able to fetch-the-world. Not having transactions is a real problem, but many apps do just fine without them. NoSQL databases offer scalability - but there is literally nothing to scale out if the AppView is not in play.
There is a hidden opportunity here as well. Most databases in the wild are focused on one domain, and centralize all user data for that domain in one place. This model makes applications great at going deep in their own area, but terrible at integrating with systems that exist in others. ATProtocol by its nature flips this on its head: the PDS pulls together one user’s data across many domains. Building cross-domain apps for personal use is a perfect fit - all of a user's data across every domain is already right next to each other! Brittle integrations that used to be a pain to build are now effortless.
Additionally, code is getting easier to write. The next wave of applications will include many built by people without formal education or experience in things like security and databases. In a world where building apps is easier than ever, the underlying architecture needs to provide clean abstractions and strict guardrails. A fetch-the-world frontend-only app, with permissions enforced by the PDS offloads multiple tricky topics from the app developer:
Understanding the nuts and bolts of querying a database
Spinning up and maintaining (cloud) infrastructure
Enforcing permissions
As my 4th grade teacher used to say, Keep It Simple Stupid.
Moving Towards Local-first
You may have noticed that our main example, a calendar, is not strictly limited to one user's personal data. A calendar necessarily displays data from many users. If we are removing AppViews from the mix, we need some way of fetching data from another user’s PDS. Our solution is to implement metadata forwarding from one PDS to another. As this mechanism evolved, we realized a system for syncing permissions between nodes would be necessary. This is a hard problem, so you will have to wait until part two to find out more about how we solve. All this is to say that we think the PDS has some sort of PDS-to-PDS sync mechanism in its future.
At the same time, we are spending a lot of time talking about how to best go about making the client application heavier. We are explicitly considering moving data processing operations usually reserved for the database, such as joins and filters, into the client. Furthermore, all business logic now needs to be handled by the client. Is there a frontend framework for dealing with these increased responsibilities?
It just so happens that another movement has been taking root, one that both centers the client application and revolves around algorithms for syncing data: local-first software.
For a long time, the ATmosphere and the local-first communities have existed as friendly neighbors - sharing ideals, but differing in approach. Several projects have already begun to mix these approaches. We believe that as permissioned data leads us into new territory, ATProtocol + local-first could deliver best-in-class applications from both a user experience and privacy standpoint.
The return of user agency
So far we’ve talked about how eliminating AppViews has advantages for both user and developer experience. But, ultimately, the reason we are compelled to go down this route is values based. We are building a platform for user data agency, and allowing third parties to see large swaths of user data creates misaligned incentives.
We’ve imagined a new setup where the two primary pieces of software involved are the client application and the PDS. The client runs on a machine the user controls, and the user has complete freedom of choice with regards to which client they use.The user’s PDS is likely still hosted on a PDS provider’s servers (though of course the option to self-host is there!), but the user now maintains granular control over their data in a way they never could before – the permissioned data PDS provider is the only third-party the user needs to trust, and they have an explicit incentive for protecting user data privacy. Between credible exit, the colocation of all user data in one place, and a permissioning system that gives users the steering wheel, the end consumer is re-empowered.
Third party AppViews are still necessary in some situations. We don’t present a solution for aggregating user data (needed for feeds, search indexes, etc). Nor is this a solution for applications where heavy amounts of compute are required. But the loss of control for the user here is a real trade-off, and we should be deliberate in determining when they are truly necessary. At this stage of the ecosystem, it seems possible that we will need different systems for public data, aggregatable / third-party shareable permissioned data, and truly permissioned data as we laid out in this post. We think to count the latter out now would be a mistake.
Conclusion
The fun part about building in the ATmosphere is that you get to spend a lot of time rethinking old assumptions. And with each outdated tenet knocked down, new implications follow. These lines of thought become a labyrinth, a kind of choose your own adventure book that we are navigating as a group.
ATProto knocks down the assumption that companies own your data. The implication that follows is that the primary way data is sliced is not by company, but by user. We went ahead and tried to feel our way through the next couple turns in the labyrinth, and we came to some conclusions that we did not expect when we started. So what are the other paths we could take? Some trade offs we have already listed, but there are many more that were omitted.
Stay tuned for part two, where the metal will hit the road. We will explore what it will take to get a real life calendar app working with permissioned data, informed by the constraints we laid out in this post.
- Ethan + Arushi 🙊