This is the third post in a series about Chive, a decentralized eprint service on AT Protocol. The first post covers what Chive is, and the second covers the Chive knowledge graph. Future posts will cover open review and discovery and citations.
This post describes Chive v0.2.0. Details may change as the project develops.
Scholarly collections
Researchers already organize research on a variety of platforms: for example, Zotero and Mendeley manage references and reading lists; and Semantic Scholar offers personalized research feeds and library features. These tools are really good at what they do.
Some of these tools even have social and discovery features: Zotero groups can share libraries; Semantic Scholar alerts you to new citations; and Mendeley suggests related papers. But the organizational work you put into one tool stays in that tool's database. If you switch from Mendeley to Zotero, you export and re-import, losing metadata along the way. If a collaborator uses a different tool, your shared list becomes a copy that drifts. Each tool is a silo that happens to hold some of the same papers and may or may not interoperate with other tools.
AT Protocol changes the storage model. Instead of each tool maintaining its own copy of your library, collections are records in your PDS, the personal data server you control. Any tool in the ecosystem can read the same records through your identity, so there's one source of truth rather than parallel copies.
Semble () already demonstrates this in the context of scholarly collections: it lets researchers curate and share knowledge collections as portable AT Protocol records. Chive's collections extend this idea by connecting to the knowledge graph described in the previous post. Further, Chive aims to support interoperability: Chive collections can also be mirrored as Semble collections (though we're still working out a few kinks), so your curated research is visible on both platforms through the same identity.
What goes in a collection
A Chive collection is exactly the same sort of thing as the Chive knowledge graph, with the exception that the collection itself is additionally reified as a node related to all other nodes in the graph by a `contains` edge. Because it use the same machinery, it can hold eprints, authors, reviews, endorsements, or even arbitrary Chive knowledge graph nodes and external references. External references let you include resources outside of Chive alongside items that live on the platform. Each item can have a note explaining why it's included and how it should be ordered on display.
Because collection items are backed by knowledge graph nodes, they carry metadata: an author item has an identity and handle, an eprint item has a title and author list. When you browse a collection, you see resolved data, rather than bare URIs.
You can also define relationships between items within a collection. If one paper cites another, or one depends on a dataset that's also in the collection, you can record that relationship. A reader sees the connections between items, not just the items themselves.
Collections can be listed or unlisted. Listed collections show up in discovery, search results, and tag filtering. Unlisted collections are only accessible if the owner shares the link. (Though note that, since all collections are AT Protocol records, they're technically accessible to anyone who resolves the URI directly. The visibility setting controls what Chive surfaces, not what exists.)
Building a collection
Chive builds collections using a wizard. The creation wizard walks through the process in steps: set a name, description, visibility, and tags; add items by searching for eprints, authors, or knowledge graph nodes; define inter-item relationships; organize items into subcollections; optionally enable Semble mirroring; then review and save.
You can also add items to existing collections from anywhere on the platform. Each eprint and author page has an "add to collection" action that shows your existing collections and lets you pick one. For importing in bulk, a batch import dialog accepts a list of DOIs or AT-URIs, resolves them (using Crossref for DOIs), and adds the resolved items to the collection.
Subcollections
Collections can contain other collections. A Probabilistic Dynamic Semantics collection could be a subcollection of Computational Semantics, for example. You can organize a literature review by theme or a conference proceedings by track. The service has cycle detection to prevent loops. If you delete an intermediate collection, its children are automatically re-linked to the grandparent so the hierarchy stays intact.
Graph clone wizard
As mentioned in the post on the Chive knowledge graph, in addition to creating your own collections, Chive makes it possible to clone subgraphs of the community graph and link back to the community originals. Personal nodes and cloned community nodes can furthermore live within the same personal graph.
We've tried to make cloning ergonomic, so you don't have to do a ton of tedious clicking. The idea is that you start from any node in the community knowledge graph—a field like Formal Semantics, an institution via ROR, or a person—and expand outward. You control the expansion depth and which edge types to follow. From the results, you select the nodes you want, add per-node notes, name the collection, and save. The selected nodes are cloned into your personal graph with links back to the community originals, and relationships between them carry over.
Collection feeds
The part we're most interested in feedback on is collection feeds. Every collection has a live activity stream that tracks events across its items when particular events occur:
A tracked author published a new eprint
A new review on a tracked eprint
A new endorsement on a tracked eprint
A new annotation on a tracked eprint
A tracked author wrote a review
A tracked author gave an endorsement
A new eprint in a tracked field
A new eprint from a tracked institution
A new paper at a tracked conference
A new eprint referencing a tracked person
These feeds follow from the knowledge graph structure described in the previous post. Fields, institutions, and conferences are all nodes with typed edges between them, and collection feeds propagate events along those edges. When a new eprint is classified under a tracked field, or a tracked author publishes new work, the event reaches the collection through relationships that already exist in the graph.
The feed data comes from the same firehose that Chive already indexes. There's no separate polling or scraping. The intent is that a collection works more like a live dashboard for a research area than a static list.
Semble mirroring
As mentioned earlier, one of our major aims with collections is to support interoperability, so we're testing out a Semble mirroring feature. When mirroring is enabled in the creation wizard, Chive publishes Semble-compatible records alongside the collection records in your PDS. The interoperability works similarly to the backlink system described in the first post.
Data portability
As always, an important part of all of this is that, if Chive disappears, your collections are still in your PDS. The items, relationships, subcollection hierarchy, and personal graph nodes are all AT Protocol records you own. Another AppView could index them and reconstruct the same view. This is the same data sovereignty model that applies to all AT Protocol data.
Collections are very much an early feature. We'd appreciate feedback from anyone who tries them out!
In this series: What Chive is · The knowledge graph · Collections · Open review · Discovery and citations
Technical deep dives: XRPC adapter · Lexicon namespace · Rich text · Firehose · Storage · Knowledge graph schema · Review system · Citations · Discovery · Plugins · Auth · Observability