This is the second post in a series about Chive, a decentralized eprint service on AT Protocol. The first post covers the architecture. Future posts will cover collections, open review, and discovery and citations. You can follow the project on Bluesky at .
This post describes Chive v0.1.0. Details may change as the project develops.
What knowledge graphs are
A knowledge graph stores entities and the relationships between them as nodes and edges. Wikidata is probably the most widely known, with over 100 million items covering people, places, and scientific concepts, all queryable via SPARQL. But there are a variety of others: ConceptNet is a commonsense knowledge graph used in NLP; Google's Knowledge Graph powers the info panels in search results; and in the library sciences, FAST (Faceted Application of Subject Terminology) and the Library of Congress Subject Headings have been organizing knowledge into structured vocabularies for decades.
What these have in common is that entities are typed, labeled, and connected by named relations. Natural Language Processing is a node; Machine Translation is a node; and subclass of is a relation connecting them. You can traverse the graph to discover that Natural Language Processing is related to Artificial Intelligence, which is related to Computer Science, and so on.
Chive has its own knowledge graph, stored in Neo4j and backed by AT Protocol records.
The problem with hardcoded categories
The rationale for Chive's knowledge graph is to provide more flexibility for organic growth. Most academic platforms hardcode their categories: you get a fixed dropdown of fields, a static list of licenses, or a predetermined set of contribution types. If your field isn't in the list, or if a new license emerges, someone has to update the schema, push a new release, and deploy it. In practice, categories lag behind the research they're supposed to organize, and researchers in emerging or interdisciplinary areas are left choosing the least-wrong option from a list someone else defined.
Chive takes a different approach: effectively everything that would normally be a hardcoded enum is a node in the knowledge graph.
What's in the graph
When you pick a license for your eprint, you're selecting a graph node. The same is true for fields, institutions, conferences, contribution types, document formats, repository platforms (GitHub, GitLab, Hugging Face, Zenodo, Figshare), annotation motivations, and presentation types. Each has a kind and subkind that determines its category, and all of them are AT Protocol records.
Graph nodes also show up inside rich text, and this is one of the most important interfaces to the knowledge graph. Chive's rich text model has reference types for knowledge graph nodes, Wikidata entities, academic fields, eprints, and authors. An abstract, review, or annotation can embed a reference to a knowledge graph entity inline. When a reviewer mentions a methodology in their comments, or an author references a specific dataset in their abstract, those mentions become queryable links in the graph. The consequence is that everyday scholarly writing, across abstracts, reviews, and annotations, continuously enriches the graph with typed connections that no centralized editorial process could produce at the same scale. We describe how this works for reviews in a later post, and the technical details are in a separate deep dive.
Relations between nodes are typed edges. So for instance, an edge connects Formal Semantics to its parent field, Semantics, via a broader relation. The full taxonomy is traversable: you can walk from a specific topic up to its parent field and across to related areas.
Self-describing categories
The system is fully self-describing. Every category–field, license, document-format, paper-type, contribution-type, platform, motivation, and so on–is itself a node; and so adding a new category of thing to the system means adding a new node.
This is what makes the knowledge graph different from a conventional taxonomy. If a linguistics subfield splits into two camps with different terminological conventions, the community can add both without waiting for a platform update. If a new open-source license gains traction, it becomes a node. The vocabulary is data that the community controls, not code that the development team maintains.
Community governance
The set of categories is community-expandable. If you need a new field or a new contribution category, you publish a proposal record in your own PDS, which is then indexed on Chive. The proposal includes the proposed node with its label, description, and any external identifiers like Wikidata QIDs, along with a rationale and supporting evidence.
The governance model is Wikipedia-style. The community discusses the proposal, trusted editors weigh in, and if it's approved by vote it becomes a new node in the taxonomy. Proposals track approval percentage, voter count, minimum vote thresholds, and consensus status. Trusted editor status is itself managed through AT Protocol records, with an elevation request process for researchers who want editorial responsibilities based on contribution record.
The entire taxonomy lives in a dedicated graph PDS, which means the classification system can be rebuilt by replaying the firehose, just like any other AT Protocol data. Nothing about the knowledge graph depends on Chive's database being intact.
Personal graphs
This setup still involves some centralized PDS maintained by the Chive team that could go down. To address this risk, Chive makes it straightforward for users to clone community nodes into personal graphs stored in their own PDSes. A personal copy of Formal Semantics links back to the community node but lives in the user's PDS. In addition to cloning community nodes, personal nodes for collaborators, fields, and institutions can all be organized into collections, which we cover in a subsequent post on the way that Chive handles collections. (To spoil the surprise, they are graphs conforming to exactly the same schema as the community graph.)
Integration with external knowledge graphs
Chive's graph links out to established identifier systems (when available): fields link to Wikidata QIDs, so “Computational Linguistics” connects to its Wikidata entity and everything Wikidata knows about it; institutions link to ROR identifiers; licenses link to SPDX identifiers; subject headings link to FAST and Library of Congress terms. When Wikidata improves a concept's relationships, Chive benefits from those improvements automatically.
Classification System
The classification system is inspired by SKOS (Simple Knowledge Organization System) and FAST from library science. It uses faceted classification, so you can combine dimensions (what something is about, who is involved, where and when) rather than forcing everything into a single hierarchy. This is how library subject headings have worked for a long time. We're applying the same idea to AT Protocol records. One difference is that the facets themselves are expandable by the community, since like almost everything else in Chive, facets (and not just their values) are nodes in the graph.
Why this matters
The practical consequence is that the system evolves at the speed the community needs without bottlenecking on the development team. If the community decides mechanistic interpretability deserves its own field node, it can be added through the governance process. The schema never changes because the enums are data, not code. And because the data lives in AT Protocol records rather than in a proprietary database, the entire classification system is portable: if a better indexer comes along, it can read the same governance records and reconstruct the same taxonomy.
In this series: What Chive is · The knowledge graph · Collections · Open review · Discovery and citations
Technical deep dives: XRPC adapter · Lexicon namespace · Rich text · Firehose · Storage · Knowledge graph schema · Review system · Citations · Discovery · Plugins · Auth · Observability