Annotation Infrastructure

This is an implementation of Annotations as ‘first class objects’ in dialog with the LLM Claude.ai.

Core Idea

Neither the document nor the graph is authoritative. Instead, annotations are independent knowledge objects stored in a shared ledger. They point to their targets (documents, other annotations, glossary terms) via stable identifiers but are not embedded in or owned by any single system.

This draws on the concept of standoff annotation from computational linguistics, where annotations are stored separately from the text they annotate, connected only by positional references.

The annotation infrastructure unifies the data model between authoring and reading tools. In Author, the user selects text and chooses Define Concept: identifying a term, writing a definition, specifying a category (person, place, concept), and viewing the resulting relationships as a map. In Reader, the user selects text and chooses Annotate: writing a note, specifying a category (Important, Issue, Quote, Claim), and viewing the resulting annotations as a map. Both operations produce knowledge objects in the same ledger, viewable in the same mapped interface—whether rendered as a 2D map on macOS or as a spatial environment in XR.

The core user action is the same in both tools: select text, specify what it is, view it flexibly.

Data Model

All knowledge objects share a common base structure stored as BibTeX-compatible entries in visual-meta format. Every entry has: a unique ID, a document reference, an anchor, a category, content, an author, and a date. Two entry types extend this base.

Reader annotations (@annotation): created when a user selects text in a document they are reading and chooses Annotate.

@annotation{anno-a3f8c,
  target-document   = {doc:visual-meta-id},
  target-position   = {paragraph:3, offset:45-89},
  target-content    = {hash:sha256:abc123, excerpt:the methodology assumes},
  target-structure  = {section:2, paragraph:1},
  category          = {issue},
  category-schema   = {scholarly-default},
  content           = {This contradicts the findings reported in Table 2},
  author            = {user-id},
  date              = {2026-03-06T14:23:00Z},
  tags              = {methodology, contradiction},
  references        = {smith2024-methods}
}

Author definitions (@definition): created when a user selects text in a document they are writing and chooses Define Concept.

@definition{def-c4b2a,
  source-document   = {doc:visual-meta-id},
  source-position   = {paragraph:7, offset:12-35},
  source-content    = {hash:sha256:def456, excerpt:standoff annotation},
  source-structure  = {section:3, paragraph:2},
  term              = {standoff annotation},
  category          = {concept},
  category-schema   = {author-default},
  content           = {An annotation stored separately from the text it
                       annotates, connected only by positional references},
  author            = {user-id},
  date              = {2026-03-04T09:15:00Z},
  related-terms     = {def-a1f3e, def-b8d9c}
}

Field specifications

Unique IDs. Generated as a short hash of author + timestamp + random salt, producing identifiers like anno-a3f8c or def-c4b2a. The prefix (anno-, def-) makes entries visually scannable in the raw ledger. IDs must be globally unique within a user’s ledger and stable across the entry’s lifetime—they are never regenerated.

Document reference. Annotations use target-document; definitions use source-document. The value is the document’s visual-meta ID. The field name differs to express the relationship clearly: a Reader annotation targets a document the user is reading; an Author definition originates from a document the user is writing. When entries are bundled into a shared document, the receiving system uses this distinction to know whether to display them as the author’s own structure or as a reader’s engagement.

Anchoring. Every entry carries three anchor fields for robustness. The field names use the same prefix as the document reference (target- for annotations, source- for definitions):

  • Position (target-position / source-position): paragraph index (counting from 1 in the document’s logical paragraph sequence) and Unicode code point offset within that paragraph. This is the primary anchor: fast to resolve, used first. It is fragile if the document is edited.
  • Content (target-content / source-content): a SHA-256 hash of the selected text plus a short plain-text excerpt (first 8 words or fewer). Used as fallback when the positional anchor fails. Survives reformatting. Does not survive rewording.
  • Structure (target-structure / source-structure): section and paragraph index within the document’s logical outline structure (e.g. section:2, paragraph:1 means the first paragraph of the second section). Used as last-resort fallback. Survives local edits but not structural reorganisation.

Resolution order: position first, then content match, then structural match. If all three fail, the annotation is flagged as unanchored and presented to the user for manual re-anchoring or deletion. The system never silently drops an annotation.

Category. A plain string drawn from the user’s active category schema. The category is stored as a literal value—it does not reference the schema by pointer. If the schema later changes (a category is renamed or removed), existing entries retain their original category string. This means categories are durable and the ledger is self-contained: you never need the schema to read the data, only to drive the UI for creating new entries.

Colour. Not stored in the annotation entry. Colour is a display property determined by the category schema at render time. This avoids the ambiguity of a colour field that might conflict with the category’s assigned colour, and it means the user can change their colour mapping without touching the ledger. The schema maps each category to a default colour; the UI applies it.

Tags. Optional. Free-form strings for user-defined cross-cutting groupings that don’t fit the category system. Tags are what enable queries like “show me everything I tagged ‘methodology’ regardless of category.”

References. Optional. BibTeX keys pointing to citation entries. These may resolve against the target document’s visual-meta (where citations already live as BibTeX), or against a shared bibliography. The resolution order is: target document’s visual-meta first, then any project-level bibliography, then unresolved (flagged for the user).

Date. ISO 8601 format with timezone. Required. Used for sorting, filtering (“annotations from this week”), and conflict resolution during import.

Category Schemas

A category schema is a named set of annotation categories with associated display properties. Schemas are stored as entries in the ledger:

@category-schema{scholarly-default,
  categories  = {important, issue, quote, claim, evidence, method, question},
  colors      = {blue, red, green, purple, orange, teal, amber},
  context     = {scholarly-reading}
}

Each category maps to a colour by position (first category gets first colour, etc.). The UI presents the active schema’s categories in the Annotate and Define Concept dialogs. Schemas are project-level or user-level; the user can switch between them.

Schemas are UI configuration, not data structure. An annotation’s category is a string; the schema tells the UI how to present the category options and what colour to render them. This separation means schemas can change without invalidating existing annotations, and annotations from a foreign schema (received via a bundled document) render correctly—the system simply uses the category string and a default colour if no matching schema entry exists.

How It Works

The user actions

Annotate (in Reader). The user selects text in a document they are reading. They invoke Annotate (via context menu, keyboard shortcut, or toolbar). A dialog appears with two elements: a text field for an optional note, and a category picker showing the active schema’s categories. The user writes a note if they wish, selects a category (or accepts a default), and confirms. The system writes the annotation to the ledger. The selected text is highlighted in the document view using the category’s colour. The document file is not modified.

Define Concept (in Author). The user selects text in a document they are writing. They invoke Define Concept (via context menu, keyboard shortcut, or toolbar). A dialog appears with: a term field (pre-filled with the selected text), a text field for the definition, and a category picker from the authoring schema. The user writes a definition, selects a category, and confirms. The system writes the definition to the ledger. If the term appears elsewhere in the document or relates to other defined terms, relationship lines appear in the document view. The document file is not modified at this point; definitions are written to the document’s visual-meta on save/export.

The views

In-document view. When a document is open, the application queries the ledger for all entries referencing that document (by its visual-meta ID). Annotations are rendered in situ as highlights, margin notes, or underlines depending on the entry’s category. Definitions are rendered as linked terms with optional definition tooltips. The user can filter by category (“show only Issues”), toggle visibility, and click any annotation to see its full content.

Single-document mapped view. The same entries, rendered as a 2D map rather than inline. Each annotation or definition becomes a node positioned in a spatial layout. Nodes are coloured by category and sized by content length (or uniformly, at user preference). Edges connect nodes that share tags, reference the same citation, or are explicitly related (as in definition related-terms). The layout can be: (a) a force-directed graph emphasising clusters of related annotations, (b) a linear layout preserving document order along one axis with category grouping along the other, or (c) a grid grouped by category. The user can switch between these layouts. Clicking a node navigates to the anchored passage in the document. This mapped view is the same component used in Author for viewing defined concepts—the rendering code is shared; only the data source differs.

Cross-document mapped view. The user selects a scope: a project, a set of documents, a tag, or a date range. The system queries the ledger for all matching entries. The map now shows nodes from multiple documents, with document origin indicated by a secondary visual property (border style, icon, or grouping region). Nodes from different documents that share tags, references, or categories become visibly connected. This view answers questions like: “show me all the Claims I’ve marked across this corpus” or “show me everything tagged ‘methodology’ from the last month.”

Author-Reader overlay. When both definitions (from the author’s writing process, bundled in the document’s visual-meta) and annotations (from the reader’s engagement, in the reader’s ledger) exist for the same document, the mapped view can render them as two layers. The author layer shows the document’s conceptual structure as the author defined it. The reader layer shows the reader’s engagement. These can be viewed separately, side by side, or overlaid. The overlay reveals where the reader’s attention aligns with or diverges from the author’s intended structure.

XR rendering. All four views above have direct XR equivalents. The 2D map becomes a 3D spatial environment. Nodes become objects the user can approach, inspect, and rearrange. Edges become visible connections. The Author-Reader overlay becomes two spatial layers that can be toggled, blended, or explored by walking between them. The ledger data is the same; only the rendering changes.

Portability

Bundling (export). When a document is shared, published, or archived, the system gathers all ledger entries referencing that document and embeds copies in the document’s visual-meta block as BibTeX entries. This includes both the author’s definitions and any annotations the author chooses to share. The bundled entries are marked as snapshots with a bundled-date field. The document is now self-contained.

Unbundling (import). When a user opens a document containing bundled entries, the system reads them from the visual-meta. Author definitions are displayed as the author layer (these represent the author’s intended structure; they are available for the overlay view but are not imported into the reader’s own ledger unless the reader explicitly chooses to). Reader annotations from other users (if shared) are offered for import. Duplicate detection uses the entry’s unique ID: if the ledger already contains an entry with that ID, the existing entry is kept and the bundled copy is ignored; if the bundled copy has a newer date, the user is prompted to choose.

The rule: the ledger is authoritative. Bundled copies in visual-meta are read-only snapshots for portability. Edits happen in the ledger; the next bundle operation produces an updated snapshot.

Annotated bibliography

Filter the ledger for entries matching a user-specified criterion (e.g. category = quote or category = important, optionally scoped to a project or document set). For each matching entry, resolve its references field to obtain the full citation BibTeX. Compile the results into a new document where each citation is followed by the annotation’s content field as the evaluative paragraph. The output is a standard annotated bibliography document with its own visual-meta.

The Annotation Ledger

Storage format

The ledger is a BibTeX-compatible text file with visual-meta, stored at a known location on the user’s filesystem. On macOS this would be within the application’s shared container or a user-specified directory. The choice of BibTeX-compatible format ensures: (a) consistency with existing visual-meta tooling, (b) human readability, (c) parseability by standard BibTeX libraries with minimal extension, and (d) easy version control with git.

The fields used in annotation and definition entries (target-position, target-content, category-schema, etc.) are non-standard BibTeX fields. This is normal and expected—BibTeX has always been extensible with arbitrary fields, and any conformant parser will read them. Applications that don’t understand these fields will ignore them; applications that do will use them.

Performance considerations

For a working researcher’s corpus (hundreds to low thousands of documents, tens of thousands of annotations), a single text file is adequate. Querying is a parse-and-filter operation. On modern hardware, parsing 50,000 BibTeX entries takes milliseconds.

For larger scale, or for applications requiring frequent queries (such as live updating of the mapped view while the user reads), the system should maintain a read cache: an in-memory index built from the ledger on application launch and updated incrementally as new entries are written. The ledger file remains the canonical store; the cache is ephemeral and rebuilt if invalidated. On macOS, this cache can be a simple in-memory dictionary keyed by target-document ID, with secondary indexes on category, tag, and date. This avoids the need for a database while keeping queries fast.

Concurrent access

Both Author and Reader may write to the same ledger. Since BibTeX entries are append-friendly (new entries are added at the end of the file), the simplest concurrency model is: acquire a file lock, append the new entry, release the lock. Reads do not require locking because the file is append-only during normal use. Deletions and modifications are handled by writing a new entry that supersedes the old one (using the same ID with an updated date), not by modifying the file in place. Periodic compaction (rewriting the file, keeping only the latest version of each ID) can be done as a background maintenance operation when no other process holds the lock.

Ledger properties

The ledger is:

  • Readable by any tool that parses BibTeX.
  • Queryable by parsing and filtering (with an in-memory cache for performance).
  • Versionable with standard version control (git diff works naturally on appended entries).
  • Splittable per project if scale demands it (each project gets its own ledger file; the application queries the active project’s ledger).
  • Itself a document with visual-meta, carrying its own metadata.
  • Shared between Author and Reader, providing a single store for both definitions and annotations.

Identity and Anchoring

Document identity

Every document has a stable ID in its visual-meta. This ID is assigned on document creation and does not change across saves, exports, or transfers. The ledger references documents exclusively by this ID, never by filename or path (which can change).

Anchor resolution across document formats

The three-strategy anchoring (position, content, structure) must account for the fact that Reader handles multiple document formats (PDF, DOCX, HTML, plain text, EPUB). The paragraph-and-offset anchor model works differently depending on the format:

  • Plain text and Markdown: paragraphs are separated by blank lines; offset is in Unicode code points from the start of the paragraph.
  • HTML and EPUB: paragraphs correspond to block-level elements (<p>, <h1>, <blockquote>, <li>); offset is in Unicode code points of the element’s text content.
  • PDF: paragraphs are extracted text blocks as determined by the PDF reader’s text extraction; offset is in Unicode code points. PDF anchoring is inherently less reliable than other formats because text extraction can vary between readers. The content anchor (hash + excerpt) is particularly important for PDF.
  • DOCX: paragraphs correspond to <w:p> elements; offset is in Unicode code points of the paragraph’s text content.

The structural anchor (section:2, paragraph:1) depends on the document having discernible section structure. For documents without sections (e.g. a single-section letter), the structural anchor degenerates to paragraph index only, which overlaps with the positional anchor. This is acceptable—the structural anchor is a fallback, and its value is proportional to the document’s structural richness.

When anchoring fails

If all three anchors fail to resolve (the document has changed substantially since the annotation was created), the annotation is marked as unanchored. The system presents unanchored annotations to the user in a dedicated list, showing the original excerpt from the content anchor so the user can manually re-anchor or delete them. Unanchored annotations are not silently discarded—they may contain valuable notes even if their position in the document can no longer be determined. In the mapped view, unanchored annotations appear in a separate region, still filterable by category and tag.

Strengths

True first-class objects. Annotations and definitions are independent entities from creation. They are interactable separately, as the design requires for documents, notes, annotations, citations, and glossary terms.

Structured from creation. The Annotate dialog asks the user to categorise their annotation at the moment of creation. The ledger contains pre-sorted, semantically meaningful knowledge objects—not a flat list of coloured highlights. This is what enables the mapped view and cross-document querying without post-hoc organisation.

Non-destructive annotation. The target document is never modified by the act of annotating. This is essential for annotating documents the user doesn’t own, read-only files, shared resources, and formats where visual-meta modification isn’t possible.

Unified authoring and reading infrastructure. Define Concept in Author and Annotate in Reader share the same ledger, the same data structure, and the same mapped view component. The transition from reading to writing—from annotated observations to a new document—is supported by continuous infrastructure.

Select, specify, view. The core interaction is identical in both tools and in both environments: select text, specify what it is, view it flexibly on a 2D map or in XR. This consistency makes the system learnable and makes the output useful across contexts.

Clean separation of concerns. The document is a document. The annotation is an annotation. The definition is a definition. Colour is a display property, not a data property. The schema is UI configuration, not data structure. Each component has one job.

Flexible portability. Documents can be bundled with their annotations on demand for sharing. The bundling is explicit, the rule is simple (ledger is authoritative, bundles are snapshots), and the unbundling handles duplicate detection.

Version control friendly. The append-only ledger works naturally with git. Annotations can be versioned independently of the documents they annotate.

Adaptable category systems. Different users, projects, and disciplines define their own category schemas. The underlying infrastructure is unchanged because categories are strings and schemas are UI configuration.

Mapped view as bridge between reading and writing. The mapped view of annotations across a corpus is the skeleton of the reader’s own future document. The infrastructure directly supports the workflow where academic papers are partly collections of annotations.

Weaknesses

Portability requires explicit bundling. A document does not automatically carry its annotations. Sharing without bundling means the recipient gets no annotations. Mitigation: the system can prompt or automatically bundle on share/export actions.

Anchor fragility. Standoff annotations are vulnerable to document edits. The three-strategy anchoring mitigates this but does not eliminate it. PDF anchoring is particularly fragile due to variability in text extraction. Mitigation: the unanchored-annotation workflow gives the user a clear recovery path.

Filesystem dependency. The ledger must be accessible to both Author and Reader, and to both macOS and XR environments. This requires either a shared filesystem location (natural on macOS where both apps access the same container), a sync service (iCloud, Dropbox), or a lightweight local server for XR access.

Discovery cost. Finding annotations for a given document requires querying the ledger. With the in-memory cache this is fast, but it is still an extra step compared to opening a document and finding everything inside. Mitigation: the cache is built on launch and updated incrementally; the cost is front-loaded and amortised.

Two copies when bundled. Bundled annotations in visual-meta are snapshots. The rule is clear (ledger is authoritative), but a user who receives a bundled document and edits its visual-meta in a third-party tool may create divergence. Mitigation: document this rule clearly; on import, always prefer the ledger copy.

Category schema management. Changing a schema after creating many annotations does not retroactively change those annotations. A user who renames “Important” to “Key Point” will have old entries under the old name. Mitigation: provide a batch-rename tool in the UI; keep the underlying model simple (categories are strings).

Concurrent write contention. If Author and Reader write to the ledger simultaneously at high frequency, file locking may cause brief delays. In practice, annotation creation is a human-speed operation (seconds between entries, not milliseconds), so this is unlikely to be a real problem. Mitigation: the append-only model with file locking is sufficient for the expected write frequency.


This is part 3 of a 4-part series. See also: [Approach 1: Visual-Meta as Canonical], [Approach 2: Graph-First], [Comparative Analysis].

Comments and feedback welcome. I am particularly interested in perspectives on the unified Author-Reader data model, anchor resolution across document formats, and practical experience with mapped views of annotations across document corpora.