What are the extractable elements in a paper?

Currently

  • Body text (sentences, paragraphs)
  • Headings (level and author with page number)
  • Images (and captions)
  • 3D models (or references to location for download/view)
  • Video (or references to location for download/view)
  • Audio (or references to location for download/view)
  • Graphs (with associated data in multiple formats or references to location for download/view)
  • Formulas (interactively)
  • Author Glosses
    • Endnotes (with connections in body text)
    • Sidenotes (with connections in body text)
    • Footnotes (with connections in body text)
    • Highlights
    • Author/Editor Annotations
  • References/Citations (with connections in body text, including what section)
  • Links
  • Glossary with definitions
  • Citation information (author name(s), institution, date, location)
  • Keywords
  • Categories
  • Stats, such as amount of words in each section

With Extraction (via AI or manual or other)

  • Keywords, names of persons, technologies, products etc.
  • Keywords pre-matched against user’s keywords lists
  • Summaries, whole document, per section or per paragraph
  • Generated document ‘covers’

Appended

  • Subscribed Annotations
  • User’s Annotations

External

  • What papers cite this paper
  • External judgements of this paper
  • External judgements of this author(s)
  • Transcluded text
  • Visible links: Cited documents brought into view on correct location

Intentional from Author via Visual-Meta

  • Structural text

Leave a comment

Your email address will not be published. Required fields are marked *