Skip to content

Schema Reference

SciFlow's manuscript schema defines the structure of documents created and consumed by the editor, exporters (JATS, DOCX, etc.), and sync layers. It is built on the ProseMirror document model. This page describes both the document node tree (the ProseMirror layer) and the snapshot wrapper that packages the document together with files and references.

Generating the schemas

The JSON Schema files are generated from the live ProseMirror schema. Regenerate them after any schema change:

npx nx run @sciflow/schema-prosemirror:generate-schema
Output lives under packages/schema/prosemirror/dist/.

The following diagram shows a typical structured manuscript as a node tree — structural nodes (green) wrap block-level content (blue), which in turn contains inline nodes (orange). Headers and parts are optional; a document can also be completely flat (just paragraphs and other block nodes directly inside doc).

graph TD
    doc["doc"]
    header["header"]
    h1["heading (level 1)<br/>'My Article'"]
    sub["subtitle<br/>'A Subtitle'"]

    part1["part (chapter)"]
    h2["heading (level 1)<br/>'Introduction'"]
    p1["paragraph"]
    t1["text 'Some text with a '"]
    cite["citation (source: ref-1, ref-2<br/>locator: p. 2 on ref-2)"]
    citetxt["text '(Hughes, 2018; Smith & Lee, 2020, p. 2)'"]:::optionalInline
    t2["text ' reference.'"]

    part2["part (chapter)"]
    h3["heading (level 1)<br/>'Methods'"]
    fig["figure"]
    cap["caption"]
    lbl["label 'Figure'"]:::optional
    p2["paragraph 'A diagram.'"]

    bib["part (bibliography)"]
    bibh["heading (level 1)<br/>'References'"]
    ref1["reference (refId: ref-1)"]
    reftxt["text 'Hughes et al. (2018) …'"]:::optionalInline
    ref2["reference (refId: ref-2)"]
    ref2txt["text 'Smith & Lee (2020) …'"]:::optionalInline

    doc --> header
    header --> h1
    header --> sub
    doc --> part1
    part1 --> h2
    part1 --> p1
    p1 --> t1
    p1 --> cite
    cite --> citetxt
    p1 --> t2
    doc --> part2
    part2 --> h3
    part2 --> fig
    fig --> cap
    cap --> lbl
    cap --> p2
    doc --> bib
    bib --> bibh
    bib --> ref1
    ref1 --> reftxt
    bib --> ref2
    ref2 --> ref2txt

    classDef structural fill:#e8f5e9,stroke:#388e3c
    classDef block fill:#e3f2fd,stroke:#1565c0
    classDef inline fill:#fff3e0,stroke:#e65100
    classDef optional fill:#e3f2fd,stroke:#1565c0,stroke-dasharray: 5 5
    classDef optionalInline fill:#fff3e0,stroke:#e65100,stroke-dasharray: 5 5

    class doc,header structural
    class part1,part2,bib structural
    class h1,h2,h3,bibh,sub,p1,p2,fig,cap,ref1,ref2 block
    class t1,t2,cite inline

Document Snapshot

A snapshot is the top-level object used for persistence and sync. It wraps the ProseMirror document together with all side-channel data.

classDiagram
    class Snapshot {
        +doc : Document
        +version? : number
        +selection? : Selection
        +files? : SnapshotFile[]
        +references? : SnapshotReference[]
    }
    class Selection {
        +anchor : number
        +head : number
    }
    class SnapshotFile {
        +id : string
        +type? : string
        +url? : string
        +previewSrc? : string
        +mimeType? : string
        +name? : string
        +dimensions? : Dimensions
    }
    class SnapshotReference {
        +id : string
        +rawReference : string
        +mimeType? : string
    }
    Snapshot *-- "1" Document : doc
    Snapshot *-- "0..1" Selection : selection
    Snapshot o-- "0..*" SnapshotFile : files
    Snapshot o-- "0..*" SnapshotReference : references
Field Type Required Description
doc Document node (see below) Yes The ProseMirror document tree in toJSON() format.
version number No Document version counter, incremented on each change.
selection Selection No Cursor/selection state (anchor and head positions).
files SnapshotFile[] No Files and assets referenced by the document.
references SnapshotReference[] No Bibliography and citation references.

Files

Each SnapshotFile describes an asset (typically an image) referenced by the document.

Field Type Required Description
id string Yes Unique asset identifier (can align with figure node ids).
type string No Resource type, e.g. "image".
url string No Full-resolution URL for fetching the asset.
previewSrc string No Preview-sized URL or data URI for in-doc rendering.
mimeType string No MIME type of the resource.
name string No Human-readable file name.
dimensions { width, height } No Asset dimensions (string or number).

References

Each SnapshotReference represents a bibliography entry or citation source.

Field Type Required Description
id string Yes Unique reference identifier.
rawReference string Yes The bibliography string for this reference — the full entry as it appears in a reference list (e.g. "Hughes, T.P. (2018). Global warming transforms coral reef assemblages. Nature, 556, 492–496."). Used as initial content when creating a reference node, and updated from node content on export. May contain lightweight HTML tags such as <i> for italics. May be empty if the bibliography is generated from structured CSL data instead.
mimeType string No Citation format: application/vnd.citationstyles.csl+json or application/vnd.openalex+json.

rawReference vs raw citation

rawReference is the bibliography string (reference list entry), not an in-text citation. A raw citation is the short inline label such as "(Hughes, 2018)" or "[1]". Some external systems use different field names for the bibliography string — for example, OJS stores it as rawCitation. Map the external field to rawReference on import and back on export. See the OJS Integration Guide for a concrete example.

Use validateDocumentResources(doc, files, references) to check that every figure source and citation in the document resolves against the snapshot's files and references arrays. See the Reference Integration guide for usage examples.

The fields above are the mandatory envelope. Beyond these, the object typically carries the full reference data (title, author, issued, DOI, etc.) as additional properties. The format depends on mimeType:

  • application/vnd.citationstyles.csl+json — additional fields conform to the CSL-JSON schema (data schema).
  • application/vnd.openalex+json — additional fields follow the OpenAlex Works object format.

Document Node Tree

The document is a tree of nodes, each with a type, optional attrs, and optional content (child nodes). Text nodes carry a text string instead of content. Any node or text node may carry marks (inline formatting). See the ProseMirror Guide — Document Structure for a general introduction to this model.

Attribute value conventions

The attribute tables below use the following conventions:

  • Type columnstring means a non-null string is expected when the attribute is present. string? means the value can be either a string or null.
  • Default column — A quoted value like "article" means ProseMirror uses that default when the attribute is omitted. An empty cell means the attribute defaults to null.
  • null vs empty string — These are not interchangeable. null means "not set / not applicable" (the attribute has no value). An empty string "" is a valid value — for example, figure.src defaults to "" (meaning the figure wraps a table or code block rather than an image), and figure.alt defaults to "" (no alt text yet). Consumers should treat null as absent and "" as an intentional empty value.
  • Omitting attributes — ProseMirror fills in defaults for omitted attributes when parsing. In serialised JSON, attributes may be present with their default value or omitted entirely — both are valid. However, attributes that the schema defines (even optional ones) will always appear in toJSON() output with either their value or null.
classDiagram
    direction TB

    class doc {
        +type : "article"
        +lang? : string
        +schema? : string
    }

    class header {
        heading + subtitle?
    }

    class part {
        +id : string
        +type : PartType
        +locale? : string
        heading? block*
    }

    class heading {
        +id : string
        +level : 1-6
        +type : string
        text | footnote
    }

    class paragraph {
        +id : string
        +text-align? : string
        inline*
    }

    class figure {
        +id : string
        +src : string
        +alt : string
        +scale-width : number
        table|code_block? caption
    }

    class table {
        +id : string
        table_row+
    }

    class blockquote {
        +id : string
        block+
    }

    class code_block {
        +id : string
        +language : string
        text*
    }

    class bullet_list {
        list_item+
    }

    class ordered_list {
        +order : number
        list_item+
    }

    class text {
        +text : string
        +marks? : Mark[]
    }

    class math {
        +id : string
        +tex : string
        +style : inline|display
    }

    class citation {
        +id : string
        +source : string
        +style : string
    }

    class footnote {
        +id : string
        inline*
    }

    class image {
        +id : string
        +src : string
        +alt? : string
    }

    doc *-- "0..1" header
    doc *-- "0..*" part : structural
    doc *-- "0..*" paragraph : block
    doc *-- "0..*" heading : block
    doc *-- "0..*" figure : block
    doc *-- "0..*" table : block
    doc *-- "0..*" blockquote : block
    doc *-- "0..*" code_block : block
    doc *-- "0..*" bullet_list : block
    doc *-- "0..*" ordered_list : block

    part *-- "0..1" heading
    part *-- "0..*" paragraph

    paragraph *-- "0..*" text : inline
    paragraph *-- "0..*" math : inline
    paragraph *-- "0..*" citation : inline
    paragraph *-- "0..*" footnote : inline
    paragraph *-- "0..*" image : inline

Root: doc

The root node wraps an optional header followed by any mix of structural (part) and block-level nodes. Both the header and parts are optional — a document can be completely flat, containing only paragraphs and other block nodes as direct children.

Attribute Type Default Description
type string "article" Document type.
lang string? Document language (e.g. "en-US").
role string? Semantic role of the document.
schema string? Schema version identifier.
pageBreak string? Page break behaviour.
placement string? Placement hint (cover, front, body, back).
numbering string? Numbering style (decimal, alpha, roman, none).

Content: header? (structural | block)*

Structural Nodes

part

A section wrapper representing a chapter, appendix, abstract, or other structural division. Parts form the main outline of the document.

Attribute Type Default Description
id string? Unique node identifier.
type string "chapter" Part type: chapter, abstract, bibliography, appendix, part, free.
locale string? Locale override (e.g. "de-DE").
numbering string? Numbering style.
placement string? Placement in the document.
role string? Semantic role.
text-direction string? ltr, rtl, or auto.
class string? Custom CSS class.
skipToc boolean false Exclude from table of contents.
pageBreak string? after, before, right, before-and-after.
data string? Extended settings as JSON string (e.g. alternative titles).

Content: heading? block*

Container that groups the main heading with an optional subtitle. Typically appears once at the top of the document.

Content: heading subtitle?

subtitle

A subtitle node, usually sitting below the main heading inside a header.

Content: inline*

Block Nodes

heading

Section or chapter title. Heading levels 1–6 are supported.

Attribute Type Default Description
id string? Unique node identifier.
level number 1 Heading level (1–6).
type string "chapter" Heading type.
role string? Semantic role.
numbering string? Numbering style override.
placement string? Placement hint.
data string? Extended settings as JSON string (e.g. literal numbering).

Content: (text | footnote)* Allowed marks: emphasis, strong, superscript, subscript, bdi, tags, indexEntry

paragraph

Standard block of inline content.

Attribute Type Description
id string? Unique node identifier.
text-align string? left, right, center, justify.
text-direction string? ltr, rtl, or auto.
class string? Custom CSS class.

Content: inline*

reference

A bibliography entry rendered as a paragraph-like block. Similar to paragraph but carries an additional refId linking it to a SnapshotReference.

Attribute Type Description
id string? Unique node identifier.
refId string? Reference to a SnapshotReference.id.
text-align string? Text alignment.
text-direction string? Text direction.
class string? Custom CSS class.

Content: inline*

The inline content of a reference node holds the displayed bibliography text for this entry. The rawReference field on the linked SnapshotReference serves as an interchange format for this content — it represents the same text in systems that do not use ProseMirror nodes.

How content and rawReference interact:

  • Import: When a reference is inserted into the document, the rawReference string (if non-empty) is used as the initial inline content of the reference node.
  • CSL generation: If rawReference is empty (or the user clears it), a citation processor can generate the bibliography text from the structured CSL-JSON data on the SnapshotReference instead.
  • User edits: The user may freely edit the inline content of a reference node after it is created — the displayed text is always what the node contains.
  • Export: On export, the current inline content of the reference node is extracted back into the rawReference field, so downstream consumers receive the latest text.

This means rawReference and the node's inline content are two representations of the same value: rawReference is the portable string form, the node content is the live ProseMirror form.

The reference node is the bibliography-side counterpart to the inline citation node. Both carry text content that can be generated or user-provided: a reference holds the full bibliography string (e.g. "Hughes, T.P. (2018). Global warming transforms coral reef assemblages. Nature, 556, 492–496."), while a citation holds the in-text marker (e.g. "(Hughes, 2018)"). The rawReference field on the SnapshotReference is the portable form of the reference text; the citation equivalent is simply the citation node's inline content — there is no separate rawCitation field.

figure

A figure wrapper that can host an image, a table, or a code block together with a caption. The type attribute determines the variant:

type value Content model Description
"figure" Image via src + caption Standard image figure. The image is rendered from the src attribute; the optional table/code_block child is not used.
"native-table" table node + caption Table wrapped in a figure. The src attribute is empty. The figure contains a ProseMirror table node as a direct child, giving it a caption and cross-reference support. Created automatically when pasting HTML tables.

A figure can also wrap a code_block instead of a table (e.g. for captioned code listings).

No sub-figure support

Figures cannot be nested. The content expression does not allow a figure inside another figure, so sub-figures (e.g. a grid of images with individual sub-captions) are not currently supported.

Attribute Type Default Description
id string? Unique node identifier.
src string "" Image source URL. Empty string "" for native-table figures (not null).
alt string "" Alternative text. Empty string "" when not yet provided.
width string? Image width in pixels.
height string? Image height in pixels.
title string? Figure title (tooltip / export metadata).
type string "figure" Figure variant: "figure" or "native-table".
environment string? LaTeX environment name (e.g. "figure*" for two-column spanning).
orientation string "portrait" portrait or landscape.
decorative string? Marks purely decorative images (skipped by screen readers).
scale-width number 1 Scale factor (0–1) controlling how much of the text width the figure occupies.
float-placement string? Float placement hint for typesetting (e.g. "top", "bottom", "here").
float-reference string? Float reference frame.
float-defer-page string? Deferred page placement.
float-modifier string? Float modifier.

Content: (table | code_block)? caption

caption

Container for caption content beneath a figure. The first paragraph serves as the actual caption text, while any subsequent paragraphs become figure notes.

Content: label? block*

label

Optional custom label for the figure environment (e.g. "Map", "Example", "Plate"). Overrides the default label derived from the figure type. Since figures are auto-numbered, this should contain only the label word, not a number. Rarely needed — the default labels (Figure, Table, etc.) are almost always appropriate.

Content: text*

blockquote

Quoted passage containing one or more blocks.

Attribute Type Description
id string? Unique identifier.
lang string? Language of quote.

Content: block+

code_block

Block-level code snippet preserving formatting.

Attribute Type Default Description
id string? Unique identifier.
text string "" Code content.
type string "code" Block type.
language string "text/plain" Language MIME type or identifier.

Content: text* (no marks)

bullet_list / ordered_list

Standard list containers.

ordered_list carries an order attribute (starting number, default 1).

Content: list_item+

list_item

Single item within a list.

Content: block*

table

Table container generated by prosemirror-tables. Contains table_row nodes, each containing table_cell or table_header nodes.

Node Attributes
table id
table_row id
table_cell colspan, rowspan, colwidth (array), background (colour)
table_header Same as table_cell

Table cells contain: (paragraph | ordered_list | bullet_list | figure | blockquote)*

pageBreak

An explicit page break marker. No attributes, no content.

horizontal_rule

A horizontal separator. No attributes, no content.

placeHolder

Placeholder node for assets injected later (e.g. logos in templates).

Attribute Type Default Description
id string? Unique identifier.
type string "logo" Placeholder type.
label string "Logo" Display label.

Inline Nodes

Inline nodes appear inside paragraphs and other text-containing blocks.

text

Plain text leaf. Carries a text string and optional marks array.

{ "type": "text", "text": "Hello world", "marks": [{ "type": "strong" }] }

hard_break

Forced line break within a paragraph. No attributes.

image

Inline image.

Attribute Type Description
id string? Unique identifier.
src string? Image source URL.
alt string? Alternative text.
title string? Image title.
width string? Width.
height string? Height.
metaData string? Opaque metadata for downstream processing.
decorative string? Marks image as decorative (role="presentation").

math

Inline or display math expression.

Attribute Type Default Description
id string? Unique identifier.
tex string "" TeX source.
style string "inline" inline or display.
label string? Equation label for cross-referencing.

Content: text* (the TeX source as text)

citation

Inline citation placeholder linking to one or more bibliography references. A single citation node can group multiple references (e.g. [1, 2]).

Attribute Type Default Description
id string? Unique identifier.
source string? null URI-encoded JSON array of CSL citation items (see below). null when unresolved.
style string "apa" Citation style key, e.g. "apa", "chicago-author-date", "ieee".

Content: inline*

The citation node carries inline content that represents the displayed citation text. This content can originate in two ways:

  1. Generated from a CSL style — a citation processor uses the source attribute and the referenced SnapshotReference data to produce formatted text according to the chosen style (e.g. "(Hughes, 2018)" for APA, "[1]" for IEEE).
  2. User-provided — if the user has typed or edited the citation text manually (e.g. "123" or "see Mola 2015"), that text is preserved as-is in the node content.

Renderers and exporters should use the node's existing text content as the display value. A citation processor may overwrite the content with a style-generated string, but it is not required — keeping the user-provided text is valid.

The source field is serialised with encodeURI(JSON.stringify(items)). Each item follows the CSL-JSON citation item structure (see also the CSL specification):

Field Type Required Description
id string Yes Reference to a SnapshotReference.id.
prefix string No Text rendered before the citation (e.g. "see ").
suffix string No Text rendered after the citation.
locator string No Page, chapter, or other locator.
label string No Locator type label (e.g. "page", "chapter").

A decoded source value looks like:

[
  { "id": "ref-hughes-2018", "prefix": "see " },
  { "id": "ref-smith-2020" }
]

footnote

Inline footnote atom storing footnote content.

Attribute Type Default Description
id string? Unique identifier.
type string "footnote" Footnote type.

Content: inline*

Cross-reference node rendered as an anchor.

Attribute Type Description
id string? Unique identifier.
type string? Link type (e.g. "xref").
href string? Target reference.
reference-format string? How the reference should be rendered.

Content: text*


Example Documents

The following JSON examples show the toJSON() output format that the schema describes. You can validate them against manuscript.schema.json or manuscript-snapshot.schema.json.

Minimal (Flat) Document

The smallest valid document — a single paragraph directly inside doc, with no header or parts. This flat structure is perfectly valid; not every document needs sections:

{
  "type": "doc",
  "attrs": {
    "type": "article",
    "lang": "en-US",
    "schema": null,
    "pageBreak": null,
    "placement": null,
    "numbering": null
  },
  "content": [
    {
      "type": "paragraph",
      "attrs": { "id": "p1", "text-align": null, "text-direction": null, "class": null },
      "content": [
        { "type": "text", "text": "Hello, world!" }
      ]
    }
  ]
}

A flat document can contain any block nodes — paragraphs, headings, figures, code blocks, lists — without wrapping them in part nodes. Parts only become necessary when the document needs distinct sections (chapters, appendix, abstract, etc.).

Document with Inline Formatting

Text nodes carry a marks array for bold, italic, links, etc.:

{
  "type": "doc",
  "attrs": { "type": "article", "lang": "en-US", "schema": null, "pageBreak": null, "placement": null, "numbering": null },
  "content": [
    {
      "type": "paragraph",
      "attrs": { "id": "p1", "text-align": null, "text-direction": null, "class": null },
      "content": [
        { "type": "text", "text": "This is " },
        { "type": "text", "text": "bold", "marks": [{ "type": "strong" }] },
        { "type": "text", "text": " and " },
        { "type": "text", "text": "italic", "marks": [{ "type": "em" }] },
        { "type": "text", "text": " text with a " },
        {
          "type": "text",
          "text": "hyperlink",
          "marks": [{ "type": "anchor", "attrs": { "href": "https://example.com", "title": null, "id": null } }]
        },
        { "type": "text", "text": "." }
      ]
    }
  ]
}

Structured Manuscript

A typical manuscript with header, chapters, a citation, and a figure:

{
  "type": "doc",
  "attrs": { "type": "article", "lang": "en-US", "schema": null, "pageBreak": null, "placement": null, "numbering": null },
  "content": [
    {
      "type": "header",
      "content": [
        {
          "type": "heading",
          "attrs": { "id": "h-title", "level": 1, "type": "chapter", "role": null, "numbering": null, "placement": null, "data": null },
          "content": [{ "type": "text", "text": "Climate Effects on Marine Biodiversity" }]
        },
        {
          "type": "subtitle",
          "content": [{ "type": "text", "text": "A Systematic Review" }]
        }
      ]
    },
    {
      "type": "part",
      "attrs": {
        "id": "sec-intro", "type": "chapter", "locale": null, "numbering": null,
        "placement": null, "role": null, "text-direction": null, "class": null,
        "skipToc": false, "pageBreak": null, "data": null
      },
      "content": [
        {
          "type": "heading",
          "attrs": { "id": "h-intro", "level": 1, "type": "chapter", "role": null, "numbering": null, "placement": null, "data": null },
          "content": [{ "type": "text", "text": "Introduction" }]
        },
        {
          "type": "paragraph",
          "attrs": { "id": "p-intro1", "text-align": null, "text-direction": null, "class": null },
          "content": [
            { "type": "text", "text": "Rising ocean temperatures have been shown to impact coral reef ecosystems " },
            {
              "type": "citation",
              "attrs": { "id": "cite-1", "source": "%5B%7B%22id%22%3A%22ref-hughes-2018%22%7D%5D", "style": "apa" }
            },
            { "type": "text", "text": "." }
          ]
        }
      ]
    },
    {
      "type": "part",
      "attrs": {
        "id": "sec-methods", "type": "chapter", "locale": null, "numbering": null,
        "placement": null, "role": null, "text-direction": null, "class": null,
        "skipToc": false, "pageBreak": null, "data": null
      },
      "content": [
        {
          "type": "heading",
          "attrs": { "id": "h-methods", "level": 1, "type": "chapter", "role": null, "numbering": null, "placement": null, "data": null },
          "content": [{ "type": "text", "text": "Methods" }]
        },
        {
          "type": "paragraph",
          "attrs": { "id": "p-methods1", "text-align": null, "text-direction": null, "class": null },
          "content": [
            { "type": "text", "text": "We analysed satellite data using the equation " },
            {
              "type": "math",
              "attrs": { "id": "eq-1", "tex": "\\Delta T = \\frac{Q}{mc}", "style": "inline", "label": null }
            },
            { "type": "text", "text": " to derive temperature change." }
          ]
        },
        {
          "type": "figure",
          "attrs": {
            "id": "fig-map", "src": "map.png", "alt": "Ocean temperature map",
            "width": null, "height": null, "title": null, "type": "figure",
            "environment": null, "orientation": "portrait", "decorative": null,
            "scale-width": 0.8, "float-placement": null, "float-reference": null,
            "float-defer-page": null, "float-modifier": null
          },
          "content": [
            {
              "type": "caption",
              "content": [
                {
                  "type": "label",
                  "content": [{ "type": "text", "text": "Map" }]
                },
                {
                  "type": "paragraph",
                  "attrs": { "id": "p-cap1", "text-align": null, "text-direction": null, "class": null },
                  "content": [{ "type": "text", "text": "Global sea surface temperature anomalies (2020)." }]
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}

Full Snapshot

A complete snapshot wrapping the document together with files and references:

{
  "doc": {
    "type": "doc",
    "attrs": { "type": "article", "lang": "en-US", "schema": null, "pageBreak": null, "placement": null, "numbering": null },
    "content": [
      {
        "type": "paragraph",
        "attrs": { "id": "p1", "text-align": null, "text-direction": null, "class": null },
        "content": [
          { "type": "text", "text": "Ocean temperatures are rising " },
          { "type": "citation", "attrs": { "id": "cite-1", "source": "%5B%7B%22id%22%3A%22ref-hughes-2018%22%7D%5D", "style": "apa" } },
          { "type": "text", "text": "." }
        ]
      },
      {
        "type": "figure",
        "attrs": {
          "id": "fig-1", "src": "asset-map-001", "alt": "Temperature map",
          "width": null, "height": null, "title": null, "type": "figure",
          "environment": null, "orientation": "portrait", "decorative": null,
          "scale-width": 1, "float-placement": null, "float-reference": null,
          "float-defer-page": null, "float-modifier": null
        },
        "content": [
          {
            "type": "caption",
            "content": [
              {
                "type": "paragraph",
                "attrs": { "id": "p-cap", "text-align": null, "text-direction": null, "class": null },
                "content": [{ "type": "text", "text": "Sea surface temperatures." }]
              }
            ]
          }
        ]
      }
    ]
  },
  "version": 42,
  "selection": { "anchor": 12, "head": 12 },
  "files": [
    {
      "id": "asset-map-001",
      "type": "image",
      "url": "https://cdn.example.com/images/map-full.png",
      "previewSrc": "data:image/png;base64,iVBOR...",
      "mimeType": "image/png",
      "name": "map-full.png",
      "dimensions": { "width": 1200, "height": 800 }
    }
  ],
  "references": [
    {
      "id": "ref-hughes-2018",
      "rawReference": "{\"type\":\"article-journal\",\"title\":\"Global warming transforms coral reef assemblages\",\"author\":[{\"family\":\"Hughes\",\"given\":\"T.P.\"}],\"issued\":{\"date-parts\":[[2018]]}}",
      "mimeType": "application/vnd.citationstyles.csl+json"
    }
  ]
}

Table with Merged Cells

A figure wrapping a table that has merged header cells:

{
  "type": "figure",
  "attrs": {
    "id": "fig-tbl", "src": "", "alt": "", "width": null, "height": null,
    "title": null, "type": "native-table", "environment": null, "orientation": "portrait",
    "decorative": null, "scale-width": 1, "float-placement": null,
    "float-reference": null, "float-defer-page": null, "float-modifier": null
  },
  "content": [
    {
      "type": "table",
      "attrs": { "id": "tbl-1" },
      "content": [
        {
          "type": "table_row",
          "attrs": { "id": "tr-1" },
          "content": [
            {
              "type": "table_header",
              "attrs": { "colspan": 2, "rowspan": 1, "colwidth": null, "background": null },
              "content": [
                {
                  "type": "paragraph",
                  "attrs": { "id": "p-th1", "text-align": null, "text-direction": null, "class": null },
                  "content": [{ "type": "text", "text": "Region", "marks": [{ "type": "strong" }] }]
                }
              ]
            },
            {
              "type": "table_header",
              "attrs": { "colspan": 1, "rowspan": 1, "colwidth": null, "background": null },
              "content": [
                {
                  "type": "paragraph",
                  "attrs": { "id": "p-th2", "text-align": null, "text-direction": null, "class": null },
                  "content": [{ "type": "text", "text": "Temperature (°C)", "marks": [{ "type": "strong" }] }]
                }
              ]
            }
          ]
        },
        {
          "type": "table_row",
          "attrs": { "id": "tr-2" },
          "content": [
            {
              "type": "table_cell",
              "attrs": { "colspan": 1, "rowspan": 1, "colwidth": null, "background": null },
              "content": [
                {
                  "type": "paragraph",
                  "attrs": { "id": "p-c1", "text-align": null, "text-direction": null, "class": null },
                  "content": [{ "type": "text", "text": "Pacific" }]
                }
              ]
            },
            {
              "type": "table_cell",
              "attrs": { "colspan": 1, "rowspan": 1, "colwidth": null, "background": null },
              "content": [
                {
                  "type": "paragraph",
                  "attrs": { "id": "p-c2", "text-align": null, "text-direction": null, "class": null },
                  "content": [{ "type": "text", "text": "North" }]
                }
              ]
            },
            {
              "type": "table_cell",
              "attrs": { "colspan": 1, "rowspan": 1, "colwidth": null, "background": null },
              "content": [
                {
                  "type": "paragraph",
                  "attrs": { "id": "p-c3", "text-align": null, "text-direction": null, "class": null },
                  "content": [{ "type": "text", "text": "18.2" }]
                }
              ]
            }
          ]
        }
      ]
    },
    {
      "type": "caption",
      "content": [
        {
          "type": "label",
          "content": [{ "type": "text", "text": "Table 1" }]
        },
        {
          "type": "paragraph",
          "attrs": { "id": "p-tcap", "text-align": null, "text-direction": null, "class": null },
          "content": [{ "type": "text", "text": "Average sea surface temperatures by region." }]
        }
      ]
    }
  ]
}

Footnotes and Code Blocks

A paragraph with a footnote, followed by a code block:

{
  "type": "doc",
  "attrs": { "type": "article", "lang": "en-US", "schema": null, "pageBreak": null, "placement": null, "numbering": null },
  "content": [
    {
      "type": "paragraph",
      "attrs": { "id": "p1", "text-align": null, "text-direction": null, "class": null },
      "content": [
        { "type": "text", "text": "This claim requires clarification" },
        {
          "type": "footnote",
          "attrs": { "id": "fn-1", "type": "footnote" },
          "content": [
            { "type": "text", "text": "See appendix B for the full derivation." }
          ]
        },
        { "type": "text", "text": "." }
      ]
    },
    {
      "type": "code_block",
      "attrs": { "id": "cb-1", "text": "", "type": "code", "language": "application/x-python" },
      "content": [
        { "type": "text", "text": "import numpy as np\n\ndef temperature_anomaly(data):\n    return data - np.mean(data)" }
      ]
    }
  ]
}

Marks

Marks represent inline formatting — emphasis, strong, links, and so on. Unlike HTML, where inline formatting creates nested elements (<strong><em>text</em></strong>), ProseMirror uses a flat model: each text node carries an array of active marks as metadata. This avoids the ambiguity of overlapping or arbitrarily nested markup and makes positions simple character offsets rather than tree paths. See ProseMirror Guide — Document Structure for the rationale behind this design.

For example, the text "bold and italic" is represented as three text nodes, not nested elements:

[
  { "type": "text", "text": "bold ", "marks": [{ "type": "strong" }] },
  { "type": "text", "text": "and italic", "marks": [{ "type": "strong" }, { "type": "em" }] }
]

Mark Ordering

The order of marks in the marks array is not arbitrary — it is determined by the order in which marks are declared in the schema definition. ProseMirror sorts marks by their rank (position in the schema's marks object) so that every document has exactly one canonical JSON representation. This means:

  • Two documents with the same logical content always produce identical JSON when serialised with toJSON().
  • Adjacent text nodes with the same mark set are automatically merged.
  • Empty text nodes are never emitted.

In this schema, the mark order is: em, strong, sup, sub, bdi, anchor, tags, indexEntry. When a text node carries multiple marks, they always appear in this order regardless of the sequence in which they were applied. This deterministic representation is important for storage, comparison, and collaborative editing.

classDiagram
    direction LR
    class Mark {
        <<interface>>
        +type : string
        +attrs? : object
    }
    class em
    class strong
    class sup
    class sub
    class bdi
    class anchor {
        +href : string
        +title? : string
        +id? : string
    }
    class tags {
        +tags : ~key: string~[]
    }
    class indexEntry {
        +id? : string
        +entries : ~raw?: string~[]
        +attributes : object
    }
    Mark <|-- em
    Mark <|-- strong
    Mark <|-- sup
    Mark <|-- sub
    Mark <|-- bdi
    Mark <|-- anchor
    Mark <|-- tags
    Mark <|-- indexEntry
Mark Attributes Description
em Emphasis (italic).
strong Strong (bold).
sup Superscript.
sub Subscript.
bdi Bidirectional isolate.
anchor href, title, id Hyperlink.
tags tags (array of { key }) Semantic tags applied to text.
indexEntry id, entries (array of { raw? }), attributes Back-of-book index entry.

Node Groups

ProseMirror content expressions reference groups rather than individual node types. For example, block+ means "one or more nodes belonging to the block group". The groups in the manuscript schema:

Group Member nodes
block paragraph, reference, heading, figure, code_block, blockquote, pageBreak, placeHolder, horizontal_rule, bullet_list, ordered_list, table
structural part
inline text, hard_break, image, math, citation, footnote, link

Compatibility Notes

  • The schema round-trips to SciFlow document snapshots and is compatible with the typesetting pipeline.
  • The JSON format matches ProseMirror's Node.toJSON() output. Documents can be loaded back with Node.fromJSON().
  • Aligns with JATS semantics but isn't a 1:1 copy; e.g., figures carry float metadata to simplify publishing workflows.
  • When adding new nodes or marks, update both the schema package and the editor features that surface them.
  • The JSON Schema files are regenerated from the ProseMirror schema — they are always in sync with the TypeScript source of truth.

Feature availability in the start bundle

The default @sciflow/editor-start feature set enables citations, cross references, footnotes, inline formatting, headings, figures, tables, and math. To render equations as SVG, add the MathJax script tag to your page (see Troubleshooting). You can change the active features via editor.configureFeatures(...) (see Extending Features).