Schema Reference¶

SciFlow's manuscript schema defines the structure of documents created and consumed by the editor, exporters (JATS, DOCX, etc.), and sync layers. It is built on the ProseMirror document model. This page describes both the document node tree (the ProseMirror layer) and the snapshot wrapper that packages the document together with files and references.

Source package: @sciflow/schema-prosemirror (packages/schema/prosemirror)
JSON Schema (document): manuscript.schema.json
JSON Schema (snapshot): manuscript-snapshot.schema.json

Generating the schemas

The JSON Schema files are generated from the live ProseMirror schema. Regenerate them after any schema change:

npx nx run @sciflow/schema-prosemirror:generate-schema

Output lives under packages/schema/prosemirror/dist/.

The following diagram shows a typical structured manuscript as a node tree — structural nodes (green) wrap block-level content (blue), which in turn contains inline nodes (orange). Headers and parts are optional; a document can also be completely flat (just paragraphs and other block nodes directly inside doc).

graph TD
    doc["doc"]
    header["header"]
    h1["heading (level 1)<br/>'My Article'"]
    sub["subtitle<br/>'A Subtitle'"]

    part1["part (chapter)"]
    h2["heading (level 1)<br/>'Introduction'"]
    p1["paragraph"]
    t1["text 'Some text with a '"]
    cite["citation (source: ref-1, ref-2<br/>locator: p. 2 on ref-2)"]
    citetxt["text '(Hughes, 2018; Smith & Lee, 2020, p. 2)'"]:::optionalInline
    t2["text ' reference.'"]

    part2["part (chapter)"]
    h3["heading (level 1)<br/>'Methods'"]
    fig["figure"]
    cap["caption"]
    lbl["label 'Figure'"]:::optional
    p2["paragraph 'A diagram.'"]

    bib["part (bibliography)"]
    bibh["heading (level 1)<br/>'References'"]
    ref1["reference (refId: ref-1)"]
    reftxt["text 'Hughes et al. (2018) …'"]:::optionalInline
    ref2["reference (refId: ref-2)"]
    ref2txt["text 'Smith & Lee (2020) …'"]:::optionalInline

    doc --> header
    header --> h1
    header --> sub
    doc --> part1
    part1 --> h2
    part1 --> p1
    p1 --> t1
    p1 --> cite
    cite --> citetxt
    p1 --> t2
    doc --> part2
    part2 --> h3
    part2 --> fig
    fig --> cap
    cap --> lbl
    cap --> p2
    doc --> bib
    bib --> bibh
    bib --> ref1
    ref1 --> reftxt
    bib --> ref2
    ref2 --> ref2txt

    classDef structural fill:#e8f5e9,stroke:#388e3c
    classDef block fill:#e3f2fd,stroke:#1565c0
    classDef inline fill:#fff3e0,stroke:#e65100
    classDef optional fill:#e3f2fd,stroke:#1565c0,stroke-dasharray: 5 5
    classDef optionalInline fill:#fff3e0,stroke:#e65100,stroke-dasharray: 5 5

    class doc,header structural
    class part1,part2,bib structural
    class h1,h2,h3,bibh,sub,p1,p2,fig,cap,ref1,ref2 block
    class t1,t2,cite inline

Document Snapshot¶

A snapshot is the top-level object used for persistence and sync. It wraps the ProseMirror document together with all side-channel data.

classDiagram
    class Snapshot {
        +doc : Document
        +version? : number
        +selection? : Selection
        +files? : SnapshotFile[]
        +references? : SnapshotReference[]
    }
    class Selection {
        +anchor : number
        +head : number
    }
    class SnapshotFile {
        +id : string
        +type? : string
        +url? : string
        +previewSrc? : string
        +mimeType? : string
        +name? : string
        +dimensions? : Dimensions
    }
    class SnapshotReference {
        +id : string
        +rawReference : string
        +mimeType? : string
    }
    Snapshot *-- "1" Document : doc
    Snapshot *-- "0..1" Selection : selection
    Snapshot o-- "0..*" SnapshotFile : files
    Snapshot o-- "0..*" SnapshotReference : references

Field	Type	Required	Description
`doc`	Document node (see below)	Yes	The ProseMirror document tree in `toJSON()` format.
`version`	`number`	No	Document version counter, incremented on each change.
`selection`	Selection	No	Cursor/selection state (`anchor` and `head` positions).
`files`	`SnapshotFile[]`	No	Files and assets referenced by the document.
`references`	`SnapshotReference[]`	No	Bibliography and citation references.

Files¶

Each SnapshotFile describes an asset (typically an image) referenced by the document.

Field	Type	Required	Description
`id`	`string`	Yes	Unique asset identifier (can align with figure node ids).
`type`	`string`	No	Resource type, e.g. `"image"`.
`url`	`string`	No	Full-resolution URL for fetching the asset.
`previewSrc`	`string`	No	Preview-sized URL or data URI for in-doc rendering.
`mimeType`	`string`	No	MIME type of the resource.
`name`	`string`	No	Human-readable file name.
`dimensions`	`{ width, height }`	No	Asset dimensions (string or number).

References¶

Each SnapshotReference represents a bibliography entry or citation source.

Field	Type	Required	Description
`id`	`string`	Yes	Unique reference identifier.
`rawReference`	`string`	Yes	The bibliography string for this reference — the full entry as it appears in a reference list (e.g. "Hughes, T.P. (2018). Global warming transforms coral reef assemblages. Nature, 556, 492–496."). Used as initial content when creating a `reference` node, and updated from node content on export. May contain lightweight HTML tags such as `<i>` for italics. May be empty if the bibliography is generated from structured CSL data instead.
`mimeType`	`string`	No	Citation format: `application/vnd.citationstyles.csl+json` or `application/vnd.openalex+json`.

rawReference vs raw citation

rawReference is the bibliography string (reference list entry), not an in-text citation. A raw citation is the short inline label such as "(Hughes, 2018)" or "[1]". Some external systems use different field names for the bibliography string — for example, OJS stores it as rawCitation. Map the external field to rawReference on import and back on export. See the OJS Integration Guide for a concrete example.

Use validateDocumentResources(doc, files, references) to check that every figure source and citation in the document resolves against the snapshot's files and references arrays. See the Reference Integration guide for usage examples.

The fields above are the mandatory envelope. Beyond these, the object typically carries the full reference data (title, author, issued, DOI, etc.) as additional properties. The format depends on mimeType:

application/vnd.citationstyles.csl+json — additional fields conform to the CSL-JSON schema (data schema).
application/vnd.openalex+json — additional fields follow the OpenAlex Works object format.

Document Node Tree¶

The document is a tree of nodes, each with a type, optional attrs, and optional content (child nodes). Text nodes carry a text string instead of content. Any node or text node may carry marks (inline formatting). See the ProseMirror Guide — Document Structure for a general introduction to this model.

Attribute value conventions¶

The attribute tables below use the following conventions:

Type column — string means a non-null string is expected when the attribute is present. string? means the value can be either a string or null.
Default column — A quoted value like "article" means ProseMirror uses that default when the attribute is omitted. An empty cell means the attribute defaults to null.
null vs empty string — These are not interchangeable. null means "not set / not applicable" (the attribute has no value). An empty string "" is a valid value — for example, figure.src defaults to "" (meaning the figure wraps a table or code block rather than an image), and figure.alt defaults to "" (no alt text yet). Consumers should treat null as absent and "" as an intentional empty value.
Omitting attributes — ProseMirror fills in defaults for omitted attributes when parsing. In serialised JSON, attributes may be present with their default value or omitted entirely — both are valid. However, attributes that the schema defines (even optional ones) will always appear in toJSON() output with either their value or null.

classDiagram
    direction TB

    class doc {
        +type : "article"
        +lang? : string
        +schema? : string
    }

    class header {
        heading + subtitle?
    }

    class part {
        +id : string
        +type : PartType
        +locale? : string
        heading? block*
    }

    class heading {
        +id : string
        +level : 1-6
        +type : string
        text | footnote
    }

    class paragraph {
        +id : string
        +text-align? : string
        inline*
    }

    class figure {
        +id : string
        +src : string
        +alt : string
        +scale-width : number
        table|code_block? caption
    }

    class table {
        +id : string
        table_row+
    }

    class blockquote {
        +id : string
        block+
    }

    class code_block {
        +id : string
        +language : string
        text*
    }

    class bullet_list {
        list_item+
    }

    class ordered_list {
        +order : number
        list_item+
    }

    class text {
        +text : string
        +marks? : Mark[]
    }

    class math {
        +id : string
        +tex : string
        +style : inline|display
    }

    class citation {
        +id : string
        +source : string
        +style : string
    }

    class footnote {
        +id : string
        inline*
    }

    class image {
        +id : string
        +src : string
        +alt? : string
    }

    doc *-- "0..1" header
    doc *-- "0..*" part : structural
    doc *-- "0..*" paragraph : block
    doc *-- "0..*" heading : block
    doc *-- "0..*" figure : block
    doc *-- "0..*" table : block
    doc *-- "0..*" blockquote : block
    doc *-- "0..*" code_block : block
    doc *-- "0..*" bullet_list : block
    doc *-- "0..*" ordered_list : block

    part *-- "0..1" heading
    part *-- "0..*" paragraph

    paragraph *-- "0..*" text : inline
    paragraph *-- "0..*" math : inline
    paragraph *-- "0..*" citation : inline
    paragraph *-- "0..*" footnote : inline
    paragraph *-- "0..*" image : inline

Root: `doc`¶

The root node wraps an optional header followed by any mix of structural (part) and block-level nodes. Both the header and parts are optional — a document can be completely flat, containing only paragraphs and other block nodes as direct children.

Attribute	Type	Default	Description
`type`	`string`	`"article"`	Document type.
`lang`	`string?`		Document language (e.g. `"en-US"`).
`role`	`string?`		Semantic role of the document.
`schema`	`string?`		Schema version identifier.
`pageBreak`	`string?`		Page break behaviour.
`placement`	`string?`		Placement hint (`cover`, `front`, `body`, `back`).
`numbering`	`string?`		Numbering style (`decimal`, `alpha`, `roman`, `none`).

Content: header? (structural | block)*

Structural Nodes¶

`part`¶

A section wrapper representing a chapter, appendix, abstract, or other structural division. Parts form the main outline of the document.

Attribute	Type	Default	Description
`id`	`string?`		Unique node identifier.
`type`	`string`	`"chapter"`	Part type: `chapter`, `abstract`, `bibliography`, `appendix`, `part`, `free`.
`locale`	`string?`		Locale override (e.g. `"de-DE"`).
`numbering`	`string?`		Numbering style.
`placement`	`string?`		Placement in the document.
`role`	`string?`		Semantic role.
`text-direction`	`string?`		`ltr`, `rtl`, or `auto`.
`class`	`string?`		Custom CSS class.
`skipToc`	`boolean`	`false`	Exclude from table of contents.
`pageBreak`	`string?`		`after`, `before`, `right`, `before-and-after`.
`data`	`string?`		Extended settings as JSON string (e.g. alternative titles).

Content: heading? block*

`header`¶

Container that groups the main heading with an optional subtitle. Typically appears once at the top of the document.

Content: heading subtitle?

`subtitle`¶

A subtitle node, usually sitting below the main heading inside a header.

Content: inline*

Block Nodes¶

`heading`¶

Section or chapter title. Heading levels 1–6 are supported.

Attribute	Type	Default	Description
`id`	`string?`		Unique node identifier.
`level`	`number`	`1`	Heading level (1–6).
`type`	`string`	`"chapter"`	Heading type.
`role`	`string?`		Semantic role.
`numbering`	`string?`		Numbering style override.
`placement`	`string?`		Placement hint.
`data`	`string?`		Extended settings as JSON string (e.g. literal numbering).

Content: (text | footnote)* Allowed marks: emphasis, strong, superscript, subscript, bdi, tags, indexEntry

`paragraph`¶

Standard block of inline content.

Attribute	Type	Description
`id`	`string?`	Unique node identifier.
`text-align`	`string?`	`left`, `right`, `center`, `justify`.
`text-direction`	`string?`	`ltr`, `rtl`, or `auto`.
`class`	`string?`	Custom CSS class.

Content: inline*

`reference`¶

A bibliography entry rendered as a paragraph-like block. Similar to paragraph but carries an additional refId linking it to a SnapshotReference.

Attribute	Type	Description
`id`	`string?`	Unique node identifier.
`refId`	`string?`	Reference to a `SnapshotReference.id`.
`text-align`	`string?`	Text alignment.
`text-direction`	`string?`	Text direction.
`class`	`string?`	Custom CSS class.

Content: inline*

The inline content of a reference node holds the displayed bibliography text for this entry. The rawReference field on the linked SnapshotReference serves as an interchange format for this content — it represents the same text in systems that do not use ProseMirror nodes.

How content and rawReference interact:

Import: When a reference is inserted into the document, the rawReference string (if non-empty) is used as the initial inline content of the reference node.
CSL generation: If rawReference is empty (or the user clears it), a citation processor can generate the bibliography text from the structured CSL-JSON data on the SnapshotReference instead.
User edits: The user may freely edit the inline content of a reference node after it is created — the displayed text is always what the node contains.
Export: On export, the current inline content of the reference node is extracted back into the rawReference field, so downstream consumers receive the latest text.

This means rawReference and the node's inline content are two representations of the same value: rawReference is the portable string form, the node content is the live ProseMirror form.

The reference node is the bibliography-side counterpart to the inline citation node. Both carry text content that can be generated or user-provided: a reference holds the full bibliography string (e.g. "Hughes, T.P. (2018). Global warming transforms coral reef assemblages. Nature, 556, 492–496."), while a citation holds the in-text marker (e.g. "(Hughes, 2018)"). The rawReference field on the SnapshotReference is the portable form of the reference text; the citation equivalent is simply the citation node's inline content — there is no separate rawCitation field.

`figure`¶

A figure wrapper that can host an image, a table, or a code block together with a caption. The type attribute determines the variant:

`type` value	Content model	Description
`"figure"`	Image via `src` + caption	Standard image figure. The image is rendered from the `src` attribute; the optional `table`/`code_block` child is not used.
`"native-table"`	`table` node + caption	Table wrapped in a figure. The `src` attribute is empty. The figure contains a ProseMirror `table` node as a direct child, giving it a caption and cross-reference support. Created automatically when pasting HTML tables.

A figure can also wrap a code_block instead of a table (e.g. for captioned code listings).

No sub-figure support

Figures cannot be nested. The content expression does not allow a figure inside another figure, so sub-figures (e.g. a grid of images with individual sub-captions) are not currently supported.

Attribute	Type	Default	Description
`id`	`string?`		Unique node identifier.
`src`	`string`	`""`	Image source URL. Empty string `""` for `native-table` figures (not `null`).
`alt`	`string`	`""`	Alternative text. Empty string `""` when not yet provided.
`width`	`string?`		Image width in pixels.
`height`	`string?`		Image height in pixels.
`title`	`string?`		Figure title (tooltip / export metadata).
`type`	`string`	`"figure"`	Figure variant: `"figure"` or `"native-table"`.
`environment`	`string?`		LaTeX environment name (e.g. `"figure*"` for two-column spanning).
`orientation`	`string`	`"portrait"`	`portrait` or `landscape`.
`decorative`	`string?`		Marks purely decorative images (skipped by screen readers).
`scale-width`	`number`	`1`	Scale factor (0–1) controlling how much of the text width the figure occupies.
`float-placement`	`string?`		Float placement hint for typesetting (e.g. `"top"`, `"bottom"`, `"here"`).
`float-reference`	`string?`		Float reference frame.
`float-defer-page`	`string?`		Deferred page placement.
`float-modifier`	`string?`		Float modifier.

Content: (table | code_block)? caption

`caption`¶

Container for caption content beneath a figure. The first paragraph serves as the actual caption text, while any subsequent paragraphs become figure notes.

Content: label? block*

`label`¶

Optional custom label for the figure environment (e.g. "Map", "Example", "Plate"). Overrides the default label derived from the figure type. Since figures are auto-numbered, this should contain only the label word, not a number. Rarely needed — the default labels (Figure, Table, etc.) are almost always appropriate.

Content: text*

`blockquote`¶

Quoted passage containing one or more blocks.

Attribute	Type	Description
`id`	`string?`	Unique identifier.
`lang`	`string?`	Language of quote.

Content: block+

`code_block`¶

Block-level code snippet preserving formatting.

Attribute	Type	Default	Description
`id`	`string?`		Unique identifier.
`text`	`string`	`""`	Code content.
`type`	`string`	`"code"`	Block type.
`language`	`string`	`"text/plain"`	Language MIME type or identifier.

Content: text* (no marks)

`bullet_list` / `ordered_list`¶

Standard list containers.

ordered_list carries an order attribute (starting number, default 1).

Content: list_item+

`list_item`¶

Single item within a list.

Content: block*

`table`¶

Table container generated by prosemirror-tables. Contains table_row nodes, each containing table_cell or table_header nodes.

Node	Attributes
`table`	`id`
`table_row`	`id`
`table_cell`	`colspan`, `rowspan`, `colwidth` (array), `background` (colour)
`table_header`	Same as `table_cell`

Table cells contain: (paragraph | ordered_list | bullet_list | figure | blockquote)*

`pageBreak`¶

An explicit page break marker. No attributes, no content.

`horizontal_rule`¶

A horizontal separator. No attributes, no content.

`placeHolder`¶

Placeholder node for assets injected later (e.g. logos in templates).

Attribute	Type	Default	Description
`id`	`string?`		Unique identifier.
`type`	`string`	`"logo"`	Placeholder type.
`label`	`string`	`"Logo"`	Display label.

Inline Nodes¶

Inline nodes appear inside paragraphs and other text-containing blocks.

`text`¶

Plain text leaf. Carries a text string and optional marks array.

{ "type": "text", "text": "Hello world", "marks": [{ "type": "strong" }] }

`hard_break`¶

Forced line break within a paragraph. No attributes.

`image`¶

Inline image.

Attribute	Type	Description
`id`	`string?`	Unique identifier.
`src`	`string?`	Image source URL.
`alt`	`string?`	Alternative text.
`title`	`string?`	Image title.
`width`	`string?`	Width.
`height`	`string?`	Height.
`metaData`	`string?`	Opaque metadata for downstream processing.
`decorative`	`string?`	Marks image as decorative (`role="presentation"`).

`math`¶

Inline or display math expression.

Attribute	Type	Default	Description
`id`	`string?`		Unique identifier.
`tex`	`string`	`""`	TeX source.
`style`	`string`	`"inline"`	`inline` or `display`.
`label`	`string?`		Equation label for cross-referencing.

Content: text* (the TeX source as text)

`citation`¶

Inline citation placeholder linking to one or more bibliography references. A single citation node can group multiple references (e.g. [1, 2]).

Attribute	Type	Default	Description
`id`	`string?`		Unique identifier.
`source`	`string?`	`null`	URI-encoded JSON array of CSL citation items (see below). `null` when unresolved.
`style`	`string`	`"apa"`	Citation style key, e.g. `"apa"`, `"chicago-author-date"`, `"ieee"`.
`citationMode`	`"generated" \\| "custom"`	`"generated"`	Controls whether the displayed citation text is generated from `source` + `style` or preserved from the node's inline content.

Content: inline*

The citation node carries inline content that represents the displayed citation text. This content can originate in two ways:

Generated from a CSL style — when citationMode is "generated" (or missing on older documents), a citation processor uses the source attribute and the referenced SnapshotReference data to produce formatted text according to the chosen style (e.g. "(Hughes, 2018)" for APA, "[1]" for IEEE).
User-provided — when citationMode is "custom", the node's inline content is treated as the canonical display text and preserved as-is (e.g. "123" or "see Mola 2015").

Backward compatibility rule: if citationMode is absent, consumers should treat it as "generated".

When serialized to HTML, custom citations add data-citation-mode="custom" to the <cite> element. Generated citations omit that attribute entirely, so a missing data-citation-mode must also be interpreted as "generated".

The source field is serialised with encodeURI(JSON.stringify(items)). Each item follows the CSL-JSON citation item structure (see also the CSL specification):

Field	Type	Required	Description
`id`	`string`	Yes	Reference to a `SnapshotReference.id`.
`prefix`	`string`	No	Text rendered before the citation (e.g. `"see "`).
`suffix`	`string`	No	Text rendered after the citation.
`locator`	`string`	No	Page, chapter, or other locator.
`label`	`string`	No	Locator type label (e.g. `"page"`, `"chapter"`).

A decoded source value looks like:

[
  { "id": "ref-hughes-2018", "prefix": "see " },
  { "id": "ref-smith-2020" }
]

`footnote`¶

Inline footnote atom storing footnote content.

Attribute	Type	Default	Description
`id`	`string?`		Unique identifier.
`type`	`string`	`"footnote"`	Footnote type.

Content: inline*

`link`¶

Cross-reference node rendered as an anchor.

Attribute	Type	Description
`id`	`string?`	Unique identifier.
`type`	`string?`	Link type (e.g. `"xref"`).
`href`	`string?`	Target reference.
`reference-format`	`string?`	How the reference should be rendered.

Content: text*

Example Documents¶

The following JSON examples show the toJSON() output format that the schema describes. You can validate them against manuscript.schema.json or manuscript-snapshot.schema.json.

Minimal (Flat) Document¶

The smallest valid document — a single paragraph directly inside doc, with no header or parts. This flat structure is perfectly valid; not every document needs sections:

{
  "type": "doc",
  "attrs": {
    "type": "article",
    "lang": "en-US",
    "schema": null,
    "pageBreak": null,
    "placement": null,
    "numbering": null
  },
  "content": [
    {
      "type": "paragraph",
      "attrs": { "id": "p1", "text-align": null, "text-direction": null, "class": null },
      "content": [
        { "type": "text", "text": "Hello, world!" }
      ]
    }
  ]
}

A flat document can contain any block nodes — paragraphs, headings, figures, code blocks, lists — without wrapping them in part nodes. Parts only become necessary when the document needs distinct sections (chapters, appendix, abstract, etc.).

Document with Inline Formatting¶

Text nodes carry a marks array for bold, italic, links, etc.:

{
  "type": "doc",
  "attrs": { "type": "article", "lang": "en-US", "schema": null, "pageBreak": null, "placement": null, "numbering": null },
  "content": [
    {
      "type": "paragraph",
      "attrs": { "id": "p1", "text-align": null, "text-direction": null, "class": null },
      "content": [
        { "type": "text", "text": "This is " },
        { "type": "text", "text": "bold", "marks": [{ "type": "strong" }] },
        { "type": "text", "text": " and " },
        { "type": "text", "text": "italic", "marks": [{ "type": "em" }] },
        { "type": "text", "text": " text with a " },
        {
          "type": "text",
          "text": "hyperlink",
          "marks": [{ "type": "anchor", "attrs": { "href": "https://example.com", "title": null, "id": null } }]
        },
        { "type": "text", "text": "." }
      ]
    }
  ]
}

Structured Manuscript¶

A typical manuscript with header, chapters, a citation, and a figure:

{
  "type": "doc",
  "attrs": { "type": "article", "lang": "en-US", "schema": null, "pageBreak": null, "placement": null, "numbering": null },
  "content": [
    {
      "type": "header",
      "content": [
        {
          "type": "heading",
          "attrs": { "id": "h-title", "level": 1, "type": "chapter", "role": null, "numbering": null, "placement": null, "data": null },
          "content": [{ "type": "text", "text": "Climate Effects on Marine Biodiversity" }]
        },
        {
          "type": "subtitle",
          "content": [{ "type": "text", "text": "A Systematic Review" }]
        }
      ]
    },
    {
      "type": "part",
      "attrs": {
        "id": "sec-intro", "type": "chapter", "locale": null, "numbering": null,
        "placement": null, "role": null, "text-direction": null, "class": null,
        "skipToc": false, "pageBreak": null, "data": null
      },
      "content": [
        {
          "type": "heading",
          "attrs": { "id": "h-intro", "level": 1, "type": "chapter", "role": null, "numbering": null, "placement": null, "data": null },
          "content": [{ "type": "text", "text": "Introduction" }]
        },
        {
          "type": "paragraph",
          "attrs": { "id": "p-intro1", "text-align": null, "text-direction": null, "class": null },
          "content": [
            { "type": "text", "text": "Rising ocean temperatures have been shown to impact coral reef ecosystems " },
            {
              "type": "citation",
              "attrs": { "id": "cite-1", "source": "%5B%7B%22id%22%3A%22ref-hughes-2018%22%7D%5D", "style": "apa" }
            },
            { "type": "text", "text": "." }
          ]
        }
      ]
    },
    {
      "type": "part",
      "attrs": {
        "id": "sec-methods", "type": "chapter", "locale": null, "numbering": null,
        "placement": null, "role": null, "text-direction": null, "class": null,
        "skipToc": false, "pageBreak": null, "data": null
      },
      "content": [
        {
          "type": "heading",
          "attrs": { "id": "h-methods", "level": 1, "type": "chapter", "role": null, "numbering": null, "placement": null, "data": null },
          "content": [{ "type": "text", "text": "Methods" }]
        },
        {
          "type": "paragraph",
          "attrs": { "id": "p-methods1", "text-align": null, "text-direction": null, "class": null },
          "content": [
            { "type": "text", "text": "We analysed satellite data using the equation " },
            {
              "type": "math",
              "attrs": { "id": "eq-1", "tex": "\\Delta T = \\frac{Q}{mc}", "style": "inline", "label": null }
            },
            { "type": "text", "text": " to derive temperature change." }
          ]
        },
        {
          "type": "figure",
          "attrs": {
            "id": "fig-map", "src": "map.png", "alt": "Ocean temperature map",
            "width": null, "height": null, "title": null, "type": "figure",
            "environment": null, "orientation": "portrait", "decorative": null,
            "scale-width": 0.8, "float-placement": null, "float-reference": null,
            "float-defer-page": null, "float-modifier": null
          },
          "content": [
            {
              "type": "caption",
              "content": [
                {
                  "type": "label",
                  "content": [{ "type": "text", "text": "Map" }]
                },
                {
                  "type": "paragraph",
                  "attrs": { "id": "p-cap1", "text-align": null, "text-direction": null, "class": null },
                  "content": [{ "type": "text", "text": "Global sea surface temperature anomalies (2020)." }]
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}

Full Snapshot¶

A complete snapshot wrapping the document together with files and references:

{
  "doc": {
    "type": "doc",
    "attrs": { "type": "article", "lang": "en-US", "schema": null, "pageBreak": null, "placement": null, "numbering": null },
    "content": [
      {
        "type": "paragraph",
        "attrs": { "id": "p1", "text-align": null, "text-direction": null, "class": null },
        "content": [
          { "type": "text", "text": "Ocean temperatures are rising " },
          { "type": "citation", "attrs": { "id": "cite-1", "source": "%5B%7B%22id%22%3A%22ref-hughes-2018%22%7D%5D", "style": "apa" } },
          { "type": "text", "text": "." }
        ]
      },
      {
        "type": "figure",
        "attrs": {
          "id": "fig-1", "src": "asset-map-001", "alt": "Temperature map",
          "width": null, "height": null, "title": null, "type": "figure",
          "environment": null, "orientation": "portrait", "decorative": null,
          "scale-width": 1, "float-placement": null, "float-reference": null,
          "float-defer-page": null, "float-modifier": null
        },
        "content": [
          {
            "type": "caption",
            "content": [
              {
                "type": "paragraph",
                "attrs": { "id": "p-cap", "text-align": null, "text-direction": null, "class": null },
                "content": [{ "type": "text", "text": "Sea surface temperatures." }]
              }
            ]
          }
        ]
      }
    ]
  },
  "version": 42,
  "selection": { "anchor": 12, "head": 12 },
  "files": [
    {
      "id": "asset-map-001",
      "type": "image",
      "url": "https://cdn.example.com/images/map-full.png",
      "previewSrc": "data:image/png;base64,iVBOR...",
      "mimeType": "image/png",
      "name": "map-full.png",
      "dimensions": { "width": 1200, "height": 800 }
    }
  ],
  "references": [
    {
      "id": "ref-hughes-2018",
      "rawReference": "{\"type\":\"article-journal\",\"title\":\"Global warming transforms coral reef assemblages\",\"author\":[{\"family\":\"Hughes\",\"given\":\"T.P.\"}],\"issued\":{\"date-parts\":[[2018]]}}",
      "mimeType": "application/vnd.citationstyles.csl+json"
    }
  ]
}

Table with Merged Cells¶

A figure wrapping a table that has merged header cells:

{
  "type": "figure",
  "attrs": {
    "id": "fig-tbl", "src": "", "alt": "", "width": null, "height": null,
    "title": null, "type": "native-table", "environment": null, "orientation": "portrait",
    "decorative": null, "scale-width": 1, "float-placement": null,
    "float-reference": null, "float-defer-page": null, "float-modifier": null
  },
  "content": [
    {
      "type": "table",
      "attrs": { "id": "tbl-1" },
      "content": [
        {
          "type": "table_row",
          "attrs": { "id": "tr-1" },
          "content": [
            {
              "type": "table_header",
              "attrs": { "colspan": 2, "rowspan": 1, "colwidth": null, "background": null },
              "content": [
                {
                  "type": "paragraph",
                  "attrs": { "id": "p-th1", "text-align": null, "text-direction": null, "class": null },
                  "content": [{ "type": "text", "text": "Region", "marks": [{ "type": "strong" }] }]
                }
              ]
            },
            {
              "type": "table_header",
              "attrs": { "colspan": 1, "rowspan": 1, "colwidth": null, "background": null },
              "content": [
                {
                  "type": "paragraph",
                  "attrs": { "id": "p-th2", "text-align": null, "text-direction": null, "class": null },
                  "content": [{ "type": "text", "text": "Temperature (°C)", "marks": [{ "type": "strong" }] }]
                }
              ]
            }
          ]
        },
        {
          "type": "table_row",
          "attrs": { "id": "tr-2" },
          "content": [
            {
              "type": "table_cell",
              "attrs": { "colspan": 1, "rowspan": 1, "colwidth": null, "background": null },
              "content": [
                {
                  "type": "paragraph",
                  "attrs": { "id": "p-c1", "text-align": null, "text-direction": null, "class": null },
                  "content": [{ "type": "text", "text": "Pacific" }]
                }
              ]
            },
            {
              "type": "table_cell",
              "attrs": { "colspan": 1, "rowspan": 1, "colwidth": null, "background": null },
              "content": [
                {
                  "type": "paragraph",
                  "attrs": { "id": "p-c2", "text-align": null, "text-direction": null, "class": null },
                  "content": [{ "type": "text", "text": "North" }]
                }
              ]
            },
            {
              "type": "table_cell",
              "attrs": { "colspan": 1, "rowspan": 1, "colwidth": null, "background": null },
              "content": [
                {
                  "type": "paragraph",
                  "attrs": { "id": "p-c3", "text-align": null, "text-direction": null, "class": null },
                  "content": [{ "type": "text", "text": "18.2" }]
                }
              ]
            }
          ]
        }
      ]
    },
    {
      "type": "caption",
      "content": [
        {
          "type": "label",
          "content": [{ "type": "text", "text": "Table 1" }]
        },
        {
          "type": "paragraph",
          "attrs": { "id": "p-tcap", "text-align": null, "text-direction": null, "class": null },
          "content": [{ "type": "text", "text": "Average sea surface temperatures by region." }]
        }
      ]
    }
  ]
}

Footnotes and Code Blocks¶

A paragraph with a footnote, followed by a code block:

{
  "type": "doc",
  "attrs": { "type": "article", "lang": "en-US", "schema": null, "pageBreak": null, "placement": null, "numbering": null },
  "content": [
    {
      "type": "paragraph",
      "attrs": { "id": "p1", "text-align": null, "text-direction": null, "class": null },
      "content": [
        { "type": "text", "text": "This claim requires clarification" },
        {
          "type": "footnote",
          "attrs": { "id": "fn-1", "type": "footnote" },
          "content": [
            { "type": "text", "text": "See appendix B for the full derivation." }
          ]
        },
        { "type": "text", "text": "." }
      ]
    },
    {
      "type": "code_block",
      "attrs": { "id": "cb-1", "text": "", "type": "code", "language": "application/x-python" },
      "content": [
        { "type": "text", "text": "import numpy as np\n\ndef temperature_anomaly(data):\n    return data - np.mean(data)" }
      ]
    }
  ]
}

Marks¶

Marks represent inline formatting — emphasis, strong, links, and so on. Unlike HTML, where inline formatting creates nested elements (<strong><em>text</em></strong>), ProseMirror uses a flat model: each text node carries an array of active marks as metadata. This avoids the ambiguity of overlapping or arbitrarily nested markup and makes positions simple character offsets rather than tree paths. See ProseMirror Guide — Document Structure for the rationale behind this design.

For example, the text "bold and italic" is represented as three text nodes, not nested elements:

[
  { "type": "text", "text": "bold ", "marks": [{ "type": "strong" }] },
  { "type": "text", "text": "and italic", "marks": [{ "type": "strong" }, { "type": "em" }] }
]

Mark Ordering¶

The order of marks in the marks array is not arbitrary — it is determined by the order in which marks are declared in the schema definition. ProseMirror sorts marks by their rank (position in the schema's marks object) so that every document has exactly one canonical JSON representation. This means:

Two documents with the same logical content always produce identical JSON when serialised with toJSON().
Adjacent text nodes with the same mark set are automatically merged.
Empty text nodes are never emitted.

In this schema, the mark order is: em, strong, sup, sub, bdi, anchor, tags, indexEntry. When a text node carries multiple marks, they always appear in this order regardless of the sequence in which they were applied. This deterministic representation is important for storage, comparison, and collaborative editing.

classDiagram
    direction LR
    class Mark {
        <<interface>>
        +type : string
        +attrs? : object
    }
    class em
    class strong
    class sup
    class sub
    class bdi
    class anchor {
        +href : string
        +title? : string
        +id? : string
    }
    class tags {
        +tags : ~key: string~[]
    }
    class indexEntry {
        +id? : string
        +entries : ~raw?: string~[]
        +attributes : object
    }
    Mark <|-- em
    Mark <|-- strong
    Mark <|-- sup
    Mark <|-- sub
    Mark <|-- bdi
    Mark <|-- anchor
    Mark <|-- tags
    Mark <|-- indexEntry

Mark	Attributes	Description
`em`	—	Emphasis (italic).
`strong`	—	Strong (bold).
`sup`	—	Superscript.
`sub`	—	Subscript.
`bdi`	—	Bidirectional isolate.
`anchor`	`href`, `title`, `id`	Hyperlink.
`tags`	`tags` (array of `{ key }`)	Semantic tags applied to text.
`indexEntry`	`id`, `entries` (array of `{ raw? }`), `attributes`	Back-of-book index entry.

Node Groups¶

ProseMirror content expressions reference groups rather than individual node types. For example, block+ means "one or more nodes belonging to the block group". The groups in the manuscript schema:

Group	Member nodes
`block`	`paragraph`, `reference`, `heading`, `figure`, `code_block`, `blockquote`, `pageBreak`, `placeHolder`, `horizontal_rule`, `bullet_list`, `ordered_list`, `table`
`structural`	`part`
`inline`	`text`, `hard_break`, `image`, `math`, `citation`, `footnote`, `link`

Compatibility Notes¶

The schema round-trips to SciFlow document snapshots and is compatible with the typesetting pipeline.
The JSON format matches ProseMirror's Node.toJSON() output. Documents can be loaded back with Node.fromJSON().
Aligns with JATS semantics but isn't a 1:1 copy; e.g., figures carry float metadata to simplify publishing workflows.
When adding new nodes or marks, update both the schema package and the editor features that surface them.
The JSON Schema files are regenerated from the ProseMirror schema — they are always in sync with the TypeScript source of truth.

Feature availability in the start bundle

The default @sciflow/editor-start feature set enables citations, cross references, footnotes, inline formatting, headings, figures, tables, and math. To render equations as SVG, add the MathJax script tag to your page (see Troubleshooting). You can change the active features via editor.configureFeatures(...) (see Extending Features).

Schema Reference¶

Document Snapshot¶

Files¶

References¶

Document Node Tree¶

Attribute value conventions¶

Root: doc¶

Structural Nodes¶

part¶

header¶

subtitle¶

Block Nodes¶

heading¶

paragraph¶

reference¶

figure¶

caption¶

label¶

blockquote¶

code_block¶

bullet_list / ordered_list¶

list_item¶

table¶

pageBreak¶

horizontal_rule¶

placeHolder¶

Inline Nodes¶

text¶

hard_break¶

image¶

math¶

citation¶

footnote¶

link¶

Example Documents¶

Minimal (Flat) Document¶

Document with Inline Formatting¶

Structured Manuscript¶

Full Snapshot¶

Table with Merged Cells¶

Footnotes and Code Blocks¶

Marks¶

Mark Ordering¶

Node Groups¶

Compatibility Notes¶

Root: `doc`¶

`part`¶

`header`¶

`subtitle`¶

`heading`¶

`paragraph`¶

`reference`¶

`figure`¶

`caption`¶

`label`¶

`blockquote`¶

`code_block`¶

`bullet_list` / `ordered_list`¶

`list_item`¶

`table`¶

`pageBreak`¶

`horizontal_rule`¶

`placeHolder`¶

`text`¶

`hard_break`¶

`image`¶

`math`¶

`citation`¶

`footnote`¶

`link`¶