Schema Reference¶
SciFlow's manuscript schema defines the structure of documents created and consumed by the editor, exporters (JATS, DOCX, etc.), and sync layers. It is built on the ProseMirror document model. This page describes both the document node tree (the ProseMirror layer) and the snapshot wrapper that packages the document together with files and references.
- Source package:
@sciflow/schema-prosemirror(packages/schema/prosemirror) - JSON Schema (document):
manuscript.schema.json - JSON Schema (snapshot):
manuscript-snapshot.schema.json
Generating the schemas
The JSON Schema files are generated from the live ProseMirror schema. Regenerate them after any schema change:
Output lives underpackages/schema/prosemirror/dist/.
The following diagram shows a typical structured manuscript as a node tree — structural nodes (green) wrap block-level content (blue), which in turn contains inline nodes (orange). Headers and parts are optional; a document can also be completely flat (just paragraphs and other block nodes directly inside doc).
graph TD
doc["doc"]
header["header"]
h1["heading (level 1)<br/>'My Article'"]
sub["subtitle<br/>'A Subtitle'"]
part1["part (chapter)"]
h2["heading (level 1)<br/>'Introduction'"]
p1["paragraph"]
t1["text 'Some text with a '"]
cite["citation (source: ref-1, ref-2<br/>locator: p. 2 on ref-2)"]
citetxt["text '(Hughes, 2018; Smith & Lee, 2020, p. 2)'"]:::optionalInline
t2["text ' reference.'"]
part2["part (chapter)"]
h3["heading (level 1)<br/>'Methods'"]
fig["figure"]
cap["caption"]
lbl["label 'Figure'"]:::optional
p2["paragraph 'A diagram.'"]
bib["part (bibliography)"]
bibh["heading (level 1)<br/>'References'"]
ref1["reference (refId: ref-1)"]
reftxt["text 'Hughes et al. (2018) …'"]:::optionalInline
ref2["reference (refId: ref-2)"]
ref2txt["text 'Smith & Lee (2020) …'"]:::optionalInline
doc --> header
header --> h1
header --> sub
doc --> part1
part1 --> h2
part1 --> p1
p1 --> t1
p1 --> cite
cite --> citetxt
p1 --> t2
doc --> part2
part2 --> h3
part2 --> fig
fig --> cap
cap --> lbl
cap --> p2
doc --> bib
bib --> bibh
bib --> ref1
ref1 --> reftxt
bib --> ref2
ref2 --> ref2txt
classDef structural fill:#e8f5e9,stroke:#388e3c
classDef block fill:#e3f2fd,stroke:#1565c0
classDef inline fill:#fff3e0,stroke:#e65100
classDef optional fill:#e3f2fd,stroke:#1565c0,stroke-dasharray: 5 5
classDef optionalInline fill:#fff3e0,stroke:#e65100,stroke-dasharray: 5 5
class doc,header structural
class part1,part2,bib structural
class h1,h2,h3,bibh,sub,p1,p2,fig,cap,ref1,ref2 block
class t1,t2,cite inline
Document Snapshot¶
A snapshot is the top-level object used for persistence and sync. It wraps the ProseMirror document together with all side-channel data.
classDiagram
class Snapshot {
+doc : Document
+version? : number
+selection? : Selection
+files? : SnapshotFile[]
+references? : SnapshotReference[]
}
class Selection {
+anchor : number
+head : number
}
class SnapshotFile {
+id : string
+type? : string
+url? : string
+previewSrc? : string
+mimeType? : string
+name? : string
+dimensions? : Dimensions
}
class SnapshotReference {
+id : string
+rawReference : string
+mimeType? : string
}
Snapshot *-- "1" Document : doc
Snapshot *-- "0..1" Selection : selection
Snapshot o-- "0..*" SnapshotFile : files
Snapshot o-- "0..*" SnapshotReference : references
| Field | Type | Required | Description |
|---|---|---|---|
doc |
Document node (see below) | Yes | The ProseMirror document tree in toJSON() format. |
version |
number |
No | Document version counter, incremented on each change. |
selection |
Selection | No | Cursor/selection state (anchor and head positions). |
files |
SnapshotFile[] |
No | Files and assets referenced by the document. |
references |
SnapshotReference[] |
No | Bibliography and citation references. |
Files¶
Each SnapshotFile describes an asset (typically an image) referenced by the document.
| Field | Type | Required | Description |
|---|---|---|---|
id |
string |
Yes | Unique asset identifier (can align with figure node ids). |
type |
string |
No | Resource type, e.g. "image". |
url |
string |
No | Full-resolution URL for fetching the asset. |
previewSrc |
string |
No | Preview-sized URL or data URI for in-doc rendering. |
mimeType |
string |
No | MIME type of the resource. |
name |
string |
No | Human-readable file name. |
dimensions |
{ width, height } |
No | Asset dimensions (string or number). |
References¶
Each SnapshotReference represents a bibliography entry or citation source.
| Field | Type | Required | Description |
|---|---|---|---|
id |
string |
Yes | Unique reference identifier. |
rawReference |
string |
Yes | The bibliography string for this reference — the full entry as it appears in a reference list (e.g. "Hughes, T.P. (2018). Global warming transforms coral reef assemblages. Nature, 556, 492–496."). Used as initial content when creating a reference node, and updated from node content on export. May contain lightweight HTML tags such as <i> for italics. May be empty if the bibliography is generated from structured CSL data instead. |
mimeType |
string |
No | Citation format: application/vnd.citationstyles.csl+json or application/vnd.openalex+json. |
rawReference vs raw citation
rawReference is the bibliography string (reference list entry), not an in-text citation. A raw citation is the short inline label such as "(Hughes, 2018)" or "[1]". Some external systems use different field names for the bibliography string — for example, OJS stores it as rawCitation. Map the external field to rawReference on import and back on export. See the OJS Integration Guide for a concrete example.
Use validateDocumentResources(doc, files, references) to check that every figure source and citation in the document resolves against the snapshot's files and references arrays. See the Reference Integration guide for usage examples.
The fields above are the mandatory envelope. Beyond these, the object typically carries the full reference data (title, author, issued, DOI, etc.) as additional properties. The format depends on mimeType:
application/vnd.citationstyles.csl+json— additional fields conform to the CSL-JSON schema (data schema).application/vnd.openalex+json— additional fields follow the OpenAlex Works object format.
Document Node Tree¶
The document is a tree of nodes, each with a type, optional attrs, and optional content (child nodes). Text nodes carry a text string instead of content. Any node or text node may carry marks (inline formatting). See the ProseMirror Guide — Document Structure for a general introduction to this model.
Attribute value conventions¶
The attribute tables below use the following conventions:
- Type column —
stringmeans a non-null string is expected when the attribute is present.string?means the value can be either a string ornull. - Default column — A quoted value like
"article"means ProseMirror uses that default when the attribute is omitted. An empty cell means the attribute defaults tonull. nullvs empty string — These are not interchangeable.nullmeans "not set / not applicable" (the attribute has no value). An empty string""is a valid value — for example,figure.srcdefaults to""(meaning the figure wraps a table or code block rather than an image), andfigure.altdefaults to""(no alt text yet). Consumers should treatnullas absent and""as an intentional empty value.- Omitting attributes — ProseMirror fills in defaults for omitted attributes when parsing. In serialised JSON, attributes may be present with their default value or omitted entirely — both are valid. However, attributes that the schema defines (even optional ones) will always appear in
toJSON()output with either their value ornull.
classDiagram
direction TB
class doc {
+type : "article"
+lang? : string
+schema? : string
}
class header {
heading + subtitle?
}
class part {
+id : string
+type : PartType
+locale? : string
heading? block*
}
class heading {
+id : string
+level : 1-6
+type : string
text | footnote
}
class paragraph {
+id : string
+text-align? : string
inline*
}
class figure {
+id : string
+src : string
+alt : string
+scale-width : number
table|code_block? caption
}
class table {
+id : string
table_row+
}
class blockquote {
+id : string
block+
}
class code_block {
+id : string
+language : string
text*
}
class bullet_list {
list_item+
}
class ordered_list {
+order : number
list_item+
}
class text {
+text : string
+marks? : Mark[]
}
class math {
+id : string
+tex : string
+style : inline|display
}
class citation {
+id : string
+source : string
+style : string
}
class footnote {
+id : string
inline*
}
class image {
+id : string
+src : string
+alt? : string
}
doc *-- "0..1" header
doc *-- "0..*" part : structural
doc *-- "0..*" paragraph : block
doc *-- "0..*" heading : block
doc *-- "0..*" figure : block
doc *-- "0..*" table : block
doc *-- "0..*" blockquote : block
doc *-- "0..*" code_block : block
doc *-- "0..*" bullet_list : block
doc *-- "0..*" ordered_list : block
part *-- "0..1" heading
part *-- "0..*" paragraph
paragraph *-- "0..*" text : inline
paragraph *-- "0..*" math : inline
paragraph *-- "0..*" citation : inline
paragraph *-- "0..*" footnote : inline
paragraph *-- "0..*" image : inline
Root: doc¶
The root node wraps an optional header followed by any mix of structural (part) and block-level nodes. Both the header and parts are optional — a document can be completely flat, containing only paragraphs and other block nodes as direct children.
| Attribute | Type | Default | Description |
|---|---|---|---|
type |
string |
"article" |
Document type. |
lang |
string? |
Document language (e.g. "en-US"). |
|
role |
string? |
Semantic role of the document. | |
schema |
string? |
Schema version identifier. | |
pageBreak |
string? |
Page break behaviour. | |
placement |
string? |
Placement hint (cover, front, body, back). |
|
numbering |
string? |
Numbering style (decimal, alpha, roman, none). |
Content: header? (structural | block)*
Structural Nodes¶
part¶
A section wrapper representing a chapter, appendix, abstract, or other structural division. Parts form the main outline of the document.
| Attribute | Type | Default | Description |
|---|---|---|---|
id |
string? |
Unique node identifier. | |
type |
string |
"chapter" |
Part type: chapter, abstract, bibliography, appendix, part, free. |
locale |
string? |
Locale override (e.g. "de-DE"). |
|
numbering |
string? |
Numbering style. | |
placement |
string? |
Placement in the document. | |
role |
string? |
Semantic role. | |
text-direction |
string? |
ltr, rtl, or auto. |
|
class |
string? |
Custom CSS class. | |
skipToc |
boolean |
false |
Exclude from table of contents. |
pageBreak |
string? |
after, before, right, before-and-after. |
|
data |
string? |
Extended settings as JSON string (e.g. alternative titles). |
Content: heading? block*
header¶
Container that groups the main heading with an optional subtitle. Typically appears once at the top of the document.
Content: heading subtitle?
subtitle¶
A subtitle node, usually sitting below the main heading inside a header.
Content: inline*
Block Nodes¶
heading¶
Section or chapter title. Heading levels 1–6 are supported.
| Attribute | Type | Default | Description |
|---|---|---|---|
id |
string? |
Unique node identifier. | |
level |
number |
1 |
Heading level (1–6). |
type |
string |
"chapter" |
Heading type. |
role |
string? |
Semantic role. | |
numbering |
string? |
Numbering style override. | |
placement |
string? |
Placement hint. | |
data |
string? |
Extended settings as JSON string (e.g. literal numbering). |
Content: (text | footnote)*
Allowed marks: emphasis, strong, superscript, subscript, bdi, tags, indexEntry
paragraph¶
Standard block of inline content.
| Attribute | Type | Description |
|---|---|---|
id |
string? |
Unique node identifier. |
text-align |
string? |
left, right, center, justify. |
text-direction |
string? |
ltr, rtl, or auto. |
class |
string? |
Custom CSS class. |
Content: inline*
reference¶
A bibliography entry rendered as a paragraph-like block. Similar to paragraph but carries an additional refId linking it to a SnapshotReference.
| Attribute | Type | Description |
|---|---|---|
id |
string? |
Unique node identifier. |
refId |
string? |
Reference to a SnapshotReference.id. |
text-align |
string? |
Text alignment. |
text-direction |
string? |
Text direction. |
class |
string? |
Custom CSS class. |
Content: inline*
The inline content of a reference node holds the displayed bibliography text for this entry. The rawReference field on the linked SnapshotReference serves as an interchange format for this content — it represents the same text in systems that do not use ProseMirror nodes.
How content and rawReference interact:
- Import: When a reference is inserted into the document, the
rawReferencestring (if non-empty) is used as the initial inline content of thereferencenode. - CSL generation: If
rawReferenceis empty (or the user clears it), a citation processor can generate the bibliography text from the structured CSL-JSON data on theSnapshotReferenceinstead. - User edits: The user may freely edit the inline content of a
referencenode after it is created — the displayed text is always what the node contains. - Export: On export, the current inline content of the
referencenode is extracted back into therawReferencefield, so downstream consumers receive the latest text.
This means rawReference and the node's inline content are two representations of the same value: rawReference is the portable string form, the node content is the live ProseMirror form.
The reference node is the bibliography-side counterpart to the inline citation node. Both carry text content that can be generated or user-provided: a reference holds the full bibliography string (e.g. "Hughes, T.P. (2018). Global warming transforms coral reef assemblages. Nature, 556, 492–496."), while a citation holds the in-text marker (e.g. "(Hughes, 2018)"). The rawReference field on the SnapshotReference is the portable form of the reference text; the citation equivalent is simply the citation node's inline content — there is no separate rawCitation field.
figure¶
A figure wrapper that can host an image, a table, or a code block together with a caption. The type attribute determines the variant:
type value |
Content model | Description |
|---|---|---|
"figure" |
Image via src + caption |
Standard image figure. The image is rendered from the src attribute; the optional table/code_block child is not used. |
"native-table" |
table node + caption |
Table wrapped in a figure. The src attribute is empty. The figure contains a ProseMirror table node as a direct child, giving it a caption and cross-reference support. Created automatically when pasting HTML tables. |
A figure can also wrap a code_block instead of a table (e.g. for captioned code listings).
No sub-figure support
Figures cannot be nested. The content expression does not allow a figure inside another figure, so sub-figures (e.g. a grid of images with individual sub-captions) are not currently supported.
| Attribute | Type | Default | Description |
|---|---|---|---|
id |
string? |
Unique node identifier. | |
src |
string |
"" |
Image source URL. Empty string "" for native-table figures (not null). |
alt |
string |
"" |
Alternative text. Empty string "" when not yet provided. |
width |
string? |
Image width in pixels. | |
height |
string? |
Image height in pixels. | |
title |
string? |
Figure title (tooltip / export metadata). | |
type |
string |
"figure" |
Figure variant: "figure" or "native-table". |
environment |
string? |
LaTeX environment name (e.g. "figure*" for two-column spanning). |
|
orientation |
string |
"portrait" |
portrait or landscape. |
decorative |
string? |
Marks purely decorative images (skipped by screen readers). | |
scale-width |
number |
1 |
Scale factor (0–1) controlling how much of the text width the figure occupies. |
float-placement |
string? |
Float placement hint for typesetting (e.g. "top", "bottom", "here"). |
|
float-reference |
string? |
Float reference frame. | |
float-defer-page |
string? |
Deferred page placement. | |
float-modifier |
string? |
Float modifier. |
Content: (table | code_block)? caption
caption¶
Container for caption content beneath a figure. The first paragraph serves as the actual caption text, while any subsequent paragraphs become figure notes.
Content: label? block*
label¶
Optional custom label for the figure environment (e.g. "Map", "Example", "Plate"). Overrides the default label derived from the figure type. Since figures are auto-numbered, this should contain only the label word, not a number. Rarely needed — the default labels (Figure, Table, etc.) are almost always appropriate.
Content: text*
blockquote¶
Quoted passage containing one or more blocks.
| Attribute | Type | Description |
|---|---|---|
id |
string? |
Unique identifier. |
lang |
string? |
Language of quote. |
Content: block+
code_block¶
Block-level code snippet preserving formatting.
| Attribute | Type | Default | Description |
|---|---|---|---|
id |
string? |
Unique identifier. | |
text |
string |
"" |
Code content. |
type |
string |
"code" |
Block type. |
language |
string |
"text/plain" |
Language MIME type or identifier. |
Content: text* (no marks)
bullet_list / ordered_list¶
Standard list containers.
ordered_list carries an order attribute (starting number, default 1).
Content: list_item+
list_item¶
Single item within a list.
Content: block*
table¶
Table container generated by prosemirror-tables. Contains table_row nodes, each containing table_cell or table_header nodes.
| Node | Attributes |
|---|---|
table |
id |
table_row |
id |
table_cell |
colspan, rowspan, colwidth (array), background (colour) |
table_header |
Same as table_cell |
Table cells contain: (paragraph | ordered_list | bullet_list | figure | blockquote)*
pageBreak¶
An explicit page break marker. No attributes, no content.
horizontal_rule¶
A horizontal separator. No attributes, no content.
placeHolder¶
Placeholder node for assets injected later (e.g. logos in templates).
| Attribute | Type | Default | Description |
|---|---|---|---|
id |
string? |
Unique identifier. | |
type |
string |
"logo" |
Placeholder type. |
label |
string |
"Logo" |
Display label. |
Inline Nodes¶
Inline nodes appear inside paragraphs and other text-containing blocks.
text¶
Plain text leaf. Carries a text string and optional marks array.
hard_break¶
Forced line break within a paragraph. No attributes.
image¶
Inline image.
| Attribute | Type | Description |
|---|---|---|
id |
string? |
Unique identifier. |
src |
string? |
Image source URL. |
alt |
string? |
Alternative text. |
title |
string? |
Image title. |
width |
string? |
Width. |
height |
string? |
Height. |
metaData |
string? |
Opaque metadata for downstream processing. |
decorative |
string? |
Marks image as decorative (role="presentation"). |
math¶
Inline or display math expression.
| Attribute | Type | Default | Description |
|---|---|---|---|
id |
string? |
Unique identifier. | |
tex |
string |
"" |
TeX source. |
style |
string |
"inline" |
inline or display. |
label |
string? |
Equation label for cross-referencing. |
Content: text* (the TeX source as text)
citation¶
Inline citation placeholder linking to one or more bibliography references. A single citation node can group multiple references (e.g. [1, 2]).
| Attribute | Type | Default | Description |
|---|---|---|---|
id |
string? |
Unique identifier. | |
source |
string? |
null |
URI-encoded JSON array of CSL citation items (see below). null when unresolved. |
style |
string |
"apa" |
Citation style key, e.g. "apa", "chicago-author-date", "ieee". |
Content: inline*
The citation node carries inline content that represents the displayed citation text. This content can originate in two ways:
- Generated from a CSL style — a citation processor uses the
sourceattribute and the referencedSnapshotReferencedata to produce formatted text according to the chosenstyle(e.g."(Hughes, 2018)"for APA,"[1]"for IEEE). - User-provided — if the user has typed or edited the citation text manually (e.g.
"123"or"see Mola 2015"), that text is preserved as-is in the node content.
Renderers and exporters should use the node's existing text content as the display value. A citation processor may overwrite the content with a style-generated string, but it is not required — keeping the user-provided text is valid.
The source field is serialised with encodeURI(JSON.stringify(items)). Each item follows the CSL-JSON citation item structure (see also the CSL specification):
| Field | Type | Required | Description |
|---|---|---|---|
id |
string |
Yes | Reference to a SnapshotReference.id. |
prefix |
string |
No | Text rendered before the citation (e.g. "see "). |
suffix |
string |
No | Text rendered after the citation. |
locator |
string |
No | Page, chapter, or other locator. |
label |
string |
No | Locator type label (e.g. "page", "chapter"). |
A decoded source value looks like:
footnote¶
Inline footnote atom storing footnote content.
| Attribute | Type | Default | Description |
|---|---|---|---|
id |
string? |
Unique identifier. | |
type |
string |
"footnote" |
Footnote type. |
Content: inline*
link¶
Cross-reference node rendered as an anchor.
| Attribute | Type | Description |
|---|---|---|
id |
string? |
Unique identifier. |
type |
string? |
Link type (e.g. "xref"). |
href |
string? |
Target reference. |
reference-format |
string? |
How the reference should be rendered. |
Content: text*
Example Documents¶
The following JSON examples show the toJSON() output format that the schema describes. You can validate them against manuscript.schema.json or manuscript-snapshot.schema.json.
Minimal (Flat) Document¶
The smallest valid document — a single paragraph directly inside doc, with no header or parts. This flat structure is perfectly valid; not every document needs sections:
{
"type": "doc",
"attrs": {
"type": "article",
"lang": "en-US",
"schema": null,
"pageBreak": null,
"placement": null,
"numbering": null
},
"content": [
{
"type": "paragraph",
"attrs": { "id": "p1", "text-align": null, "text-direction": null, "class": null },
"content": [
{ "type": "text", "text": "Hello, world!" }
]
}
]
}
A flat document can contain any block nodes — paragraphs, headings, figures, code blocks, lists — without wrapping them in part nodes. Parts only become necessary when the document needs distinct sections (chapters, appendix, abstract, etc.).
Document with Inline Formatting¶
Text nodes carry a marks array for bold, italic, links, etc.:
{
"type": "doc",
"attrs": { "type": "article", "lang": "en-US", "schema": null, "pageBreak": null, "placement": null, "numbering": null },
"content": [
{
"type": "paragraph",
"attrs": { "id": "p1", "text-align": null, "text-direction": null, "class": null },
"content": [
{ "type": "text", "text": "This is " },
{ "type": "text", "text": "bold", "marks": [{ "type": "strong" }] },
{ "type": "text", "text": " and " },
{ "type": "text", "text": "italic", "marks": [{ "type": "em" }] },
{ "type": "text", "text": " text with a " },
{
"type": "text",
"text": "hyperlink",
"marks": [{ "type": "anchor", "attrs": { "href": "https://example.com", "title": null, "id": null } }]
},
{ "type": "text", "text": "." }
]
}
]
}
Structured Manuscript¶
A typical manuscript with header, chapters, a citation, and a figure:
{
"type": "doc",
"attrs": { "type": "article", "lang": "en-US", "schema": null, "pageBreak": null, "placement": null, "numbering": null },
"content": [
{
"type": "header",
"content": [
{
"type": "heading",
"attrs": { "id": "h-title", "level": 1, "type": "chapter", "role": null, "numbering": null, "placement": null, "data": null },
"content": [{ "type": "text", "text": "Climate Effects on Marine Biodiversity" }]
},
{
"type": "subtitle",
"content": [{ "type": "text", "text": "A Systematic Review" }]
}
]
},
{
"type": "part",
"attrs": {
"id": "sec-intro", "type": "chapter", "locale": null, "numbering": null,
"placement": null, "role": null, "text-direction": null, "class": null,
"skipToc": false, "pageBreak": null, "data": null
},
"content": [
{
"type": "heading",
"attrs": { "id": "h-intro", "level": 1, "type": "chapter", "role": null, "numbering": null, "placement": null, "data": null },
"content": [{ "type": "text", "text": "Introduction" }]
},
{
"type": "paragraph",
"attrs": { "id": "p-intro1", "text-align": null, "text-direction": null, "class": null },
"content": [
{ "type": "text", "text": "Rising ocean temperatures have been shown to impact coral reef ecosystems " },
{
"type": "citation",
"attrs": { "id": "cite-1", "source": "%5B%7B%22id%22%3A%22ref-hughes-2018%22%7D%5D", "style": "apa" }
},
{ "type": "text", "text": "." }
]
}
]
},
{
"type": "part",
"attrs": {
"id": "sec-methods", "type": "chapter", "locale": null, "numbering": null,
"placement": null, "role": null, "text-direction": null, "class": null,
"skipToc": false, "pageBreak": null, "data": null
},
"content": [
{
"type": "heading",
"attrs": { "id": "h-methods", "level": 1, "type": "chapter", "role": null, "numbering": null, "placement": null, "data": null },
"content": [{ "type": "text", "text": "Methods" }]
},
{
"type": "paragraph",
"attrs": { "id": "p-methods1", "text-align": null, "text-direction": null, "class": null },
"content": [
{ "type": "text", "text": "We analysed satellite data using the equation " },
{
"type": "math",
"attrs": { "id": "eq-1", "tex": "\\Delta T = \\frac{Q}{mc}", "style": "inline", "label": null }
},
{ "type": "text", "text": " to derive temperature change." }
]
},
{
"type": "figure",
"attrs": {
"id": "fig-map", "src": "map.png", "alt": "Ocean temperature map",
"width": null, "height": null, "title": null, "type": "figure",
"environment": null, "orientation": "portrait", "decorative": null,
"scale-width": 0.8, "float-placement": null, "float-reference": null,
"float-defer-page": null, "float-modifier": null
},
"content": [
{
"type": "caption",
"content": [
{
"type": "label",
"content": [{ "type": "text", "text": "Map" }]
},
{
"type": "paragraph",
"attrs": { "id": "p-cap1", "text-align": null, "text-direction": null, "class": null },
"content": [{ "type": "text", "text": "Global sea surface temperature anomalies (2020)." }]
}
]
}
]
}
]
}
]
}
Full Snapshot¶
A complete snapshot wrapping the document together with files and references:
{
"doc": {
"type": "doc",
"attrs": { "type": "article", "lang": "en-US", "schema": null, "pageBreak": null, "placement": null, "numbering": null },
"content": [
{
"type": "paragraph",
"attrs": { "id": "p1", "text-align": null, "text-direction": null, "class": null },
"content": [
{ "type": "text", "text": "Ocean temperatures are rising " },
{ "type": "citation", "attrs": { "id": "cite-1", "source": "%5B%7B%22id%22%3A%22ref-hughes-2018%22%7D%5D", "style": "apa" } },
{ "type": "text", "text": "." }
]
},
{
"type": "figure",
"attrs": {
"id": "fig-1", "src": "asset-map-001", "alt": "Temperature map",
"width": null, "height": null, "title": null, "type": "figure",
"environment": null, "orientation": "portrait", "decorative": null,
"scale-width": 1, "float-placement": null, "float-reference": null,
"float-defer-page": null, "float-modifier": null
},
"content": [
{
"type": "caption",
"content": [
{
"type": "paragraph",
"attrs": { "id": "p-cap", "text-align": null, "text-direction": null, "class": null },
"content": [{ "type": "text", "text": "Sea surface temperatures." }]
}
]
}
]
}
]
},
"version": 42,
"selection": { "anchor": 12, "head": 12 },
"files": [
{
"id": "asset-map-001",
"type": "image",
"url": "https://cdn.example.com/images/map-full.png",
"previewSrc": "data:image/png;base64,iVBOR...",
"mimeType": "image/png",
"name": "map-full.png",
"dimensions": { "width": 1200, "height": 800 }
}
],
"references": [
{
"id": "ref-hughes-2018",
"rawReference": "{\"type\":\"article-journal\",\"title\":\"Global warming transforms coral reef assemblages\",\"author\":[{\"family\":\"Hughes\",\"given\":\"T.P.\"}],\"issued\":{\"date-parts\":[[2018]]}}",
"mimeType": "application/vnd.citationstyles.csl+json"
}
]
}
Table with Merged Cells¶
A figure wrapping a table that has merged header cells:
{
"type": "figure",
"attrs": {
"id": "fig-tbl", "src": "", "alt": "", "width": null, "height": null,
"title": null, "type": "native-table", "environment": null, "orientation": "portrait",
"decorative": null, "scale-width": 1, "float-placement": null,
"float-reference": null, "float-defer-page": null, "float-modifier": null
},
"content": [
{
"type": "table",
"attrs": { "id": "tbl-1" },
"content": [
{
"type": "table_row",
"attrs": { "id": "tr-1" },
"content": [
{
"type": "table_header",
"attrs": { "colspan": 2, "rowspan": 1, "colwidth": null, "background": null },
"content": [
{
"type": "paragraph",
"attrs": { "id": "p-th1", "text-align": null, "text-direction": null, "class": null },
"content": [{ "type": "text", "text": "Region", "marks": [{ "type": "strong" }] }]
}
]
},
{
"type": "table_header",
"attrs": { "colspan": 1, "rowspan": 1, "colwidth": null, "background": null },
"content": [
{
"type": "paragraph",
"attrs": { "id": "p-th2", "text-align": null, "text-direction": null, "class": null },
"content": [{ "type": "text", "text": "Temperature (°C)", "marks": [{ "type": "strong" }] }]
}
]
}
]
},
{
"type": "table_row",
"attrs": { "id": "tr-2" },
"content": [
{
"type": "table_cell",
"attrs": { "colspan": 1, "rowspan": 1, "colwidth": null, "background": null },
"content": [
{
"type": "paragraph",
"attrs": { "id": "p-c1", "text-align": null, "text-direction": null, "class": null },
"content": [{ "type": "text", "text": "Pacific" }]
}
]
},
{
"type": "table_cell",
"attrs": { "colspan": 1, "rowspan": 1, "colwidth": null, "background": null },
"content": [
{
"type": "paragraph",
"attrs": { "id": "p-c2", "text-align": null, "text-direction": null, "class": null },
"content": [{ "type": "text", "text": "North" }]
}
]
},
{
"type": "table_cell",
"attrs": { "colspan": 1, "rowspan": 1, "colwidth": null, "background": null },
"content": [
{
"type": "paragraph",
"attrs": { "id": "p-c3", "text-align": null, "text-direction": null, "class": null },
"content": [{ "type": "text", "text": "18.2" }]
}
]
}
]
}
]
},
{
"type": "caption",
"content": [
{
"type": "label",
"content": [{ "type": "text", "text": "Table 1" }]
},
{
"type": "paragraph",
"attrs": { "id": "p-tcap", "text-align": null, "text-direction": null, "class": null },
"content": [{ "type": "text", "text": "Average sea surface temperatures by region." }]
}
]
}
]
}
Footnotes and Code Blocks¶
A paragraph with a footnote, followed by a code block:
{
"type": "doc",
"attrs": { "type": "article", "lang": "en-US", "schema": null, "pageBreak": null, "placement": null, "numbering": null },
"content": [
{
"type": "paragraph",
"attrs": { "id": "p1", "text-align": null, "text-direction": null, "class": null },
"content": [
{ "type": "text", "text": "This claim requires clarification" },
{
"type": "footnote",
"attrs": { "id": "fn-1", "type": "footnote" },
"content": [
{ "type": "text", "text": "See appendix B for the full derivation." }
]
},
{ "type": "text", "text": "." }
]
},
{
"type": "code_block",
"attrs": { "id": "cb-1", "text": "", "type": "code", "language": "application/x-python" },
"content": [
{ "type": "text", "text": "import numpy as np\n\ndef temperature_anomaly(data):\n return data - np.mean(data)" }
]
}
]
}
Marks¶
Marks represent inline formatting — emphasis, strong, links, and so on. Unlike HTML, where inline formatting creates nested elements (<strong><em>text</em></strong>), ProseMirror uses a flat model: each text node carries an array of active marks as metadata. This avoids the ambiguity of overlapping or arbitrarily nested markup and makes positions simple character offsets rather than tree paths. See ProseMirror Guide — Document Structure for the rationale behind this design.
For example, the text "bold and italic" is represented as three text nodes, not nested elements:
[
{ "type": "text", "text": "bold ", "marks": [{ "type": "strong" }] },
{ "type": "text", "text": "and italic", "marks": [{ "type": "strong" }, { "type": "em" }] }
]
Mark Ordering¶
The order of marks in the marks array is not arbitrary — it is determined by the order in which marks are declared in the schema definition. ProseMirror sorts marks by their rank (position in the schema's marks object) so that every document has exactly one canonical JSON representation. This means:
- Two documents with the same logical content always produce identical JSON when serialised with
toJSON(). - Adjacent text nodes with the same mark set are automatically merged.
- Empty text nodes are never emitted.
In this schema, the mark order is: em, strong, sup, sub, bdi, anchor, tags, indexEntry. When a text node carries multiple marks, they always appear in this order regardless of the sequence in which they were applied. This deterministic representation is important for storage, comparison, and collaborative editing.
classDiagram
direction LR
class Mark {
<<interface>>
+type : string
+attrs? : object
}
class em
class strong
class sup
class sub
class bdi
class anchor {
+href : string
+title? : string
+id? : string
}
class tags {
+tags : ~key: string~[]
}
class indexEntry {
+id? : string
+entries : ~raw?: string~[]
+attributes : object
}
Mark <|-- em
Mark <|-- strong
Mark <|-- sup
Mark <|-- sub
Mark <|-- bdi
Mark <|-- anchor
Mark <|-- tags
Mark <|-- indexEntry
| Mark | Attributes | Description |
|---|---|---|
em |
— | Emphasis (italic). |
strong |
— | Strong (bold). |
sup |
— | Superscript. |
sub |
— | Subscript. |
bdi |
— | Bidirectional isolate. |
anchor |
href, title, id |
Hyperlink. |
tags |
tags (array of { key }) |
Semantic tags applied to text. |
indexEntry |
id, entries (array of { raw? }), attributes |
Back-of-book index entry. |
Node Groups¶
ProseMirror content expressions reference groups rather than individual node types. For example, block+ means "one or more nodes belonging to the block group". The groups in the manuscript schema:
| Group | Member nodes |
|---|---|
block |
paragraph, reference, heading, figure, code_block, blockquote, pageBreak, placeHolder, horizontal_rule, bullet_list, ordered_list, table |
structural |
part |
inline |
text, hard_break, image, math, citation, footnote, link |
Compatibility Notes¶
- The schema round-trips to SciFlow document snapshots and is compatible with the typesetting pipeline.
- The JSON format matches ProseMirror's
Node.toJSON()output. Documents can be loaded back withNode.fromJSON(). - Aligns with JATS semantics but isn't a 1:1 copy; e.g., figures carry float metadata to simplify publishing workflows.
- When adding new nodes or marks, update both the schema package and the editor features that surface them.
- The JSON Schema files are regenerated from the ProseMirror schema — they are always in sync with the TypeScript source of truth.
Feature availability in the start bundle
The default @sciflow/editor-start feature set enables citations, cross references, footnotes, inline formatting, headings, figures, tables, and math. To render equations as SVG, add the MathJax script tag to your page (see Troubleshooting). You can change the active features via editor.configureFeatures(...) (see Extending Features).