Content Import & Export¶

SciFlow documents are stored as JSON snapshots. The schema package provides export capabilities for scholarly publishing formats and validation tools for document integrity.

Snapshot Format¶

Every SciFlow document is wrapped in a snapshot:

{
  "doc": { "type": "doc", "content": [...] },
  "files": [],
  "references": [],
  "version": 1
}

doc — The ProseMirror document tree as JSON. See Schema Reference.
files — Metadata for attached media (images, data files).
references — Bibliography entries used by citation nodes.
version — Optional version counter for optimistic locking.

JATS XML Export¶

SciFlow can export the document body to JATS 1.4 (Blue) XML, the standard format for scholarly article interchange.

Usage¶

import { generateJatsBody } from '@sciflow/schema-prosemirror';
import { schema } from '@sciflow/schema-prosemirror';
import { Node } from 'prosemirror-model';

// Parse the snapshot's doc JSON into a ProseMirror node
const pmDoc = Node.fromJSON(schema, snapshot.doc);

// Generate JATS <body> XML
const xml = generateJatsBody(pmDoc, {
  prettyPrint: true,
  indent: 2,
});

What Gets Exported¶

Document element	JATS output
Parts (chapter, abstract, etc.)	`<sec>` with appropriate `sec-type`
Headings	Auto-sectioning: headings open nested `<sec>` elements
Paragraphs	`<p>`
Bold / italic / sup / sub	`<bold>`, `<italic>`, `<sup>`, `<sub>`
Citations	`<xref ref-type="bibr">` with decoded source IDs
Footnotes	Collected into a trailing `<fn-group>`
Math (TeX)	`<disp-formula>` or `<inline-formula>` with `<tex-math>`
Figures	`<fig>` with `<graphic>`, `<caption>`, `<label>`
Tables	`<table-wrap>` with HTML-style table markup
Lists	`<list list-type="bullet">` or `<list list-type="order">`
Blockquotes	`<disp-quote>`
Code blocks	`<code>`
Hyperlinks	`<ext-link>`

Options¶

interface JatsBodyOptions {
  prettyPrint?: boolean;  // Format XML with indentation (default: false)
  indent?: number;        // Spaces per indent level (default: 2)
}

JSON Schema Validation¶

The schema package can generate JSON Schema (Draft-07) definitions from the live ProseMirror schema. Use these to validate documents outside the editor.

Generating Schemas¶

npx nx run @sciflow/schema-prosemirror:generate-schema

This produces two files under packages/schema/prosemirror/dist/:

File	Validates
`manuscript.schema.json`	The `doc` portion of a snapshot
`manuscript-snapshot.schema.json`	The full snapshot (doc + files + references)

Validating a Document¶

import Ajv from 'ajv';
import snapshotSchema from '@sciflow/schema-prosemirror/dist/manuscript-snapshot.schema.json';

const ajv = new Ajv();
const validate = ajv.compile(snapshotSchema);

if (!validate(snapshot)) {
  console.error('Invalid document:', validate.errors);
}

Importing Content¶

SciFlow does not ship a built-in importer from external formats (Word, Markdown, HTML). Documents must be provided as JSON conforming to the ProseMirror schema.

Building a Custom Importer¶

To import from external formats, construct a valid ProseMirror JSON tree:

// Minimal valid document
const imported = {
  doc: {
    type: 'doc',
    content: [
      {
        type: 'heading',
        attrs: { level: 1, id: 'title-1' },
        content: [{ type: 'text', text: 'Imported Article' }],
      },
      {
        type: 'paragraph',
        content: [{ type: 'text', text: 'First paragraph of imported content.' }],
      },
    ],
  },
  files: [],
  references: [],
};

Validate the result against the JSON schema before loading it into the editor.

Third-party converters

For Markdown → ProseMirror, consider libraries like prosemirror-markdown. For DOCX, mammoth.js can produce HTML that you then transform into ProseMirror JSON. These require mapping to SciFlow's specific node types.

Schema Migration¶

When the ProseMirror schema changes between versions (new attributes, renamed nodes, removed fields), existing documents may need migration.

Migration Strategy¶

Regenerate the JSON schema after any schema change:

npx nx run @sciflow/schema-prosemirror:generate-schema

Validate existing documents against the new schema to identify breaking changes.

Write a migration function that transforms old JSON to the new format:

function migrateV1toV2(snapshot: any): SyncSnapshot {
  // Walk the document tree and transform nodes
  const migrateNode = (node: any) => {
    if (node.type === 'old_node_name') {
      node.type = 'new_node_name';
    }
    if (node.content) {
      node.content.forEach(migrateNode);
    }
    return node;
  };

  migrateNode(snapshot.doc);
  return snapshot;
}

Run migrations at load time in your sync strategy's load() method, before returning the snapshot to the editor.