Skip to content

Content Import & Export

SciFlow documents are stored as JSON snapshots. The schema package provides export capabilities for scholarly publishing formats and validation tools for document integrity.

Snapshot Format

Every SciFlow document is wrapped in a snapshot:

{
  "doc": { "type": "doc", "content": [...] },
  "files": [],
  "references": [],
  "version": 1
}
  • doc — The ProseMirror document tree as JSON. See Schema Reference.
  • files — Metadata for attached media (images, data files).
  • references — Bibliography entries used by citation nodes.
  • version — Optional version counter for optimistic locking.

JATS XML Export

SciFlow can export the document body to JATS 1.4 (Blue) XML, the standard format for scholarly article interchange.

Usage

import { generateJatsBody } from '@sciflow/schema-prosemirror';
import { schema } from '@sciflow/schema-prosemirror';
import { Node } from 'prosemirror-model';

// Parse the snapshot's doc JSON into a ProseMirror node
const pmDoc = Node.fromJSON(schema, snapshot.doc);

// Generate JATS <body> XML
const xml = generateJatsBody(pmDoc, {
  prettyPrint: true,
  indent: 2,
});

What Gets Exported

Document element JATS output
Parts (chapter, abstract, etc.) <sec> with appropriate sec-type
Headings Auto-sectioning: headings open nested <sec> elements
Paragraphs <p>
Bold / italic / sup / sub <bold>, <italic>, <sup>, <sub>
Citations <xref ref-type="bibr"> with decoded source IDs
Footnotes Collected into a trailing <fn-group>
Math (TeX) <disp-formula> or <inline-formula> with <tex-math>
Figures <fig> with <graphic>, <caption>, <label>
Tables <table-wrap> with HTML-style table markup
Lists <list list-type="bullet"> or <list list-type="order">
Blockquotes <disp-quote>
Code blocks <code>
Hyperlinks <ext-link>

Options

interface JatsBodyOptions {
  prettyPrint?: boolean;  // Format XML with indentation (default: false)
  indent?: number;        // Spaces per indent level (default: 2)
}

JSON Schema Validation

The schema package can generate JSON Schema (Draft-07) definitions from the live ProseMirror schema. Use these to validate documents outside the editor.

Generating Schemas

npx nx run @sciflow/schema-prosemirror:generate-schema

This produces two files under packages/schema/prosemirror/dist/:

File Validates
manuscript.schema.json The doc portion of a snapshot
manuscript-snapshot.schema.json The full snapshot (doc + files + references)

Validating a Document

import Ajv from 'ajv';
import snapshotSchema from '@sciflow/schema-prosemirror/dist/manuscript-snapshot.schema.json';

const ajv = new Ajv();
const validate = ajv.compile(snapshotSchema);

if (!validate(snapshot)) {
  console.error('Invalid document:', validate.errors);
}

Importing Content

SciFlow does not ship a built-in importer from external formats (Word, Markdown, HTML). Documents must be provided as JSON conforming to the ProseMirror schema.

Building a Custom Importer

To import from external formats, construct a valid ProseMirror JSON tree:

// Minimal valid document
const imported = {
  doc: {
    type: 'doc',
    content: [
      {
        type: 'heading',
        attrs: { level: 1, id: 'title-1' },
        content: [{ type: 'text', text: 'Imported Article' }],
      },
      {
        type: 'paragraph',
        content: [{ type: 'text', text: 'First paragraph of imported content.' }],
      },
    ],
  },
  files: [],
  references: [],
};

Validate the result against the JSON schema before loading it into the editor.

Third-party converters

For Markdown → ProseMirror, consider libraries like prosemirror-markdown. For DOCX, mammoth.js can produce HTML that you then transform into ProseMirror JSON. These require mapping to SciFlow's specific node types.

Schema Migration

When the ProseMirror schema changes between versions (new attributes, renamed nodes, removed fields), existing documents may need migration.

Migration Strategy

  1. Regenerate the JSON schema after any schema change:

    npx nx run @sciflow/schema-prosemirror:generate-schema
    

  2. Validate existing documents against the new schema to identify breaking changes.

  3. Write a migration function that transforms old JSON to the new format:

    function migrateV1toV2(snapshot: any): SyncSnapshot {
      // Walk the document tree and transform nodes
      const migrateNode = (node: any) => {
        if (node.type === 'old_node_name') {
          node.type = 'new_node_name';
        }
        if (node.content) {
          node.content.forEach(migrateNode);
        }
        return node;
      };
    
      migrateNode(snapshot.doc);
      return snapshot;
    }
    

  4. Run migrations at load time in your sync strategy's load() method, before returning the snapshot to the editor.