Pipeline

Table of contents

  1. How to instantiate
  2. Evaluation rules
  3. Supported JsonLogic operators
  4. Custom extensions
    1. Path helpers
    2. String helpers
    3. Regex helpers
    4. Function calls to internal library
  5. Examples
    1. Conditionals with text extraction
    2. URL extraction and formatting
    3. Slack-like text with joins
  6. Runtime, limits, and error policy
  7. Schema definition
  8. Custom JSON output validation
  • JSON Schema 2020-12 to define the mapper document shape and validate configs.
  • JsonLogic 2.0 for expressions, conditionals, and array ops inside a mapper.

Below is a normative spec for map.custom_json using those standards, plus the minimal extensions. For the authoritative schema definition see below. The mapper output is free-form and validated only by the permissive custom_json_output@1.

How to instantiate

Using Rule & Pipeline (JSON):

{
  "pipeline": {
    "steps": [
      {
        "name": "map.custom_json",
        "args": {
          "version": "v1",
          "vars": [ 
            // ordered list of variable definitions
          ],
          "output": { 
            // desired shape for emitted JSON
          }
        }
      }
    ]
  }
}

Inputs available at evaluation time

Use {"var": "message.subject"} etc. The evaluator provides:

  • message object with parsed email fields: message_id, subject, from, to, text, html, headers, attachments.
  • ctx object: project_id, route_id, source_type, now (RFC3339).
  • meta object: any pipeline-provided metadata.

Evaluation rules

  • Evaluate vars entries sequentially, top to bottom. Each entry has name and expr. Later vars can reference earlier ones using {"var":"vars.some_name"}.
  • Build output by recursively evaluating any JsonLogic expressions found in the template.
  • The environment seen by var includes message, ctx, meta, and vars.
  • Only array form is supported for vars; each entry must include name and expr.
  • Ordering-sensitive behaviors (like array traversal) and runtime limits (depth, nodes, regex timeouts) are enforced by the evaluator.
  • Errors inside expressions yield null unless noted below. Hard failures abort with a mapper runtime error.

Supported JsonLogic operators

Use standard JsonLogic 2.0 semantics for these:

  • Control and logic: if, and, or, !
  • Comparators: ==, !=, ===, !==, >, >=, <, <=, in
  • Data access: var with dotted and bracket paths. Examples:

    • {"var":"message.subject"}
    • {"var":"message.attachments[0].filename"}
  • Strings and arrays:

    • cat (concatenate)
    • substr (start, length)
    • map, filter, reduce, some, all, none
  • Objects:

    • merge (shallow object merge)

Custom extensions

All extensions are prefixed to avoid collisions and are implemented as custom JsonLogic operators. Unknown operators or call.fn values are rejected by schema validation; the generic call must use one of: transform.html_to_text, extract.urls, extract.bullet_list.

Path helpers

  • You can rely on var for traversal. Arrays support [index]. Dots split object keys.

String helpers

  • string.lower: {"string.lower": expr}
  • string.upper: {"string.upper": expr}
  • string.trim: {"string.trim": expr}
  • string.slice: {"string.slice": {"value": expr, "start": 0, "end": 120}}
  • string.split: {"string.split": {"value": expr, "sep": ","}}
  • string.join: {"string.join": {"items": exprArray, "sep": ", "}}
  • string.replace: exact substring replace

    • {"string.replace":{"value":expr,"find":"x","with":"y","count":1}}

Regex helpers

  • regex.match

    • Input: {"regex.match":{"value": expr, "pattern":"(?i)invoice|receipt"}}
    • Output: boolean
  • regex.replace

    • {"regex.replace":{"value":expr,"pattern":"\\s+","with":" "}}

Regex engine is the host regex with a timeout. On timeout or invalid pattern, operator returns null.

Function calls to internal library

Those are the first few I came up with and might change/extend later.

Encoded as operator names under call.*. Arguments are named objects.

  • call.transform.html_to_text

    • Args: {"html": string} or {"html": string, "text": string}
    • Returns: string
  • call.extract.urls

    Extracts URLs from the text or HTML. In HTML mode, extracts from href/src attributes and text content from the following tags: <a>, <area>, <img>, <link>, <form>, <button>, <input>, <script>, <iframe>, and <source>.

    In text mode, if Markdown links are found (e.g., [title](url)), extracts both URL and title.

    • Args are optional. Supported args:
      • mode: "html" (default) or "text".
      • html: HTML string source (default: message.html).
      • text: text string source (default: message.text).
      • deduplicate: boolean to drop duplicate (url,title,element) entries (default: False).
    • Mode selection:
      • If mode is provided and valid, honor it.
      • If only text is provided (no html), default to text mode.
      • Otherwise default to HTML mode; if HTML parsing errors or yields zero results, fall back to text mode.
    • Output item shape:
      • { "url": "<string>", "title": "<optional string>", "element": "<optional string>" }
  • call.extract.bullet_list

    Extracts bullet/numbered lists from HTML or text content. Tries to guess title from preceding text.

    • Args are optional. Expected optional keys:
      • mode: "html" (default) or "text".
      • html: HTML string source (default: message.html).
      • text: text string source (default: message.text).
      • nested: boolean to enable nested list extraction (default: False).
      • sanitize: boolean to remove empty lines in text mode (default: True).
      • numbered: boolean to include numbered lists (<ol> or 1./1)) (default: True).
    • Source selection:
      • Default source is message.html.
      • In HTML mode: use args.html if provided, else message.html. If extraction yields zero results, fall back to text mode.
      • In text mode: use args.text if provided, else message.text.
    • Output shape:
      • List objects: {"title": "<optional>", "bullets": [<bullet>, ...]}.
      • Bullet objects: {"text": "<item>", "child": <list|null>} where child (if present) uses the same list object shape and title equals the bullet text.

Results are plain JSON and can be traversed with var. If you assign results to a var, reference as {"var":"vars.urls[0].url"}.

Examples

Conditionals with text extraction

{
  "version": "v1",
  "vars": [
    { "name": "text", "expr": { "call.transform.html_to_text": { "html": { "var": "message.html" } } } },
    { "name": "is_vendor", "expr": { "regex.match": { "value": { "var":"message.from[0].email" }, "pattern": "@vendor\\.com$" } } }
  ],
  "output": {
    "id": { "var": "message.message_id" },
    "source": { "var": "ctx.source_type" },
    "priority": { "if": [ { "var":"vars.is_vendor" }, "high", "normal" ] },
    "snippet": { "substr": [ { "var":"vars.text" }, 0, 200 ] }
  }
}

URL extraction and formatting

{
  "version": "v1",
  "vars": [
    { "name": "text", "expr": { "call.transform.html_to_text": { "html": { "var": "message.html" } } } },
    { "name": "urls", "expr": { "call.extract.urls": { "text": { "var": "vars.text" } } } }
  ],
  "output": {
    "subject": { "var": "message.subject" },
    "first_url": { "var": "vars.urls[0].url" },
    "domain": {
      "regex.replace": {
        "value": { "var":"vars.urls[0].url" },
        "pattern": "^https?://([^/]+)/.*$",
        "with": "$1"
      }
    }
  }
}

Slack-like text with joins

{
  "version": "v1",
  "vars": [
    { "name": "text", "expr": { "call.transform.html_to_text": { "html": { "var":"message.html" } } } },
    { "name": "lines", "expr": { "string.split": { "value": { "var":"vars.text" }, "sep": "\n" } } }
  ],
  "output": {
    "text": {
      "cat": [
        "*", { "var":"message.subject" }, "*", "\n",
        "From: ", { "var":"message.from[0].email" }, "\n",
        { "string.join": { "items": { "var":"vars.lines" }, "sep": " " } }
      ]
    }
  }
}

Runtime, limits, and error policy

  • Deterministic, pure evaluation. No IO except registered call.* functions.
  • Guards:

    • Max expression depth: 50
    • Max nodes evaluated: 10k per mapper
    • Max output size: 1 MiB
    • Regex timeout: 50 ms per op (timeout raises a mapper error)
    • Function call timeout: 100 ms per call (timeout raises a mapper error)
  • Operator error returns null. Critical errors raise a mapper runtime error and stop execution.

Schema definition

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://schemas.mailwebhook.dev/transform/custom_json_mapper@1",
  "title": "mailwebhook custom_json mapper config v1",
  "description": "Validates the JsonLogic-style configuration passed to map.custom_json.",
  "type": "object",
  "required": ["version", "output"],
  "additionalProperties": false,
  "properties": {
    "version": { "const": "v1" },
    "vars": {
      "type": "array",
      "default": [],
      "items": {
        "type": "object",
        "required": ["name", "expr"],
        "additionalProperties": false,
        "properties": {
          "name": { "type": "string", "pattern": "^[A-Za-z_][A-Za-z0-9_]*$" },
          "expr": { "$ref": "#/$defs/expr" },
          "description": { "type": "string" }
        }
      }
    },
    "output": { "$ref": "#/$defs/jsonTemplate" },
    "meta": { "type": "object" }
  },
  "$defs": {
    "jsonTemplate": {
      "description": "Arbitrary JSON where any value may be a literal or an expression",
      "anyOf": [
        { "$ref": "#/$defs/expr" },
        { "type": ["string", "number", "boolean", "null"] },
        {
          "type": "array",
          "items": { "$ref": "#/$defs/jsonTemplate" }
        },
        {
          "type": "object",
          "additionalProperties": { "$ref": "#/$defs/jsonTemplate" }
        }
      ]
    },
    "expr": {
      "description": "JsonLogic-style expression or literal",
      "oneOf": [
        { "type": ["string", "number", "boolean", "null"] },
        {
          "type": "array",
          "items": { "$ref": "#/$defs/expr" }
        },
        {
          "type": "object",
          "minProperties": 1,
          "maxProperties": 1,
          "patternProperties": {
            "^(var|if|and|or|!|==|!=|===|!==|>|>=|<|<=|in|cat|substr|merge|string\\.lower|string\\.upper|string\\.trim|string\\.slice|string\\.split|string\\.join|string\\.replace|regex\\.match|regex\\.replace|map|filter|find|reduce|some|all|none)$": {
              "description": "Supported operators for map.custom_json",
              "type": ["array", "object", "string", "number", "boolean", "null"]
            },
            "^call$": {
              "type": "object",
              "required": ["fn"],
              "additionalProperties": false,
              "properties": {
                "fn": {
                  "type": "string",
                  "enum": [
                    "transform.html_to_text",
                    "extract.urls",
                    "extract.bullet_list"
                  ]
                },
                "args": { "type": "object" }
              }
            },
            "^call\\.(transform\\.html_to_text|extract\\.urls|extract\\.bullet_list)$": {
              "type": "object"
            }
          },
          "additionalProperties": false
        }
      ]
    }
  }
}

Custom JSON output validation

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://schemas.mailwebhook.dev/transform/custom_json_output@1",
  "title": "mailwebhook.custom_json output v1",
  "description": "Output envelope for map.custom_json. Intentionally permissive (user-defined shape).",
  "type": ["object", "array", "string", "number", "boolean", "null"],
  "additionalProperties": true
}