JsonLogic-Style DSL

Table of contents

  1. Expressions
  2. Variable lookup
  3. Core operators
  4. String helpers
  5. Regex helpers
  6. Array helpers
  7. Scoped blocks
  8. Function calls
    1. call.transform.html_to_text
    2. call.extract.urls
    3. call.extract.bullet_list
    4. call.extract.reply_segments
    5. call.extract.key_value_pairs
    6. call.extract.tables
    7. call.extract.dom
  9. Runtime limits and errors
  10. Unsupported features

The Custom JSON mapper uses a JsonLogic-style expression language. It is not a complete JsonLogic implementation: MailWebhook supports a focused operator set, adds MailWebhook-specific helpers under call.*, and reserves scoped blocks for local variables inside templates.

Expressions

An expression is usually a single-key object:

{ "var": "message.subject" }

Expression values can also be plain literals:

"normal"

Any value inside a Custom JSON output template can be:

  • a scalar literal
  • an array of template values
  • a plain object template
  • a single-key expression
  • a scoped block whose exact shape is { "vars": [...], "output": ... }

Single-key objects with recognized operator names are evaluated as expressions. Unknown operator-like keys are rejected. Other objects are treated as plain JSON templates and their values are evaluated recursively.

Variable lookup

Use var to read from the current scope or the Custom JSON evaluation root:

{ "var": "message.from[0].email" }

Path rules:

  • Dots traverse object keys.
  • [index] traverses arrays.
  • Missing paths return null.
  • Bracket syntax supports numeric indexes only.

Lookup order:

  1. Current lexical scope head segment first.
  2. Iterator aliases such as item, a custom as alias, current, and accumulator.
  3. Nearest visible vars.*, then outer vars.*, then root vars.*.
  4. Root fallback for message.*, ctx.*, meta.*, and top-level vars.*.

Examples:

{ "var": "message.subject" }
{ "var": "vars.text" }
{ "var": "bullet.text" }
{ "var": "current.amount" }

Core operators

OperatorFormResult
if{ "if": [condition, thenValue, elseValue] }thenValue when condition is truthy; otherwise elseValue or null.
and{ "and": [a, b, ...] }First falsey evaluated value, or the last evaluated value.
or{ "or": [a, b, ...] }First truthy evaluated value, or the last evaluated value.
!{ "!": expr }Boolean negation.
==, !=, ===, !=={ "==": [a, b] }Equality comparison. Strict and non-strict forms behave the same in v1.
>, >=, <, <={ ">": [a, b] }Comparison, or null when values cannot be compared.
in{ "in": [needle, haystack] }Membership test, or null when haystack cannot be searched.
cat{ "cat": [a, b, ...] }String concatenation. null parts become empty strings.
substr{ "substr": [value, start, length] }String slice starting at start with optional length.
merge{ "merge": [objectA, objectB] }Shallow object merge. Later objects win.

substr also accepts object form:

{
  "substr": {
    "value": { "var": "message.subject" },
    "start": 0,
    "end": 80
  }
}

Common usage examples:

{
  "priority": {
    "if": [
      {
        "or": [
          {
            "regex.match": {
              "value": { "var": "message.subject" },
              "pattern": "(?i)urgent|failure|down"
            }
          },
          {
            "in": [
              { "var": "message.from[0].email" },
              ["ceo@example.com", "ops@example.com"]
            ]
          }
        ]
      },
      "high",
      "normal"
    ]
  },
  "summary": {
    "cat": [
      { "var": "message.subject" },
      " from ",
      { "var": "message.from[0].email" }
    ]
  },
  "payload": {
    "merge": [
      { "source": "mailwebhook", "priority": "normal" },
      { "priority": "high" },
      { "subject": { "var": "message.subject" } }
    ]
  }
}

Use if with or, and, comparisons, and in for classification. Use cat for small strings; null parts become empty strings. Use merge for shallow defaults where later objects override earlier objects.

String helpers

OperatorFormResult
string.lower{ "string.lower": expr }Lowercase string, or null.
string.upper{ "string.upper": expr }Uppercase string, or null.
string.trim{ "string.trim": expr }Trimmed string, or null.
string.slice{ "string.slice": { "value": expr, "start": 0, "end": 120 } }Slice by start and optional end indexes.
string.split{ "string.split": { "value": expr, "sep": "," } }Array of string parts, or null.
string.join{ "string.join": { "items": exprArray, "sep": ", " } }Joined string, or null when items is not an array.
string.replace{ "string.replace": { "value": expr, "find": "x", "with": "y", "count": 1 } }Exact substring replacement.

Regex helpers

OperatorFormResult
regex.match{ "regex.match": { "value": expr, "pattern": "(?i)invoice" } }Boolean match result, or null for invalid input or invalid pattern.
regex.replace{ "regex.replace": { "value": expr, "pattern": "\\s+", "with": " " } }Replaced string, or null for invalid input or invalid pattern.

Regex replacement uses Python-style replacement semantics. Captured groups use backreferences such as "\\1", not JavaScript-style $1.

Invalid regex patterns return null. Regex timeouts raise a mapper error and stop route delivery.

String and regex examples:

{
  "version": "v1",
  "vars": [
    {
      "name": "sender",
      "expr": { "string.lower": { "var": "message.from[0].email" } }
    },
    {
      "name": "sender_domain",
      "expr": {
        "regex.replace": {
          "value": { "var": "vars.sender" },
          "pattern": "^.*@([^>\\s]+)$",
          "with": "\\1"
        }
      }
    },
    {
      "name": "subject_parts",
      "expr": {
        "string.split": {
          "value": { "var": "message.subject" },
          "sep": ":"
        }
      }
    }
  ],
  "output": {
    "sender_domain": { "var": "vars.sender_domain" },
    "subject_prefix": { "string.trim": { "var": "vars.subject_parts[0]" } },
    "subject_detail": { "string.trim": { "var": "vars.subject_parts[1]" } },
    "compact_subject": {
      "regex.replace": {
        "value": { "var": "message.subject" },
        "pattern": "\\s+",
        "with": " "
      }
    },
    "tag_list": {
      "string.join": {
        "items": ["mail", { "var": "vars.sender_domain" }],
        "sep": ", "
      }
    },
    "safe_subject": {
      "string.replace": {
        "value": { "var": "message.subject" },
        "find": "\"",
        "with": "'"
      }
    }
  }
}

Use string.replace for exact substring replacement. Use regex.replace when you need pattern matching, whitespace compaction, or captured groups such as "\\1".

Array helpers

Array helpers accept object form and, for compact configs, list form. Object form is clearer and recommended for new configs.

OperatorObject formList formEmpty-array result
map{ "map": { "over": exprArray, "as": "item", "do": expr } }{ "map": [exprArray, expr] }[]
filter{ "filter": { "over": exprArray, "as": "item", "where": expr } }{ "filter": [exprArray, expr] }[]
find{ "find": { "over": exprArray, "as": "item", "where": expr } }{ "find": [exprArray, expr] }null
reduce{ "reduce": { "over": exprArray, "do": expr, "start": initial } }{ "reduce": [exprArray, expr, initial] }start, or null when omitted
some{ "some": { "over": exprArray, "as": "item", "where": expr } }{ "some": [exprArray, expr] }false
all{ "all": { "over": exprArray, "as": "item", "where": expr } }{ "all": [exprArray, expr] }true
none{ "none": { "over": exprArray, "as": "item", "where": expr } }{ "none": [exprArray, expr] }true

Rules:

  • map returns the evaluated do result for each item.
  • filter returns the original items whose where expression is truthy.
  • find returns the first original item whose where expression is truthy.
  • reduce binds current and accumulator for each item.
  • some, all, and none evaluate where as a predicate and short-circuit.
  • Non-array over values return null.
  • as defaults to item when omitted.
  • as must match ^[A-Za-z_][A-Za-z0-9_]*$.
  • as must not shadow reserved scope names: message, ctx, meta, vars, current, accumulator.

Example:

{
  "map": {
    "over": { "var": "vars.lists[0].bullets" },
    "as": "bullet",
    "do": { "var": "bullet.text" }
  }
}

Reducer example:

{
  "reduce": {
    "over": [{ "amount": 10 }, { "amount": 15 }],
    "start": 0,
    "do": {
      "cat": [
        { "var": "accumulator" },
        { "var": "current.amount" }
      ]
    }
  }
}

Workflow examples:

{
  "pdf_attachments": {
    "filter": {
      "over": { "var": "message.attachments" },
      "as": "attachment",
      "where": {
        "==": [
          { "var": "attachment.content_type" },
          "application/pdf"
        ]
      }
    }
  },
  "first_invoice_pdf": {
    "find": {
      "over": { "var": "message.attachments" },
      "as": "attachment",
      "where": {
        "regex.match": {
          "value": { "var": "attachment.filename" },
          "pattern": "(?i)invoice.*\\.pdf$"
        }
      }
    }
  },
  "has_large_attachment": {
    "some": {
      "over": { "var": "message.attachments" },
      "as": "attachment",
      "where": {
        ">": [{ "var": "attachment.size" }, 10485760]
      }
    }
  },
  "all_attachments_hashed": {
    "all": {
      "over": { "var": "message.attachments" },
      "as": "attachment",
      "where": { "var": "attachment.sha256" }
    }
  },
  "no_executables": {
    "none": {
      "over": { "var": "message.attachments" },
      "as": "attachment",
      "where": {
        "regex.match": {
          "value": { "var": "attachment.filename" },
          "pattern": "(?i)\\.(exe|scr)$"
        }
      }
    }
  }
}

Use filter when you need all matching items, find when only the first match matters, and some/all/none when the output should be a boolean.

Scoped blocks

Scoped blocks let you define local vars anywhere a template value is allowed:

{
  "vars": [
    { "name": "text", "expr": { "var": "bullet.text" } },
    {
      "name": "parts",
      "expr": {
        "string.split": {
          "value": { "var": "vars.text" },
          "sep": ":"
        }
      }
    }
  ],
  "output": { "string.trim": { "var": "vars.parts[1]" } }
}

Rules:

  • Detection is exact: an object with exactly vars and output is evaluated as a scoped block.
  • Any extra key keeps the object a plain JSON template.
  • Block vars use the same { "name": "...", "expr": ... } contract as top-level vars.
  • Block vars evaluate top-to-bottom.
  • Later block vars can read earlier block vars through vars.<name>.
  • Inner vars.foo shadows outer vars.foo.

If you need to emit a literal object with keys named vars and output, build it indirectly:

{
  "merge": [
    { "vars": [1, 2, 3] },
    { "output": "value" }
  ]
}

Function calls

Function calls expose deterministic MailWebhook helpers. They can be written in generic form:

{
  "call": {
    "fn": "extract.urls",
    "args": {
      "text": "See https://example.com"
    }
  }
}

Or prefixed form:

{
  "call.extract.urls": {
    "text": "See https://example.com"
  }
}

Allowed helpers:

  • transform.html_to_text
  • extract.urls
  • extract.bullet_list
  • extract.reply_segments
  • extract.key_value_pairs
  • extract.tables
  • extract.dom

Unknown helper names are rejected.

call.transform.html_to_text

Arguments:

ArgDefaultDescription
htmlnoneHTML string to convert.
textnonePlain text fallback. When present and truthy, it is returned directly.

Returns a string or null.

Common pattern:

{
  "version": "v1",
  "vars": [
    {
      "name": "text",
      "expr": {
        "call.transform.html_to_text": {
          "html": { "var": "message.html" },
          "text": { "var": "message.text" }
        }
      }
    }
  ],
  "output": {
    "subject": { "var": "message.subject" },
    "snippet": { "substr": [{ "var": "vars.text" }, 0, 240] }
  }
}

Provide both html and text when possible. If text is present and truthy, it is returned directly; otherwise the helper converts html.

call.extract.urls

Extracts URLs from text or HTML. In HTML mode, it reads href, src, action, and formaction attributes from link, image, form, script, iframe, and source-like tags. In text mode, Markdown links include both URL and title.

Arguments:

ArgDefaultDescription
mode"html" unless only text is provided"html" or "text".
htmlmessage.htmlHTML source.
textmessage.textText source.
deduplicatefalseDrop duplicate (url, title, element) entries.

Returns an array of objects:

[
  {
    "url": "https://example.com",
    "title": "Optional title",
    "element": "a"
  }
]

Usage example:

{
  "version": "v1",
  "vars": [
    {
      "name": "links",
      "expr": {
        "call.extract.urls": {
          "html": { "var": "message.html" },
          "text": { "var": "message.text" },
          "deduplicate": true
        }
      }
    }
  ],
  "output": {
    "first_url": { "var": "vars.links[0].url" },
    "first_label": { "var": "vars.links[0].title" },
    "all_urls": {
      "map": {
        "over": { "var": "vars.links" },
        "as": "link",
        "do": { "var": "link.url" }
      }
    }
  }
}

HTML links may include element. Markdown links in text mode may include title. Bare text URLs usually return only url.

call.extract.bullet_list

Extracts bullet and numbered lists from HTML or text content. HTML mode falls back to text mode when no lists are found.

Arguments:

ArgDefaultDescription
mode"html""html" or "text".
htmlmessage.htmlHTML source.
textmessage.textText source.
nestedfalsePreserve nested list structure.
sanitizetrueRemove empty lines in text mode.
numberedtrueInclude numbered lists.

Returns an array of list objects:

[
  {
    "title": "Optional heading",
    "bullets": [
      {
        "text": "Item text",
        "child": {
          "title": "Item text",
          "bullets": []
        }
      }
    ]
  }
]

Usage example:

{
  "version": "v1",
  "vars": [
    {
      "name": "lists",
      "expr": {
        "call.extract.bullet_list": {
          "html": { "var": "message.html" },
          "text": { "var": "message.text" },
          "nested": true,
          "numbered": true
        }
      }
    }
  ],
  "output": {
    "first_item": { "var": "vars.lists[0].bullets[0].text" },
    "first_item_children": { "var": "vars.lists[0].bullets[0].child.bullets" },
    "flat_items": {
      "map": {
        "over": { "var": "vars.lists[0].bullets" },
        "as": "bullet",
        "do": { "var": "bullet.text" }
      }
    }
  }
}

Use nested: true when child bullets matter. Use numbered: false only when numbered lines should not be treated as list items.

call.extract.reply_segments

Splits reply content from quoted content, forwarded content, and signatures. Use text.reply_content or html.reply_content for the common “just the new reply” case.

Arguments:

ArgDefaultDescription
textmessage.textText source.
htmlmessage.htmlHTML source.
sources["text", "html"]Sources to analyze.
include_signaturefalseInclude detected signatures as segments.
split_quoted_by_depthfalseSplit quoted content by quote depth.
min_confidence0.0Filter segments below this confidence.
max_segments_per_sourceruntime cappedSoft per-call segment cap.
max_input_chars_per_sourceruntime cappedSoft per-call input-size cap.

Returns an object with source-specific results:

{
  "text": {
    "format": "text/plain",
    "reply_content": "Thanks for the update.",
    "has_quoted_content": true,
    "max_depth": 1,
    "detected_vendors": [],
    "segments": [
      {
        "kind": "reply_content",
        "depth": 0,
        "content": "Thanks for the update.",
        "confidence": 1.0,
        "detectors": ["plain.leading_unquoted"],
        "vendor": null
      }
    ]
  },
  "html": null
}

Useful fields:

  • text.reply_content and html.reply_content
  • text.has_quoted_content and html.has_quoted_content
  • segments[*].kind
  • segments[*].depth
  • segments[*].content

Usage example:

{
  "version": "v1",
  "vars": [
    {
      "name": "reply_segments",
      "expr": {
        "call.extract.reply_segments": {
          "sources": ["text"],
          "split_quoted_by_depth": true,
          "include_signature": true
        }
      }
    }
  ],
  "output": {
    "reply_text": { "var": "vars.reply_segments.text.reply_content" },
    "has_quoted_content": { "var": "vars.reply_segments.text.has_quoted_content" },
    "quoted_segments": {
      "filter": {
        "over": { "var": "vars.reply_segments.text.segments" },
        "as": "segment",
        "where": {
          "==": [{ "var": "segment.kind" }, "quoted_content"]
        }
      }
    },
    "signature": {
      "find": {
        "over": { "var": "vars.reply_segments.text.segments" },
        "as": "segment",
        "where": {
          "==": [{ "var": "segment.kind" }, "signature"]
        }
      }
    }
  }
}

Segment kind values include reply_content, quoted_header, quoted_content, forwarded_header, forwarded_content, and signature.

call.extract.key_value_pairs

Extracts conservative key/value pairs from message body text or HTML. Use this helper when downstream config needs direct lookup fields such as vars.kv.values.order_id, while still retaining ordered items for map/filter workflows.

Arguments:

ArgDefaultDescription
mode"auto""auto", "html", or "text". Auto prefers HTML when present and falls back to text when HTML yields no pairs.
htmlmessage.htmlHTML source.
textmessage.textText source.
separators[":", "="]Ordered single-character separators for text candidates.
max_key_words5Maximum accepted key word count before a candidate is treated as prose.
allow_prosefalseAllow small prose-like keys that are normally rejected.
allow_html_adjacent_blocksfalseAllow generic adjacent HTML block pairing. Disabled by default because it can be noisy.

Returns:

{
  "items": [
    {
      "key": "Order ID",
      "normalized_key": "order_id",
      "value": "AZ-123",
      "separator": ":",
      "source": "text",
      "line": 1
    }
  ],
  "values": {
    "order_id": "AZ-123"
  },
  "groups": {
    "order_id": ["AZ-123"]
  }
}

Useful fields:

  • items[*].key
  • items[*].normalized_key
  • items[*].value
  • items[*].separator
  • items[*].source
  • items[*].line
  • values.<normalized_key> for the first value
  • groups.<normalized_key> for all values in encounter order

Referencing normalized keys:

Order ID: AZ-123
Order ID: AZ-124
Customer Email: ada@example.com
Ticket-Type: billing

The labels above produce these lookup keys:

LabelNormalized keyFirst valueAll values
Order IDorder_idvars.kv.values.order_idvars.kv.groups.order_id
Customer Emailcustomer_emailvars.kv.values.customer_emailvars.kv.groups.customer_email
Ticket-Typeticket_typevars.kv.values.ticket_typevars.kv.groups.ticket_type

Use values when the first value is enough, and groups when duplicate labels should be preserved:

{
  "version": "v1",
  "vars": [
    {
      "name": "kv",
      "expr": {
        "call.extract.key_value_pairs": {
          "mode": "text"
        }
      }
    }
  ],
  "output": {
    "first_order_id": { "var": "vars.kv.values.order_id" },
    "all_order_ids": { "var": "vars.kv.groups.order_id" },
    "customer_email": { "var": "vars.kv.values.customer_email" },
    "ticket_type": { "var": "vars.kv.values.ticket_type" }
  }
}

Text extraction rules:

  • Keys must start with an alphabetic character.
  • Keys may contain letters, digits, whitespace, and hyphen only.
  • Hyphens and whitespace in keys normalize to _, so Ticket-Type becomes ticket_type.
  • Separators must be single non-whitespace characters.
  • Candidate lines ending with ? or ! are ignored.
  • Values preserve internal whitespace after edge trimming.
  • One trailing . is trimmed from values.

HTML extraction rules:

  • HTML keys and values are always returned as text only.
  • Structured extraction covers common table rows, definition lists, labels, and bold or strong key hints.
  • HTML-to-text fallback handles separator lines after structural extraction.
  • Generic adjacent block pairing requires allow_html_adjacent_blocks: true.

call.extract.tables

Extracts structured tables from message body text or HTML. Use this helper when downstream config needs stable cell lookup by normalized row and column headers, or when it needs to iterate over rows and columns without parsing table text.

Arguments:

ArgDefaultDescription
mode"auto""auto", "html", or "text". Auto considers HTML tables first, then text tables.
htmlmessage.htmlHTML source.
textmessage.textText source.
header_mode"auto""auto", "first_row", "first_column", "both", or "none".
include_html_gridsfalseEnable opt-in non-<table> HTML grid extraction.
html_grid_mode"strict""strict", "off", or "lenient" for smart HTML grids.
min_rows2Minimum accepted table row count.
min_columns2Minimum accepted table column count.
min_grid_confidence0.75Minimum confidence for smart HTML grids.
max_tables20Maximum returned tables.
max_rows500Maximum rows retained per table.
max_columns50Maximum columns retained per table.
deduplicatetrueDrop duplicate table matrices.

Returns, abridged:

{
  "tables": [
    {
      "index": 0,
      "source": "text_tsv",
      "caption": null,
      "confidence": 1.0,
      "row_count": 3,
      "column_count": 3,
      "header_strategy": "both",
      "col_headers": [
        { "index": 1, "text": "Qty", "normalized": "qty", "lookup_key": "qty" },
        { "index": 2, "text": "Total", "normalized": "total", "lookup_key": "total" }
      ],
      "row_headers": [
        { "index": 1, "text": "Widget", "normalized": "widget", "lookup_key": "widget" }
      ],
      "header_candidates": {
        "first_row": [
          { "index": 0, "text": "Item", "normalized": "item", "lookup_key": "item" }
        ],
        "first_column": [
          { "index": 0, "text": "Item", "normalized": "item", "lookup_key": "item" }
        ]
      },
      "rows": [
        {
          "index": 1,
          "lookup_key": "widget",
          "values": { "qty": "2", "total": "$10.00" },
          "cells": [
            {
              "row_index": 1,
              "column_index": 1,
              "row_lookup_key": "widget",
              "column_lookup_key": "qty",
              "value": "2",
              "row_span": 1,
              "col_span": 1,
              "span_origin": null,
              "is_span_origin": true
            }
          ]
        }
      ],
      "cols": [
        {
          "index": 1,
          "lookup_key": "qty",
          "values": { "widget": "2" },
          "cells": [
            {
              "row_index": 1,
              "column_index": 1,
              "row_lookup_key": "widget",
              "column_lookup_key": "qty",
              "value": "2",
              "row_span": 1,
              "col_span": 1,
              "span_origin": null,
              "is_span_origin": true
            }
          ]
        }
      ],
      "lookup": {
        "by_row": {
          "widget": { "qty": "2", "total": "$10.00" }
        },
        "by_column": {
          "qty": { "widget": "2" },
          "total": { "widget": "$10.00" }
        }
      },
      "matrix": [
        ["Item", "Qty", "Total"],
        ["Widget", "2", "$10.00"],
        ["Gadget", "3", "$15.00"]
      ],
      "detection": {
        "reason": "text_tsv",
        "selector_hint": null,
        "details": {}
      }
    }
  ],
  "summary": { "table_count": 1 }
}

Supported source values:

  • text_tsv: tab-delimited body text.
  • text_pipe: Markdown-style pipe tables.
  • text_ascii: boxed ASCII tables.
  • html_table: native HTML <table> markup.
  • html_aria_table: opt-in ARIA role="table" or role="grid" structures.
  • html_css_table: opt-in inline CSS display: table structures.
  • html_repeated_grid: opt-in repeated row-like HTML blocks.
  • html_label_grid: opt-in repeated label/value card grids.

Useful fields:

  • tables[0].lookup.by_row.<row_key>.<column_key> for direct row-oriented lookup.
  • tables[0].lookup.by_column.<column_key>.<row_key> for direct column-oriented lookup.
  • tables[0].rows for map, filter, find, and reduce over data rows.
  • tables[0].cols for map, filter, find, and reduce over data columns.
  • tables[0].rows[*].values.<column_key> and tables[0].cols[*].values.<row_key> for axis-local cell values.
  • tables[0].rows[*].cells[*] and tables[0].cols[*].cells[*] for cells with both lookup keys and span metadata.
  • tables[0].matrix for the rectangular text-only visual matrix.
  • summary.table_count for whether extraction found any tables.

Usage example:

{
  "version": "v1",
  "vars": [
    {
      "name": "tables",
      "expr": {
        "call.extract.tables": {
          "mode": "auto"
        }
      }
    }
  ],
  "output": {
    "widget_qty": {
      "var": "vars.tables.tables[0].lookup.by_row.widget.qty"
    },
    "gadget_total": {
      "var": "vars.tables.tables[0].lookup.by_column.total.gadget"
    },
    "row_totals": {
      "map": {
        "over": { "var": "vars.tables.tables[0].rows" },
        "as": "row",
        "do": {
          "item": { "var": "row.lookup_key" },
          "total": { "var": "row.values.total" }
        }
      }
    },
    "matrix": { "var": "vars.tables.tables[0].matrix" }
  }
}

Header and lookup rules:

  • Header keys are trimmed, entity-decoded, zero-width-stripped, case-folded, and normalized by replacing whitespace, punctuation, and symbol runs with _.
  • Duplicate normalized keys receive stable suffixes such as total, total_2, and total_3.
  • Empty headers use synthetic row_N or column_N fallback keys.
  • Headers that start with a digit are prefixed with their axis, for example row_2026 or column_2026.
  • header_candidates.first_row and header_candidates.first_column are populated for inspection even when the resolved header_strategy is "none".

Table extraction rules:

  • In "auto" mode, HTML-derived tables are considered before text tables.
  • When HTML and text contain the same table, deduplication keeps the first accepted table, normally the HTML-derived one.
  • Smart non-<table> HTML grids are disabled by default and require include_html_grids: true.
  • Use "lenient" grid mode only for known sender templates where strict mode misses valid grids.
  • Returned cell values are text-only. Raw HTML is never returned.
  • Native HTML rowspan and colspan values are expanded into the visual matrix. Each affected cell carries row_span, col_span, span_origin, and is_span_origin metadata.
  • There is no top-level flattened cells array in v1; use rows[*].cells, cols[*].cells, or lookup.
  • Layout tables, navigation/link groups, hidden/script/style/head content, and low-confidence grids are rejected by safety heuristics.

call.extract.dom

Extracts scalar values or repeated structured records from message HTML with a CSS or XPath selector. Use this helper for stable sender templates where the target fields live in repeated cards, nested layout tables, or specific links that are not good fits for table or key/value extraction.

CSS and XPath return the same output shape. XPath selectors must return element nodes. Scalar XPath expressions such as count(//tr) are rejected; use value: "count" to count matched elements.

Arguments:

ArgDefaultDescription
htmlmessage.htmlHTML source.
selector_typenoneRequired. "css" or "xpath".
selectornoneRequired CSS selector or XPath selector.
valuenoneRequired. "text", "attr", "html", "outer_html", "exists", or "count".
attrnoneRequired when value is "attr".
manyfalseRetain all projected scalar values instead of only the first.
max_matches16Maximum retained matches. Hard maximum: 32.
defaultnullScalar fallback when no projected value exists.
requiredfalseFail only when no element matches the selector.
normalize_whitespacetrueCompact whitespace for text projection.
fieldsnoneStructured record field object. When present, the top-level selector finds containers and each field selector runs inside its container.
include_containernoneOptional structured-record metadata opt-in with tag and selected safe attrs.

Scalar result shape:

{
  "value": "$42.00",
  "values": ["$42.00"],
  "item": {
    "index": 0,
    "tag": "span",
    "value": "$42.00",
    "text": "$42.00",
    "attrs": {
      "class": "total"
    }
  },
  "items": [
    {
      "index": 0,
      "tag": "span",
      "value": "$42.00",
      "text": "$42.00",
      "attrs": {
        "class": "total"
      }
    }
  ],
  "summary": {
    "match_count": 1,
    "value_count": 1,
    "truncated": false
  }
}

Aggregate modes return the aggregate in value and leave values, item, and items empty:

{
  "value": 3,
  "values": [],
  "item": null,
  "items": [],
  "summary": {
    "match_count": 3,
    "value_count": 0,
    "truncated": false
  }
}

CSS scalar example:

{
  "version": "v1",
  "vars": [
    {
      "name": "tracking_link",
      "expr": {
        "call.extract.dom": {
          "selector_type": "css",
          "selector": "a[href*='track']",
          "value": "attr",
          "attr": "href"
        }
      }
    }
  ],
  "output": {
    "tracking_url": { "var": "vars.tracking_link.value" }
  }
}

Structured XPath example:

{
  "version": "v1",
  "vars": [
    {
      "name": "opportunities",
      "expr": {
        "call.extract.dom": {
          "selector_type": "xpath",
          "selector": "//h1[contains(normalize-space(.), 'Matches Based')]/following::table[1]//tr[td[contains(@style, 'padding:20px 0;')]/table//p[contains(normalize-space(.), 'Submit By:')]]",
          "value": "text",
          "fields": {
            "outlet": {
              "selector": ".//td[@width='90']//img[1]",
              "value": "attr",
              "attr": "alt"
            },
            "title": {
              "selector": ".//td[@width='90']/following-sibling::td[1]/p[2]/a[1]",
              "value": "text",
              "required": true
            },
            "submit_by": {
              "selector": ".//p[contains(normalize-space(.), 'Submit By:')]",
              "value": "text"
            },
            "pitch_url": {
              "selector": ".//a[contains(normalize-space(.), 'Learn More') and contains(normalize-space(.), 'Pitch')]",
              "value": "attr",
              "attr": "href"
            }
          }
        }
      }
    }
  ],
  "output": {
    "count": { "var": "vars.opportunities.summary.item_count" },
    "first_title": { "var": "vars.opportunities.items[0].values.title" },
    "opportunities": {
      "map": {
        "over": { "var": "vars.opportunities.items" },
        "as": "opportunity",
        "do": {
          "outlet": { "var": "opportunity.values.outlet" },
          "title": { "var": "opportunity.values.title" },
          "submit_by": { "var": "opportunity.values.submit_by" },
          "pitch_url": { "var": "opportunity.values.pitch_url" }
        }
      }
    }
  }
}

Structured result shape, abridged:

{
  "value": null,
  "values": [],
  "item": {
    "index": 0,
    "values": {
      "outlet": "Attentus Technologies",
      "title": "CFOs & IT leaders on in-house vs managed IT",
      "submit_by": "Submit By: 4 May 7:17PM CEST (in 3 days)"
    },
    "fields": {
      "title": {
        "value": "CFOs & IT leaders on in-house vs managed IT",
        "values": ["CFOs & IT leaders on in-house vs managed IT"],
        "match_count": 1,
        "value_count": 1,
        "truncated": false
      }
    }
  },
  "items": [
    {
      "index": 0,
      "values": {
        "outlet": "Attentus Technologies",
        "title": "CFOs & IT leaders on in-house vs managed IT",
        "submit_by": "Submit By: 4 May 7:17PM CEST (in 3 days)"
      },
      "fields": {
        "title": {
          "value": "CFOs & IT leaders on in-house vs managed IT",
          "values": ["CFOs & IT leaders on in-house vs managed IT"],
          "match_count": 1,
          "value_count": 1,
          "truncated": false
        }
      }
    }
  ],
  "summary": {
    "item_count": 1,
    "truncated": false
  }
}

DOM extraction rules:

  • text returns normalized text.
  • attr returns the selected attribute value or null.
  • html and outer_html return raw HTML projections capped at 8,192 characters.
  • exists returns a boolean aggregate in value.
  • count returns the number of matched elements retained up to max_matches.
  • Input HTML is capped at 200,000 characters before parsing.
  • summary.truncated is true when input, matches, or raw HTML projection are capped.
  • Hidden, script, style, noscript, and head content is removed before selector matching.
  • required: true fails only when no element matches. A matched empty node or missing projected attribute does not trigger a required failure.
  • Field selectors are scoped to their matched container.
  • Field-level selector_type inherits the parent selector engine. Mixed CSS and XPath record extraction is rejected.
  • XPath field selectors must be relative, for example .//span[@class='price']; absolute field selectors beginning with / or // are rejected.
  • Container tag and attributes are absent unless include_container requests them.

Runtime limits and errors

The DSL is deterministic and does not perform network IO. Helper calls are limited to MailWebhook’s registered helpers.

Runtime limits:

  • Maximum expression depth: 50
  • Maximum nodes evaluated: 10,000
  • Regex timeout: about 50 ms per operation
  • transform.html_to_text, extract.urls, extract.bullet_list, extract.key_value_pairs, extract.tables, and extract.dom: about 200 ms per helper call
  • extract.reply_segments: bounded by reply-segmentation runtime guardrails

Most operator-level failures return null. Hard guard failures raise a mapper error and stop route delivery.

Unsupported features

The DSL intentionally does not support:

  • arbitrary functions
  • network calls
  • mutation
  • loops outside array helpers