extend.split
extend.split, function
def table extend.split(source: text, separator: text): table
def table extend.split(source: text, separator: text, unit: text): table
Splits source by one or more separator values and returns a table of tokens
with their starting position. When unit is provided, tokens ending in a unit
and preceded by a digit are split into number + unit.
source: text to split.separator: list of separators (table input).unit: optional list of units (table input).
Examples
table Phrases = with
[| as Text |]
[| "blue|berry" |]
[| "fast red" |]
table Seps = with
[| as Sep |]
[| "|" |]
[| " " |]
table Tokens max 1000 = extend.split(Phrases.Text, Seps.Sep)
show table "Tokens" with
Phrases.Text
Tokens.Token
Tokens.Position
This produces the following table:
| Text | Token | Position |
|---|---|---|
| blue | berry | blue |
| blue | berry | berry |
| fast red | fast | 0 |
| fast red | red | 5 |
table Phrases = with
[| as Text |]
[| "12kg" |]
[| "5cm" |]
table Seps = with
[| as Sep |]
[| " " |]
table Units = with
[| as Unit |]
[| "kg" |]
[| "cm" |]
table Tokens max 1000 = extend.split(Phrases.Text, Seps.Sep, Units.Unit)
show table "Units" with
Phrases.Text
Tokens.Token
Tokens.Position
This produces the following table:
| Text | Token | Position |
|---|---|---|
| 12kg | 12 | 0 |
| 12kg | kg | 2 |
| 5cm | 5 | 0 |
| 5cm | cm | 1 |
Remarks
Tables produced by extend.split default to a maximum size of 100m lines unless
an explicit max constraint is provided.
Errors
extend.split fails when a separator is empty, the separator table is empty, or
the separator or unit tables exceed 100 rows.