The DOM via JSON
April 11, 2023
In a A Document Object Model in APL the usefulness of an APL DOM was discussed, and a function HTML2DOM
demonstrated:
HTML2DOM←{
⍝ ⍵ ←→ HTML
⍝ ← ←→ DOM
0=≢⍵:⍬
{⍵⌷⍨⍳1=⍴⍵}0{
m←⍵
0=≢m:⍺
b←m[;0]=0
p←⍺{⍺ New 3↑1↓⍵}¨↓b⌿m
m[;0]-←1
p⊣p ∇¨1↓¨b⊂[0]m
}⎕XML ⍵
}
This function takes HTML and recursively works through the ⎕XML
matrix using the New
function, building a tree of element namespaces. The New
function is the same function used to build up a document from scratch. All of this code is in the Abacus project.
In a recent discussion on The APL Orchard, it was noted that ⎕JSON
could be used to create a namespace tree from HTML. While it is no secret that ⎕JSON
creates a namespace tree from JSON (that's what it does!), it never occurred to me to use it for creating an HTML DOM. It seems it should be faster than doing it by "hand", but there is some additional overhead in going from HTML to JSON and then to a namespace tree. Let's see how to do it.
The problem boils down to converting a ⎕XML
-style matrix to a ⎕JSON
-style matrix. We know what a ⎕XML
matrix looks like, and we know how the DOM is stuctured (which is good, as half the battle is simply knowing the destination), but what on Earth does the corresponding ⎕JSON
matrix look like? Fortunately, we can create a simple document fragment:
A←#.Abacus.Main
SimpleDoc←{
d←A.New'section'
d.class←'post'
d.id←'p20230411'
h←d A.New'h1' 'Simple Doc'
l←d A.New'ul'
l.Content←{A.New'li'⍵}¨'One' 'Two' 'Three'
_←(A.Elements d).⎕EX⊂'Parent'⍝ Not good for JSON
d
}
d←SampleDoc 0
h←A.DOM2HTML d
h
<section class="post" id="p20230411">
<h1>Simple Doc</h1>
<ul>
<li>One</li>
<li>Two</li>
<li>Three</li>
</ul>
</section>
And see what the familiar source ⎕XML
matrix looks like:
⎕XML h
0 section class post 3
id p20230411
1 h1 Simple Doc 5
1 ul 3
2 li One 5
2 li Two 5
2 li Three 5
And the corresponding unfamiliar target ⎕JSON
matrix:
⎕JSON⍠'M'⊢⎕JSON d
0 1
1 Content 2
2 1
3 Content 2
4 Simple Doc 4
3 Tag h1 4
2 1
3 Content 2
4 1
5 Content 2
6 One 4
5 Tag li 4
4 1
5 Content 2
6 Two 4
5 Tag li 4
4 1
5 Content 2
6 Three 4
5 Tag li 4
3 Tag ul 4
1 Tag section 4
1 class post 4
1 id p20230411 4
The are indeed a little different. By inspection, and some trial and error, we can write a function to go directly from the ⎕XML
matrix to the ⎕JSON
matrix:
XM2JM←{
⍝ ⍵ ←→ ⎕XML matrix
⍝ ← ←→ ⎕JSON matrix
⊃⍪/{
d e t a←4↑⍵
k←2×d
j←k+1
e≡'':⍉⍪k''t 4 ⍝ Data 1 row OR...
z←⊂k'' '' 1 ⍝ Object (1 row)+
z,←⊂j'Tag'e 4 ⍝ Element (1 row)+
z,←4,⍨¨j,¨↓a ⍝ Attributes (0 or more rows)+
z,←⊂j'Content' '' 2 ⍝ Content (1 row)+
z,←(0<≢t)/⊂(j+1)''t 4 ⍝ Text (0 or 1 row)
↑z}¨↓⍵
}
This loops through each row of the ⎕XML
matrix, which is less than desirable (but surprisingly not particularly slow). However it provides us insight into how to take a more array oriented approach:
XM2JM←{
⍝ ⍵ ←→ ⎕XML matrix
⍝ ← ←→ ⎕JSON matrix
(o t)←↓⍉~⍵[;1 2]∊⊂'' ⍝ Object, text
l←≢¨⍵[;3] ⍝ Number of attributes
n←o*⍨3+t+l ⍝ Target rows per source row
s←0,+\¯1↓n ⍝ Starting index of source in target
j←(o∧t)/3+s+l ⍝ Text index
m←0 '' '' 1⍴⍨4,⍨+/n ⍝ Initialize result
m[o/1+s;1 2 3]←(⊂'Tag'),4,⍨⍪o/⍵[;1] ⍝ Tag row
m[2+∊s+⍳¨l;1 2 3]←4,⍨⊃⍪/⍵[;3] ⍝ Possible attribute rows
m[o/2+s+l;1 2 3]←(+/o)⌿⍉⍪'Content' '' 2 ⍝ Content row
m[j;2 3]←4,⍨⍪⍵[;2]/⍨o∧t ⍝ Possible Text row
m[s/⍨~o;2 3]←4,⍨⍪⍵[;2]/⍨~o ⍝ Text row
m[;0]←(n/2×⍵[;0])+(~m[;1]∊⊂'')+2×j∊⍨⍳≢m ⍝ Depth
m
}
With the exception of the depth column, the target matrix is filled in over about 6 steps. There may be a way to fill in each column in one go, as the depth is done on the penultimate line.
In the next post we will investigate the relative performance of these various techniques.