Tool of Thought

APL for the Practical Man

The Abacus Threading Model

April 28, 2024

Way back in September 2022 we looked at threading the HTMLRenderer. A few changes have happened since then including some additional requirements around modal dialogs, progress bars, and confirmation boxes. Let's take a look.

To review, Abacus uses websockets for two-way asynchronous communication between the browser (whether the HTMLRenderer or a remote, independent client browser) and an APL session.

There are four different types of client-server interaction that must be considered.

The first two types of interaction originate on the client side, from the user taking action in the browser. These interactions are asynchronous - the client sends a message to the server and goes on its merry way. The server may send back 0 or more messages at some point.

The 1st interaction type is the normal case of handling basic user actions like clicking a button, entering text in an input field or scrolling though a datagrid. These messages are threaded and queued. Each message executes an APL handler function in its own thread, but each thread must wait for the previous thread to complete before it starts. This queue is managed by ⎕TSYNC. Why are these messages threaded only to be queued and run sequentially? Because we always want the websocket messages to be handled immediately (thus the threading), but in the normal case we want user generated events handled in order (thus the queue). The APL handler function will in this case almost always send some HTML back to the browser.

Note that these threaded-and-queued messages can, if they need to, kick off long-running processes in yet another thread and report back immediately to the client, and avoid tying up the server.

There is at least one other technique for handling threaded and queued messages. There is no reason that we need a new thread for each message. Messages must be dispatched in a thread other than 0, but since they are queued they do not need to be in different threads from each other. Thus, when the app starts, we could create one permanent thread with ⎕TGET in a loop, and have the main thread chuck messages into it using ⎕TPUT. You would think this might be more efficient than creating a new short-lived thread for every single message. But you might be wrong.

The 2nd interaction type is the case of handling user actions that control or modify a previous user action. Consider a confirmation dialog box. This is Modal with a capital M - that is a function executing in APL is paused on a particular line, waiting for the user in the browser to take some action, like Continue or Cancel. This message cannot wait for the previous request to finish, because the previous request is asking the user if he wants to continue or cancel the previous request. Therefore it must execute immediately and without delay on the server. Or consider a progress bar dialog that must let the server know that the process kicked off by the previous message should be canceled or paused. This too cannot wait for the previous message to complete. These messages are unthreaded and unqueued. These messages generally do not send any HTML back to the client - that is done by the message they are modifying.

There are also modal dialog boxes with a lower case m. These are modal only in the sense that the user cannot interact with the rest of the page until the modal is dismissed. There is no pendant or waiting APL function over on the server. Generally modal dialogs should be avoided, and Modal dialogs with a capital M avoided even more, but they both have their proper place, and it is important that we can create them and have automated tests that exercise them.

The 3rd and 4th types of interaction originate on the server and simulate synchronous behavior. That is, a function on the server sends a message to the client and waits, using ⎕TGET, for a response. Both of these types are generally used only for automated testing. There is generally no reason for an application to need this functionality in the normal course.

The 3rd type is the case of handling synchronous JavaScript. The server sends a JavaScript snippet to the client for execution and waits for a response. The server will send the TID with the request as an identifier, and the client will send it back. This allows the server to use ⎕TGET and ⎕TPUT to implement synchronous behavior. The prime use of synchronous JavaScript requests is testing: the server needs to get the innerHTML of some element to inspect the state of the client. The result message from the client is not threaded or queued, and requires no real processing of any sort; it is simply chucked to the waiting server thread using ⎕TPUT.

The 4th type of interaction is the server firing an event on the client, which in turn is handled by the server. When a function on the server sends an event to be fired back on the client, it must wait (in a thread, so as not to block) for the client to send a request back, and for that request to complete. Then, and only then, can the server function inspect the state of the server and/or the client to make sure the intended thing actually happened. When the server handles the message that the server has asked the client to send, it handles it just as if the user initiated the event, with the exception that when the task completes, the thread handling the task must notify the waiting thread of completion. Again, this is generally only used for automated testing.

Making SharpPlot Charts Interactive

December 21, 2023

Now that we know how to make attractive charts in SharpPlot, the next step is add interactivity. SharpPlot has a brief tutorial on this topic, and provides various methods for making charts interactive. The AddHyperlinks method will add a hyperlink to any bar or point in a chart. The AddAttributes method allows an arbitrary attribute and value to be inserted into various elements

However, much of the techniques used are outdated given where CSS and SVG are now and the existence of the HTMLRenderer. In addition, using SharpPlot itself to add interactivity might be useful if we were to rely on different output formats but our only concern is SVG. All we need to do is to be able to identify and address the elements of interest. One option to accomplish this is to use the AddAttributes method to add an id to the elements. Unfortunately, AddAttributes adds an additional <rect> element for every <text> element, (and then adds it own id as well). For example, here is a snippet of SVG from a basic bar chart:

<desc>for X-axis labels</desc>
 <g font-family="Times New Roman" font-size="80" text-anchor="middle" >
  <text x="793" y="2096" >North</text>
  <text x="1548" y="2096" >South</text>
  <text x="2303" y="2096" >East</text>
  <text x="3058" y="2096" >West</text>
 </g>

We want to be able to identify and manipulate these <text> elements, but when we add an id using the AddAttributes method we get:

<desc>for X-axis labels</desc>
 <rect x="699" y="2024" width="187" height="88" fill="none" pointer-events="visible" id="chart1_XLabels_1" myid="xlabel0" > </rect>
 <rect x="1454" y="2024" width="187" height="88" fill="none" pointer-events="visible" id="chart1_XLabels_2" myid="xlabel1" > </rect>
 <rect x="2209" y="2024" width="187" height="88" fill="none" pointer-events="visible" id="chart1_XLabels_3" myid="xlabel2" > </rect>
 <rect x="2964" y="2024" width="187" height="88" fill="none" pointer-events="visible" id="chart1_XLabels_4" myid="xlabel3" > </rect>
 <g font-family="Times New Roman" font-size="80" text-anchor="middle" pointer-events="none" >
  <text x="793" y="2096" >North</text>
  <text x="1548" y="2096" >South</text>
  <text x="2303" y="2096" >East</text>
  <text x="3058" y="2096" >West</text>
 </g>

I'm sure there was a reason in the past for having the <rect> element, probably just to apply the pointer-visible attribute, but I don't think there is any need for it today. This gets in the way of, say, making the text of one x axis value bold using CSS. We need to identify the <text> element, not some associated <rect> element.

Luckily we can use Abacus to create an APL DOM of the SVG text emitted by SharpPlot. Then we can easily manipulate elements, add attributes, and so on. The problem is that the SVG is full of largely unidentifiable <text> and <rect> elements. But there are comments embedded using the <desc> element, as can be seen above. We can do some crude coding and sort of find out where things are. For example, here is a function that identifies the basic elments of a single series bar chart, adding id and class attributes:

AddIdsToDOM←{
     ⍝ ⍵ ←→ DOM
     ⍝ Crude Technique that relies on comments
     ⍝ Will not work if AddAttributes is used in certain circumastances
     ⍝ ... as additional elements are inserted.
     ⍝ Works only on basic bar chart with one series
     A←#.Abacus.Main
     e←A.Elements ⍵
     n←'xlabel' 'ylabel' 'value' 'point'
     v←'for X-axis labels' 'Y-axis labels' 'Data value labels ...'('Start of Barchart ',11⍴'=')
     ⍵⊣n{
         p←⊃e A.ElementsWhere'Content'⍵
         c←(e⊃⍨1+e⍳p).Content
         c.class←⊂⍺
         c.id←⍺∘,¨⍕¨⍳≢c
         0
     }¨v
 }

Which yields:

<desc>for X-axis labels</desc>
    <g font-family="Times New Roman" font-size="80" text-anchor="middle">
      <text class="xlabel" id="xlabel0" x="793" y="2096">North</text>
      <text class="xlabel" id="xlabel1" x="1548" y="2096">South</text>
      <text class="xlabel" id="xlabel2" x="2303" y="2096">East</text>
      <text class="xlabel" id="xlabel3" x="3058" y="2096">West</text>
    </g>

Now we can easily identify and manipulate all the relevant elements. (Of course SharpPlot knows exactly where and what everything is when it generates the SVG, and it would be much better if it added the id and class attributes itself.) Now we can construct a bar chart that operates like a pick list, allowing the user to scroll up and down, highlighting the current selection by placing a border around the bar and bolding and increasing the font size of the associated labels:

Created by Causeway SVG engine - SharpPlot v3.71.0 Paint the paper ===== Border ===== for X-axis labels 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 Heading, subheading and footnotes ===== Region ===== Y-axis labels Cash-Out Refinance Purchase X-Axis Ticks ===== X-Axis tickmarks Y-Axis Ticks ===== Y-Axis tickmarks Start of Horizontal Barchart =========== Axes ===== Data value labels ... 1,000 1,100 7,900 Reset to original origin

Note that if you inspect the source of this chart, it is not as it would appear in an application. Here, for convenience in a static web site, we simply do the highlighting by using the style attribute. In an application, classes are used with external style sheets. Scrolling up and down will change the class of the bar, for example, from unselected to selected.

On Categorical Data

December 6, 2023

Consider getting a useful first impression and understanding of a single column in a database table, (or a vector of values all of the same type). If there are only a few unique values in the column, say a dozen or less, then a frequency distribution is appropriate. We get an immediate, informative overview of the data, regardless of the type. This is easily displayed in a bar chart. Here we have the distribution of stints for major league baseball players in 2019. A stint is a period of time with a particular team. We can see that most players spent the entire season with one team, while 12 players played for 3 teams:

Created by Causeway SVG engine - SharpPlot v3.71.0 Paint the paper ===== Border ===== for X-axis labels 200 400 600 800 1,000 1,200 1,400 Heading, subheading and footnotes ===== Region ===== Y-axis labels 3 2 1 X-Axis Ticks ===== X-Axis tickmarks Y-Axis Ticks ===== Y-Axis tickmarks Start of Horizontal Barchart =========== Axes ===== Data value labels ... 12 146 1,410 Reset to original origin

However, as the number of unique values grows, a frequency distribution becomes less and less useful. When every value is unique, the distribution degenerates into the entire original column catenated with a vector of 1's. For quantitative or temporal data, this problem is easily solved by grouping into bins or buckets, reducing the number of categories. Here we have the number of games played per player for 2019:

Created by Causeway SVG engine - SharpPlot v3.71.0 Paint the paper ===== Border ===== for X-axis labels 50 100 150 200 250 300 350 400 450 500 550 600 Heading, subheading and footnotes ===== Region ===== Y-axis labels 160 to 179 140 to 159 120 to 139 100 to 119 80 to 99 60 to 79 40 to 59 20 to 39 0 to 19 X-Axis Ticks ===== X-Axis tickmarks Y-Axis Ticks ===== Y-Axis tickmarks Start of Horizontal Barchart =========== Axes ===== Data value labels ... 11 84 81 60 78 137 161 391 565 Reset to original origin

However, if the data is categorical, it is generally not possible to meaningfully group the data. One option is to produce a frequency distribution that shows only the top 10 (say) categories, grouping the remainder into an "other" category. This works well when there are many categories, and the categories are of varying sizes, keeping the "other" category relatively small:

Created by Causeway SVG engine - SharpPlot v3.71.0 Paint the paper ===== Border ===== for X-axis labels 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 Heading, subheading and footnotes ===== Region ===== Y-axis labels Other Moseley Crozet Suffolk Chesapeake Clarksville Chesterfield Midlothian Charlottesville Virginia Beach Richmond X-Axis Ticks ===== X-Axis tickmarks Y-Axis Ticks ===== Y-Axis tickmarks Start of Horizontal Barchart =========== Axes ===== Data value labels ... 4,500 100 200 200 200 200 400 500 500 1,100 2,100 Reset to original origin

If there are many categories and they are similar-sized, this breaks down. Here we have a distribution of the PlayerID column, which is mostly unique, except for players that have done multiple stints in the season:

Created by Causeway SVG engine - SharpPlot v3.71.0 Paint the paper ===== Border ===== for X-axis labels 200 400 600 800 1,000 1,200 1,400 1,600 Heading, subheading and footnotes ===== Region ===== Y-axis labels Other mejiaad01 maldoma01 josepco01 fontwi01 dullry01 darnatr01 broxtke01 biddlje01 austity01 altheaa01 X-Axis Ticks ===== X-Axis tickmarks Y-Axis Ticks ===== Y-Axis tickmarks Start of Horizontal Barchart =========== Axes ===== Data value labels ... 1,538 3 3 3 3 3 3 3 3 3 3 Reset to original origin

What, then, is to be done in the case of a categorical column with many evenly distributed unique values? If a frequency distribution is inadequate how about a frequency distribution of the frequency distribution? That is, a table displaying the number of values that occur once, the number of values that occur twice, the number of values that occur three times, etc.:

Value OccursUnique CountTotal Rows
Once1,2641,264
Twice134268
Three times1236
Four times00
Five +00
Total1,4101,568

This table is a much more useful first look at high-variance categorical data. For example, it is immediately apparent if the values are unique and suitable for a key column. It is easy to identify outliers, that is duplicate values or triplicate values. Let's call this a second-order frequency distribution.

By inspection we can tell whether a first-order or second-order distribution will be more useful, and come up with some back-of-the-envelope algorithm to make the choice, which may well be sufficient. But is there a way to actually compute the variance of a categorical column and use that measure to determine what exactly is "high-variance" categorical data? That question and some APL code will be explored in a future post.

More posts...