Way back in September 2022 we looked at threading the HTMLRenderer. A few changes have happened since then including some additional requirements around modal dialogs, progress bars, and confirmation boxes. Let's take a look.
To review, Abacus uses websockets for two-way asynchronous communication between the browser (whether the HTMLRenderer or a remote, independent client browser) and an APL session.
There are four different types of client-server interaction that must be considered.
The first two types of interaction originate on the client side, from the user taking action in the browser. These interactions are asynchronous - the client sends a message to the server and goes on its merry way. The server may send back 0 or more messages at some point.
The 1st interaction type is the normal case of handling basic user actions like clicking a button, entering text in an input field or scrolling though a datagrid. These messages are threaded and queued. Each message executes an APL handler function in its own thread, but each thread must wait for the previous thread to complete before it starts. This queue is managed by ⎕TSYNC. Why are these messages threaded only to be queued and run sequentially? Because we always want the websocket messages to be handled immediately (thus the threading), but in the normal case we want user generated events handled in order (thus the queue). The APL handler function will in this case almost always send some HTML back to the browser.
Note that these threaded-and-queued messages can, if they need to, kick off long-running processes in yet another thread and report back immediately to the client, and avoid tying up the server.
There is at least one other technique for handling threaded and queued messages. There is no reason that we need a new thread for each message. Messages must be dispatched in a thread other than 0, but since they are queued they do not need to be in different threads from each other. Thus, when the app starts, we could create one permanent thread with ⎕TGET in a loop, and have the main thread chuck messages into it using ⎕TPUT. You would think this might be more efficient than creating a new short-lived thread for every single message. But you might be wrong.
The 2nd interaction type is the case of handling user actions that control or modify a previous user action. Consider a confirmation dialog box. This is Modal with a capital M - that is a function executing in APL is paused on a particular line, waiting for the user in the browser to take some action, like Continue or Cancel. This message cannot wait for the previous request to finish, because the previous request is asking the user if he wants to continue or cancel the previous request. Therefore it must execute immediately and without delay on the server. Or consider a progress bar dialog that must let the server know that the process kicked off by the previous message should be canceled or paused. This too cannot wait for the previous message to complete. These messages are unthreaded and unqueued. These messages generally do not send any HTML back to the client - that is done by the message they are modifying.
There are also modal dialog boxes with a lower case m. These are modal only in the sense that the user cannot interact with the rest of the page until the modal is dismissed. There is no pendant or waiting APL function over on the server. Generally modal dialogs should be avoided, and Modal dialogs with a capital M avoided even more, but they both have their proper place, and it is important that we can create them and have automated tests that exercise them.
The 3rd and 4th types of interaction originate on the server and simulate synchronous behavior. That is, a function on the server sends a message to the client and waits, using ⎕TGET, for a response. Both of these types are generally used only for automated testing. There is generally no reason for an application to need this functionality in the normal course.
The 3rd type is the case of handling synchronous JavaScript. The server sends a JavaScript snippet to the client for execution and waits for a response. The server will send the TID with the request as an identifier, and the client will send it back. This allows the server to use ⎕TGET and ⎕TPUT to implement synchronous behavior. The prime use of synchronous JavaScript requests is testing: the server needs to get the innerHTML of some element to inspect the state of the client. The result message from the client is not threaded or queued, and requires no real processing of any sort; it is simply chucked to the waiting server thread using ⎕TPUT.
The 4th type of interaction is the server firing an event on the client, which in turn is handled by the server. When a function on the server sends an event to be fired back on the client, it must wait (in a thread, so as not to block) for the client to send a request back, and for that request to complete. Then, and only then, can the server function inspect the state of the server and/or the client to make sure the intended thing actually happened. When the server handles the message that the server has asked the client to send, it handles it just as if the user initiated the event, with the exception that when the task completes, the thread handling the task must notify the waiting thread of completion. Again, this is generally only used for automated testing.
Now that we know how to make attractive charts in SharpPlot, the next step is add interactivity. SharpPlot has a brief tutorial on this topic, and provides various methods for making charts interactive. The AddHyperlinks method will add a hyperlink to any bar or point in a chart. The AddAttributes method allows an arbitrary attribute and value to be inserted into various elements
However, much of the techniques used are outdated given where CSS and SVG are now and the existence of the HTMLRenderer. In addition, using SharpPlot itself to add interactivity might be useful if we were to rely on different output formats but our only concern is SVG. All we need to do is to be able to identify and address the elements of interest. One option to accomplish this is to use the AddAttributes method to add an id to the elements. Unfortunately, AddAttributes adds an additional <rect> element for every <text> element, (and then adds it own id as well). For example, here is a snippet of SVG from a basic bar chart:
I'm sure there was a reason in the past for having the <rect> element, probably just to apply the pointer-visible attribute, but I don't think there is any need for it today. This gets in the way of, say, making the text of one x axis value bold using CSS. We need to identify the <text> element, not some associated <rect> element.
Luckily we can use Abacus to create an APL DOM of the SVG text emitted by SharpPlot. Then we can easily manipulate elements, add attributes, and so on. The problem is that the SVG is full of largely unidentifiable <text> and <rect> elements. But there are comments embedded using the <desc> element, as can be seen above. We can do some crude coding and sort of find out where things are. For example, here is a function that identifies the basic elments of a single series bar chart, adding id and class attributes:
AddIdsToDOM←{
⍝ ⍵ ←→ DOM
⍝ Crude Technique that relies on comments
⍝ Will not work if AddAttributes is used in certain circumastances
⍝ ... as additional elements are inserted.
⍝ Works only on basic bar chart with one series
A←#.Abacus.Main
e←A.Elements ⍵
n←'xlabel' 'ylabel' 'value' 'point'
v←'for X-axis labels' 'Y-axis labels' 'Data value labels ...'('Start of Barchart ',11⍴'=')
⍵⊣n{
p←⊃e A.ElementsWhere'Content'⍵
c←(e⊃⍨1+e⍳p).Content
c.class←⊂⍺
c.id←⍺∘,¨⍕¨⍳≢c
0
}¨v
}
Now we can easily identify and manipulate all the relevant elements. (Of course SharpPlot knows exactly where and what everything is when it generates the SVG, and it would be much better if it added the id and class attributes itself.) Now we can construct a bar chart that operates like a pick list, allowing the user to scroll up and down, highlighting the current selection by placing a border around the bar and bolding and increasing the font size of the associated labels:
Note that if you inspect the source of this chart, it is not as it would appear in an application. Here, for convenience in a static web site, we simply do the highlighting by using the style attribute. In an application, classes are used with external style sheets. Scrolling up and down will change the class of the bar, for example, from unselected to selected.
Consider getting a useful first impression and understanding of a single column in a database table, (or a vector of values all of the same type). If there are only a few unique values in the column, say a dozen or less, then a frequency distribution is appropriate. We get an immediate, informative overview of the data, regardless of the type. This is easily displayed in a bar chart. Here we have the distribution of stints for major league baseball players in 2019. A stint is a period of time with a particular team. We can see that most players spent the entire season with one team, while 12 players played for 3 teams:
However, as the number of unique values grows, a frequency distribution becomes less and less useful. When every value is unique, the distribution degenerates into the entire original column catenated with a vector of 1's. For quantitative or temporal data, this problem is easily solved by grouping into bins or buckets, reducing the number of categories. Here we have the number of games played per player for 2019:
However, if the data is categorical, it is generally not possible to meaningfully group the data. One option is to produce a frequency distribution that shows only the top 10 (say) categories, grouping the remainder into an "other" category. This works well when there are many categories, and the categories are of varying sizes, keeping the "other" category relatively small:
If there are many categories and they are similar-sized, this breaks down. Here we have a distribution of the PlayerID column, which is mostly unique, except for players that have done multiple stints in the season:
What, then, is to be done in the case of a categorical column with many evenly distributed unique values? If a frequency distribution is inadequate how about a frequency distribution of the frequency distribution? That is, a table displaying the number of values that occur once, the number of values that occur twice, the number of values that occur three times, etc.:
Value Occurs
Unique Count
Total Rows
Once
1,264
1,264
Twice
134
268
Three times
12
36
Four times
0
0
Five +
0
0
Total
1,410
1,568
This table is a much more useful first look at high-variance categorical data. For example, it is immediately apparent if the values are unique and suitable for a key column. It is easy to identify outliers, that is duplicate values or triplicate values. Let's call this a second-order frequency distribution.
By inspection we can tell whether a first-order or second-order distribution will be more useful, and come up with some back-of-the-envelope algorithm to make the choice, which may well be sufficient. But is there a way to actually compute the variance of a categorical column and use that measure to determine what exactly is "high-variance" categorical data? That question and some APL code will be explored in a future post.