Tool of Thought

APL for the Practical Man

"You don't know how bad your design is until you write about it."

LeetCode 185: Department Top Three Salaries

June 25, 2022

Let's solve LeetCode 185, the Department Top Three Salaries problem, in FlipDB, our array-oriented, APL-based DSL. We will use FlipDB directly in the APL session.

We are given two tables, a department table and an employee table:

      d←s.Get '/Databases/LeetCode185'
      d.Display 0
 ── LeetCode185.Department                       
  ┌ID──┐  ┌Name───┐                            
  ↓1   │  ↓IT     │                            
  │2   │  │Sales  │                            
  └Int8┘  └Char(5)┘                            
 ── 2 rows by 2 columns                        
                                               
 ── LeetCode185.Employee ───────────────────── 
  ┌ID──┐  ┌Name───┐  ┌Salary┐  ┌DepartmentID┐  
  ↓1   │  ↓Joe    │  ↓85,000│  ↓1           │  
  │2   │  │Henry  │  │80,000│  │2           │  
  │3   │  │Sam    │  │60,000│  │2           │  
  │4   │  │Max    │  │90,000│  │1           │  
  │5   │  │Janet  │  │69,000│  │1           │  
  │6   │  │Randy  │  │85,000│  │1           │  
  │7   │  │Will   │  │70,000│  │1           │  
  └Int8┘  └Char(5)┘  └Int32─┘  └Int8────────┘  
 ── 7 rows by 4 columns ────────────────────── 

The problem is:

A company's executives are interested in seeing who earns the most money in each of the company's departments. A high earner in a department is an employee who has a salary in the top three unique salaries for that department. Write an SQL query to find the employees who are high earners in each of the departments.

The Department table does not add anything to the problem. It simply gives a nice description for each department, and we can dispense with it.

We create a new query on the Employee table:

      t←d.GetTable 'Employee'
      q←t.Query '' 

The task is simply to select some rows. There is no computation or aggregation required in the result, so the only thing we need to specify is a where clause, making use of the rankDown function:

      q.Where←'3 > 1 rankDown by Salary (group DepartmentID)'
      r←q.Execute 0
      r.Display 0
── Key:ID ───────────────────────────────────
 ┌ID──┐  ┌Name───┐  ┌Salary┐  ┌DepartmentID┐ 
 ↓1   │  ↓Joe    │  ↓85,000│  ↓1           │ 
 │2   │  │Henry  │  │80,000│  │2           │ 
 │3   │  │Sam    │  │60,000│  │2           │ 
 │4   │  │Max    │  │90,000│  │1           │ 
 │6   │  │Randy  │  │85,000│  │1           │ 
 │7   │  │Will   │  │70,000│  │1           │ 
 └Int8┘  └Char(5)┘  └Int32─┘  └Int8────────┘ 
── 6 rows by 4 columns ──────────────────────

Note first that the rankDown function is in ⎕IO←0, so we are selecting salaries ranked 0, 1, or 2. Note further that rankDown takes a left argument of 0 1 or 2, for rank, dense rank, and average rank, respectively, so in this case we are applying dense rank. Finally note that rankDown is passed to the by operator to apply it within each department.

We can pick apart this where clause expression and see how it works, right in the APL session:

        disp←{⍵.Display 0}
        Salary←85 80 60 90 69 85 70
        Department←1 2 2 1 1 1 1

The group function returns the indices for each unique value:

        disp group Department
┌───────────┐
↓[0,3,4,5,6]│
│[1,2]      │
└Int8───────┘

The rankDown function, with a left argument of 1 for dense, applied to Salary directly:

       disp 1 rankDown Salary
┌────┐
↓1   │
│2   │
│5   │
│0   │
│4   │
│1   │
│3   │
└Int8┘

And now applied via the by operator, so that we rank within each group:

        disp 1 rankDown by Salary (group Department)
┌────┐
↓1   │
│0   │
│1   │
│0   │
│3   │
│1   │
│2   │
└Int8┘

The by operator groups the data and applies its operand function to each group, and then ungroups the data, restoring it to its original order. It is particularly useful with uniform functions like rankDown, but it is also useful with aggregate and structural functions.

Finally we flag the top 3 ranks:

      disp 3 gt 1 rankDown by Salary(group Department)
┌───────┐
↓1      │
│1      │
│1      │
│1      │
│0      │
│1      │
│1      │
└Boolean┘

Design Issue

There is a tradeoff when designing a DSL, between more functions with fewer arguments, and fewer functions with more arguments. In the case of rank, FlipDB currently has two functions, rankUp and rankDown, each taking three potential values as a left argument for an effective total combination of 6 functions:

rankUp
rankDown
denseRankUp
denseRankDown
averageRankUp
averageRankDown

We could of course have only one function, rank, and provide yet another argument for the direction. The idea behind building in the ordering into the name was to make it similar to the related APL primitives grade up and grade down. Two functions seemed good, 6 functions seemed overkill at the time, but I'm reconsidering. It would be easier to read denseRankDown Salary than 1 rankDown Salary. The latter requires a trip to the documentation, while the former is discoverable with autocomplete.