Joint meeting with ASSUME

At the end of March 1999 the statistical computing section held a joint one day meeting with ASSUME, (Association of Statistical Specialists Using Microsoft Excel).

David Sayer

Building Statistical Add-ins was the theme of David Sayer's talk, which focused on the Dynamic Link Libraries (dll's) and the Statistical Add-in to Excel provided by NAg Ltd.

He described how the Dynamic Link Libraries, a cluster of routines from the NAg library, can be called from packages such as VBA within Excel. Once linked with Excel the routines are then available via the Function wizard.

The add-in is a cluster of 51 NAg library routines covering topics such as Regression, Time-series and Multivariate analyses, ANOVA, GLMs and basic statistics. These routines are packaged into an Excel add-in to allow for ease of use. It has a 'two for the price of one' benefit as it not only contains the excel code, but also provides a dll which can be called from packages such as Delphi, VB, C, Excel and Fortran. As the add-in is a new venture it is constantly evolving and a number of improvement areas were identified where work is ongoing. NAg are actively looking for feedback on a areas such as: which additional routines should be included; and what output is required.

Jim Shalliker

Jim Shalliker, from the University of Plymouth, talked about the use of Excel to simulate queueing systems. In particular he discussed a real-life case study of the appointment and consultation system at a local health centre. This case study was used as the basis for a simulation assignment on a third year module, Queueing and Simulation. Data and simple descriptive statistics were made available and students were required to select suitable models, and then simulate and compare the performance of different types of appointment schedule using both Excel and a dedicated simulation package, Simul8. The apparently simple queueing system turns out to more complex than it appears and simulation is not straightforward. For example, it is not FCFS but First Booked First Served; however overtaking can take place if a patient arrives sufficiently late. The system, which is based on a 3 hour surgery, is not in equilibrium. It has a mixture of deterministic (booked) times with a stochastic element ("lateness") superimposed. A proportion of bookings are cancelled at the last minute, or result in a no-show. A variety of performance measures need to be considered. The GP's idle time should be minimized, but non-zero idle times should be long enough to be useful. Patients' waiting times should be minimized and an individual patient's waiting time should be independent of his or her place in the queue. Surprisingly, Excel turned out to be a marginally better simulation tool for this job than the specialist package Simil8. Jim discussed the differences in more detail, together with suggested solutions to modelling and analysis problems in both packages. The arrival pattern was easier to set up in Excel but the no-sUse of uninitialized value in concatenation (.) or string at E:\listplex\SYSTEM\SCRIPTS\filearea.cgi line 455, line 110. hows were easier to handle in Simul8. Finding the distribution of patient waiting time was easy in both packages; the distribution of GP idle time was easy in Excel but hard in Simul8. Overtaking was difficult in both. Excel provided a more transparent environment allowing easier verification and validation.

There was time to look briefly at another example, the use of Excel to simulate the reorganization of regionally dispersed multi-operator dial-in services into a national single-number service. In this case, Excel was used to simulate multi-server systems and to do some rather rough and ready statistical modelling.

Overall Jim concluded that Excel provided an easy and open environment for handling simple but non-standard queueing systems.

Francois Sermier

Exploratory Data Analysis is an interactive task, according to Francois Sermier, and should be approached using tools that encourage interaction with the data. He states that the classical model of statistical software, in which a program file, a compiler and a data file come together to produce an output file, is poorly suited to a discipline such as EDA. EDA should be characterised by intensive use of graphics to probe data structure, explore relationships and extract features of the data.

Pointing out the simplicity of the spreadsheet-and-chart paradigm, Francois Sermier proceeded to demonstrate that Excel was capable of delivering the graphics and interactivity required for EDA, even without resorting to VBA programming or to any of the statistical add-ins which are available.

Illustrating his contention that "The worksheet is the program", he showed how to use filtered lists to hide rows which fail to meet certain criteria; how to use the trendline feature to perform regression on subsets of the data; and how to exploit the interactivity of the pivot table to investigate relationships between data series.

The use of controls in worksheets enabled Francois to produce the tour de force that most excited those members of the audience who had not realised the potential benefits. After displaying charts which changed appearance in response to the position of a slider, he went on to give an example of a single chart for which the variables plotted on the x and y axes could be altered at will by selecting them from a listbox. This in turn led on to a highly interactive method of assessing forecasts for seasonal time series generated by the Holt-Winters method of exponential smoothing.

A further explanation involved the use of one-line matrix expressions to perform weighted regression, with the addition of a VBA routine to enable the user to pick out points on the scatter diagram to exclude them from the analysis.

The audience was greatly impressed by the variety of effects that Francois achieved despite limiting himself to the spreadsheet functions, and indeed to those available in Excel version 5. He has agreed to make the workbook used in his demonstration available for downloading, and it is to be hoped that, with this example before us, more of us who use Excel for teaching statistics will be able to underline its usefulness for elementary data visualisation.