Royal Statistical Society
General Applications Section
Meeting on Data Mining, 11 April 2007 at the RSS, 12 Errol Street,
London EC1Y 8LX
14.00 Professor John Keane, University of Manchester
Developing data mining algorithms
14.45 Professor Mark Girolami, University of Glasgow
Bayesian activity profiling in scoring fraudulent telephone usage
15.30 Tea
16.00 Professor David Hand (Imperial College)
What you get is what you want? Some dangers of black box data mining
16.45 Panel discussion
17.30 Close of meeting
This meeting is sponsored by the Southampton Statistical Sciences
Research Institute.
For further details, please contact the meeting organisers: Alan Kimber
([log in to unmask]) or John Marriott
([log in to unmask])
Abstracts
John Keane, University of Manchester
Developing data mining algorithms
This talk will consider involvement in data mining over more than a
decade, across a wide variety of applications including statistical
disclosure, bioinformatics, medical diagnosis, pedestrian detection and
electricity forecasting. The technical focus will be on a number of
recent activities across a range of algorithmic approaches:
classification, rough sets, item set mining, neuro-fuzzy systems,
hierarchical fuzzy systems and hierarchical hybrid systems.
Mark Girolami, University of Glasgow
Bayesian activity profiling in scoring fraudulent telephone usage
This talk considers a Bayesian solution to the problem of scoring &
identifying anomalous, and therefore potentially fraudulent, usage of a
telephone service by individual accounts. Creating and maintaining
account specific profiles, which are represented by discrete
multivariate probability distributions, provides a consistent and highly
practical means of scoring and ranking individual service usage in terms
of potential malfeasance. The computational overhead of the proposed
method is such that millions of account profiles can be stored and
maintained so that tens of millions of transactions can be scored on a
daily basis using standard computing capabilities. A commercial
prototype system based on this proposed methodology has been developed
by Memex Technologies and evaluated in an operational environment by the
telecom operator NTL.
Such model-based profiling methods rely on a compact generative
representation of the sequential activity of a number of individuals
within a population in which case there is a tradeoff between the
definition of individual specific and global models. A linear-time
distributed model for finite state symbolic sequences representing
traces of individual user activity is considered by making the
assumption that heterogeneous user behaviour may be 'explained' by a
relatively small number of common structurally simple behavioural
patterns which may interleave randomly in a user-specific proportion.
The results of empirical studies related to telephone usage and web
browsing behaviour will be presented and indicates that this modelling
approach provides an efficient representation scheme, reflected by
improved prediction performance as well as providing low complexity and
intuitively interpretable representations.
David Hand, Imperial College
What you get is what you want? Some dangers of black box data mining
Descriptions of data mining tools, and examples of them in action,
generally assume that the truth is out there, is fixed and invariant,
and can be discovered, at least in principle. The fact is, however, that
life is often more complicated than this suggests. Sometimes shaky
assumptions cast serious doubt on one's conclusions. Sometimes the
exciting knowledge nuggets are irrelevant in the context of the bigger
picture. Even worse, however, sometimes the very data mining exercise
itself has consequences which render its own results invalid. Data
mining is a powerful technology for profit and progress, but careless
use of any powerful technology can have serious adverse consequences.
This email is intended solely for the addressee. It may contain private and confidential information. If you are not the intended addressee, please take no action based on it nor show a copy to anyone. In this case, please reply to this email to highlight the error. Opinions and information in this email that do not relate to the official business of Nottingham Trent University shall be understood as neither given nor endorsed by the University.
Nottingham Trent University has taken steps to ensure that this email and any attachments are virus-free, but we do advise that the recipient should check that the email and its attachments are actually virus free. This is in keeping with good computing practice.
|