JISCMail - AI-SGES Archives

*Call for Papers - KDD2015 Workshop on Learning from Small Sample Sizes*
https://sites.google.com/site/smallsamplesizes

*Overview*
The small sample size ( or "large-p small-n") problem is a perennial in 
the world of Big Data. A frequent occurrence in medical imaging, 
computer vision, omics and bioinformatics it describes the situation 
where the number of features p, in the tens of thousands or more, far 
exceeds the sample size n, usually in the tens.  Datamining, statistical 
parameter estimation, and predictive modelling are all particularly 
challenging in such a setting.
Moreover in all fields where the large-p small-n problem is a sensitive 
issue (and actually also in many others) current technology is moving 
towards higher resolution in sensing and recording while, in practice, 
sample size is often bounded by hard limits or cost constraints. 
Meanwhile even modest improvements in performance for modelling these 
information-rich complex data promise significant cost savings or 
advances in knowledge.

On the other hand it is becoming clear that "large-p small-n" is too 
broad a categorization for these problems and progress is still possible 
in the small sample setting either (1) in the presence of side 
information - such as related unlabelled data (semi-supervised 
learning), related learning tasks (transfer learning), or informative 
priors (domain knowledge) - to further constrain the problem, or (2) 
provided that data have low complexity, in some problem-specific sense, 
that we are able to take advantage of. Concrete examples of such 
low-complexity include: a large margin between classes (classification), 
a sparse representation of data in some known linear basis (compressed 
sensing), a sparse weight vector (regression), or a sparse correlation 
structure (parameter estimation). However we do not know what other 
properties of data, if any, act to make it "easy" or "hard" to work with 
in terms of the sample size required for some specific class of 
problems. For example: anti-learnable datasets in genomics are from the 
same domain as many eminently learnable datasets. Is anti-learnability 
then just a problem of data quality, the result of an unlucky draw of a 
small sample, or is there something deeper that makes such data 
inherently difficult to work with compared to other apparently similar data?

This workshop will bring together researchers working on different kinds 
of challenges where the common thread is the small sample size problem.
It will provide a forum for exchanging theoretical and empirical 
knowledge of small sample problems, and for sharing insight into which 
data structures facilitate progress on particular families of problems - 
even with a small sample size - and which do the opposite or when these 
break down.

A further specific goal of this workshop is to make a start on building 
links between the many disparate fields working with small data samples, 
with the ultimate aim of creating a multi-disciplinary research network 
devoted to this common issue.

We seek papers on all aspects of learning from small sample sizes, from 
any problem domain where this issue is prevalent (e.g. bioinformatics 
and omics, machine vision, anomaly detection, drug discovery, medical 
imaging, multi-label classification, multi-task classification, 
density-based clustering/density estimation, and others).

In particular:

Theoretical and empirical analyses of learning from small samples:

     Which properties of data support, or prevent, learning from a small 
sample?
     Which forms of side information support learning from a small sample?
     When do guarantees break down? In theory? In practice?

Techniques and algorithms targeted at small sample size learning:

     Semi-supervised learning.
     Transfer learning.
     Deep learning.
     Representation learning.
     Dimensionality reduction.
     Application of domain knowledge/informative priors.

Reproducible case studies.

Please submit an extended abstract of no more than 8 pages, including 
references, diagrams, and appendices, if any. The format is the standard 
double column ACM Proceedings Template, Tighter Alternate style.
Please submit your abstract in pdf format only via Easychair at 
https://easychair.org/conferences/?conf=ls3

The deadline for submission is 23:59 Pacific Standard Time on Friday 5th 
June 2015.

Following KDD tradition reviews are not blinded, so please include 
author names and affiliations in your submission. Maximum file size for 
submissions is 20MB.

Important: Overfitting and serendipity are serious challenges to the 
realistic assessment of approaches applied to small data samples. If you 
are submitting experimental findings then please give enough detail in 
your submission to reproduce these in full.The ideal way to ensure 
reproducibility is to provide code and data on the web (including 
scripts used for data preparation if the data provided are unprepared), 
and we strongly encourage authors to do this.


Bob Durrant, University of Waikato, Department of Statistics (Primary 
Contact)
Alain C. Vandal, Auckland University of Technology, Department of 
Biostatistics and Epidemiology

KDD2015 Workshop on Learning from Small Sample Sizes Organisers

-- 
Dr. Robert (Bob) Durrant, Senior Lecturer.

Room G.3.30,
Department of Statistics,
University of Waikato,
Private Bag 3105,
Hamilton 3240
New Zealand

e: [log in to unmask]
w: http://www.stats.waikato.ac.nz/~bobd/
t: +64 (0)7 838 4466 x8334
f: +64 (0)7 838 4155