*Call for Papers - KDD2015 Workshop on Learning from Small Sample Sizes* https://sites.google.com/site/smallsamplesizes *Overview* The small sample size ( or "large-p small-n") problem is a perennial in the world of Big Data. A frequent occurrence in medical imaging, computer vision, omics and bioinformatics it describes the situation where the number of features p, in the tens of thousands or more, far exceeds the sample size n, usually in the tens. Datamining, statistical parameter estimation, and predictive modelling are all particularly challenging in such a setting. Moreover in all fields where the large-p small-n problem is a sensitive issue (and actually also in many others) current technology is moving towards higher resolution in sensing and recording while, in practice, sample size is often bounded by hard limits or cost constraints. Meanwhile even modest improvements in performance for modelling these information-rich complex data promise significant cost savings or advances in knowledge. On the other hand it is becoming clear that "large-p small-n" is too broad a categorization for these problems and progress is still possible in the small sample setting either (1) in the presence of side information - such as related unlabelled data (semi-supervised learning), related learning tasks (transfer learning), or informative priors (domain knowledge) - to further constrain the problem, or (2) provided that data have low complexity, in some problem-specific sense, that we are able to take advantage of. Concrete examples of such low-complexity include: a large margin between classes (classification), a sparse representation of data in some known linear basis (compressed sensing), a sparse weight vector (regression), or a sparse correlation structure (parameter estimation). However we do not know what other properties of data, if any, act to make it "easy" or "hard" to work with in terms of the sample size required for some specific class of problems. For example: anti-learnable datasets in genomics are from the same domain as many eminently learnable datasets. Is anti-learnability then just a problem of data quality, the result of an unlucky draw of a small sample, or is there something deeper that makes such data inherently difficult to work with compared to other apparently similar data? This workshop will bring together researchers working on different kinds of challenges where the common thread is the small sample size problem. It will provide a forum for exchanging theoretical and empirical knowledge of small sample problems, and for sharing insight into which data structures facilitate progress on particular families of problems - even with a small sample size - and which do the opposite or when these break down. A further specific goal of this workshop is to make a start on building links between the many disparate fields working with small data samples, with the ultimate aim of creating a multi-disciplinary research network devoted to this common issue. We seek papers on all aspects of learning from small sample sizes, from any problem domain where this issue is prevalent (e.g. bioinformatics and omics, machine vision, anomaly detection, drug discovery, medical imaging, multi-label classification, multi-task classification, density-based clustering/density estimation, and others). In particular: Theoretical and empirical analyses of learning from small samples: Which properties of data support, or prevent, learning from a small sample? Which forms of side information support learning from a small sample? When do guarantees break down? In theory? In practice? Techniques and algorithms targeted at small sample size learning: Semi-supervised learning. Transfer learning. Deep learning. Representation learning. Dimensionality reduction. Application of domain knowledge/informative priors. Reproducible case studies. Please submit an extended abstract of no more than 8 pages, including references, diagrams, and appendices, if any. The format is the standard double column ACM Proceedings Template, Tighter Alternate style. Please submit your abstract in pdf format only via Easychair at https://easychair.org/conferences/?conf=ls3 The deadline for submission is 23:59 Pacific Standard Time on Friday 5th June 2015. Following KDD tradition reviews are not blinded, so please include author names and affiliations in your submission. Maximum file size for submissions is 20MB. Important: Overfitting and serendipity are serious challenges to the realistic assessment of approaches applied to small data samples. If you are submitting experimental findings then please give enough detail in your submission to reproduce these in full.The ideal way to ensure reproducibility is to provide code and data on the web (including scripts used for data preparation if the data provided are unprepared), and we strongly encourage authors to do this. Bob Durrant, University of Waikato, Department of Statistics (Primary Contact) Alain C. Vandal, Auckland University of Technology, Department of Biostatistics and Epidemiology KDD2015 Workshop on Learning from Small Sample Sizes Organisers -- Dr. Robert (Bob) Durrant, Senior Lecturer. Room G.3.30, Department of Statistics, University of Waikato, Private Bag 3105, Hamilton 3240 New Zealand e: [log in to unmask] w: http://www.stats.waikato.ac.nz/~bobd/ t: +64 (0)7 838 4466 x8334 f: +64 (0)7 838 4155