Print

Print


*ICML 2011 Workshop on Learning Architectures, Representations, and
Optimization for Speech and Visual Information Processing
*http://icml2011speechvision.wordpress.com/

*Overview*

This workshop is about bringing together and informing researchers and
students from diverse communities of machine learning, speech recognition,
computer vision, signal processing, cognitive science of human auditory
and visual perception, optimization and applied mathematics to further the
research in deep learning models for real-world applications in computer
vision and speech. The special focus is placed on both commonality and
uniqueness of speech and vision problems, and on how unified learning
paradigms and representations can be developed to address these problems
tackled largely by disparate communities until now.

Through invited talks and panel discussions, we will attempt to address
the central topics in learning representations and architectures today, as
well as the associated optimization techniques. The workshop will also
invite paper submissions on the most recent development of unsupervised
learning and hierarchical learning algorithms, theoretical foundations,
inference and optimization, semi-supervised and transfer learning, and
applications to real-world tasks in speech processing and computer vision.
Papers will be presented as oral or poster presentations. Detailed topics
of presentations are expected to include (but not limited to) the
followings:

   * Development of learning models, e.g., deep belief nets, deep neural
nets, deep Boltzmann machines, high-order sparse coding, hierarchical
generative models, temporal and/or recursive models with deep
structure, generative models motivated by physical processes of human
speech production and of natural image formation, discriminative
models motivated by human speech and visual perception, etc.
   * Algorithms for probabilistic inference, optimization strategies when
the objective is non-convex, and large-scale implementations
associated with the above models.
   * Learning biologically inspired feature hierarchies in human visual
and auditory signal processing.
   * Novel representations via the use of side information in
unsupervised feature learning, e.g., spatial correlations in image,
sequential dynamics and temporal/spectral correlations in speech,
physical constraints in speech production, perceptual constraints in
vision, and other additional prior knowledge, etc.
   * Theoretical understanding on the role of unsupervised feature
learning in building complex predictive models. Under which conditions
does the feature hierarchy provide a better regularization or achieve
a higher statistical efficiency?
   * Success, failures, and lessons learned  in real-world applications
including understanding of natural scenes, recognition of objects and
events, speech recognition under controlled environments,
large-vocabulary speech recognition under realistic acoustic
environments, auditory coding of speech and music, etc.

*Motivation*

In recent years, there has been a lot of interest in algorithms that learn
hierarchical representations from unlabeled data. Unsupervised learning
and deep learning methods, such as sparse coding, restricted Boltzmann
machines, deep belief networks, convolutional architectures, recursive
compositional models, and hierarchical generative models, have been
successfully applied to a variety of tasks in computer vision and speech
processing with highly promising results. In this workshop, we will bring
together researchers who are interested in learning representations and
architectures and in developing efficient and robust optimization
algorithms for speech and visual information processing, review the recent
technical progress, and discuss the challenges and future research
directions.

*Impact and expected outcomes

*This workshop is aimed to stimulate vigorous interactions among
researchers in machine learning, neural networks, speech recognition, and
computer vision. It will accelerate deep learning research and its
applications to speech and visual information processing as the
researchers in disparate research areas learn from each other and as they
jointly establish the foundation of the architectural, representational,
and optimization aspects of deep learning related to these two major
classes of applications.

With this workshop, we plan to have in-depth discussions on the current
state-of-the-art and next big challenges in learning representations and
architectures and propose research directions to the research community.
We will stimulate the exchange of ideas among all other members of the
ICML community as well.

In addition to the main presentations, the workshop will also plan a panel
discussion session. The main topics of the discussion will include:

   * How to build hierarchical systems
   * Principles underlying learning of hierarchical systems: sparsity,
reconstruction, (if supervised) what kind of supervision, how to learn
and use invariances, how to learn and use variability, etc.
   * Similarities and differences of computer vision and speech
recognition problems; hand-crafted features (e.g., SIFT vs. MFCC/PLP);
learned features; nature of the variability in speech and natural
images; nature of the invariance in speech and natural images
   * Critiques of the current approaches
   * Real-world applications and benchmark datasets
   * Scalability: efficiency during training and inference; how to
distribute training with mass data over many machines
   * Major milestones and goals for the next 5 or 10 years

Panel discussions will be led by the members of the organizing committee
as well as by prominent representatives of the vision and speech
processing communities.

*Key Dates
*Paper Submission Deadline: April 29, 2011
Paper Acceptance Notification: May 20, 2011
Camera Ready Submission: June 10, 2011
Workshop Date: July 02, 2011

*Organizers
*Li Deng, Microsoft Research
Honglak Lee, University of Michigan
Kai Yu, NEC Laboratories America