Appel à communication Intl’ Workshop

Last call for participation to the Intl’ Workshop on Unsupervised Learning from Bioacoustic Big Data @ ICML 2014, 26 June, Beijing - supported by the CNRS MASTODONS SABIOD -

The general topic of uLearnBio is probabilistic machine learning from
scaled bioacoustic data. It focuses on unsupervised learning from
bioacoustic data for automatically acquiring "knowledge" from data for
representation, analysis, etc. One of the main goals is
clustering/segmentation, like the mixture model-based clustering
approach with well-established theoretical background and associated
efficient estimation algorithms such as EM algorithms. The problem of
selecting the number of mixture components can be tackled thanks to
model selection criteria such as BIC, AIC, ICL, etc.

Another probabilistic alternative for cluster analysis is the one
based on Bayesian Non-Parametrics (BNP), in particular the Infinite
Gaussian Mixture Model (IGMM) formulation, Chinese Restaurant Process
(CRP) mixtures and Dirichlet Process Mixtures (DPM). The
non-parametric alternative avoids assuming restricted functional forms
and thus allows the complexity and accuracy of the inferred model to
grow as more data is observed. It also represents an alternative to
the difficult problem of model selection in model-based clustering
models by inferring the number of clusters from the data as the
learning proceeds. One the main current concerns for all these
approaches is to scale them up.

Topics cover (but are not limited to) :

  • Unsupervised Generative Learning
  • Latent data Models
  • Model-based clustering
  • Bayesian Non-parametric clustering
  • Bayesian sparse representation

Applied to :

  • Big Bio-acoustic data clustering / structuration
  • Species clustering (birds, etc) - Song (bird whale...) clustering/decomposition
  • Automatic species classification

Challenges :

3 challenges are still open on bird species clustering & unsupervised
bird song decomposition and on whale song decomposition. Please submit to by 6th of June.

The scaled bio-acoustic data science is a novel challenge for
artificial intelligence that requires new methods. Big data scientists
are today invited to look into that data using advanced methods to
definitelynew knowledge about this important species. Fro example
large cabled submarine acoustic observatory deployments permit data to
be acquired continuously, over long time periods. For examples,
submarine Neptune observatory in Canada, Antares or Nemo neutrino
detectors (see NIPS4B proceedings) are ’big data’ challenges to the
scientists. Automated analysis, including clustering / segmentation
and structuration of acoustic signals, event detection, data mining
and machine learning to discover relationships among data streams are
techniques which promise to aid scientists in discoveries in an
otherwise overwhelming quantity of acoustic data.

This workshop offers an excellent framework to see how parametric and
nonparametric probabilistic models for cluster analysis can perform to
learn from complex real scaled bio-acoustic data, in the continuity of
previous ICML and NIPS 2013 workshops on learning from bio-acoustic
data ( ).

Organizers :

  • F. Chamroukhi - Toulon university & CNRS LSIS, France
  • H. Glotin - Toulon university & Institut Universitaire de France, CNRS LSIS, France
  • P. Dugan - Cornell University, New York
  • C. Clark - Head of the Cornell Lab Bioacoustics- Cornell University, New-York
  • T. Artières - CNRS LIP6 & Univ. Pierre et Marie Curie (UPMC)
  • Y. LeCun - Computational and Biological Learning Lab, New-York University & head of the Facebook Machine Learning team

MASTODONS is a global project on the exploitation of large scientific
data sets, launched by the multidisciplinary mission of CNRS.