The Centre for Translational Neurophysiology is looking for 1 post-doc who will work within the H2020 EcoMode project (“Event-Driven Compressive Vision for Multimodal Interaction with Mobile Devices”), funded by the European Commission with grant agreement n. 644096.
Job Description: Robust automatic speech detection and recognition for human-robot interaction in many realistic environments (where speech is typically noisy and distant) and settings (where the robot must be continuously able to detect verbal commands from non-verbal audio streams) are still challenging tasks. Vision can be used to increase speech recognition robustness by adding complementary speech-production related information. In this project visual information will be provided by an event-driven (ED) camera. ED vision sensors transmit information as soon as a change occurs in their visual field, achieving very high temporal resolution, coupled with extremely low data rate and automatic segmentation of significant events. In an audio-visual speech recognition setting ED vision can not only provide new additional visual information to the speech recognizer, but can also drive the temporal processing of speech by locating (in the temporal dimension) visual events related to speech production landmarks.
The goal of the proposed research is the exploitation of highly dynamical information from ED vision sensors for robust speech detection and processing. The temporal information provided by EDC sensors will allow to experiment with new audio-visual techniques for voice activity detection and new models of speech temporal dynamics based on events as opposed to the typical fixed-length segments (i.e. frames).
In this context, we are looking for a highly motivated Post-doc who will work on speech processing. The post-doc will mainly develop a novel speech recognition system based on visual, acoustic and (recovered) articulatory features (i.e., features describing the inner vocal tract), that will be targeted for users with mild speech impairments. The temporal information provided by EDC sensors will allow to experiment with new strategies to model the temporal dynamics of normal and atypical speech. The main outcomes of the project will be: (i) a fast, featherweight audio-visual voice detection system combined with (ii) a computationally efficient audio-visual recognition system that robustly recognizes the most relevant commands (key phrases) delivered by users to devices in real-word usage scenarios.
The resulting methods for improving speech detection and recognition will be exploited for the implementation of a tablet with robust speech processing. Given the automatic adaptation of the speech processing to the speech production rhythm, the speech recognition system will target speakers with mild speech impairments, specifically subjects with atypical speech flow and rhythm, typical of some disabilities and of the ageing population. The same approach will then be applied to the humanoid robot iCub to improve its interaction with humans in cooperative tasks.
We are looking for highly motivated people and inquisitive minds with the curiosity to use a new and challenging technology that requires a rethinking of audio-visual speech processing to achieve a high payoff in terms of speed, efficiency and robustness.
The ideal candidates also have the following additional skills:
Team-work, PhD tutoring and general lab-related activities are expected.
An internationally competitive salary depending on experience will be offered.
Pleasesubmit CV, list of publications, 2 reference letters and a statement of research interest to email@example.com quoting “Postdoctoral position in New techniques for vision-assisted speech processing BC: 69724” in the subject line.
Please apply by October 20, 2016.
Lichtsteiner, P., Posch, C., & Delbruck, T. (2008). A 128×128 120 dB 15 μs latency asynchronous temporal contrast vision sensor. Solid-State Circuits, IEEE Journal of, 43(2), 566-576.
Rea, F., Metta, G., & Bartolozzi, C. (2013). Event-driven visual attention for the humanoid robot iCub. Frontiers in neuroscience, 7.
Benosman, R.; Clercq, C.; Lagorce, X.; Sio-Hoi Ieng; Bartolozzi, C., (2014) "Event-Based Visual Flow," Neural Networks and Learning Systems, IEEE Transactions on , vol.25, no.2, pp.407,417, Feb. 2014, doi: 10.1109/TNNLS.2013.2273537
Potamianos, G. Neti, C. Gravier, G. Garg, A. and Senior, A.W. (2003) “Recent Advances in the Automatic Recognition of Audiovisual Speech” in Proceedings of the IEEE Vol. 91 pp. 1306-1326
Sodoyer, D. Rivet, B. Girin, L. Savariaux, C. Schwartz, J.L. and Jutten, C. (2009) “A study of lip movements during spontaneous dialog and its application to voice activity detection”. J Acoust Soc Am. Vol 125(2):1184-96. doi: 10.1121/1.3050257.
Glass, J. (2003)“A probabilistic framework for segment-based speech recognition”, Computer Speech and Language, vol. 17, pp. 137-152.
Badino, L., Canevari, C., Fadiga, L., Metta, G. (2012) "Deep-Level Acoustic-to-Articulatory Mapping for DBN-HMM Based Phone Recognition", in IEEE SLT 2012, Miami, Florida, 2012
Badino, L., Canevari, C., Fadiga, L., Metta, G. (2016) " Integrating articulatory data in deep neural network-based acoustic modeling ", Computer Speech and Language, vol 36, pp. 173–195.
Istituto Italiano di Tecnologia (http://www.iit.it) is a private Foundation with the objective of promoting Italy's technological development and higher education in science and technology. Research at IIT is carried out in highly innovative scientific fields with state-of-the-art technology.
In order to comply with Italian law (art. 23 of Privacy Law of the Italian Legislative Decree n. 196/03), the candidate is kindly asked to give his/her consent to allow IIT to process his/her personal data. We inform you that the information you provide will be solely used for the purpose of assessing your professional profile to meet the requirements of IIT. Your data will be processed by IIT, with its headquarters in Genoa, Via Morego, 30, acting as the Data Holder, using computer and paper-based means, observing the rules on the protection of personal data, including those relating to the security of data.
Please also note that, pursuant to art. 7 of Legislative Decree 196/2003, you may exercise your rights at any time as a party concerned by contacting the Data Manager.
Istituto Italiano di Tecnologia is an Equal Opportunity Employer that actively seeks diversity in the workforce.