BC 69721 - BC 69724
Istituto Italiano di Tecnologia (http://www.iit.it) is a private Foundation with the objective of promoting Italy's technological development and higher education in science and technology. Research at IIT is carried out in highly innovative scientific fields with state-of-the-art technology. http://www.iit.it/
iCub Facility (http://www.iit.it/icub) and Robotics, Brain and Cognitive Sciences (http://www.iit.it/rbcs) Departments are looking for 2 post-docs to be involved in the H2020 project EcoMode funded by the European Commission under the H2020-ICT-2014-1 call (topic ICT-22-2014 – Multimodal and natural computer interaction).
Job Description: Robust automatic speech recognition in realistic environments for human-robot interaction, where speech is noisy and distant, is still a challenging task. Vision can be used to increase speech recognition robustness by adding complementary speech-production related information. In this project visual information will be provided by an event-driven (ED) camera. ED vision sensors transmit information as soon as a change occurs in their visual field, achieving incredibly high temporal resolution, coupled with extremely low data rate and automatic segmentation of significant events. In an audio-visual speech recognition setting ED vision can not only provide new additional visual information to the speech recognizer, but also drive the temporal processing of speech by locating (in the temporal dimension) visual events related to speech production landmarks.
The goal of the proposed research is the exploitation of highly dynamical information from ED vision sensors for robust speech processing. The temporal information provided by EDC sensors will allow to experiment with new models of speech temporal dynamics based on events as opposed to the typical fixed-length segments (i.e. frames).
In this context, we are looking for 2 highly motivated Post-docs, respectively tackling vision (Research Challenge 1) and speech processing (Research Challenge 2), as oulined herebelow:
Research Challenge 1 (vision @ iCub facility – BC 69721): the post-doc will mostly work on the detection of features from event-driven cameras instrumental for improving speech recognition (e.g. lips closure, protrusion, shape, etc…). The temporal features extracted from the visual signal will be used for crossmodal event-driven speech segmentation that will drive the processing of speech. In the attempt to increase the robustness to acoustic noise and atypical speech, acoustic and visual features will be combined to recover phonetic gestures of the inner vocal tract (articulatory features).
Research Challenge 2 (speech processing @ RBCS – BC 69724): the post-doc will mainly develop a novel speech recognition system based on visual, acoustic and (recovered) articulatory features, that will be targeted for users with mild speech impairments. The temporal information provided by EDC sensors will allow to experiment with new strategies to model the temporal dynamics of normal and atypical speech. The main outcome of the project will be an audio-visual speech recognition system that robustly recognizes the most relevant commands (key phrases) delivered by users to devices in real-word usage scenarios.
The resulting methods for improving speech recognition will be exploited for the implementation of a tablet with robust speech processing. Given the automatic adaptation of the speech processing to the speech production rhythm, the speech recognition system will target speakers with mild speech impairments, specifically subjects with atypical speech flow and rhythm, typical of some disabilities and of the ageing population. The same approach will then be applied to the humanoid robot iCub to improve its interaction with humans in cooperative tasks.
We are looking for highly motivated people and inquisitive minds with the curiosity to use a new and challenging technology that requires a rethinking of visual and speech processing to achieve a high payoff in terms of speed, efficiency and robustness. The candidates we are looking for should also have the following additional skills:
Team-work, PhD tutoring and general lab-related activities are expected
- PhD in Computer Science, Robotics, Engineering (or equivalent) with a background in machine learning, signal processing or related areas;
- ability to analyze, improve and propose new algorithms;
- Good knowledge of C, C++ programming languages with proven experience.
An internationally competitive salary depending on experience will be offered.
Please note that these positions are pending the signature of the grant agreement with the European Commission (expected start date in early 2015) How to apply: Challenge 1: Send applications and informal enquires to
Challenge 2: Send applications and informal enquires to
The application should include a curriculum vitae listing all publications and pdf files of the most representative publications (maximum 2) If possible, please also indicate three independent reference persons. Presumed Starting Date: Challenge 1: January 2015 (but later starts are also possible). Challenge 2: June 2015 (but later starts are also possible). Evaluation of the candidates starts immediately and officially closes on November 10th, 2014. References:
Lichtsteiner, P., Posch, C., & Delbruck, T. (2008). A 128×128 120 dB 15 μs latency asynchronous temporal contrast vision sensor. Solid-State Circuits, IEEE Journal of, 43(2), 566-576.
Rea, F., Metta, G., & Bartolozzi, C. (2013). Event-driven visual attention for the humanoid robot iCub. Frontiers in neuroscience, 7.
Benosman, R.; Clercq, C.; Lagorce, X.; Sio-Hoi Ieng; Bartolozzi, C., "Event-Based Visual Flow," Neural Networks and Learning Systems, IEEE Transactions on , vol.25, no.2, pp.407,417, Feb. 2014, doi: 10.1109/TNNLS.2013.2273537
Potamianos, G. Neti, C. Gravier, G. Garg, A. and Senior, A.W. (2003) “Recent Advances in the Automatic Recognition of Audiovisual Speech” in Proceedings of the IEEE Vol. 91 pp. 1306-1326
Glass, J. (2003)“A probabilistic framework for segment-based speech recognition”, Computer Speech and Language, vol. 17, pp. 137-152.
Badino, L., Canevari, C., Fadiga, L., Metta, G. "Deep-Level Acoustic-to-Articulatory Mapping for DBN-HMM Based Phone Recognition", in IEEE SLT 2012, Miami, Florida, 2012
In order to comply with Italian law (art. 23 of Privacy Law of the Italian Legislative Decree n. 196/03), the candidate is kindly asked to give his/her consent to allow IIT to process his/her personal data. We inform you that the information you provide will be solely used for the purpose of assessing your professional profile to meet the requirements of IIT. Your data will be processed by IIT, with its headquarters in Genoa, Via Morego, 30, acting as the Data Holder, using computer and paper-based means, observing the rules on the protection of personal data, including those relating to the security of data.
Please also note that, pursuant to art. 7 of Legislative Decree 196/2003, you may exercise your rights at any time as a party concerned by contacting the Data Manager. Istituto Italiano di Tecnologia is an Equal Opportunity Employer that actively seeks diversity in the workforce.