Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@ShigekiKarita
Created August 8, 2019 14:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ShigekiKarita/f2f31b41bf265c3467a2efe77c0c8528 to your computer and use it in GitHub Desktop.
Save ShigekiKarita/f2f31b41bf265c3467a2efe77c0c8528 to your computer and use it in GitHub Desktop.
NTT at INTERSPEECH2019

retrieved from https://interspeech2019.org/program/schedule/

Tutorials

[T6] Advanced methods for neural end-to-end speech processing – unification, integration, and implementation Sunday, 15 September, 1400–1730, Hall 1 Takaaki Hori (Mitsubishi Electric Research Laboratories), Tomoki Hayashi (Department of Information Science, Nagoya University), Shigeki Karita (NTT Communication Science Laboratories), Shinji Watanabe (Center for Language and Speech Processing, Johns Hopkins University)

[T8] Microphone array signal processing and deep learning for speech enhancement – strong together, Sunday, 15 September, 1400–1730, Hall 11 Reinhold Haeb-Umbach (Department of Communications Engineering, Paderborn University), Tomohiro Nakatani (NTT Communication Science Laboratories)


Speech Enhancement: Multi-channel[Mon-O-1-2] Monday, 16 September, Hall 1

Simultaneous denoising and dereverberation for low-latency applications using frame-by-frame online unified convolutional beamformer Oral; 1240–1300 Tomohiro Nakatani (NTT Corporation), Keisuke Kinoshita (NTT Corporation)

ASR for noisy and far-field speech[Mon-P-1-E] Monday, 16 September, Hall 10/E

End-to-end SpeakerBeam for single channel target speech recognition Poster; 1100–1300 Marc Delcroix (NTT Communication Science Laboratories), Shinji Watanabe (Johns Hopkins University), Tsubasa Ochiai (NTT Communication Science Laboratories), Keisuke Kinoshita (NTT), Shigeki Karita (NTT Communication Science Laboratories), Atsunori Ogawa (NTT Communication Science Laboratories), Tomohiro Nakatani (NTT Corporation)

Dialogue speech understanding[Mon-P-2-B] Monday, 16 September, Gallery B

Improving Conversation-Context Language Models with Multiple Spoken Language Understanding Models Poster; 1430–1630 Ryo Masumura (NTT Corporation), Tomohiro Tanaka (NTT Corporation), Atsushi Ando (NTT Corporation), Hosana Kamiyama (NTT Corporation), Takanobu Oba (NTT Media Intelligence Laboratories, NTT Corporation), Satoshi Kobashikawa (NTT Corporation), Yushi Aono (NTT Corporation)

Neural techniques for voice conversion and waveform generation[Mon-P-2-C] Monday, 16 September, Gallery C

StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion Poster; 1430–1630 Takuhiro Kaneko (NTT Communication Science Laboratories), Hirokazu Kameoka (NTT Communication Science Laboratories), Kou Tanaka (NTT corporation), Nobukatsu Hojo (NTT)

ASR neural network architectures - 1[Tue-O-5-2] Tuesday, 17 September, Hall 1

Improving Transformer Based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration Oral; 1700–1720 Shigeki Karita (NTT Communication Science Laboratories), Nelson Yalta (Waseda University), Shinji Watanabe (Johns Hopkins University), Marc Delcroix (NTT Communication Science Laboratories), Atsunori Ogawa (NTT Communication Science Laboratories), Tomohiro Nakatani (NTT Corporation)

Speech synthesis: data and evaluation[Tue-P-3-A] Tuesday, 17 September, Gallery A

Evaluating Intention Communication by TTS using Explicit Definitions of Illocutionary Act Performance Poster; 1000–1200 Nobukatsu Hojo (NTT), Noboru Miyazaki (NTT)

Model training for ASR[Tue-P-3-B] Tuesday, 17 September, Gallery B

End-to-End Automatic Speech Recognition with a Reconstruction Criterion Using Speech-to-Text and Text-to-Speech Encoder-Decoders Poster; 1000–1200 Ryo Masumura (NTT Corporation), Hiroshi Sato (NTT Corporation), Tomohiro Tanaka (NTT Corporation), Takafumi Moriya (NTT Corporation), Yusuke Ijima (NTT corporation), Takanobu Oba (NTT Media Intelligence Laboratories, NTT Corporation)

Spoken Term Detection, Confidence Measure, and End-to-End Speech Recognition [Tue-P-5-C] Tuesday, 17 September, Gallery C

A Joint End-to-End and DNN-HMM Hybrid Automatic Speech Recognition System with Transferring Shared Knowledge Poster; 1600–1800 Tomohiro Tanaka (NTT Corporation), Ryo Masumura (NTT Corporation), Takafumi Moriya (NTT Corporation), Takanobu Oba (NTT Media Intelligence Laboratories, NTT Corporation), Yushi Aono (NTT Media Intelligence Laboratories, NTT Corporation)

Speech and Audio Source Separation and Scene Analysis 2[Wed-O-7-4], Wednesday, 18 September, Hall 11

Multimodal SpeakerBeam: Single channel target speech extraction with audio-visual speaker clues Oral; 1510–1530 Tsubasa Ochiai (NTT Communication Science Laboratories), Marc Delcroix (NTT Communication Science Laboratories), Keisuke Kinoshita (NTT), Atsunori Ogawa (NTT Communication Science Laboratories), Tomohiro Nakatani (NTT Corporation)

Training Strategy for Speech Emotion Recognition[Wed-O-8-3] Wednesday, 18 September, Hall 2

Speech Emotion Recognition based on Multi-Label Emotion Existence Model Oral; 1700–1720 Atsushi Ando (NTT Corporation), Ryo Masumura (NTT Corporation), Hosana Kamiyama (NTT Corporation), Satoshi Kobashikawa (NTT Corporation), Yushi Aono (NTT Corporation)

Emotion Modeling and Analysis[Wed-P-7-C] Wednesday, 18 September, Gallery C

Does the Lombard Effect Improve Emotional Communication in Noise? – Analysis of Emotional Speech Acted in Noise – Poster; 1330–1530 Yi Zhao (National Institute of Informatics (NII)), Atsushi Ando (NTT Corporation), Shinji Takaki (National Institute of Informatics), Junichi Yamagishi (National Institute of Informatics), Satoshi Kobashikawa (NTT Corporation)

Speech and Audio Classification 2[Wed-P-7-E] Wednesday, 18 September, Hall 10/E

Neural Whispered Speech Detection with Imbalanced Learning Poster; 1330–1530 Takanori Ashihara (NTT Corporation), Yusuke Shinohara (NTT Corporation), Hiroshi Sato (NTT Corporation), Takafumi Moriya (NTT Corporation), Kiyoaki Matsui (NTT Media Intelligence laboratories), Takaaki Fukutomi (NTT Corporation), Yoshikazu Yamaguchi (NTT Corporation), Yushi Aono (NTT Corporation)

Neural Networks for Language Modeling[Thu-O-10-1] Thursday, 19 September, Main Hall

Improved Deep Duel Model for Rescoring N-best Speech Recognition List Using Backward LSTMLM and Ensemble Encoders Oral; 1410–1430 Atsunori Ogawa (NTT Communication Science Laboratories), Marc Delcroix (NTT Communication Science Laboratories), Shigeki Karita (NTT Communication Science Laboratories), Tomohiro Nakatani (NTT Corporation)

NN architectures for ASR[Thu-P-10-B] Thursday, 19 September, Gallery B

Joint Maximization Decoder with Neural Converters for Fully Neural Network-based Japanese Speech Recognition Poster; 1330–1530 Takafumi Moriya (NTT Corporation), Jian Wang (The University of Tokyo), Tomohiro Tanaka (NTT Corporation), Ryo Masumura (NTT Corporation), Yusuke Shinohara (NTT Corporation), Yoshikazu Yamaguchi (NTT Corporation), Yushi Aono (NTT Corporation)

Speech and Audio Source Separation and Scene Analysis 3[Thu-P-10-E] Thursday, 19 September, Hall 10/E

A MODIFIED ALGORITHM FOR MULTIPLE INPUT SPECTROGRAM INVERSION Poster; 1330–1530 Dongxiao Wang (Tokyo Institute of Technology), Hirokazu Kameoka (NTT Communication Science Laboratories), Koichi Shinoda (Tokyo Institute of Technology)

Speech Enhancement: Multi-channel and Intelligibility[Thu-P-9-E] Thursday, 19 September, Hall 10/E

Predicting Speech Intelligibility of Enhanced Speech Using Phone Accuracy of DNN-based ASR Systems Poster; 1000–1200 Kenichi Arai (NTT Communication SCience Laboratories), Shoko Araki (NTT Communication Science Laboratories), Atsunori Ogawa (NTT Communication Science Laboratories), Keisuke Kinoshita (NTT), Tomohiro Nakatani (NTT Corporation), Katsuhiko Yamamoto (Wakayama University), Toshio Irino (Wakayama University)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment