Salmonn: Towards generic hearing abilities for large language models C Tang, W Yu, G Sun, X Chen, T Tan, W Li, L Lu, Z Ma, C Zhang arXiv preprint arXiv:2310.13289, 2023 | 173 | 2023 |
Improving non-native mispronunciation detection and enriching diagnostic feedback with DNN-based speech attribute modeling W Li, SM Siniscalchi, NF Chen, CH Lee 2016 IEEE international conference on acoustics, speech and signal …, 2016 | 108 | 2016 |
Llava-next-interleave: Tackling multi-image, video, and 3d in large multimodal models F Li, R Zhang, H Zhang, Y Zhang, B Li, W Li, Z Ma, C Li arXiv preprint arXiv:2407.07895, 2024 | 55 | 2024 |
Improving Mispronunciation Detection for Non-Native Learners with Multisource Information and LSTM-Based Deep Models. W Li, NF Chen, SM Siniscalchi, CH Lee Interspeech, 2759-2763, 2017 | 44 | 2017 |
Connecting speech encoder and large language model for asr W Yu, C Tang, G Sun, X Chen, T Tan, W Li, L Lu, Z Ma, C Zhang ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 36 | 2024 |
Detecting Mispronunciations of L2 Learners and Providing Corrective Feedback Using Knowledge-Guided and Data-Driven Decision Trees. W Li, K Li, SM Siniscalchi, NF Chen, CH Lee Interspeech 2016, 3127-3131, 2016 | 33 | 2016 |
Improving mandarin tone recognition based on dnn by combining acoustic and articulatory features using extended recognition networks J Lin, W Li, Y Gao, Y Xie, NF Chen, SM Siniscalchi, J Zhang, CH Lee Journal of Signal Processing Systems 90, 1077-1087, 2018 | 30 | 2018 |
Improving mispronunciation detection of mandarin tones for non-native learners with soft-target tone labels and BLSTM-based deep tone models W Li, NF Chen, SM Siniscalchi, CH Lee IEEE/ACM Transactions on Audio, Speech, and Language Processing 27 (12 …, 2019 | 26 | 2019 |
A cross-task transfer learning approach to adapting deep speech enhancement models to unseen background noise using paired senone classifiers S Wang, W Li, SM Siniscalchi, CH Lee ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020 | 24 | 2020 |
A study on functional loads of phonetic contrasts under context based on mutual information of Chinese text and phonemes J Zhang, W Li, Y Hou, W Cao, Z Xiong 2010 7th International Symposium on Chinese Spoken Language Processing, 194-198, 2010 | 23 | 2010 |
Improving audio-visual speech recognition performance with cross-modal student-teacher training W Li, S Wang, M Lei, SM Siniscalchi, CH Lee ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019 | 20 | 2019 |
Improving accent conversion with reference encoder and end-to-end text-to-speech W Li, B Tang, X Yin, Y Zhao, W Li, K Wang, H Huang, Y Wang, Z Ma arXiv preprint arXiv:2005.09271, 2020 | 13 | 2020 |
Improving non-native word-level pronunciation scoring with phone-level mixup data augmentation and multi-source information K Fu, S Gao, K Wang, W Li, X Tian, Z Ma arXiv preprint arXiv:2203.01826, 2022 | 10 | 2022 |
A transfer and multi-task learning based approach for MOS prediction X Tian, K Fu, S Gao, Y Gu, K Wang, W Li, Z Ma Proc. Interspeech 2022, 5438-5442, 2022 | 10 | 2022 |
Fine-grained audio-visual joint representations for multimodal large language models G Sun, W Yu, C Tang, X Chen, T Tan, W Li, L Lu, Z Ma, C Zhang arXiv preprint arXiv:2310.05863, 2023 | 9 | 2023 |
An ASR-free fluency scoring approach with self-supervised learning W Liu, K Fu, X Tian, S Shi, W Li, Z Ma, T Lee ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 9 | 2023 |
Improving mandarin tone mispronunciation detection for non-native learners with soft-target tone labels and blstm-based deep models W Li, NF Chen, SM Siniscalchi, CH Lee 2018 IEEE International Conference on Acoustics, Speech and Signal …, 2018 | 9 | 2018 |
Using Fluency Representation Learned from Sequential Raw Features for Improving Non-native Fluency Scoring. K Fu, S Gao, X Tian, W Li, Z Ma, A Bytedance INTERSPEECH, 4337-4341, 2022 | 8 | 2022 |
Using tone-based extended recognition network to detect non-native Mandarin tone mispronunciations W Li, SM Siniscalchi, NF Chen, CH Lee 2016 Asia-Pacific Signal and Information Processing Association Annual …, 2016 | 8 | 2016 |
video-SALMONN: Speech-enhanced audio-visual large language models G Sun, W Yu, C Tang, X Chen, T Tan, W Li, L Lu, Z Ma, Y Wang, C Zhang arXiv preprint arXiv:2406.15704, 2024 | 7 | 2024 |