Follow
Wei Li
Wei Li
Bytedance
Verified email at bytedance.com
Title
Cited by
Cited by
Year
Salmonn: Towards generic hearing abilities for large language models
C Tang, W Yu, G Sun, X Chen, T Tan, W Li, L Lu, Z Ma, C Zhang
arXiv preprint arXiv:2310.13289, 2023
1732023
Improving non-native mispronunciation detection and enriching diagnostic feedback with DNN-based speech attribute modeling
W Li, SM Siniscalchi, NF Chen, CH Lee
2016 IEEE international conference on acoustics, speech and signal …, 2016
1082016
Llava-next-interleave: Tackling multi-image, video, and 3d in large multimodal models
F Li, R Zhang, H Zhang, Y Zhang, B Li, W Li, Z Ma, C Li
arXiv preprint arXiv:2407.07895, 2024
552024
Improving Mispronunciation Detection for Non-Native Learners with Multisource Information and LSTM-Based Deep Models.
W Li, NF Chen, SM Siniscalchi, CH Lee
Interspeech, 2759-2763, 2017
442017
Connecting speech encoder and large language model for asr
W Yu, C Tang, G Sun, X Chen, T Tan, W Li, L Lu, Z Ma, C Zhang
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024
362024
Detecting Mispronunciations of L2 Learners and Providing Corrective Feedback Using Knowledge-Guided and Data-Driven Decision Trees.
W Li, K Li, SM Siniscalchi, NF Chen, CH Lee
Interspeech 2016, 3127-3131, 2016
332016
Improving mandarin tone recognition based on dnn by combining acoustic and articulatory features using extended recognition networks
J Lin, W Li, Y Gao, Y Xie, NF Chen, SM Siniscalchi, J Zhang, CH Lee
Journal of Signal Processing Systems 90, 1077-1087, 2018
302018
Improving mispronunciation detection of mandarin tones for non-native learners with soft-target tone labels and BLSTM-based deep tone models
W Li, NF Chen, SM Siniscalchi, CH Lee
IEEE/ACM Transactions on Audio, Speech, and Language Processing 27 (12 …, 2019
262019
A cross-task transfer learning approach to adapting deep speech enhancement models to unseen background noise using paired senone classifiers
S Wang, W Li, SM Siniscalchi, CH Lee
ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020
242020
A study on functional loads of phonetic contrasts under context based on mutual information of Chinese text and phonemes
J Zhang, W Li, Y Hou, W Cao, Z Xiong
2010 7th International Symposium on Chinese Spoken Language Processing, 194-198, 2010
232010
Improving audio-visual speech recognition performance with cross-modal student-teacher training
W Li, S Wang, M Lei, SM Siniscalchi, CH Lee
ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019
202019
Improving accent conversion with reference encoder and end-to-end text-to-speech
W Li, B Tang, X Yin, Y Zhao, W Li, K Wang, H Huang, Y Wang, Z Ma
arXiv preprint arXiv:2005.09271, 2020
132020
Improving non-native word-level pronunciation scoring with phone-level mixup data augmentation and multi-source information
K Fu, S Gao, K Wang, W Li, X Tian, Z Ma
arXiv preprint arXiv:2203.01826, 2022
102022
A transfer and multi-task learning based approach for MOS prediction
X Tian, K Fu, S Gao, Y Gu, K Wang, W Li, Z Ma
Proc. Interspeech 2022, 5438-5442, 2022
102022
Fine-grained audio-visual joint representations for multimodal large language models
G Sun, W Yu, C Tang, X Chen, T Tan, W Li, L Lu, Z Ma, C Zhang
arXiv preprint arXiv:2310.05863, 2023
92023
An ASR-free fluency scoring approach with self-supervised learning
W Liu, K Fu, X Tian, S Shi, W Li, Z Ma, T Lee
ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023
92023
Improving mandarin tone mispronunciation detection for non-native learners with soft-target tone labels and blstm-based deep models
W Li, NF Chen, SM Siniscalchi, CH Lee
2018 IEEE International Conference on Acoustics, Speech and Signal …, 2018
92018
Using Fluency Representation Learned from Sequential Raw Features for Improving Non-native Fluency Scoring.
K Fu, S Gao, X Tian, W Li, Z Ma, A Bytedance
INTERSPEECH, 4337-4341, 2022
82022
Using tone-based extended recognition network to detect non-native Mandarin tone mispronunciations
W Li, SM Siniscalchi, NF Chen, CH Lee
2016 Asia-Pacific Signal and Information Processing Association Annual …, 2016
82016
video-SALMONN: Speech-enhanced audio-visual large language models
G Sun, W Yu, C Tang, X Chen, T Tan, W Li, L Lu, Z Ma, Y Wang, C Zhang
arXiv preprint arXiv:2406.15704, 2024
72024
The system can't perform the operation now. Try again later.
Articles 1–20