Videochat: Chat-centric video understanding KC Li, Y He, Y Wang, Y Li, W Wang, P Luo, Y Wang, L Wang, Y Qiao arXiv preprint arXiv:2305.06355, 2023 | 200 | 2023 |
Internvideo: General video foundation models via generative and discriminative learning Y Wang, K Li, Y Li, Y He, B Huang, Z Zhao, H Zhang, J Xu, Y Liu, Z Wang, ... arXiv preprint arXiv:2212.03191, 2022 | 169 | 2022 |
Videomae v2: Scaling video masked autoencoders with dual masking L Wang, B Huang, Z Zhao, Z Tong, Y He, Y Wang, Y Wang, Y Qiao Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 149 | 2023 |
Forgerynet: A versatile benchmark for comprehensive forgery analysis Y He, B Gan, S Chen, Y Zhou, G Yin, L Song, L Sheng, J Shao, Z Liu Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2021 | 104 | 2021 |
Uniformerv2: Spatiotemporal learning by arming image vits with video uniformer K Li, Y Wang, Y He, Y Li, Y Wang, L Wang, Y Qiao arXiv preprint arXiv:2211.09552, 2022 | 71 | 2022 |
Unmasked teacher: Towards training-efficient video foundation models K Li, Y Wang, Y Li, Y Wang, Y He, L Wang, Y Qiao Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 58 | 2023 |
Lavie: High-quality video generation with cascaded latent diffusion models Y Wang, X Chen, X Ma, S Zhou, Z Huang, Y Wang, C Yang, Y He, J Yu, ... arXiv preprint arXiv:2309.15103, 2023 | 55 | 2023 |
Internchat: Solving vision-centric tasks by interacting with chatbots beyond language Z Liu, Y He, W Wang, W Wang, Y Wang, S Chen, Q Zhang, Y Yang, Q Li, ... arXiv preprint arXiv:2305.05662, 2023 | 54 | 2023 |
Internvid: A large-scale video-text dataset for multimodal understanding and generation Y Wang, Y He, Y Li, K Li, J Yu, X Ma, X Li, G Chen, X Chen, Y Wang, C He, ... arXiv preprint arXiv:2307.06942, 2023 | 52 | 2023 |
Intern: A new learning paradigm towards general vision J Shao, S Chen, Y Li, K Wang, Z Yin, Y He, J Teng, Q Sun, M Gao, J Liu, ... arXiv preprint arXiv:2111.08687, 2021 | 31 | 2021 |
Internvideo-ego4d: A pack of champion solutions to ego4d challenges G Chen, S Xing, Z Chen, Y Wang, K Li, Y Li, Y Liu, J Wang, YD Zheng, ... arXiv preprint arXiv:2211.09529, 2022 | 30 | 2022 |
Mvbench: A comprehensive multi-modal video understanding benchmark K Li, Y Wang, Y He, Y Li, Y Wang, Y Liu, Z Wang, J Xu, G Chen, P Luo, ... arXiv preprint arXiv:2311.17005, 2023 | 18 | 2023 |
Videomamba: State space model for efficient video understanding K Li, X Li, Y Wang, Y He, Y Wang, L Wang, Y Qiao arXiv preprint arXiv:2403.06977, 2024 | 17 | 2024 |
Vbench: Comprehensive benchmark suite for video generative models Z Huang, Y He, J Yu, F Zhang, C Si, Y Jiang, Y Zhang, T Wu, Q Jin, ... arXiv preprint arXiv:2311.17982, 2023 | 13 | 2023 |
Uniformerv2: Unlocking the potential of image vits for video understanding K Li, Y Wang, Y He, Y Li, Y Wang, L Wang, Y Qiao Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 10 | 2023 |
Internvideo2: Scaling video foundation models for multimodal video understanding Y Wang, K Li, X Li, J Yu, Y He, G Chen, B Pei, R Zheng, J Xu, Z Wang, ... arXiv preprint arXiv:2403.15377, 2024 | 6 | 2024 |
X-learner: Learning cross sources and tasks for universal visual representation Y He, G Huang, S Chen, J Teng, K Wang, Z Yin, L Sheng, Z Liu, Y Qiao, ... European Conference on Computer Vision, 509-528, 2022 | 6 | 2022 |
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities C Lu, C Qian, G Zheng, H Fan, H Gao, J Zhang, J Shao, J Deng, J Fu, ... arXiv preprint arXiv:2401.15071, 2024 | 3 | 2024 |
ForgeryNet--Face Forgery Analysis Challenge 2021: Methods and Results Y He, L Sheng, J Shao, Z Liu, Z Zou, Z Guo, S Jiang, C Sun, G Zhang, ... arXiv preprint arXiv:2112.08325, 2021 | 2 | 2021 |
Harvest Video Foundation Models via Efficient Post-Pretraining Y Li, K Li, Y He, Y Wang, Y Wang, L Wang, Y Qiao, P Luo arXiv preprint arXiv:2310.19554, 2023 | 1 | 2023 |