Llava-onevision: Easy visual task transfer B Li, Y Zhang, D Guo, R Zhang, F Li, H Zhang, K Zhang, P Zhang, Y Li, ... arXiv preprint arXiv:2408.03326, 2024 | 386 | 2024 |
Tinyllama: An open-source small language model P Zhang, G Zeng, T Wang, W Lu arXiv preprint arXiv:2401.02385, 2024 | 345 | 2024 |
Long context transfer from language to vision P Zhang, K Zhang, B Li, G Zeng, J Yang, Y Zhang, Z Wang, H Tan, C Li, ... arXiv preprint arXiv:2406.16852, 2024 | 100 | 2024 |
Lmms-eval: Reality check on the evaluation of large multimodal models K Zhang, B Li, P Zhang, F Pu, JA Cahyono, K Hu, S Liu, Y Zhang, J Yang, ... arXiv preprint arXiv:2407.12772, 2024 | 52* | 2024 |
Otterhd: A high-resolution multi-modality model B Li, P Zhang, J Yang, Y Zhang, F Pu, Z Liu arXiv preprint arXiv:2311.04219, 2023 | 52 | 2023 |
Better Few-Shot Relation Extraction with Label Prompt Dropout P Zhang, W Lu EMNLP 2022, 2022 | 28 | 2022 |
One Network, Many Masks: Towards More Parameter-Efficient Transfer Learning G Zeng, P Zhang, W Lu ACL 2023, 2023 | 22 | 2023 |
Temporal reasoning transfer from text to video L Li, Y Liu, L Yao, P Zhang, C An, L Wang, X Sun, L Kong, Q Liu arXiv preprint arXiv:2410.06166, 2024 | 2 | 2024 |
EgoLife: Towards Egocentric Life Assistant J Yang, S Liu, H Guo, Y Dong, X Zhang, S Zhang, P Wang, Z Zhou, B Xie, ... arXiv preprint arXiv:2503.03803, 2025 | | 2025 |
Efficient-vDiT: Efficient Video Diffusion Transformers With Attention Tile H Ding, D Li, R Su, P Zhang, Z Deng, I Stoica, H Zhang arXiv preprint arXiv:2502.06155, 2025 | | 2025 |
Fast Video Generation with Sliding Tile Attention P Zhang, Y Chen, R Su, H Ding, I Stoica, Z Liu, H Zhang arXiv preprint arXiv:2502.04507, 2025 | | 2025 |