Collection of research papers and resources on multimodal learning, generation, and editing models.
LLMs Meet Multimodal Generation and Editing: A Survey: https://arxiv.org/html/2405.19334v1
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities: https://arxiv.org/html/2505.02567v3
Deep Research:
https://docs.google.com/document/d/1mbB_-o_vVIn54ZsHaJyp3opIAh3Kz6clMr4t_uXDkMI/edit?usp=sharing
https://chatgpt.com/s/dr_68943a432e9881918bdc63ed1e5e25cb
https://www.kimi.com/share/d2a3kotf4396rncvlmvg
Article:
https://weaviate.io/blog/multimodal-models