Paper presentation and discussion session, organized by Stockholm AI and Silo AI, with the topic of Multi-modal AI.
This session was moderated by Pier Luigi Dovesi and I was the main speaker.
I started the presentation with an introduction of multi-modal models using the transformer architecture to then deep dive into two recent papers:
- CogVLM: Visual Expert for Pretrained Language Models (https://lnkd.in/dPaB946G)
- mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration (https://lnkd.in/dGiGPrmP)
A Q&A session followed the presentation.