Paper presentation and discussion session, organized by Stockholm AI and Silo AI, with the topic of Multi-modal AI. This session was moderated by Pier Luigi Dovesi and I was the main speaker. I started the presentation with an introduction of multi-modal models using the transformer architecture to then deep dive into two recent papers: - CogVLM: Visual Expert for Pretrained Language Models (https://lnkd.in/dPaB946G) - mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration (https://lnkd.in/dGiGPrmP) A Q&A session followed the presentation.