From Images to Markdown
I previously wrote about building a mind map app using vibe coding. But why did I build a mind map app in the first place? One reason is simply that I have been a long-time user of mind maps. The other reason is that I wanted to effectively leverage the mind maps generated by NotebookLM.
However, there was a critical limitation: NotebookLM could only export mind maps as static images. This severely limited their utility (even though I could arguably rebuild them using Nano Banana).
This is where vibe coding and Google AI Studio came into play. I built an app that uses OCR to scan the mind map images and convert them into Markdown.

This worked remarkably well, and it really opened my eyes to the potential of “vibe coding.” If a feature is missing, I can just build it myself.

It triggered a “Cambrian explosion” of ideas — so much so that I was actually losing sleep. Eventually, though, this wasn’t enough; I found myself craving cleaner, more visually appealing mind maps. That is precisely why I decided to build the mind map application itself.
From Video to PowerPoint — SlideRemix
Now, NotebookLM analyzes sources to generate slides and automatically creates presentation videos complete with audio. The initial impact of this feature was profound.
However, as I generated more of them, certain issues began to surface. Just like before, these videos cannot be edited directly. If there is a major error in the audio or slides — even a small one — fixing it requires a full video editing workflow. Ideally, the system would export to an editable PowerPoint format, but NotebookLM lacks this functionality.
If a feature is missing, build it yourself. So, naturally, I built this using vibe coding as well.
Initially, I was extracting images from the video and using OCR to convert them into PowerPoint. However, I soon realized I could just feed the raw video file directly into the system.
The result is SlideRemix: a tool designed to reconstruct and edit slide decks from videos, images, and PDFs.

Key Features
The tool includes the following four primary features:
MP4 Video Import
- Automatically extracts slides from presentation video files.
- You can select specific slides to process or download them directly as images.
Audio Transcription
- As an optional feature, it can extract and recognize audio for each slide.
- It automatically transcribes the speech content and adds it to the speaker notes.
Smart Font Detection
- Automatically identifies and analyzes the font family, size, weight, and color of text within images to match the original design.
Magic Inpainting
- Cleanly removes text from images while preserving background diagrams and charts, regenerating it as editable text boxes. This is extremely useful for tasks like translation where text needs to be replaced.
Supported File Formats
- Images (PNG, JPG)
- PDF Files
- MP4 Videos
The following video was created leveraging this workflow.
This video also incorporates slide images generated by SlideRemix. However, creating this video involved a rather complex process. I used NotebookLM beforehand to generate a script for the summary video. Even with a script-based overview video, there were issues such as the audio exaggerating facts, so I ultimately decided to have the script read aloud using Google AI Studio’s speech synthesis. I then combined this with separately generated images and music created by Suno using video editing software. If you are not satisfied with NotebookLM videos, you may end up having to create everything separately after all. I will cover the details in a separate article.