
Summary in a word: Ruby is important, so I want Google Docs and NotebookLM to support it!
Google Docs, used for uploading sources to NotebookLM, does not support furigana (ruby) (nor does it support vertical writing). Therefore, when uploading a Word document containing furigana, it is necessary to take measures such as converting it to plain text format beforehand. You can also upload a Word document with ruby characters as is, but the ruby characters will not be processed correctly.
Ruby characters indicate the correct pronunciation (reading) of Japanese words. They are often displayed on top
Example: 私の名前は八島游舷です。
You see small hiragana on top of kanji words.
This feature is widely used in educational materials for Japanese learners, children’s books with limited kanji knowledge, and creative works such as novels, making it extremely important when dealing with Japanese text.
For more details on ruby characters: https://en.wikipedia.org/wiki/Ruby_character
“Aozora Bunko” is a volunteer-based website that publishes classical Japanese literature, and its texts frequently include ruby characters.
While the implementation of ruby support in Google Docs has been long-awaited for many years, it should be possible to implement it separately in NotebookLM as well. For example, it might be implemented by supporting the “Aozora Bunko format,” which is one of the simplest methods. This is the format enclosed in 《 》.
Example of ruby notation in Aozora Bunko format:
Aozora Bunko – Wikipedia ja.wikipedia.org
Supporting ruby characters would greatly contribute to NotebookLM in two main areas:
- Output Quality: Ruby characters indicate the correct readings, improving the output quality of audio and video. For example, when I output an explanation of my short story “D-World” as a video, it repeatedly read 鏡像 (mirror image) as “kagamizou” (which I couldn’t correct even by instructing NotebookLM in the glossary).
- Translation Accuracy: Ruby characters are also important for accurate translation from Japanese to other languages. This is because, besides names of people and places, there are many Japanese words that can have multiple readings, and the reading cannot always be determined from the kanji alone.
Why would supporting Aozora Bunko’s ruby characters be beneficial? It’s because by incorporating this wealth of Japanese literature content into NotebookLM, literary research and translation into other languages will become easier!
Unfortunately, the translation of Japanese literature has been slow, and excellent works remain buried. Literary translation is not easy, and it cannot be completed by NotebookLM alone. However, it’s much better than nothing. For example, you could have NotebookLM analyze Kenji Miyazawa’s works and generate illustrations based on the descriptions… though that would probably be better done by a human.
Perhaps a project to have Aozora Bunko works AI-drafted and then translated sequentially by human translators should be prioritized over the Cool Japan budget?
I will write more on that later.
Currently, it is necessary to upload two versions of the same text: one “with ruby” and one “without ruby” to NotebookLM. If ruby characters are implemented in both Google Docs and NotebookLM, the workflow could be significantly streamlined.
The mechanism of ruby characters is complex, considering which reading corresponds to which kanji, but a simple format would be sufficient if it only instructs the AI (NotebookLM) on the reading.