Labo

お客様から相談を受けたり、自分自身が「あったらな」と必要に迫られて作成したツールなどをご紹介します。

Here, I’ll introduce tools that I’ve created either in response to customer requests or out of my own need—things I thought, “It would be great if this existed.”

以下の北風と太陽（英語）をShort Movie Makerに渡して完全自動で制作。さきほど自動制作したばかりで、チューニングしていないので荒削りですが⋯

This short film was generated fully automatically using Short Movie Maker with the English version of The North Wind and the Sun as input. It was created just recently, and I have not yet tuned or refined it, so it is still rough around the edges.

https://en.wikisource.org/wiki/The_North_Wind_and_the_Sun
"The North Wind and the Sun", translated by Vernon Stanley Jones, illustrated by Arthur Rackham, in Æsop's fables: A New Translation (1912)

https://en.wikisource.org/wiki/%C3%86sop%27s_Fables_(V._S._Vernon-Jones)/The_North_Wind_and_the_Sun

動画生成SaaS全盛の今、
私が「Short Movie Maker」をフルスクラッチで作った理由

Short Movie Maker:
A Proof of Concept for How AI Video Tools Should Work

最近の動画生成AI、ほんとにすごいですよね。Invideo AI、Runway Gen-3、HeyGenなど、誰もがクリエイターになれる時代になりました。

でも、いざビジネスの最前線でガッツリ使おうとすると、ちょっとモヤモヤしませんか？

これまでの動画制作って、台本書きから音声収録、素材集め、そして動画編集ソフトでの緻密なタイムライン同期（音声と画像・字幕のタイミング合わせ）と、膨大な時間と専門スキルが必要でした。それをどうにかしたくて、私自身、世間で騒がれている「簡易的な動画生成ツール」をいくつも試してきたんですが……どれも期待外れでした。

「完全おまかせで作れるけど、完成後の修正ができない（できても死ぬほど面倒）」「日本語の音声がイマイチで、読み間違いを直せない」

「あともう少しテンポ上げたいのに微調整が効かない！」

結局、プラットフォームの仕様に縛られて、痒いところに手が届かない。色々な課題が放置されたままなのに、毎月のサブスク代だけはしっかり取られる。

だから、自分でフルスクラッチで作っちゃいました。『Short Movie Maker』。

動画の元ネタ（テキストでもPDFでも画像でも）をポンと渡すだけで、AIが「ナレーション」「シーン画像」「字幕」を自動生成。動画編集ソフトなんて一切開かずに、コマンド一発で「字幕あり動画」「字幕なし動画」「SRTデータ」のセットが完成するツールです。

誤解しないでほしいんですが、「既存のSaaSより便利なツールを作りました！」とドヤ顔でアピールしたいわけじゃありません。このツールは、私の頭の中にある技術哲学や「サービスってこうあるべきだよね」という少し偏執的な視点を世に問うための、言わば壮大な実験（Proof of Concept）なんです。

私が提供しているシステム開発やビジネスコンサルティングも、根底の思想はすべて同じ。このツールに込めた、私の「4つのこだわり」を聞いてください。

1. 「どう作るか」の前に「何を作るべきか」から考える

テキストを入れたら動画になるツールは、もう珍しくありません。私がやりたかったのは、GUI上に「AIプランナー」を住まわせることでした。 PDFや画像を投げれば、AIが勝手に構成案やテンポを複数提案してくれる。抽象的な要件を即座に具体的なアイデアに落とし込む。この「ゼロから価値を生み出す」アプローチが、私のサービス企画の基本スタンスです。

2. 「おまかせ」が奪う”人間くささ”への執着

既存ツールの「完全おまかせ」は、どこか冷たく単調になりがちです。だから私は、プロの編集者がこだわるパラメータをあえてシステム化しました。ミリ秒単位のフェード調整、音声スピードのランダムな「揺らぎ（グルーヴ感）」、妥協できなかった日本語の「ルビ自動付与」。さらに、TVコマーシャルのような激しい緩急をつける「CMモード」や、ElevenLabsの多彩なVoice指定まで組み込んでいます。もし自動生成された動画が気に入らなければ、特定のナレーションや画像、ストーリー展開、再生速度だけをサクッと簡単に修正できる。ただ効率化するだけでなく、人間の感情に寄り添う視聴体験（UX）を徹底的にハックしたかったんです。

3. クラウドとローカルの「いいとこ取り」でコストの壁を壊す

SaaSのネックである「重い処理待ち」と「高額なコスト」。これを打破すべく、計算資源の配置を見直しました。 AIの推論は強力なクラウドAPIに任せ、一番重いエンコードや合成処理はローカルのマルチスレッドにぶん投げる。結果、煩雑な作業工程を大幅に短縮し、高品質な動画がわずか数分、数十円（API実費のみ）で作れる仕組みが完成しました。利便性とROIを極限まで両立させる、私なりの最適解です。

4. ひとつのプラットフォームと心中しない身軽さ

特定のSaaSへの依存は、変化が激しいAI業界ではリスクでしかありません。だからこのシステムは、テキスト・画像・音声それぞれで「今、一番いいモデル」をAPI連携させる疎結合な作りにしています。先日中島聡さんが「Mulmo Claude」を発表されましたが、この「Short Movie Maker」に通じる何かがあるなと感じて驚きました。常に最新トレンドを組み込んでいく身軽さが私の武器なのかもしれないです。例えば、最新のマルチモーダルモデル（gemini-3-pro-image-previewなど）を指定してInterleavedモードを活用すれば、シーン間の整合性がバッチリ担保された動画だってすぐに作れます。

「次世代の動画生成」が変えるビジネスの現場

クリエイターの負担を最小限に抑えつつ、AIの力でアイデアを即座に形にする。テキストや資料を用意するだけで、誰でも直感的に高品質なショート動画を量産できるこのツールは、以下のようなシーンで強力な武器になります。

SNSマーケティング: YouTube Shorts、TikTok、Instagram Reels向けコンテンツの高速生成

広告クリエイティブ: CMモードと多様なボイスを活かした、プロモーション動画の大量制作・テスト

教育・研修コンテンツ: テキストベースのマニュアルや解説資料を、より分かりやすい動画へサクッと変換

30歳の時に描いたロードマップと、今「AI」を手にして思うこと

少し昔話をさせてください。

私はシステムエンジニアとしてのキャリアが長いですが、30歳の時に密かに立てた目標がありました。それは以下の3ステップで、自分の知見を複利で増やすこと。

1. システムエンジニアとして泥臭く圧倒的な経験を積む。

2. 特定業界からの学びを得て、現場のリアルな課題や商習慣を深く理解する。

3.「ITの力」と「特定ドメインの深い知見」を掛け合わせ、自らビジネスを展開する。

私はこのステップを順に踏み、システム屋と事業者の「両方の視点」を蓄積してきました。そして今、まさに「3」のフェーズをドライブさせようという絶好のタイミングで、「生成AI」という強烈なパワーが手に入ってしまったんです。

これ、控えめに言って最高にワクワクする時代に入ったと思いませんか？

ツールは「思考の実験」に過ぎない

AIが勝手にコードを書いてくれる時代。言われたものを作るだけの「実行能力」は、恐ろしいスピードでコモディティ化していくはずです。

こんな時代だからこそ、ピーター・ティールが名著『Zero to One』で語った哲学が、より重みを持つと私は思っています。

AIが既存のものを「1からNへ」と爆速でスケールさせる世界。そこで人間に求められるのは、前提を疑い、誰も思いつかない角度から「0から1を生み出す」哲学。そして、常識の枠を超える探求心（良い意味での変態性）だと信じています。

この「Short Movie Maker」は、私の脳内にある知見を、たまたま動画生成という領域で出力してみた結果に過ぎません。私が手掛ける他のコンサルティングやプロジェクトも、全く同じ熱量と変態性で動いています。

私は「ただ言われたシステムを作るだけのエンジニア」ではありません。

この偏執的でちょっとクレイジーなアプローチに共感し、「こいつと組んだら面白い未来が創れそうだ」と感じていただけたなら。世界中で新たなパートナーを探している方、日本市場でこれまでにないビジネスを仕掛けたい皆様からの、常識を越えた熱いご相談をお待ちしています。

Recent advances in AI video generation are remarkable. With tools like InVideo AI, Runway Gen-3, and HeyGen, we have entered an era in which almost anyone can become a creator.

But when it comes to using these tools in real business environments, the experience is often less impressive.

Traditional video production has always required significant time and specialized skill: writing scripts, recording narration, gathering assets, and then carefully synchronizing audio, visuals, and subtitles in an editing timeline. Wanting to simplify that process, I personally tested many of the “easy” AI video tools that have attracted so much attention in the market. In practice, however, none of them gave me the level of control or reliability I needed.

Some tools could generate a video end to end, but made post-generation editing difficult or painfully cumbersome. Others produced unnatural Japanese narration and offered no practical way to fix misreadings. And in many cases, even simple refinements—such as slightly adjusting the pacing—were surprisingly hard to make.

In the end, I kept running into the same issue: platform limitations. Important problems remained unresolved, yet the subscription fees remained very real.

So I built my own solution from scratch: Short Movie Maker.

Simply provide source material—text, PDFs, or images—and the system automatically generates narration, scene visuals, and subtitles. Without ever opening a video editor, a single command produces a complete output set: a subtitled video, a clean version without subtitles, and the corresponding SRT subtitle data.

To be clear, this project is not about claiming that I built a tool that is simply “better than existing SaaS.” Short Movie Maker is better understood as an ambitious proof of concept: a way of putting my technical philosophy and my somewhat obsessive view of how software should serve people into a concrete form.

The same thinking underlies the system development and business consulting work I provide. This tool reflects four principles that define that broader approach.

1. Start with what should be built, not just how to build it

Tools that turn text into video are no longer unusual. What interested me was something more fundamental: building an AI planner directly into the interface.

When a user provides a PDF or image, the system does not just “generate a video.” It can also propose multiple narrative structures, pacing options, and creative directions. In other words, it translates abstract intent into concrete ideas.

That approach—creating value from ambiguity, not just executing clearly defined instructions—is central to how I think about service design.

2. Preserve the human element that full automation often strips away

Many existing tools aim for a fully automated workflow, but the results often feel flat, cold, and mechanically uniform. I wanted to address that directly.

That is why I deliberately systematized the kinds of parameters professional editors actually care about: millisecond-level fade control, subtle variation in narration speed so delivery feels less mechanical, and automatic furigana support for Japanese text—something I considered essential rather than optional.

I also built in a CM Mode to create the sharper rhythm and dynamic contrast associated with television commercials, along with flexible voice selection through ElevenLabs.

Just as important, if the generated output is not right, it is easy to revise only the parts that matter: a specific narration segment, an image, a branch in the story flow, or the playback speed. My goal was not simply to automate production. It was to design a viewing experience that still feels intentional, expressive, and human.

3. Use a hybrid cloud-and-local architecture to break the cost barrier

Two of the biggest weaknesses of SaaS video tools are long waits for heavy processing and high recurring costs. To address both, I rethought where each part of the workload should run.

AI inference is handled through powerful cloud APIs, while the heaviest encoding and compositing tasks are offloaded to a multithreaded local environment. That architectural choice dramatically reduces processing time while keeping output quality high.

The result is a workflow that can produce polished videos in just a few minutes, often at a cost measured in only tens of yen per run, excluding anything beyond API usage. For me, this is not just an engineering optimization. It is a practical way to maximize both usability and return on investment.

4. Stay agile instead of becoming dependent on a single platform

In a fast-moving AI industry, deep dependence on any single SaaS platform is a strategic risk. That is why this system is built as a loosely coupled architecture, allowing the best available model for text, image, and voice generation to be integrated through APIs as needed.

When Satoshi Nakajima introduced Mulmo Claude, I was struck by how closely its direction resonated with ideas I had already been exploring through Short Movie Maker. That confirmed something important for me: one of my strengths is the ability to continuously absorb new trends and turn them into practical systems quickly.

For example, by using image-capable multimodal models in an interleaved workflow, it becomes possible to generate videos with much stronger visual consistency from scene to scene. The architecture is designed to evolve with the model landscape rather than be constrained by it.

How next-generation video generation changes real business work

The value of this tool is simple: it minimizes the production burden on creators while allowing ideas to take shape immediately through AI.

With nothing more than text or source material, users can quickly produce high-quality short-form videos at scale. That makes the system especially useful in areas such as:

Social media marketing: rapid production of content for YouTube Shorts, TikTok, and Instagram Reels

Advertising creative: large-scale testing and production of promotional videos using dynamic pacing and diverse voice options

Education and training: fast conversion of manuals, documents, and explanatory materials into more accessible video content

The roadmap I drew at 30—and what it means now that I have AI

A brief personal note.

I have spent many years working as a systems engineer. When I turned 30, I quietly set a long-term roadmap for myself: a three-step process for compounding knowledge and capability over time.

1. Build overwhelming hands-on experience as a systems engineer.

2. Learn deeply from a specific industry so I can understand real operational problems and commercial practices firsthand.

3. Combine the power of IT with deep domain expertise to build businesses of my own.

I followed that path step by step, accumulating both perspectives: the builder’s perspective and the operator’s perspective.

And now, just as I have reached the stage where I can fully activate that third step, I also have access to the force multiplier of generative AI.

That is why I believe we are entering an extraordinary period. The timing could hardly be more interesting.

Tools are only the surface; the real point is the thinking behind them

We now live in a world where AI can generate working code on demand. In that environment, the ability to merely execute instructions—to build what one is told to build—will become commoditized at an extraordinary pace.

That is precisely why I believe the philosophy Peter Thiel described in Zero to One matters even more now.

AI will become increasingly effective at scaling what already exists from 1 to N. What humans must contribute is something different: the ability to question assumptions, to identify overlooked opportunities, and to create 0 to 1 from unexpected angles.

That requires more than execution. It requires independent judgment, conceptual clarity, and a willingness to pursue unconventional ideas beyond the boundaries of standard practice.

Short Movie Maker is simply one example of that way of thinking expressed through the domain of video generation. The consulting work and projects I take on are driven by exactly the same mindset.

I am not merely an engineer who builds what he is asked to build.

I work by combining technical depth, business perspective, and a persistent willingness to rethink the premise itself. If that approach resonates with you—if you feel that working together could lead to something genuinely new—I would be glad to hear from you.

I welcome conversations with people looking for new partners across borders, and with those who want to build unconventional businesses for the Japanese market.

生成AIを活用した設計図面からの加工時間・費用の見積り

Estimating Machining Time and Costs from Design Drawings Using Multimodal Models

--- 工事中 ---

--- Under Construction ---

↑Back to Top