Demos

See FastFlowLM running on real hardware

Explore live demos and recordings that showcase FastFlowLM powering LLMs, VLMs, and embeddings fully on the Ryzen™ AI NPU. From interactive chat experiences to system integrations, these examples highlight what’s possible when you make the NPU the primary inference engine.

Watch on YouTube Remote test drive

GPT-OSS on NPU

GPT-OSS-20B streaming fully on the Ryzen™ AI NPU

Stream GPT-OSS-20B locally with FastFlowLM, keeping CPU and GPU usage minimal while the NPU does the heavy lifting.

GPT-OSS 20B running locally on the Ryzen AI NPU

GPT-OSS on NPU

Run large-scale chat and reasoning workloads directly on your Ryzen™ AI NPU with stable, low-power performance— ideal for long-running sessions and experimentation.

Whisper on-device

Transcribe and summarize long-form audio locally

Use FastFlowLM to run Whisper completely on the NPU, keeping voice and meeting data on your device while you transcribe and summarize hours of audio.

Whisper transcription and summarization demo

Whisper on-device

Turn recordings into searchable transcripts and concise summaries with NPU-accelerated Whisper, without sending audio to the cloud.

Llama 3.2 on WebUI

Interact with Llama 3.2-3B through the FastFlowLM WebUI

Chat with Llama 3.2-3B in a browser-based UI powered by FastFlowLM, with responses served directly from the Ryzen™ AI NPU.

Llama 3.2 chat demo running in the FastFlowLM WebUI

Llama 3.2 on WebUI

Explore NPU-backed chat experiences with a modern WebUI, suitable for local copilots, assistants, and prototyping interactive products.