Inception, a pioneering company based in Palo Alto, has emerged from the innovative efforts of Stanford computer science professor Stefano Ermon. The company has developed a groundbreaking AI model utilizing "diffusion" technology, which they have named a diffusion-based large language model, or DLM for short.
Currently, the spotlight in generative AI models is primarily on two types: large language models (LLMs) and diffusion models. LLMs, which utilize the transformer architecture, are primarily employed for text generation. In contrast, diffusion models, which drive AI systems like Midjourney and OpenAI’s Sora, are mainly used to generate images, videos, and audio.
Inception’s diffusion-based language model offers the functionalities of traditional LLMs, including code generation and question-answering, but claims significantly faster performance and reduced computing costs. According to Ermon, the model leverages the power of diffusion technology to overcome the speed limitations associated with LLMs.
Ermon explained to TechCrunch that he has long researched how to apply diffusion models to text generation in his Stanford lab. His hypothesis was that the sequential nature of LLMs limits their speed. In contrast, diffusion models can start with an approximate estimate of the data they generate, refining it all at once. This allows for generating and modifying large blocks of text in parallel.
After persistent efforts, Ermon and his student achieved a significant breakthrough, which they published in a research paper last year. Recognizing the potential of this advancement, Ermon founded Inception last summer, bringing on board two former students, UCLA professor Aditya Grover and Cornell professor Volodymyr Kuleshov, to co-lead the company.
While Ermon withheld specific details about Inception’s funding, it is understood that the Mayfield Fund has invested in the company. Inception has already garnered several customers, including unnamed Fortune 100 companies, by addressing critical needs for reduced AI latency and increased speed.
"What we found is that our models can leverage the GPUs much more efficiently," Ermon stated, referring to the crucial chips used for running models in production. "I think this is a big deal. This is going to change the way people build language models."
Inception provides an API along with on-premises and edge device deployment options, support for model fine-tuning, and a suite of out-of-the-box DLMs for various use cases. The company claims its DLMs can run up to 10 times faster than traditional LLMs while costing 10 times less.
"Our ‘small’ coding model is as good as OpenAI’s GPT-4o mini while more than 10 times as fast," a company spokesperson told TechCrunch. "Our ‘mini’ model outperforms small open-source models like Meta’s Llama 3.1 8B and achieves more than 1,000 tokens per second."