Notes from the Frontier ft. Enrico Shippole
A deep dive into the evolution of LLMs and persistent challenges.
As a non-technical founder, understanding frontier tech can be exhausting. New trends and buzz words seem to pop up every day, and parsing out actionable insights from the hype is hard.
I've been facing this same problem as I transition from non-technical to psuedo-technical. One way to solve this (for me) has been to have 1:1 conversations with smart founders and academics working at the frontier—asking them to explain cutting-edge concepts like I’m five.
Publishing my notes here because I think the insights from these conversations will help other founders like myself. The goal with this is to give readers a sense of:
Where we’re at with AI today.
What are some things coming up in the near future.
What are the opportunities / use cases these changes will unlock.
Let's get into it.
This essay was written using Type.ai. I’m a huge fan of what Stew and team are building. If you’re a founder or investor who struggles to write and publish consistently, Type is a complete game changer.
Recently, I spoke with Enrico Shippole, a researcher and entrepreneur building Teraflop AI. Enrico's journey began in quantitative finance, where he became interested in using LLMs to process and extract insights from vast troves of financial documents like SEC filings. This led him to start training LLMs from scratch on terabytes of copyright free, permissively licensed data.
Enrico has published influential research on extending the context window of language models and improving the efficiency of diffusion-based generative models. He's also collaborated with leading AI companies like Stability AI to develop state-of-the-art open-source models.. In our conversation, Enrico shares his perspective on the current state of LLMs and where he sees the most exciting opportunities ahead.
Here are some takeaways from our discussion:
Model Architectures Haven't Changed Much, But Hardware and Data Have
While it may seem like the LLM realm is evolving rapidly, the core architectures of today's state-of-the-art language models have actually existed for 3-5 years. What's really driving progress are improvements in hardware, like the A100 and H100 GPUs, and techniques for parallelizing training across many devices. At the same time, the availability of massive high-quality datasets has been a game-changer.
Data Quality is a Major Bottleneck
Despite the abundance of web-scale data, data quality remains a significant challenge, especially for domain-specific applications. Many commonly used datasets rely on outdated parsing methods, making it difficult to extract clean, structured information. Enrico and his co-founder are tackling this problem head-on by building a platform to streamline data processing at scale.
Domain-Specific Models are Key for Robust Applications
While large general-purpose models like GPT-4 are impressive, the biggest opportunities lie in developing domain-specific models trained on curated, high-quality datasets. This is especially true for applications like information retrieval, document understanding, and knowledge-grounded generation, where having relevant in-domain knowledge can dramatically boost performance.
Open-Source Models are Catching Up, But Still Lag Behind
Open-source NLP models still have some ways to go to match the robustness and coherence of closed-source models like GPT-4 and Anthropic's Claude. However, he expects this gap to narrow in the coming months with the release of ever-larger open models like LLaMA 3.
Autoregressive Models Aren't Going Away Anytime Soon
While there's been a lot of excitement around non-autoregressive approaches like diffusion models for text generation, Enrico remains skeptical that they'll displace the tried-and-true autoregressive paradigm anytime soon. He notes that autoregressive language models have withstood the test of time and continue to be the backbone of most state-of-the-art NLP systems.
Stylistic Alignment is an Unsolved Challenge
It’s very hard to get language models to faithfully mimic a specific author's writing style or voice. He explains that this is difficult because most training data only contains a small fraction of any given author's oeuvre, making it hard for the model to reliably learn and reproduce their stylistic fingerprint. Few-shot prompting with representative excerpts can help, but generating truly unique creative outputs remains elusive.
Research Often Fails to Translate to Practice
Many of the techniques and architectures that generate buzz in the research community fail to deliver when applied to real-world problems. While it's easy to get caught up in the hype around each new groundbreaking paper, he advises taking research claims with a grain of salt and pressure-testing them yourself.
You can follow Enrico on X @EnricoShippole.
Until next time,
Yash