Why the future of AI is flexible, reusable foundation models
- by 7wData
When learning a different language, the easiest way to get started is with fill in the blank exercises. “It’s raining cats and …”
By making mistakes and correcting them, your brain (which linguists agree is hardwired for language learning) starts discovering patterns in grammar, vocabulary, and word sequence — which can not only be applied to filling in blanks, but also to convey meaning to other humans (or computers, dogs, etc.).
That last bit is important when talking about so-called ‘Foundation models,’ one of the hottest (but underreported) topics in Artificial Intelligence right now.
According to a review paper from 2021, Foundation models are, “trained on broad data (generally using self-supervision at scale) that can be adapted to a wide range of downstream tasks.”
In non-academic language, much like studying fill in the blank exercises, foundation models learn things in a way that can later be applied to other tasks, making them more flexible than current AI models.
The way foundation models are trained solves one of the biggest bottlenecks in AI: labeling data.
When (to prove you’re not a robot) a website asks you to select “all the pictures containing a boat,” you’re essentially labeling. This label can then be used to feed images of boats to an algorithm so it can, at some point, reliably recognize boats on its own. This is traditionally how AI models are trained; using data labeled by humans. It’s a time-consuming process and requires many humans to label data.
Foundation models don’t need this type of labeling. Instead of relying on human annotation, they use the fill in the blanks method and self-generated feedback to continuously learn and improve performance, without the need for human supervision.
This makes foundation models more accessible for industries that don’t already have a wide-range of data available. In fact, according to Dakshi Agrawal, IBM Fellow and CTO at IBM AI, depending on the domain you’re training a foundation model in, a few gigabytes of data can suffice.
These complex models might sound far removed from a user like you, but you’ve almost certainly seen a foundation model at work at some point online. Some of the more famous ones are the GPT-3 language model, which, after being fed works by famous writers, can produce remarkable imitations, or DALL-E, which produces stunning images based on users’ prompts.
Beyond creating new entertainment, the flexibility that foundation models bring could help accelerate groundbreaking medical research, scientific advances, engineering, architecture, and even programming.
Foundation models are characterized by two very interesting properties: Emergence and homogenization.
Emergence means new unexpected properties that models show which were not available in previous generations. It typically happens when model sizes grow. A language model doing basic arithmetic reasoning is an example of an emergent property of a model which is somewhat unexpected.
Homogenization is a complicated term for a model that’s trained to understand and use the English language to perform different tasks. This could include summarizing a piece of text, outputting a poem in the style of a famous writer or interpreting a command given by a human (the GPT-3 language model is a good example of this).
But foundation models are not limited to human language. In essence, what we’re teaching a computer to do is to find patterns in processes or phenomena that it can then replicate given a certain condition.
Let’s unpack that with an example. Take molecules. Physics and chemistry dictate that molecules can exist only in certain configurations. The next step would be to define a use for molecules, such as medicines. A foundation model can then be trained, using reams of medical data, to understand how different molecules (i.e. drugs) interact with the human body when treating diseases.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read More