![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
NASA and IBM are working together to create foundation models based on NASA’s data sets — including geospatial data — with the goal of accelerating the creation of AI models.
Foundation models are trained on large, broad data sets, then used to train other AI models by using targeted and smaller datasets. Foundation models can be used for different tasks and can apply information about one situation to another. One real-world example of a foundation model at work is ChatGPT3, which was built with the foundation model, GPT3.
Priya Nagpurkar, who oversees IBM’s hybrid cloud platform and developer productivity research strategy, said this approach accelerates the creation of AI models.
“We are excited about this being a key proof point and really being the first time we at IBM are applying foundation model technology towards sciences and to this scale of data in particular,” Nagpurkar said.
“Foundation models are part of a big push at IBM Research and what excites us about foundation models is it’s this emerging AI technology, which can ingest large amounts of unlabeled data and transfer, learn in one area and apply it to others [which] simplifies significantly downstream tasks and AI applications, and also removes the need for large amounts of labor data.”
Foundation models can be powerful: It originally took IBM seven years to train Watson in 12 languages. By using a foundation model, IBM accelerated Watson’s language abilities to 25 languages in approximately one year.
“The key thing here is it will augment and accelerate the scientific process in terms of building and solving specific science problems,” said Rahul Ramachandran, senior research scientist at NASA’s Marshall Space Flight Center in Huntsville, Alabama. “Instead of people having to build their own individual machine learning pipelines starting from collecting the large volumes of training data, you can start with the foundation models, and with a few limited or well-curated training samples, you should be able to build your applications that would meet your scientific or application needs.”
The goal of this joint work is to advance scientific understanding, as well as the response to Earth and climate-related issues such as natural disasters and warming temperatures, the joint press release stated. The collaboration will apply foundation models in two areas:
In 2020, NASA held a workshop about incorporating AI and machine learning, where NASA identified two challenges: First, the lack of training data sets required to train deep learning models — which Ramachandran called “a major scientific bottleneck.” Second, the existing AI models do not generalize across space and time.
While there is already a proof-of-concept for the first language model project — which IBM and NASA speculated could be ready by mid-year — the second goal faces technical challenges, IBM and NASA officials acknowledged during a press conference Tuesday.
“We’re looking at new, innovative solutions that can address these problems […] I think that there is a potential for the foundation models to address these challenges,” Ramachandran said.
Raghu Ganti, principal researcher at IBM, further explained that the remote sensors of the HLS2 data set created unique challenges because it’s recording geophysics space data, which includes time and space information.
“The kind of transformer technology on which foundation models are built will have to change in order to train a model on top of such data,” he said. “Those are the questions that we are exploring.”
Transformer technology is a deep learning model used primarily in the fields of natural language processing and computer vision. Transformers are designed to process sequential input data, which includes natural language, to translate and summarize. Unlike older approaches, like recurrent neural networks, transformers process the entire input all at once — so rather than digesting one word at a time, transformer technology can process a sentence. This approach reduces training time.
There’s also the fact that NASA’s archive data is currently at 70 petabytes and is projected to grow to 250 petabytes within a few years with the launch of high data rate missions, such as SWOT, launched in December, and NISAR, planned for 2024.
“All our data is openly available, we support 7 billion users worldwide who access our data for research and applications,” Ramachandran said. “Our goal is to make our data, the NASA data — which is really valuable to the scientific community — discoverable, accessible and usable for broad scientific use and applications worldwide.”
One goal of the projects will be to lower the barriers of entry for end users to put that data to work, he said.
The models will be open and available to the public, and will leverage PyTorch, Ganti said.
“Our training platform leverages PyTorch, we solely rely upon PyTorch for training all our foundation models, and we have partnered with PyTorch as well, to drive the training of all these models,” Ganti said. “PyTorch is the go-to deep learning framework for all the developers in open source and we just want to make sure all our foundation models, the technology for training, the models that we train on are all in PyTorch and contributed back to the community.”
The foundation model platform is built on Red Hat OpenShift, which supports running on any hyper scaler in a public or private cloud. Red Hat OpenShift will “let you train these models with recipes out of the box much faster,” Ganti said.
For example, with a model already built with NASA data, they leveraged roughly a billion tokens. Tokenization is splitting the input data into a sequence of meaningful parts, according to ML engineer and AI blogger Vaclav Kosar.
“To give you a comparison, an open source model, which is a very high-quality model, has 50 billion tokens,” Ganti said. “This particular one takes…maybe it’s around six hours on 32 GPUs — that’s pretty much what it is doing right now. So you see significant speed improvement because of all the streamlining of the training approach that we are taking.”
That’s a big improvement to developer productivity, he noted, adding that putting that into the hands of developers is something “we are strategically interested in.”