Published May 7, 2026
Accessible cyberinfrastructure is helping researchers transform rich cancer datasets into clinical insight
Cancer researchers today have access to an unprecedented volume of clinical and genomic data. The challenge is no longer simply collecting information; it is finding ways to analyze that data at scale and translate it into clinically meaningful insights.
Researchers at the University of Missouri are using the FABRIC testbed, a globally distributed research infrastructure designed for advanced computing and experimentation, to apply artificial intelligence and machine learning techniques to large-scale cancer datasets. Their work explores how computational models can help predict survivability outcomes and uncover patterns across thousands of metastatic cancer cases.
At the center of the project is a publicly available dataset from Memorial Sloan Kettering Cancer Center containing information from approximately 25,000 patients across 27 tumor types, including clinical variables, staging information, treatment regimens, and genomic markers. The richness of the dataset created an opportunity to move beyond traditional statistical analyses and apply machine learning methods to identify new insights.
“We wanted to explore what machine learning could reveal beyond conventional statistical methods,” said Dr. Praveen Rao, associate professor of electrical engineering and computer science at the University of Missouri. “FABRIC became the platform that allowed us to run those experiments and iterate quickly.”
From clinical questions to computational discovery
The project began as part of an NSF CC* grant focused on scalable human genome analysis using FABRIC. As part of the grant, Rao sought to train PhD student Polycarp Nalela in health informatics and data-driven cancer research.
The team turned to the MSK-MET dataset, a widely used resource from Memorial Sloan Kettering that includes metastatic cancer cases spanning multiple organ systems. Because the dataset combines both clinical and molecular information, it provided an ideal foundation for machine learning and interpretability analysis.
Dr. Deepthi Rao, a gastrointestinal pathologist and clinical informatics expert who previously trained at Memorial Sloan Kettering, helped bring the clinical perspective to the work.
“This is an exceptionally rich dataset,” said Deepthi. “You have survival data, staging, histologic subtypes, treatment regimens, and genomic information all together. Clinically, that makes it incredibly valuable.”
This collaboration between computer science and clinical expertise proved essential in shaping the questions the models would address, from survivability prediction to identifying patterns in aggressive and rare cancers.
Making advanced computing accessible
For this project, FABRIC served as the team’s primary computing platform.
Using FABRIC’s virtual machines and integrated Jupyter notebook environment, the researchers were able to perform large-scale data analysis, train machine learning models, and repeatedly rerun experiments throughout the peer-review process.
The notebook environment was especially important for enabling researchers from non-computer science backgrounds to work comfortably within advanced cyberinfrastructure.
“Not everybody is a hardcore computer scientist working from the Linux terminal,” said Praveen. “For data scientists and informatics students, having a familiar notebook interface made the transition seamless.”
This accessibility is a key part of FABRIC’s broader impact: lowering the technical barriers that often prevent domain scientists from leveraging advanced computing resources.
Rather than investing in costly local infrastructure or paying for commercial cloud services, the team was able to run extensive experiments at no additional computing cost.
“That saved us a lot of money,” Praveen said. “We didn’t need to pay cloud computing fees or build our own lab resources.”
Enabling the next generation of AI-driven cancer research
The publication represents only the beginning of a broader research trajectory.
The team is now expanding the work to include AI model training on genomic data using FABRIC’s GPU resources, with a focus on survivability prediction and clinically actionable insights. Future directions include identifying precursor patterns in aggressive cancers such as pancreatic adenocarcinoma, analyzing rare tumor types, and exploring whole-slide pathology images through AI-assisted pattern recognition.
For clinicians, the potential is significant. “Now both the data and the technology are there,” said Deepthi Rao. “I think the sky is the limit.” The team also sees broader applications beyond oncology, from genomics and food sciences to zoology and digital pathology, all supported by the same scalable computing framework.
By making high-performance computing and AI infrastructure accessible to domain scientists, FABRIC is helping researchers move faster from complex datasets to discoveries that can ultimately improve patient care.