Why data partnerships can create lasting value for rightsholders
This week, I attended an event discussing data licensing for machine-learning (ML) and AI, where a panellist made a striking comment:
“Rightsholders are signing deals with tech companies because they have to. But no-one can create a sustainable company just from data”
It’s an understandable sentiment, that reflects the concerns of rightsholders in many industries, but it’s also incorrect.
One of Nascent Studio’s founding principles is that quality data are a vital foundation for effective AI applications. If you accept that effective AI has genuine, sustainable value, then the foundations that enable this, like computational infrastructure and data must also have lasting value.
Belief in the value of data is more than theoretical and reflects reality. To give one example, the UK government and others, recently committed significant funding to OpenBind, a project aiming to generate structural biology data for AI and ML.
This belief also reflects Nascent Studio’s practical experience and expertise. In my time at Elsevier, a leading scientific information provider, I built a business developing data solutions for AI and ML, and achieved over 10x (sustainable, recurring) revenue growth in 4 years.
This success was a result of some difficult lessons and careful decisions, the most important of which I have summarised below:
- Design datasets for specific ML / AI tasks or applications: Building models involves multiple steps, each requiring different data. Rather than providing broad datasets that could be used for multiple steps, it’s more effective to develop targeted datasets e.g. instruction-response pairs for supervised fine-tuning of LLMs. The benefits are that:
- Datasets can be used more quickly; they need less cleaning and preparation
- Datasets are simpler to value; you only have to account for specific, not all applications
- Models built using a specific dataset are unlikely to replace all your existing solutions
- Partner with developers to uncover innovation ideas: AI and ML are still experimental, with developers continually researching solutions to various challenges. This is an opportunity for rightsholders to collaborate and solve these challenges. These solutions can become new innovations / products used by multiple companies and create lasting value. A simple example is an ontology for a less well researched field of science, developed by a data provider in that field.
- Name?
- Charge for the model use as well as training: Many licensing agreements focus on charging for model training. Since training is often a single (or infrequent) step, such agreements make it harder to create lasting value for rightsholders. However, datasets also support model use (inference), and charging for this means future growth in a model’s use is not at the expense of, but benefits rightsholders.
Relevance in applications outside science
While these approaches have been successful for scientific data, they apply equally well to creative content and other data types.
The history of ML demonstrates this clearly. Many of the ML methods used today were developed on what could be considered ‘creative’ content, before being applied to medical and scientific problems. Every computer science conference will have at least one researcher demonstrating new methods with video games!



