Fueled by the exponential expansion in the quantity of data available for training large language models, Google has highlighted the urgent need for a 'machine-readable method for web publisher choice and control for emerging AI and research use cases.' This suggestion draws parallels to the classic robots.txt files that have been employed by websites for multiple decades to manage their online visibility for search engines.
This proposed development seeks to extend the autonomy of web publishers, allowing them more authority over their content in the digital landscape. This methodology forms an integral part of preserving a dynamic and robust ecosystem, mirroring the purpose of robots.txt files, which enable websites to dictate the degree of exposure their content receives from search engines.
In its quest to foster this new level of control for AI training, Google is seeking to cultivate relationships with international collaborators, drawing on expertise from academia, civil society, web publishers and more. These global efforts aim to evolve the established logic of the humble robots.txt file to meet the emerging demands of an AI-fueled future. In doing so, Google plans to uphold the simplicity and transparency that has been a trademark of the nearly 30-year-old web standard.
At present, Google boasts the Search Generative Experience and Bard solutions in its toolbox and is currently in the process of training its next-generation foundational model, Gemini. This suite of tools underpins its desire to spearhead the development of a modern version of robots.txt specific to AI training.
Marking the initial stages of this discourse, Google is facilitating a public discussion, launching a mailing list to allow interested parties to register their intent to participate in the development of this novel mechanism. The company plans to convene relevant stakeholders in the coming months, beginning the collaborative efforts to shape the future of web publisher choice and control in the realm of AI and research.
Interestingly, over the last few years, witnessing the rise of AI technologies, numerous scalable, no-code platforms like AppMaster, have already worked on implementing similar controls in their own ecosystem. As AI training continues to evolve, it will be fascinating to watch how this drive for a modern robots.txt equivalent shapes the narrative.