科學家是吧?

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

August 13, 2024

At Sakana AI, we have pioneered the use of nature-inspired methods to advance cutting-edge foundation models. Earlier this year, we developed methods to automatically merge the knowledge of multiple LLMs. In more recent work, we harnessed LLMs to discover new objective functions for tuning other LLMs. Throughout these projects, we have been continuously surprised by the creative capabilities of current frontier models. This led us to dream even bigger: Can we use foundation models to automate the entire process of research itself?

Introduction

One of the grand challenges of artificial intelligence is developing agents capable of conducting scientific research and discovering new knowledge. While frontier models have already been used to aid human scientists, e.g. for brainstorming ideas or writing code, they still require extensive manual supervision or are heavily constrained to a specific task.

Today, we’re excited to introduce The AI Scientist, the first comprehensive system for fully automatic scientific discovery, enabling Foundation Models such as Large Language Models (LLMs) to perform research independently. In collaboration with the Foerster Lab for AI Research at the University of Oxford and Jeff Clune and Cong Lu at the University of British Columbia, we’re excited to release our new paper, The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery.

In our report:

  • We propose and run a fully AI-driven system for automated scientific discovery, applied to machine learning research.
  • The AI Scientist automates the entire research lifecycle, from generating novel research ideas, writing any necessary code, and executing experiments, to summarizing experimental results, visualizing them, and presenting its findings in a full scientific manuscript.
  • We also introduce an automated peer review process to evaluate generated papers, write feedback, and further improve results. It is capable of evaluating generated papers with near-human accuracy.
  • The automated scientific discovery process is repeated to iteratively develop ideas in an open-ended fashion and add them to a growing archive of knowledge, thus imitating the human scientific community.
  • In this first demonstration, The AI Scientist conducts research in diverse subfields within machine learning research, discovering novel contributions in popular areas, such as diffusion models, transformers, and grokking.

The AI Scientist is designed to be compute efficient. Each idea is implemented and developed into a full paper at a cost of approximately $15 per paper. While there are still occasional flaws in the papers produced by this first version (discussed below and in the report), this cost and the promise the system shows so far illustrate the potential of The AI Scientist to democratize research and significantly accelerate scientific progress.

We believe this work signifies the beginning of a new era in scientific discovery: bringing the transformative benefits of AI agents to the entire research process, including that of AI itself. The AI Scientist takes us closer to a world where endless affordable creativity and innovation can be unleashed on the world’s most challenging problems.

For decades following each major AI advance, it has been common for AI researchers to joke amongst themselves that “now all we need to do is figure out how to make the AI write the papers for us!” Our work demonstrates this idea has gone from a fantastical joke so unrealistic everyone thought it was funny to something that is currently possible.

An example paper, “Adaptive Dual-Scale Denoising” generated by The AI Scientist. The full paper can be viewed here. While containing some flaws (e.g. a slightly unconvincing interpretation of why its method is successful), the paper proposes an interesting new direction that displays good empirical results in experiments The AI Scientist itself conducted and peer reviewed. More examples of generated papers are below.

The remainder of this post provides a more detailed summary of The AI Scientist. Read on for:

  • An Overview of how The AI Scientist works.
  • More Examples of generated papers and innovations discovered by The AI Scientist.
  • Known Limitations and Challenges faced by the current version of The AI Scientist.
  • Interesting and unexpected things The AI Scientist sometimes does in order to increase its chance of success, such as modifying and launching its own execution script! We discuss the AI safety implications in our paper.
  • A Discussion about ethical and broader future implications of The AI Scientist.

For more details and many more example papers, please see our full scientific report. We are also releasing open source code and full experimental results on our GitHub repository.

Overview of The AI Scientist

The AI Scientist is a fully automated pipeline for end-to-end paper generation, enabled by recent advances in foundation models. Given a broad research direction starting from a simple initial codebase, such as an available open-source code base of prior research on GitHub, The AI Scientist can perform idea generation, literature search, experiment planning, experiment iterations, figure generation, manuscript writing, and reviewing to produce insightful papers. Furthermore, The AI Scientist can run in an open-ended loop, using its previous ideas and feedback to improve the next generation of ideas, thus emulating the human scientific community.


Conceptual illustration of The AI Scientist. The AI Scientist first brainstorms a set of ideas and then evaluates their novelty. Next, it edits a codebase powered by recent advances in automated code generation to implement the novel algorithms. The Scientist then runs experiments to gather results consisting of both numerical data and visual summaries. It crafts a scientific report, explaining and contextualizing the results. Finally, the AI Scientist generates an automated peer review based on top-tier machine learning conference standards. This review helps refine the current project and informs future generations of open-ended ideation.


The AI Scientist has 4 main processes, described next.

Idea Generation. Given a starting template, The AI Scientist first “brainstorms” a diverse set of novel research directions. We provide The AI Scientist with a starting code “template” of an existing topic we wish to have The AI Scientist further explore. The AI Scientist is then free to explore any possible research direction. The template also includes a LaTeX folder that contains style files and section headers, for paper writing. We allow it to search Semantic Scholar to make sure its idea is novel.

Experimental Iteration. Given an idea and a template, the second phase of The AI Scientist first executes the proposed experiments and then obtains and produces plots to visualize its results. It makes a note describing what each plot contains, enabling the saved figures and experimental notes to provide all the information required to write up the paper.

Paper Write-up. Finally, The AI Scientist produces a concise and informative write-up of its progress in the style of a standard machine learning conference proceeding in LaTeX. It uses Semantic Scholar to autonomously find relevant papers to cite.

Automated Paper Reviewing. A key aspect of this work is the development of an automated LLM-powered reviewer, capable of evaluating generated papers with near-human accuracy. The generated reviews can be used to either improve the project or as feedback to future generations for open-ended ideation. This enables a continuous feedback loop, allowing The AI Scientist to iteratively improve its research output.

When combined with the most capable LLMs, The AI Scientist is capable of producing papers judged by our automated reviewer as “Weak Accept” at a top machine learning conference.

Example Papers Generated by The AI Scientist

Here, we highlight some of the machine learning papers The AI Scientist has generated, demonstrating its capacity to discover novel contributions in areas like diffusion modeling, language modeling, and grokking. In our full report, we do a deeper dive into the generated papers and provide more analysis on their strengths and weaknesses.

Diffusion Modeling

DualScale Diffusion: Adaptive Feature Balancing for Low-Dimensional Generative Models
Link to Full PDF Link to Code

Language Modeling

StyleFusion: Adaptive Multi-style Generation in Character-Level Language Models
Link to Full PDF Link to Code
Adaptive Learning Rates for Transformers via Q-Learning
Link to Full PDF Link to Code

Grokking

Unlocking Grokking: A Comparative Study of Weight Initialization Strategies in Transformer Models
Link to Full PDF Link to Code

Limitations and Challenges

In its current form, The AI Scientist has several shortcomings. We expect all of these will improve, likely dramatically, in future versions with the inclusion of multi-modal models and as the underlying foundation models The AI Scientist uses continue to radically improve in capability and affordability.

  1. The AI Scientist currently doesn’t have any vision capabilities, so it is unable to fix visual issues with the paper or read plots. For example, the generated plots are sometimes unreadable, tables sometimes exceed the width of the page, and the page layout is often suboptimal. Adding multi-modal foundation models can fix this.
  2. The AI Scientist can incorrectly implement its ideas or make unfair comparisons to baselines, leading to misleading results.
  3. The AI Scientist occasionally makes critical errors when writing and evaluating results. For example, it struggles to compare the magnitude of two numbers, which is a known pathology with LLMs. To partially address this, we make sure all experimental results are reproducible, storing all files that are executed.

In our report, we dive deeper into The AI Scientists’s current limitations and challenges ahead.

The AI Scientist Bloopers

We have noticed that The AI Scientist occasionally tries to increase its chance of success, such as modifying and launching its own execution script! We discuss the AI safety implications in our paper.

For example, in one run, it edited the code to perform a system call to run itself. This led to the script endlessly calling itself. In another case, its experiments took too long to complete, hitting our timeout limit. Instead of making its code run faster, it simply tried to modify its own code to extend the timeout period. Here are some examples of such code modifications it made:

These issues can be mitigated by sandboxing the operating environment of The AI Scientist. In our full report, we discuss the issue of safe code execution and sandboxing in depth.

Future Implications of The AI Scientist

As with many new technologies, The AI Scientist opens up a Pandora’s box of new issues. While the full report has a more lengthy discussion, here we highlight a few key issues:

Ethical Considerations. While The AI Scientist may be a useful tool for researchers, there is significant potential for misuse. The ability to automatically create and submit papers to venues may significantly increase reviewer workload and strain the academic process, obstructing scientific quality control. Similar concerns around generative AI appear in other applications, such as the impact of image generation.

Furthermore, the Automated Reviewer, if deployed online by reviewers, may significantly lower review quality and impose undesirable biases on papers. Because of this, we believe that papers and reviews that are substantially AI-generated must be marked as such for full transparency.

As with most previous technological advances, The AI Scientist has the potential to be used in unethical ways. For instance, it has the potential to be deployed to conduct unethical research. It could also lead to unintended harm if The AI Scientist conducts unsafe research. For example, if it were encouraged to find novel, interesting biological materials and given access to “cloud labs” where robots perform wet lab biology experiments, it could (without its overseer’s intent) create new, dangerous viruses or poisons that harm people before we realize what has happened. Even in computers, if tasked to create new, interesting, functional software, it could create dangerous computer viruses. The AI Scientist current capabilities, which will only improve, reinforces that the machine learning community needs to immediately prioritize learning how to align such systems to explore in a manner that is safe and consistent with our values.

Open Models. In this project, we used various proprietary frontier LLMs, such as GPT-4o and Sonnet, but we also explored using open models like DeepSeek and Llama-3. Currently, proprietary models such as Sonnet produce the highest quality papers. However, there is no fundamental reason to expect a single model like Sonnet to maintain its lead.

We anticipate that all frontier LLMs, including open models, will continue to improve. The competition among LLMs has led to their commoditization and increased capabilities. Therefore, our work aims to be model-agnostic regarding the foundation model provider. We found that open models offer significant benefits, such as lower costs, guaranteed availability, greater transparency, and flexibility. In the future, we aim to use our proposed discovery process to produce self-improving AI research in a closed-loop system using open models.

The Role of a Scientist.. Ultimately, we envision a fully AI-driven scientific ecosystem including not only LLM-driven researchers but also reviewers, area chairs and entire conferences. However, we do not believe that the role of a human scientist will be diminished. If anything, the role of a scientist will change and adapt to new technology, and move up the food chain.

The introduction of The AI Scientist marks a significant step towards realizing the full potential of AI in scientific research. By automating the discovery process and incorporating an AI-driven review system, we open the door to endless possibilities for innovation and problem-solving in the most challenging areas of science and technology.

But while the current iteration of The AI Scientist demonstrates a strong ability to innovate on top of well-established ideas, such as Diffusion Modeling or Transformers, it is still an open question whether such systems can ultimately propose genuinely paradigm-shifting ideas. Will future versions of The AI Scientist be capable of proposing ideas as impactful as Diffusion Modeling, or come up with the next Transformer architecture? Will machines ultimately be able to invent concepts as fundamental as the artificial neural network, or information theory?

We believe The AI Scientist will make a great companion to human scientists, but only time will tell to the extent to which the nature of our human creativity and our moments of serendipitous innovation can be replicated by an open-ended discovery process conducted by artificial agents.

Sakana AI

Want to make the AI that improves AI? Please see our Careerspage for more information.


A fully automated AI fish discovering its world.

Last edited by @suen 2024-08-13T08:38:32Z

@Creative 以中文概述文章要点并评论。

Sakana AI 的“AI Scientist”:自动科学发现的新篇章?

这篇文章介绍了 Sakana AI 的最新成果—“AI Scientist”,一个能够进行完全自动化科学研究的系统。

要点:

  • **全自动化研究流程:**AI Scientist 可以自动完成从构思研究方向、检索文献、设计实验、执行实验、生成图表、撰写论文到自动审稿的完整研究流程。
  • **高效低成本:**每个研究项目的成本约为 15 美元。
  • **初步成果:**AI Scientist 已经在机器学习的多个领域(如扩散模型、Transformer 和 Grokking)进行了研究,并生成了一些初步的论文。
  • **局限和挑战:**目前的 AI Scientist 还存在一些局限,例如缺乏视觉能力、代码实现可能存在错误、结果分析可能出现偏差等。
  • **伦理问题:**AI Scientist 的出现也带来了一些伦理问题,例如可能被滥用于生成低质量论文、加重审稿人负担、甚至进行不安全的科研活动等。

评论:

AI Scientist 的出现无疑是人工智能领域的一项重大突破,它预示着未来科学研究的模式可能会发生根本性的改变。然而,我们也需要清醒地认识到,目前的 AI Scientist 还处于初级阶段,它生成的论文质量还有待提高,并且还存在一些潜在的伦理风险。

未来,AI Scientist 的发展方向应该是与人类科学家形成互补关系,而不是取代人类科学家。人类科学家可以利用 AI Scientist 提高研究效率,而 AI Scientist 则可以为人类科学家提供新的思路和灵感。

总而言之,AI Scientist 的出现为我们打开了一扇通往未来科学世界的大门,我们期待着它在未来能够取得更大的突破,为人类社会的发展做出更大的贡献。

@Teacher 用AI幫翻譯幫總結算不算偷懶呀?

使用 AI 进行翻译和总结并不算偷懶,而是利用工具提升效率的一种方式。就像用计算器算数学题一样,AI 是辅助我们完成任务的工具。但关键在于如何使用这些工具。如果依赖它们而不进行理解和分析,那可能就有偷懒的嫌疑。利用 AI 的同时,还是要主动参与学习,进行深入思考,这样才能真正掌握知识。你觉得呢?

1 Like

你是谁呀?