Open Source or Closed? The AI Dilemma

Open Source or Closed? The AI Dilemma

This post first appeared on The New Stack on July 29th, 2024.

Artificial Intelligence is in the middle of a perfect storm in the software industry, and now Mark Zuckerberg is calling for open-sourced AI. 

Three powerful perspectives are colliding on how to control AI: 

  1. All AI should be open-source for sharing and transparency.
  2. Keep AI closed-source and allow big tech companies to control it. 
  3. Establish regulations for the use of AI.

There are a few facts that make this debate tricky. First, if you have the source code for a model you know absolutely nothing about how the model will behave. Openness in AI requires far more than providing source code. Second, AI comes in a lot of different flavors and can be used to solve a broad range of problems. From traditional AI for fraud detection and targeted advertising to generative AI for creating chatbots that, on the surface, produce human-like results, pushing us closer and closer to the ultimate (and scary) goal of Artificially Generated Intelligence (AGI). Finally, the ideas listed above for controlling AI all have a proven track record for improving software in general. 

In this article, I will discuss:

  • The true nature of open-source and why the industry must redefine it for AI models.
  • Common arguments and logic flaws of idealists, who hyper-focus on a single use case.
  • The rights of innovators and the rights of the general public.
  • Thoughts on using the proper control on the right model.

Understanding the Different Perspectives

Before diving in, let’s discuss the different perspectives listed above in more detail.

Perspective #1 – All AI should be open-source for sharing and transparency: This comes from a push for transparency with AI.  Open source is a proven way to share and improve software. It provides complete transparency when used for conventional software. (In this article, I will use the term conventional software to refer to software unrelated to AI. For example, an operating system, a service, a reusable library, or a full application.) Open-source software has propelled the software industry forward by leaps and bounds. 

Perspective #2 – Keep AI closed-source and allow big tech companies to control it: Closed-source, or proprietary software, is the idea that an invention can be kept a secret, away from the competition, for the purpose of maximizing financial gain. To open-source idealists, this sounds completely evil; however, it is more of a philosophical choice rather than one that exists on the spectrum of good and evil. Most software is proprietary, and that is not inherently bad – it is the foundation of a competitive and healthy ecosystem. It is a fundamental right of any innovator who creates something new to choose the closed-source path. The question becomes, if you operate without transparency, what guarantees can there be around responsible AI?

Perspective #3 – Establish regulations for the use of AI: This comes from lawmakers and elected officials who are pushing for regulation. The basic idea is that if a public function or technology is so powerful that bad actors or irresponsible management could hurt the general public, then a government agency should be appointed to develop controls and enforce those controls. There is a school of thought that suggests that incumbent and current leaders in AI also want regulation, but for reasons that are less pure – they want to freeze the playing field with them in the lead. We will primarily focus on the public good area. 

The True Nature of Open Source

Before generative AI burst onto the scene, most software running in data centers was conventional software. If you have the source code for conventional software, you can determine exactly what it does. An engineer fluent in the appropriate programming language can review the code and determine its logic. You can even modify it and alter its behavior. Open source (or open source code) is another way of saying – I am going to provide everything needed to determine behavior and change behavior. In short, the true nature of open-source software is to provide everything you need to understand the behavior of the software and to change it.

Now, with AI models, if you have the source code for a model, you know absolutely nothing about how the model will behave. For a model to be fully open, you need the training data, the source code of the model, the hyperparameters used during training, and, of course, the trained model itself, which is composed of the billions (and soon trillions) of parameters that store the model’s knowledge – also known as parametric memory. Now, some organizations only provide the model, keep everything else to themselves, and claim the model is “open source.” This is a practice known as “open-washing” and is generally frowned upon by both the open and closed-source communities as disingenuous. I would like to see a new term used for AI models that are partially shared. Maybe “partially open model” or “model from an open washing company.”

There is one final rub when it comes to fully shared models. Let’s say an organization wants to do the right thing and shares everything about a model – the training data, the source code, the hyperparameters, and the trained model. Well, you still can’t determine exactly how it is going to behave unless you test it extensively. The parametric memory that determines behavior is not human-readable. Again, the industry needs a different term for fully open models. A term that is different from “open source,” which should only be used for non-AI software because the source code of a model does not help determine the behavior of the model. Perhaps “open model”.

Common Arguments

Let’s look at some common arguments you will find on the internet that endorse the use of only one of the perspectives previously described. These are passionate defenders of their perspective, but that passion can cloud judgment. 

Argument: Closed AI supporters claim that big tech companies have the means to guard against potential dangers and abuse. Therefore AI should be kept private and out of the open source community.

Rebuttal: Big tech companies do have the means to guard against potential abuse but that does not mean they will do it judiciously or even at all. Furthermore, this is not their primary objective. Their primary objective is making money for their shareholders – and that will always take precedence.

Argument: Those who think that AI could become a threat to humanity like to ask, “Would you open source the Manhattan Project?”

Rebuttal: This is clearly an argument for governance. However, it is an unfair and incorrect analogy. The purpose of the Manhattan Project was to build a bomb during wartime by using radioactive materials to produce nuclear fusion. Nuclear fusion is not a general-purpose technology that can be applied to different tasks. You can make a bomb, and you can generate power – that’s it. The ingredients and the results are pretty dangerous to the general public, so all aspects should be regulated. AI is much different. As described above, it comes in varying flavors with varying risks.

Argument: Proponents of open-sourcing AI say that open-source facilitates the sharing of science, provides transparency, and is a means to prevent a few from monopolizing a powerful technology.

Rebuttal: This is mostly true, but it is not entirely true. Open source does provide sharing. For an AI model, it is only going to provide some transparency. Finally, it is debatable whether “open models” will prevent a few from monopolizing their power. To run a model like ChatGPT at scale, you need compute that only a few companies are capable of acquiring.

Needs of the Many Outweigh the Needs of the Few

In “Star Trek II: The Wrath of Khan,” Spock dies from radiation poisoning. Spock realizes that the ship’s main engines must be repaired to facilitate an escape, but the engine room is flooded with lethal radiation. Despite the danger, Spock enters the radiation-filled chamber to make the necessary repairs. He successfully restores the warp drive, allowing the Enterprise to reach a safe distance. Unfortunately, Vulcans are not immune to radiation. His dying words to Captain Kirk explain the logic behind his actions, “The needs of the many outweigh the needs of the few or the one.”

This is perfectly sound logic, and it will have to be used to control AI. There are certain models that pose a risk to the general public. For these models, the needs of the general public outweigh the rights of innovators.

Should All AI be Open-source?

We are now ready to tie everything together and answer the question that is the title of this post. First, let’s review the axioms established thus far:

  • Open Source should remain a choice.
  • Open models are not as transparent as non-AI software that is open-sourced.
  • Close Source is a right of the innovator.
  • There is no guarantee that big tech will correctly control their AI.
  • The needs of the general public must take precedence over all others.

The five bullets above represent everything I tried to make clear about open source, closed source, and regulations. If you believe them to be true, the answer to the question, “Should All AI be Open-Source?” is no because it will not control AI, nor will closed source. Furthermore, in a fair world, open source and open models should remain a choice, and close source should remain a right.

We can go one step further and talk about the actions the industry can take as a whole to move toward effective control of AI:

  • Determine the types of models that pose a risk to the general public. Models with high risk, because they control information (chatbots) or dangerous resources (automated cars), should be regulated.
  • Organizations should be encouraged to share their models as fully open models. The open-source community will need to step up and either prevent or label models that are only partially shared. The open-source community should also put together tests that can be used to rate models. 
  • Closed models should still be allowed if they do not pose a risk to the general public. Big Tech should step up and develop its own set of controls and tests that it funds and shares. Perhaps this is a chance for Big Tech to work closely with the open-source community to solve a common problem. 

If you have any questions, be sure to reach out to us on Slack!