Corporate Code Leaks in ChatGPT: The Samsung Dilemma

Back in April 2023, Samsung made the news after releasing an internal memo to their engineers banning the use of ChatGPT and other chatbots for any and all company processes. This happened after some engineers shared bits and pieces of their internal source code and hardware specifications with ChatGPT, leaking it to the world.

Samsung is just one of many companies that are restricting how their employees interact with large language models (LLMs). JP Morgan and most US banks have taken a similar measure, posing the question of legality (or lack thereof) as the basis for imposing restrictions or outright banning chatbots.

These prohibitions come amidst worries surrounding the replacement of human workers with AI; for example, IBM recently announced that they may replace 7,800 jobs in the next five years with AI-powered assistants. On one hand, we are quickly embracing AI as a new promising and disruptive technology, notably through AI development services, and on the other, we are limiting its access to employees. No, it’s not a paradox; it’s the natural outcome of living in an age of digital disruption. Quick changes demand quick adaptation and on-the-spot decision-making as we make our way in the never-ending storm of progress.

Concerns about Security

Let’s get one thing out of the way: Samsung, JP Morgan, and other companies aren’t wrong in being cautious. I, for one, am cautious. But, I’m also a huge proponent of AI, and my “relationship” with ChatGPT has significantly increased my productivity.

Like our team of 4,500+ software engineers, developers from across the globe have reported that their ability to produce reliable code has increased and GitHub reports that their productivity increases by nearly 56% if they use tools like ChatGPT or Github’s Copilot. And trust me, no company on the face of the earth wants to restrict access to software that costs relatively little and gives developers superpowers.

ChatGPT Bugs: But just how safe are these services? For example, a well-publicized bug of the OpenAI platform leaked credit card information from premium users. It’s baffling that an open-source library that OpenAI used for ChatGPT caused a bug that let users see other people’s chat history and their personal information, including payment info.

The multibillion-dollar company blamed some anonymous developer for their slip-up instead of admitting they added libraries to their project without checking the source code first. But that’s an issue for another article.

ChatGPT Censorship: That’s just the tip of the iceberg! Italy was the first country to outright censor ChatGPT amidst concerns about the source of the training data as well as concerns regarding security—concerns which have led the European Union to ask for more transparency from the company as well as stricter security measures. In response, OpenAI threatened to leave Europe if they couldn’t comply with the new regulations. OpenAI CEO Sam Altman later downplayed and backtracked the threat by saying he “has no plans to leave” Europe after meeting with European leaders concerning the proposed EU AI Act.

How safe are these chatbots? For ChatGPT, we simply don’t know. The same could be said for Bard, Huggingface’s chatbot, and any other of the thousands of implementations on the market.

For example, OpenAI’s business model allows other developers to make API calls to most of their transformer models to power their own questionable apps. Take for instance, Replika and/or Soulmate, AI companions that act as friends, relatives, or lovers for their users.

ChatGPT API Flaws: Now, take an entrepreneurial young computer scientist called Xtekky, add a Git repo called GPT4all, and watch the world burn. The young developer reverse-engineered some of the most popular web pages that are using OpenAI models, so anyone can make API calls to OpenAI. And the developer even went as far as training other models using data from these API calls, something that OpenAI has banned.

Legalities aside, Xtekky revealed one of the big issues with AI: the rapid adoption of the technology has led to many startups and companies hastily implementing these tools in their products with few safeguards. API access is about as dangerous as cyberattacks can get. These companies are footing the bill for other users that are talking with the GPT models outside their ecosystems.

ChatGPT Vulnerability: So, you may think your security concerns are over if you just stick to the source and work with OpenAI directly, right? Not so fast. Let me quote a couple of excerpts from OpenAI’s FAQ page:

“Who can view my conversations?

As part of our commitment to safe and responsible AI, we review conversations to improve our systems and to ensure the content complies with our policies and safety requirements.”

“Will you use my conversations for training?

Yes. Your conversations may be reviewed by our AI trainers to improve our systems.”

“Can you delete specific prompts?

No, we are not able to delete specific prompts from your history. Please don’t share any sensitive information in your conversations.”

So, according to their terms of service, OpenAI has the right to review your conversations and use your data to train their models, but you can’t delete a specific prompt. So, should we trust that they will use our information responsibly? Does OpenAI have complete control on who reads the data they are feeding into the models?

Well, according to a TIME magazine expose, OpenAI outsourced workers from low-income countries to parse through their datasets before training GPT-3. Do we know for sure that they won’t do it again? We don’t know. Yet like Twitter, there’s a good chance random foreigners will pour over prompts, messages, information from private companies, code sources, and other sensitive information to identify criminal activity and discard toxic, fake, and politically incorrect language.

Now, let’s get one thing straight: You can criticize OpenAI all you want, but they are not wrong here. So far, they have been crystal clear on what ChatGPT does and how they use the data gathered from the conversations; it’s literally the first thing they tell you when you start the app for the first time.

There you have it. So, for the sake of security, should you prohibit employees from using large language models? Strap on because this is going to get even more interesting.

The Prohibition Era Again?

The world of education, especially the role of AI in education, is even more distraught with its many implications. It’s often said colleges are where innovation goes to die. For the most part, some of the smartest people on the planet work at academic institutions, but the weight of their bureaucracy makes it almost impossible to compete with tech companies and private labs.

Many of the older professors have very deep-seated biases against technology, and they feel that using these apps is akin to cheating. Yet, there are only two possibilities here: Either we help students learn how to use and implement ChatGPT apps, or blindly let them use the apps behind our backs. The fact is, Generative AI is here, and there is no way to stop it. The only thing we can do is learn how to live with it responsibly and ethically.

Early in 2023, 51% of teacher respondents admitted to using AI in their daily activities to prepare for their classes, and most of them encouraged their students to do the same. Over 80% of these respondents believed that AI has had a favorable and significant impact on their routines.

Going back to our original point, most developers will not stop using Generative AI tools. That’s just not going to happen. Ban them in their workplace, and they will use their phones; ban the IP, and they will do it on their private computers from home. Company policy aside, why would they spend days banging their heads against a code problem when the answer could be at the tip of their fingertips? It’s so much faster than sharing code via Stack Overflow.

The more restrictions you create, the more back-alley solutions you foster. It’s human nature, and no historical period is more fitting an example than the prohibition era. For 13 years, the U.S. banned all forms of alcohol, which led to the rapid surge of underground breweries and bars, corruption, and the rise of a huge crime ring that dealt with alcohol. Banning it might have solved a problem, but it created so many more.

So, ask yourself: Would you rather let your employee use ChatGPT while you supervise them, or would you rather they’d do it behind your back?

The answer is difficult, but the trick is to find a balance between security and access. The fact that private source code was published online is a big no-no, but that’s not just a ChatGPT problem, it’s one of many AI problems. The same outrage would have happened if the engineers leaked the information while asking a question on Stack Overflow. In other words, I believe that the issue in this case was the lack of a firm AI corporate culture.

AI Culture

Don Ihde and Bruno Latour are two names you probably haven’t heard of, but who we should pay more attention to. Both are philosophers, both have tackled the issue of technology, and both have a deep respect and love for technology and science. What sets them apart from other academics is that they have a rather positive take on technology and how it interacts with human beings.

While we don’t have time to go over their corpus, let’s focus on the active role that technology has in human experience. To both of these researches, technology isn’t just a tool or a passive object that acts as an extension of our will, it’s an “active agent” that has a profound impact on our capabilities and the ways we engage with the world.

Take, for example, modern calculators. Today, we have spreadsheets and powerful apps that run Python code, enabling us to solve math problems in seconds.

Ihde and Latour both argue that critical thinking is not a product of the human brain but rather a function of the interaction between mind and technology, just like improving a person’s sight is the function of their eyes and glasses working together.

What does this have to do with ChatGPT? A lot, actually.

Technology is not going to stop, and forcing our developers to actively stay away from modern solutions is like asking a NASA scientist to solve equations with a pocket calculator. In contrast, we have to accept that modern developers are inextricably intertwined with AI.

The question is: How can we embrace the productivity of Generative AI while promoting a security-first policy? The answer: It’s a matter of culture.

As AI becomes more mainstream, we need to foster a new culture in our businesses, one that incorporates the technology responsibly at all levels of our organizations. Let’s take the Samsung case to analyze how things could have played out differently.

First, on the developer side, it’s clear that the engineers weren’t aware of just how vulnerable ChatGPT is in terms of data privacy. I’m certain that the engineers involved would have known that sharing the same information on an open forum like Stack Overflow or on an open Git repo is a terrible idea. Thus, the implementation of these tools requires understanding the terms of service and how these companies use the data they gather.

Here is a simple yet elegant solution right out of the OpenAI webpage:

“Starting on March 1, 2023, we are making two changes to our data usage and retention policies:

OpenAI will not use data submitted by customers via our API to train or improve our models unless you explicitly decide to share your data with us for this purpose. You can opt-in to share data.
Any data sent through the API will be retained for abuse and misuse monitoring purposes for a maximum of 30 days, after which it will be deleted (unless otherwise required by law).”

It’s as simple as using the API to make calls to the model instead of using ChatGPT. Is that a perfect solution? Of course not, but at least you have a legally binding agreement that OpenAI is not going to share or keep your data unless you want them to. But what if you are absolutely against the idea of sending any kind of private information to an open LLM?

Well, in that case, you should build your own private model? What was a titanic task a few months ago is about to get a lot easier with services like Amazon Bedrock that provide foundational models for companies to build their own AI-based applications. That way, you can have an application in-house that your engineers can use to bolster their productivity in a private, safe, and secure environment, breaking down any AI myths.

It’s extremely important that you familiarize yourself with AI technologies, including AI programming languages, and that, out of your research and understanding, you build an AI strategy that fits your business. Plus, you should promote a culture of understanding and accountability regarding AI and its applications within your organization and let that guide your strategic decision-making.

Conclusion

The idea of developers as a synthesis of mind and AI does not have to be alarming or threatening. Instead, it can be viewed as a progression of productivity and our capacity for critical thinking and problem-solving. However, this new reality brings with it a heightened responsibility to prioritize security and privacy. The transformation necessitates a proactive, informed approach, which is only possible through the fostering of a deep-seated culture of learning and understanding the capabilities and implications of LLMs and Generative AI.

In this brave new world, where developers are using AI as a thought partner, the premium will be on education, comprehension, and strategic foresight. Adopting a security-first policy is non-negotiable. It is not a matter of halting or regressing technological progress but of embracing it responsibly. By fostering a culture of learning and understanding AI, we can ensure that we are using this technology safely and effectively while harnessing its tremendous potential for innovation and advancement.

Ultimately, the goal is to integrate AI technologies into our businesses in a manner that is secure, beneficial, and ethical. This involves marrying an unwavering commitment to data security and privacy with a keen appreciation of the potential of AI and human morality. A balanced, informed approach to the integration of AI will be the key to thriving in a world where our minds and machines are inextricably linked.

If you enjoyed this, be sure to check out our other AI articles.