How Should you Protect your Machine Learning Models and IP? Blog

How Should you Protect your Machine Learning Models and IP?

by 7wData
May 11, 2022

Over the last decade I’ve helped hundreds of product teams ship ML-based products, inside and outside of Google, and one of the most frequent questions I got was “How do I protect my models?”. This usually came from executives, and digging deeper it became clear they were most worried about competitors gaining an advantage from what we released. This worry is completely understandable, because modern machine learning has become essential for many applications so quickly that best practices haven’t had time to settle and spread. The answers are complex and depend to some extent on your exact threat models, but if you want a summary of the advice I usually give it boils down to:

To explain why I ended up with these conclusions, I’ll need to dive into some of the ways that malicious actors could potentially harm a company based on how ML materials are released. I’ve spent a lot of my time focused on edge deployments, but many of the points are applicable to cloud applications too.

The most concerning threat is frequently “Will releasing this make it easy for my main competitor to copy this new feature and hurt our differentiation in the market?”. If you haven’t spent time personally engineering ML features, you might think that releasing a model file, for example as part of a phone app, would make this easy, especially if it’s in a common format like a TensorFlow Lite flatbuffer. In practice, I recommend thinking about these model files like the binary executables that contain your application code. By releasing it you are making it possible to inspect the final result of your product engineering process, but trying to do anything useful with it is usually like trying to turn a hamburger back into a cow. Just as with executables you can disassemble them to get the overall structure, by loading them into a tool like Netron. You may be able to learn something about the model architecture, but just like disassembling machine code it won’t actually give you a lot of help reproducing the results. Knowing the model architecture is mildly useful, but most architectures are well known in the field anyway, and only differ from each other incrementally.

What about just copying the model file itself and using it in an application? That’s not as useful as you might think, for a lot of reasons. First off, it’s a clear Copyright violation, just like copying an executable, so it’s easy to spot and challenge legally. If you are still worried about this, you can take some simple steps like encrypting the model file in the app bundle and only unpacking it into memory when the app is running. This won’t stop a determined attacker, but it makes it harder. To help catch copycats, you can also add text strings into your files that say something like “Copyright Foo, Inc.”, or get more elaborate and modify your training data to add canaries, also more poetically called Mountweazels, by modifying your training data so that the model produces distinct and unlikely results in rare circumstances. For example, an image model could be trained so that a Starbucks logo always returns “Duck” as the prediction. Your application could ignore this result, but even if the attacker got clever and added small perturbations to the model weights to prevent obvious binary comparisons, the behavior would be likely to persist and prove that it was directly derived from the original.

Even if you don’t detect the copying, having a static model is not actually that useful. The world keeps changing, you’ll want to keep improving the model and adapting to new needs, and that’s very hard to do if all you have is the end result of training. It’s also unlikely that a competitor will have exactly the same requirements as you, whether it’s because of using different hardware or a user population that differs from yours.

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

How Should you Protect your Machine Learning Models and IP?

Leave a Reply Cancel reply

Upcoming Events

MarkLogic World | Amsterdam

Knowledge Graph — The Ultimate Center of Excellence

From Text to Value: Pairing Text Analytics and Generative AI

Bringing Data Closer to Decision Makers with Data Fabric

Categories

Tags

You Might Be Interested In

Device Technologies to Reduce Heart Failure Readmissions

The Data and AI Habits of Future-Ready Companies

Machine Learning Moves the Needle on Neural Science

Recent Jobs

IT Engineer

Data Engineer

Applications Developer

D365 Business Analyst

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

How Should you Protect your Machine Learning Models and IP?

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change