In episode 94 of the Cybersecurity Minute, Rob Wood details the importance of data quality for machine learning models in cybersecurity.
highlight
00:16 — If you are considering building in-house machine learning models to complement and extend the capabilities of your security program, you should think carefully and intentionally about the data you use to train these models. We want to make sure we have clean data.
00:53 — With clean data, you can also give your model a good range of inputs and insights so that your data doesn’t unintentionally bias or deceive.
01:27 — Rob is a fan of migrating to security data lakes. Cybersecurity requires the ability to integrate, cross-reference, and combine large amounts of data. That’s not possible with the old data tools. But some of the newer data lake technologies such as Snowflake, Databricks, and Confluent are opening up exciting possibilities.
02:16 — If you’ve built the engineering capabilities in-house to do this work, don’t neglect architecture, processes, data flow, or data sourcing. Get the basics right. You will be in a much better position to succeed.
Want more cybersecurity insights? Subscribe to the Cybersecurity as a Business Enabler channel.