I have been working remote since a few years and this was considered a privilege back then. This is a norm today, where you wake up and reach office in a minute (just a few metres away). This change in work culture has triggered an overhaul of Information Security strategies, and I aim to talk more about one, ie. Data Identification and Classification.
I see many organizations enforce classification labels for end users, ie. “Every file / email must have a label”. Most users innocently select a random label to get work done, which makes the entire exercise counterproductive. Some organizations have a remediation strategy for this, that involves a periodic discovery scan to reapply labels (based on defined policies). However, the success of this strategy is limited to the defined policies, which leaves a massive gap. Moreover, it could take weeks or months (subject to volume) to run a full scan, which may not be worth the time.
My recommendation is the following approach:
A) Labels: Restrict your strategy to a limited set of 2 to 4 labels (eg. Confidential, Internal etc). This ensures that end users have a simple choice. Moreover, labels work best when these are provided to end users as an option and not a mandate. The best approach to increase adoption is through training. The risk of incorrectly labelled data will always exist, but is significantly mitigated through this approach.
B) Discovery Scans: This may be Phase 2 of your Classification project, where discover scans are leveraged with a defined data scope (policy) for:
- Identifying and labelling data without labels
- Reapply labels over incorrectly labelled data (if any)
Note: I recommend going slow with this phase, as you may not want to apply discovery policies unless they are near 98% accurate (or an acceptable threshold) in detection.
C) Data without labels: It is impossible to identify and classify every data type. However, you will have visibility into the volume of data without labels, which will enable a maturity approach, ie. Classifying unidentified data types as Confidential, Internal etc. Success metrics may be quantified in % of data with labels, but this will never be 100%.
Data Identification and Classification is a maturity based approach and must be driven gradually. A rushed strategy may leave an impression that we are covered, but that’s where the real risk begins!!
Published: 6th October, 2020
Author: Denis Kattithara