Gastbeitrag von France Labs
Datafari, as an Enterprise Search solution, has an overall visibility over all of the knowledge bases of an organization. As such, it is a good entry point to check where PII (Personally Identifiable Information) are stored.
Indeed, as part of the GDPR requirements, any organization must maintain a list of where PII data are stored. But as soon as the knowledge base grows too much, it is impossible to manually maintain such a list. Distributing this task over the different departments of the organization is a good start, but it has its limits, for instance due to the possible misinterpretation from colleagues about what PII are.
This is where Enterprise Search solutions come in handy: because they go through all of the internal documents and data, it is simple to add detection mechanisms to automate the generation of a list of documents that are potential candidates as PII holders.
Such a feature is feasible for free, using the open source version of Datafari, aka Datafari Community Edition. We presented during the Open Source Experience event in Paris end of 2023, a demo and a walkthrough on how to set it up. Thanks for this tutorial, you can have an end-to-end systems that detects regular expressions (think phone numbers, social security card numbers etc) as well as entities via Machine Learning (people names, organizations for instance) using a dedicated Spacy server leveraging the Transformers models. You can now do it yourself following this link that details the necessary steps: using Datafari for GDPR PII inventory.