This module provides functionality to detect and replace various types of sensitive information in German text using named entity recognition, regular expressions, and the Faker library for generating replacement data.
This function identifies and replaces the following types of personally identifiable information (PII):
phonenumbers
libraryThe replacements are done in the following order:
text
strThe input text containing sensitive information to be anonymized.
The anonymized text with all detected PII replaced by fake data.
It includes functions to:
Dependencies: spacy: For German NLP processing nltk: For stopword removal and tokenization
This function processes input text through the following steps:
text
strThe input text to be cleaned
The cleaned text with stop words removed and in lowercase
Processes email content by:
email
strThe full email content including potential signature
The processed email body with signature elements removed