Netra SDK provides advanced PII detection with multiple engines: To use the PII detection features provided by Netra SDK:
pip install 'netra-sdk[presidio]'
If you try to use the PII detection utility in Netra SDK without installing this dependency, you will receive an exception. So, always make sure to install the package if you plan to use this utility.
from netra.pii import get_default_detector

# Get default detector with custom settings
detector = get_default_detector(
    action_type="MASK",  # Options: "BLOCK", "FLAG", "MASK"
    entities=["EMAIL_ADDRESS", "PERSON", "PHONE_NUMBER"]
)

# Detect PII in text
text = "Contact John Doe at john.doe@example.com or call him at 555-123-4567"
result = detector.detect(text)

print(f"Has PII: {result.has_pii}")
print(f"Masked text: {result.masked_text}")
print(f"Entity mappings: {result.entities}")

Entity Anonymization Details

Netra SDK uses different anonymization mechanisms based on entity type:
  1. Email Addresses
    • Format-preserving anonymization
    • Maintains @ and . characters
    • Example: john.doe@example.comfank.fzr@hgbcfxa.zrl
  2. Other Entities
    • Hash-based anonymization with consistent hashing
    • Same entity always produces the same hash
    • Format: <ENTITY_TYPE_hash>
    • Examples:
      • Person: John Doe<PERSON_6cea57c2>
      • Phone: 555-123-4567<PHONE_NUMBER_2dcbf551>
    Important Note: The hashing mechanism is deterministic, meaning that the same entity will always produce the same hash value. This ensures consistency across multiple detections and allows for proper data correlation while maintaining privacy.
    • Format: <ENTITY_TYPE_hash>
    • Examples:
      • Person: John Doe<PERSON_6cea57c2>
      • Phone: 555-123-4567<PHONE_NUMBER_2dcbf551>

Example Results

Original text:
Contact John Doe at john.doe@example.com or call him at 555-123-4567
Anonymized text:
Contact <PERSON_6cea57c2> at fank.fzr@hgbcfxa.zrl or call him at 55<PHONE_NUMBER_2dcbf551>
Entity mappings:
  PHONE_NUMBER_2dcbf551 -> 5-123-4567
  fank.fzr@hgbcfxa.zrl -> john.doe@example.com
  PERSON_6cea57c2 -> John Doe

Verification Features

  • Emails maintain format (contains @ and .)
  • Hashes follow consistent naming convention
  • Entity types are preserved in anonymization
  • Consistent mapping between original and anonymized values

Presidio-based Detection

from netra.pii import PresidioPIIDetector

# Initialize detector with different action types
detector = PresidioPIIDetector(
    action_type="MASK",  # Options: "FLAG", "MASK", "BLOCK"
    score_threshold=0.8,
    entities=["EMAIL_ADDRESS", "PERSON", "PHONE_NUMBER"]
)

# Detect PII in text
text = "Contact John Doe at john.doe@example.com"
result = detector.detect(text)

print(f"Has PII: {result.has_pii}")
print(f"Masked text: {result.masked_text}")
print(f"Entity mappings: {result.entities}")

Regex-based Detection

from netra.pii import RegexPIIDetector
import re

# Custom patterns
custom_patterns = {
    "EMAIL": re.compile(r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"),
    "PHONE": re.compile(r"\b\d{3}[-.\s]?\d{3}[-.\s]?\d{4}\b"),
    "CUSTOM_ID": re.compile(r"ID-\d{6}")
}

detector = RegexPIIDetector(
    patterns=custom_patterns,
    action_type="MASK"
)

result = detector.detect("User ID-123456 email: user@test.com")

Chat Message PII Detection

from netra.pii import get_default_detector

# Get default detector with custom settings
detector = get_default_detector(
    action_type="MASK"  # Options: "BLOCK", "FLAG", "MASK"
)

# Works with chat message formats
chat_messages = [
    {"role": "user", "content": "My email is john@example.com"},
    {"role": "assistant", "content": "I'll help you with that."},
    {"role": "user", "content": "My phone is 555-123-4567"}
]

result = detector.detect(chat_messages)
print(f"Masked messages: {result.masked_text}")

Custom Models for PII Detection

The PresidioPIIDetector supports custom NLP models through the nlp_configuration parameter, allowing you to use specialized models for improved PII detection accuracy. You can configure custom spaCy, Stanza, or transformers models:

NLP Configuration Example

Follow this configuration structure to provide your custom models.
nlp_configuration = {
    "nlp_engine_name": "spacy|stanza|transformers",
    "models": [
        {
            "lang_code": "en",  # Language code
            "model_name": "model_identifier"  # Varies by engine type
        }
    ],
    "ner_model_configuration": {  # Optional, mainly for transformers
        # Additional configuration options
    }
}

Using Custom spaCy Models

from netra.pii import PresidioPIIDetector

# Configure custom spaCy model
spacy_config = {
    "nlp_engine_name": "spacy",
    "models": [{"lang_code": "en", "model_name": "en_core_web_lg"}]
}

detector = PresidioPIIDetector(
    nlp_configuration=spacy_config,
    action_type="MASK",
    score_threshold=0.8
)

text = "Dr. Sarah Wilson works at 123 Main St, New York"
result = detector.detect(text)
print(f"Detected entities: {result.pii_entities}")

Using Stanza Models

from netra.pii import PresidioPIIDetector

# Configure Stanza model
stanza_config = {
    "nlp_engine_name": "stanza",
    "models": [{"lang_code": "en", "model_name": "en"}]
}

detector = PresidioPIIDetector(
    nlp_configuration=stanza_config,
    action_type="FLAG"
)

text = "Contact Alice Smith at alice@company.com"
result = detector.detect(text)
print(f"PII detected: {result.has_pii}")

Using Transformers Models

For advanced NER capabilities, you can use transformer-based models:
from netra.pii import PresidioPIIDetector

# Configure transformers model with entity mapping
transformers_config = {
    "nlp_engine_name": "transformers",
    "models": [{
        "lang_code": "en",
        "model_name": {
            "spacy": "en_core_web_sm",
            "transformers": "dbmdz/bert-large-cased-finetuned-conll03-english"
        }
    }],
    "ner_model_configuration": {
        "labels_to_ignore": ["O"],
        "model_to_presidio_entity_mapping": {
            "PER": "PERSON",
            "LOC": "LOCATION",
            "ORG": "ORGANIZATION",
            "MISC": "MISC"
        },
        "low_confidence_score_multiplier": 0.4,
        "low_score_entity_names": ["ORG"]
    }
}

detector = PresidioPIIDetector(
    nlp_configuration=transformers_config,
    action_type="MASK"
)

text = "Microsoft Corporation is located in Redmond, Washington"
result = detector.detect(text)
print(f"Masked text: {result.masked_text}")