Medical diagnosis. High stakes. Real consequences. This represents one of Naive Bayes' most impactful applications. Doctors observe symptoms. They need disease probability. That's exactly what Bayes' theorem computes.
Medical Diagnosis System Implementation
import pandas as pd
from sklearn.naive_bayes import BernoulliNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
# Create medical diagnosis dataset
medical_data = {
'fever': [1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1],
'cough': [1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0],
'headache': [0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1],
'sore_throat': [1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1],
'diagnosis': ['flu', 'cold', 'migraine', 'flu', 'healthy', 'flu', 'cold',
'migraine', 'cold', 'flu', 'migraine', 'cold', 'flu', 'migraine', 'flu']
}
df = pd.DataFrame(medical_data)
print("Medical Diagnosis Dataset:")
print(df.head(10))
print()
# Prepare features and target
X = df[['fever', 'cough', 'headache', 'sore_throat']]
y = df['diagnosis']
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Train Bernoulli Naive Bayes (binary symptoms)
medical_nb = BernoulliNB()
medical_nb.fit(X_train, y_train)
# Test with new patients
new_patients = pd.DataFrame({
'fever': [1, 0, 1],
'cough': [1, 1, 0],
'headache': [0, 1, 1],
'sore_throat': [1, 0, 0]
})
# Get predictions with probabilities
predictions = medical_nb.predict(new_patients)
probabilities = medical_nb.predict_proba(new_patients)
classes = medical_nb.classes_
print("NEW PATIENT DIAGNOSES:")
print("=" * 40)
for i, (_, patient) in enumerate(new_patients.iterrows()):
print(f"\nPatient {i+1} Symptoms:")
symptoms = []
if patient['fever']: symptoms.append('fever')
if patient['cough']: symptoms.append('cough')
if patient['headache']: symptoms.append('headache')
if patient['sore_throat']: symptoms.append('sore throat')
print(f" Present: {', '.join(symptoms)}")
print(f" Most likely diagnosis: {predictions[i]}")
print(" Probability breakdown:")
for j, class_name in enumerate(classes):
prob = probabilities[i][j] * 100
print(f" {class_name}: {prob:.1f}%")
# Show feature importance (log probabilities)
print("\nSYMPTOM SIGNIFICANCE BY DISEASE:")
print("=" * 40)
feature_names = ['fever', 'cough', 'headache', 'sore_throat']
for class_idx, class_name in enumerate(classes):
print(f"\n{class_name.upper()}:")
class_log_probs = medical_nb.feature_log_prob_[class_idx]
for feat_idx, feature in enumerate(feature_names):
# Convert log prob to regular probability
prob = np.exp(class_log_probs[feat_idx])
print(f" {feature}: {prob:.3f}")
This medical system demonstrates uncertainty handling in diagnosis. No definitive answers. Just probability distributions. That's critical. Doctors need confidence scores, not binary decisions. They need to understand the likelihood of different conditions to make informed treatment choices.
Why It Works in Medicine: Symptoms act somewhat independently in many cases. Having fever doesn't necessarily increase your probability of having a headache beyond what the underlying disease predicts. The naive assumption? Reasonably accurate for many diagnostic scenarios. Not perfect. But good enough to provide valuable decision support.