Proprietary Credit Risk Probability Map - System Overview
1. Introduction
This document provides an overview of the Evolved Credit Risk Analysis System, a proof-of-concept designed to offer a holistic and dynamic view of credit risk. It integrates quantitative models, qualitative data, knowledge graph context, and scenario analysis to provide a comprehensive assessment framework.
The system operates on a synthetic, file-based dataset for demonstration purposes. For full technical details, setup instructions, and in-depth explanations, please refer to the Full System README.
2. System Capabilities
Expanded Ontology & Rich Data Model: Defines core financial concepts using Pydantic models for structured data representation.
Integrated Knowledge Graph: Captures interconnections (subsidiaries, suppliers, guarantors) and provides contextual insights like network centrality.
Advanced Risk Models: Utilizes a Random Forest Classifier for Probability of Default (PD) and a Gradient Boosting Regressor for Loss Given Default (LGD), both incorporating enhanced feature engineering.
SHAP Explainability for PD Model: Offers insights into the drivers of PD model predictions.
Dynamic Pricing Model: Calculates loan interest rates based on PD, LGD, customer segments, and knowledge graph context.
Comprehensive Risk Mapping Service: Aggregates portfolio-wide risk, including qualitative scores and KG metrics, with summaries by sector and country.
MLOps Framework (PoC): Features automated model registration, loading of "production" models, prediction logging, data drift detection (numerical & categorical), and simulated performance monitoring.
Sophisticated Scenario Simulation: Allows for feature-level shocks to raw data, with PD/LGD re-calculated by models to assess stress impact realistically.
3. Walkthrough / How to Use
To get started with the system:
Ensure Python 3.9+ is installed and set up a virtual environment.
The Comprehensive Analysis Notebook is the primary interface for visualizing and interacting with the "Probability Map". It synthesizes data from all components to provide risk insights. Below are conceptual snapshots:
Illustrative Portfolio Risk/Return Landscape
This conceptual plot positions different asset classes (corporate loans, synthetic equities, synthetic commodities) in a risk-return space. Bubble size can represent exposure or market capitalization.
Example Peer Comparison
The notebook allows for deep dives into specific entities, including comparison against synthetic peers on key metrics.
Key Portfolio Data Snippet
The portfolio overview generated by the RiskMapService provides a rich table of data. Here's a small sample of what the structure looks like (actual data will vary based on the synthetic dataset):
loan_id
company_id
company_name
pd_estimate
lgd_estimate
expected_loss_usd
management_quality_score
kg_degree_centrality
LOAN7001
COMP001
Innovatech Solutions
0.0452
0.3875
87567.50
8
0.052
LOAN7003
COMP002
GreenBuild Corp
0.0911
0.5210
47463.10
7
0.030
LOAN7004
COMP003
HealthFirst Pharma
0.1530
0.6050
92565.00
9
0.040
Deep Dive Snapshot: Example Company Analysis
For a selected company (e.g., Innovatech Solutions - COMP001), a deep dive in the notebook might summarize key risk factors (actual values from a run):
Quantitative Scores: PD: 0.0452, LGD: 0.3875, Base EL: $87,567.50
Financial Health (Latest FS): D/E: 0.67, Current Ratio: 2.20. Overall: Good
PD Model Drivers (SHAP Top 3): debt_to_equity_ratio, company_age_at_origination, net_profit_margin.