PeoplesSouth Bank AI Agent

High Level Proposal

PeoplesSouth Bank AI Agent for Level One IT Support

This is a high level proposal for PeoplesSouth Bank to implement an Agentic AI system which would offload their limited staff by handling level-one IT Support calls for their 350+ employees that are spread across 34 bank locations.

The intention of this proposal is to guide a follow up discussion to ensure that all parties are in sync concerning what this system will include and how it will be deployed. Coming to an agreement on these parameters will then enable us to provide a quote for this system.

Based on the data compliance requirements, this proposal assumes that the system will be on-premise and not rely on any cloud-based components.

On-Premises Deployment • No Cloud Dependencies

Recommended Hardware Configuration

Based on Unmute.sh specifications for optimal performance with separate GPU allocation. TTS latency improves from ~750ms (single GPU) to ~450ms (multi-GPU setup).

Speech-to-Text Server

GPU Requirements:
  • • NVIDIA L40S or equivalent
  • • Dedicated GPU allocation
  • • CUDA capability required
  • • 48GB VRAM recommended
driver: nvidia, count: 1

Text-to-Speech Server

GPU Requirements:
  • • NVIDIA L40S or equivalent
  • • Dedicated GPU allocation
  • • Optimized for voice synthesis
  • • 48GB VRAM recommended
Latency: ~450ms (vs 750ms single GPU)

VLLM Server

GPU Requirements:
  • • NVIDIA L40S or equivalent
  • • Dedicated GPU allocation
  • • 48GB+ VRAM for enterprise models
  • • Multi-GPU setup for 70B+ models
Recommended Models:
  • • Mistral Small 3.2 24B (default)
  • • gpt-oss-20b (OpenAI, alternative)
  • • Granite 3.1 8B (IBM, small and fast)
Storage: Local filesystem + HuggingFace cache

Complete System Specification

Primary Server Configuration:
GPU CountMinimum 3x NVIDIA L40S
Total VRAM144GB (48GB × 3)
CPU64+ cores (Intel Xeon/AMD EPYC)
System RAM256GB DDR5
Storage4-8TB NVMe SSD (models/logs/backups)
Model StorageLocal + HuggingFace Hub cache
LLM Configuration & Storage:
Default ModelMistral Small 3.2 24B
Alternative ModelsGPT-OSS-120B, Granite 3.1 8B
Model StorageLocal filesystem cache
VLLM FrameworkOpenAI-compatible API
Banking ComplianceAir-gapped deployment ready
Configuration Notes:
  • • Dedicated GPU allocation per service
  • • Independent processing workflows
  • • Hardware redundancy options
  • • Modular system architecture
  • • Isolated GPU workloads
  • • Enterprise hardware components
  • • Standard rack-mount deployment
  • • Expandable configuration design

Telephony Integration Options

Connection methods for integrating with PS Bank's existing phone infrastructure.

SIP Integration (Recommended)

Protocol Features:
  • • Session Initiation Protocol (SIP)
  • • Digital VoIP integration
  • • Existing PBX compatibility
  • • Cisco/Avaya system support
Implementation:
  • • SIP trunk configuration
  • • RTP media stream handling
  • • Call routing integration

FXO Integration (Legacy)

Hardware Features:
  • • Foreign Exchange Office (FXO)
  • • Analog phone line interface
  • • PSTN/POTS compatibility
  • • Direct line connection
Implementation:
  • • FXO card installation
  • • Analog-to-digital conversion
  • • Line seizure detection

Integration Requirements

SIP Integration Setup:
ProtocolSIP 2.0 (RFC 3261)
MediaRTP/SRTP (RFC 3550)
CodecsG.711, G.722, Opus
NetworkUDP/TCP ports 5060/5061
SecurityTLS encryption support
FXO Integration Setup:
InterfaceAnalog FXO cards
Channels4-24 ports per card
SignalingLoop start/Ground start
Audio8kHz/16kHz sampling
Line DetectionOn-hook/Off-hook signaling
Recommendation:

SIP integration is recommended for modern banking environments with existing VoIP infrastructure.FXO integration should be considered for legacy analog phone systems or as a backup connection method.

System Architecture

End-to-end on-premises voice AI system with dedicated GPU allocation and secure data flow

📞 Phone System Layer

SIP/VoIP
Digital PBX Integration
FXO/Analog
Legacy PSTN Lines
Audio Stream

🖥️ On-Premises AI Processing Layer

STT Server
GPU 1
Unmute.sh
Speech Recognition
• L40S 48GB VRAM
• Real-time transcription
• Multi-language support
VLLM Server
GPU 2
LLM Engine
Intelligence Layer
• L40S 48GB VRAM
• GPT-OSS-120B / Mistral
• Context awareness
TTS Server
GPU 3
Unmute.sh
Voice Synthesis
• L40S 48GB VRAM
• ~450ms latency
• Natural voice output
Audio In
Processing
Audio Out
Secure Local Network

💾 Storage Infrastructure

Primary Storage
4-8TB NVMe SSD
AI models, logs & backups
System Configuration
Settings & parameters
Voice prompts

🔒 On-Premises Deployment

🛡️
Local Network
No cloud calls
🔐
Data Privacy
On-site processing
⚙️
Full Control
Bank-managed
✓ On-Premises Processing

All AI processing happens locally. No external API calls or cloud dependencies.

✓ Dedicated GPU Resources

Separate GPUs for STT, LLM, and TTS ensure optimal performance.

✓ Modular Design

Independent services allow for easy maintenance and updates.

✓ Scalable Infrastructure

Hardware can be expanded to handle increased call volumes.

Implementation Phases

1

Phase 1: Technical Proof of Concept

Local server deployment with isolated testing environment

Technology Stack:
  • • Unmute.sh local STT + TTS
  • • Open source LLM hosting
  • • Internal call simulation
Success Metrics:
  • • Optimal endpoint latency
  • • High accuracy in responses
  • • Excellent system reliability
  • • Security audit compliance
2

Phase 2: Limited Production Deployment

Integration with existing phone systems in select locations

Implementation:
  • • Real calls via internal phone network
  • • Comprehensive monitoring setup
  • • Staff training and feedback loops
  • • Performance optimization
ROI Demonstration:
  • • Staff hours saved per week
  • • Call volume handled automatically
  • • Customer satisfaction scores
  • • Error rate and escalations
3

Phase 3: Enterprise Deployment

Full system deployment across all branch locations

Enterprise Features:
  • • Multi-location redundancy
  • • Real-time monitoring dashboard
  • • Automated backup systems
  • • Load balancing and failover
Business Impact:
  • • 24/7 customer support availability
  • • Consistent service across all branches
  • • Scalable capacity management
  • • Continuous improvement analytics

Technology Evaluation

StrategySecurity ComplianceSpeed to LaunchVoice QualityCost Control
ElevenLabs (Cloud)
Off-Premise Data
FastestExcellentMedium
Unmute.sh (On-Prem)
Perfect
ModerateVery GoodHigh Control

Technical Assessment

On-premises deployment with Unmute.sh meets all security requirements while providing necessary voice processing capabilities for banking operations.