PeoplesSouth Bank AI Agent

High Level Proposal

PeoplesSouth Bank AI Agent for Level One IT Support

This is a high level proposal for PeoplesSouth Bank to implement an Agentic AI system which would offload their limited staff by handling level-one IT Support calls for their 350+ employees that are spread across 34 bank locations.

The intention of this proposal is to guide a follow up discussion to ensure that all parties are in sync concerning what this system will include and how it will be deployed. Coming to an agreement on these parameters will then enable us to provide a quote for this system.

Based on the data compliance requirements, this proposal assumes that the system will be on-premise and not rely on any cloud-based components.

On-Premises Deployment • No Cloud Dependencies

Recommended Hardware Configuration

Based on Unmute.sh specifications for optimal performance with separate GPU allocation. TTS latency improves from ~750ms (single GPU) to ~450ms (multi-GPU setup).

Speech-to-Text Server

GPU Requirements:

• NVIDIA L40S or equivalent
• Dedicated GPU allocation
• CUDA capability required
• 48GB VRAM recommended

driver: nvidia, count: 1

Text-to-Speech Server

GPU Requirements:

• NVIDIA L40S or equivalent
• Dedicated GPU allocation
• Optimized for voice synthesis
• 48GB VRAM recommended

Latency: ~450ms (vs 750ms single GPU)

VLLM Server

GPU Requirements:

• NVIDIA L40S or equivalent
• Dedicated GPU allocation
• 48GB+ VRAM for enterprise models
• Multi-GPU setup for 70B+ models

Recommended Models:

• Mistral Small 3.2 24B (default)
• gpt-oss-20b (OpenAI, alternative)
• Granite 3.1 8B (IBM, small and fast)

Storage: Local filesystem + HuggingFace cache

Complete System Specification

Primary Server Configuration:

GPU CountMinimum 3x NVIDIA L40S

Total VRAM144GB (48GB × 3)

CPU64+ cores (Intel Xeon/AMD EPYC)

System RAM256GB DDR5

Storage4-8TB NVMe SSD (models/logs/backups)

Model StorageLocal + HuggingFace Hub cache

LLM Configuration & Storage:

Default ModelMistral Small 3.2 24B

Alternative ModelsGPT-OSS-120B, Granite 3.1 8B

Model StorageLocal filesystem cache

VLLM FrameworkOpenAI-compatible API

Banking ComplianceAir-gapped deployment ready

Configuration Notes:

• Dedicated GPU allocation per service
• Independent processing workflows
• Hardware redundancy options
• Modular system architecture

• Isolated GPU workloads
• Enterprise hardware components
• Standard rack-mount deployment
• Expandable configuration design

Telephony Integration Options

Connection methods for integrating with PS Bank's existing phone infrastructure.

SIP Integration (Recommended)

Protocol Features:

• Session Initiation Protocol (SIP)
• Digital VoIP integration
• Existing PBX compatibility
• Cisco/Avaya system support

Implementation:

• SIP trunk configuration
• RTP media stream handling
• Call routing integration

FXO Integration (Legacy)

Hardware Features:

• Foreign Exchange Office (FXO)
• Analog phone line interface
• PSTN/POTS compatibility
• Direct line connection

Implementation:

• FXO card installation
• Analog-to-digital conversion
• Line seizure detection

Integration Requirements

SIP Integration Setup:

ProtocolSIP 2.0 (RFC 3261)

MediaRTP/SRTP (RFC 3550)

CodecsG.711, G.722, Opus

NetworkUDP/TCP ports 5060/5061

SecurityTLS encryption support

FXO Integration Setup:

InterfaceAnalog FXO cards

Channels4-24 ports per card

SignalingLoop start/Ground start

Audio8kHz/16kHz sampling

Line DetectionOn-hook/Off-hook signaling

Recommendation:

SIP integration is recommended for modern banking environments with existing VoIP infrastructure.FXO integration should be considered for legacy analog phone systems or as a backup connection method.

System Architecture

End-to-end on-premises voice AI system with dedicated GPU allocation and secure data flow

📞 Phone System Layer

SIP/VoIP

Digital PBX Integration

FXO/Analog

Legacy PSTN Lines

↓

Audio Stream

🖥️ On-Premises AI Processing Layer

STT Server

GPU 1

Unmute.sh

Speech Recognition

• L40S 48GB VRAM
• Real-time transcription
• Multi-language support

VLLM Server

GPU 2

LLM Engine

Intelligence Layer

• L40S 48GB VRAM
• GPT-OSS-120B / Mistral
• Context awareness

TTS Server

GPU 3

Unmute.sh

Voice Synthesis

• L40S 48GB VRAM
• ~450ms latency
• Natural voice output

Audio In

→

Processing

→

Audio Out

↓

Secure Local Network

💾 Storage Infrastructure

Primary Storage

4-8TB NVMe SSD
AI models, logs & backups

System Configuration

Settings & parameters
Voice prompts

🔒 On-Premises Deployment

🛡️

Local Network

No cloud calls

🔐

Data Privacy

On-site processing

⚙️

Full Control

Bank-managed

✓ On-Premises Processing

All AI processing happens locally. No external API calls or cloud dependencies.

✓ Dedicated GPU Resources

Separate GPUs for STT, LLM, and TTS ensure optimal performance.

✓ Modular Design

Independent services allow for easy maintenance and updates.

✓ Scalable Infrastructure

Hardware can be expanded to handle increased call volumes.

Implementation Phases

Phase 1: Technical Proof of Concept

Local server deployment with isolated testing environment

Technology Stack:

• Unmute.sh local STT + TTS
• Open source LLM hosting
• Internal call simulation

Success Metrics:

• Optimal endpoint latency
• High accuracy in responses
• Excellent system reliability
• Security audit compliance

Phase 2: Limited Production Deployment

Integration with existing phone systems in select locations

Implementation:

• Real calls via internal phone network
• Comprehensive monitoring setup
• Staff training and feedback loops
• Performance optimization

ROI Demonstration:

• Staff hours saved per week
• Call volume handled automatically
• Customer satisfaction scores
• Error rate and escalations

Phase 3: Enterprise Deployment

Full system deployment across all branch locations

Enterprise Features:

• Multi-location redundancy
• Real-time monitoring dashboard
• Automated backup systems
• Load balancing and failover

Business Impact:

• 24/7 customer support availability
• Consistent service across all branches
• Scalable capacity management
• Continuous improvement analytics

Technology Evaluation

Strategy	Security Compliance	Speed to Launch	Voice Quality	Cost Control
ElevenLabs (Cloud)	Off-Premise Data	Fastest	Excellent	Medium
Unmute.sh (On-Prem)	Perfect	Moderate	Very Good	High Control

Technical Assessment

On-premises deployment with Unmute.sh meets all security requirements while providing necessary voice processing capabilities for banking operations.