About UnifyCX
UnifyCX is pioneering AI-driven customer experience at scale, blending cutting-edge research with human-centric design. Join our AI Engineering team to advance next-generation conversational services by integrating and optimizing foundation models across text, speech, and multimodal channels.
About the Role
As a Sr Research Engineer – Audio & Language, you will lead prototyping, research and productionalization of advanced ML models—tuning state-of-the-art foundation models from leading providers/open source, optimizing inference for low-latency CX delivery, and adapting models for dialect, speaker identification, prosody, and noise robustness. You will work closely with product, UX, infrastructure, and MLOps teams to embed these capabilities into our multiplexed multimodal stack. Your average day might include:
- Multimodal CX Integration
Integrate tuned audio/text models into a multiplexed stack with vision, RAG, and dialogue-management components. Build data orchestration pipelines that unify structured, unstructured and other data modalities. Define industry-standard evaluation metrics for speech quality, intelligibility, and user satisfaction.
- Audio Model Adaptation & Tuning
Partner with leading model providers to fine-tune and adapt foundation models for TTS, ASR, and STS. Develop post-training techniques for speaker adaptation, prosody control, and noise robustness.
- Inference Optimization
Continuously profile and benchmark cloud and edge deployments, build custom evaluation frameworks. Architect optimized inference pipelines (quantization, pruning, caching) to meet low-latency, high-throughput SLAs.
- Research & Thought Leadership
Publish breakthroughs at top conferences and evangelize best practices internally and across industry and academia.
Required Qualifications
- Masters in in Computer Science, Machine Learning, EE, Computational Linguistics, or related discipline, with a proven research portfolio.
- Deep theoretical foundations in ML (optimization, probabilistic modeling, representation learning).
- Hands-on expertise fine-tuning and adapting neural generative models for audio (Transformers, diffusion, GANs) and/or text (LLMs).
- Proficiency in Python and deep-learning frameworks (PyTorch, TensorFlow).
Preferred Qualifications
- Experience productionizing ML models in customer-facing applications, with end-to-end MLOps (CI/CD, monitoring, rollback).
- Familiarity with retrieval-augmented generation (RAG), vector search engines, and agentic orchestration frameworks.
- Experience in optimizing inference at scale (quantization, distillation, hardware acceleration).
For pay transparency purposes, the base salary for this full time position is $180,000 – $280,000. We believe in fair compensation concomitant with candidate experience and location.