Senior AI Engineer

AI / ML Senior Sofia, Bulgaria / Remote Full-time

Design and ship production agentic AI, RAG, and LLM systems for clients across BPO, tech media, healthtech, legal, e-commerce, talent, and B2B SaaS. Build agents that plan and execute, retrieval pipelines over 250K+ documents and LLM workloads serving millions of users annually. Agentic orchetration, Claude/GPT/Mistral, Vision models, vector databases, Cloud. We deliver working production engineering at scale, not POCs.

Tech Stack

PythonFastAPIClaudeGPT-4o/5BedrockPineconeQdrantpgvectorOpenSearchAWSAzureTerraformKubernetesvLLM

About the Role

We're looking for a Senior AI Engineer to join our AI/ML practice and lead the design and delivery of production-grade agentic, RAG, and LLM systems for our clients.

You'll build the kind of systems described on our capabilities page: autonomous agents that plan, execute, and iterate; RAG pipelines indexing hundreds of thousands of documents with hybrid search and re-ranking; document intelligence with OCR and vision models; conversational and voice AI grounded in proprietary data. We currently have 20+ AI projects in production, serving millions of annual users at the largest scale, with 250K+ documents in active RAG pipelines.

You'll collaborate directly with client tech leads while being part of our embedded engineering teams covering backend, data, and DevSecOps. Long-term engagements with real ownership, not throwaway PoCs.

What You'll Do

Design and build agentic AI workflows (plan → execute → evaluate → human checkpoint) using frameworks like LangChain, LlamaIndex, LangGraph, or custom orchestration

Build RAG pipelines end-to-end: ingestion, semantic chunking, embedding strategies, hybrid retrieval, re-ranking, and citation tracking

Integrate frontier and open LLMs (Claude Opus/Sonnet/Haiku, GPT-4o/5, Gemini, Llama, Mistral, Qwen, DeepSeek) with proper guardrails, grounding, and cost optimization

Design and ship production tool-calling and multi-agent systems with reliable evaluation, tracing, and rollback

Operate vector and hybrid search infrastructure (Pinecone, Qdrant, Milvus, pgvector, OpenSearch) at scale

Build document intelligence pipelines using OCR and vision models for extraction, classification, and structuring

Implement evaluation harnesses, offline and online metrics, and golden datasets for LLM-driven systems

Deploy and operate AI workloads on AWS, Azure or OVH (Lambda, EKS, SageMaker, Bedrock, Azure AI Services) with IaC and CI/CD

Collaborate with client product, data, and engineering leads to scope, architect, and deliver AI features end-to-end

What We're Looking For

Strong proficiency in Python, including async, typing, and modern API frameworks (FastAPI)

5+ years of professional software engineering experience, with at least 2 building production LLM applications: RAG, agents, tool use, structured output

Hands-on experience with LangChain, LlamaIndex, Haystack, or equivalent orchestration frameworks

Deep understanding of embeddings, vector search, hybrid retrieval, and re-ranking strategies

Working knowledge of at least one vector database (Pinecone, Qdrant, Milvus, pgvector, OpenSearch)

Experience integrating multiple LLM providers (Anthropic, OpenAI, Bedrock, open-source via vLLM/Ollama)

Solid grasp of LLM evaluation: regression suites, eval harnesses, hallucination and grounding metrics

Experience deploying to AWS or Azure with IaC (Terraform), containers (Docker/Kubernetes), and CI/CD

Strong collaboration and written communication skills — you'll work directly with client teams

Nice to Have

Experience building multi-agent systems (planner/executor/critic patterns) or with agent frameworks (LangGraph, AutoGen, CrewAI, OpenAI Agents SDK, Claude Agent SDK)

LLM fine-tuning with LoRA/QLoRA/PEFT, or training custom embedding models for domain adaptation

Self-hosted LLM and GPU deployment experience (vLLM, TGI, SGLang, Triton on Kubernetes, RunPod, Modal, or bare-metal GPU)

Background in classical ML: training, fine-tuning, or distillation with PyTorch, TensorFlow, or scikit-learn

Experience with voice AI (Whisper, ElevenLabs, real-time STT/TTS pipelines)

Document AI experience (Textract, Azure Document Intelligence, vision LLMs)

Knowledge of LLM observability tooling (LangSmith, Langfuse, Arize, Helicone)

Open-source contributions to AI/ML tooling, models, or infrastructure

Awareness of AI safety, prompt injection defence, data privacy, and compliance (GDPR, SOC2)

What We Offer

Long-term, meaningful engagements building real AI products in production, not POCs

Embedded inside multi-disciplinary teams covering backend, data, DevOps, and AI

Cutting-edge stack: Claude (Opus/Sonnet/Haiku/Vision), GPT-4o/5, Bedrock, LangChain, LlamaIndex, Pinecone, Snowflake, AWS, Azure

Variety across domains: BPO, tech media, pharma, legal, e-commerce, talent, travel, fintech — agents, RAG, voice, vision

Remote-friendly with flexible hours

Backed by Eastvantage Group — global delivery, local culture

Part of an official partner company of Anthropic with available certification and upskilling.

Engineering culture that values ownership, transparency, and craft

By Capability

By Industry