Sungjun Son — AI Engineer Portfolio

01.

About

I'm an AI engineer who designs and operates AI agent platforms. From vLLM/llama.cpp multi-GPU (CUDA·ROCm) LLM serving, LangChain/LangGraph Iterative RAG, and Neo4j knowledge graphs to MCP-based AI agents— I build the entire AI service stack across inference, search, automation, and infrastructure.

Today I lead an 8-person team at Plateer building the enterprise AI platform (XGEN), and as Tech & Consulting Part Leader I design clients' AI adoption directly. I cut token cost ~60% on a regional bank's GenAI platform, co-ran Intel Gaudi2/3 LLM inference PoCs, and advised 30+ enterprise clients on AI.

It all started with commerce search. I ran OpenSearch keyword search at 5,000+ TPS in production and rewrote it in Rust for 28ms responses, then moved through hybrid search and LLM query expansion (AI Search) to today's LLM and agent work. Lately I focus on the MCP ecosystem, open-sourcing LLM-agent infrastructure like graph-tool-call and gwanjong-mcp.

Languages Rust Python TypeScript Go

Frameworks NestJS Next.js FastAPI Axum Tauri React

AI / ML vLLM llama.cpp Qdrant OpenSearch HuggingFace LangChain LangGraph MCP Neo4j

Infra Kubernetes Docker ArgoCD Jenkins Redis

02.

Expertise

Search Engine

92 posts

OpenSearch k-NN / Hybrid Search
Qdrant Vector DB
Rust Axum Search API
NestJS Hybrid Search
RAG / Semantic Search

AI / ML

79 posts

vLLM / llama.cpp GPU Serving
MCP-based AI Agent Design
Graph-based Tool Retrieval Engine
LangChain / LangGraph RAG
XGEN AI Agent Platform

Full Stack

63 posts

Next.js / React UI
Rust API Gateway
Tauri Desktop App
Python Async Services
WebSocket / SSE Realtime

DevOps

46 posts

K8s / K3s Cluster Operations
ArgoCD GitOps Deployment
Jenkins CI/CD Pipeline
Docker Multi-stage Build
Istio / Let's Encrypt

03.

Experience

Part Leader · AI Lab → Tech & Consulting

Plateer · EC Solution Lab, AI Lab

Designed & built the XGEN AI agent platform — vLLM/llama.cpp multi-GPU (CUDA·ROCm) LLM serving, LangChain/LangGraph Iterative RAG, Neo4j knowledge graph, MCP-based AI agents & workflow engine, and a new HWPX/DOCX/PPTX Document Adapter
Core engineer on a regional bank's GenAI platform — full-stack AI chatbot (SSE streaming, multi-step agents) from requirements to delivery, ~60% token cost reduction, plus CI/CD infrastructure
Co-ran Intel Gaudi2/3 LLM inference PoCs (DeepSeek·Llama·QwQ quantization/perf) and led XGEN demos·PoCs·consulting for 30+ enterprise clients
Operated 7 microservices on k3s/Istio/ArgoCD GitOps with dev/stg/prd multi-env design, and built the XGEN GS-certification (TTA) test environment
Led an 8-person AI Lab team — task allocation, code review, mentoring, and engineer hiring interviews

Platform Division Manager · Full Stack

Seoul IR Network

Revamped & operated the IRUP content clipping/analytics dashboard — 12 clients, real-time news/YouTube/report ingestion, auto dedup grouping, email/KakaoTalk alerts
Solo full-stack build of an internal e-approval system (Next.js/Node.js/Firebase) — approval flow, permissions, real-time sync; planned to deployed in 3 months

Engineering · Data Collection System

HJ Brain

Built & ran a 10,000+ docs/day web crawling system — 20+ news-source crawlers; retry/exception handling raised success rate 85% → 98%
HTML parsing, text normalization, similarity-based dedup, cron batch pipeline; async processing made collection 3x faster

04.

Projects

Featured Project 40+ related posts

XGEN 2.0 — AI Agent Platform

Search AI/ML Full Stack DevOps

An enterprise AI agent platform built from 7 microservices (Model Serving, API Gateway, Core, Workflow, Retrieval, Documents, Frontend). A 4-tier Backend Adapter pattern auto-detects NVIDIA CUDA / AMD ROCm / Vulkan GPUs and dynamically switches vLLM and llama.cpp backends, serving up to 20 models concurrently on a single server. An Iterative RAG pipeline (query expansion → large top-100 retrieval → iterative LLM filtering → compression) improved search accuracy over a simple top-k baseline, and hybrid search (Dense + BM25 Sparse) was applied using Qdrant Prefetch + RRF (Reciprocal Rank Fusion).

15x higher LLM inference throughput vs. Transformers (12.5 → 185.3 tokens/sec, vLLM PagedAttention + Continuous Batching)
3x faster container startup (45s → 15s), 20% less memory — after removing Ray Serve and moving to a single FastAPI process
3.75x faster embedding (45s → 12s for a 10MB PDF) — Switch-Backend dual mode + batch size 512 → 2048
ArgoCD GitOps pipeline cut deploy time 15min → 3min, 30s rollback, 90% fewer deploy errors, 99.9% availability
Enterprise RBAC (5-level role hierarchy) + full API I/O audit logging + MCP tool-level permission control

Python Rust TypeScript K8s / K3s vLLM llama.cpp Qdrant FastAPI Next.js ArgoCD

AI/ML 4 posts

graph-tool-call — Graph Tool Retrieval Engine

A graph-based retrieval engine that lets an LLM precisely find the tool it needs among 1,000+ API tools. It parses OpenAPI specs to build a 3-tier weighted graph (Tag → Operation → Parameter), and achieves higher accuracy than Vector/BM25 via BFS propagation + IDF weighting. An MCP Proxy mode provides a gateway that collapses many MCP servers into just 2 meta-tools.

On a 1,068-tool benchmark, 2x recall and 40% higher accuracy vs. Vector
MCP Proxy gateway mode — N MCP servers collapsed into 2 meta-tools (1-hop direct calling)
Workflow chain engine — auto-composes multi-step tool calls into a DAG

Python MCP OpenAPI Graph BFS PyPI

AI/ML Full Stack 2 posts

gwanjong-mcp — AI Social Agent

An AI social agent that automates 9 social platforms (Dev.to, Bluesky, Twitter, Reddit, Mastodon, HN, Stack Overflow, GitHub Discussions, Discourse) through an MCP pipeline. Platforms are abstracted with the devhub-social adapter pattern, and the mcp-pipeline stores/requires chain composes a 3-stage Scout → Draft → Strike pipeline.

Scaled 4 → 9 platforms — adapter pattern minimizes per-platform code
stores/requires chain auto-resolves dependencies across multi-step pipelines
Campaign GTM + anti-spam system — rate limiter, content validation, per-platform policy compliance

Python MCP TypeScript 9 Platforms Pipeline

AI/ML PyPI

Synaptic Memory — Brain-inspired Knowledge Graph

A brain-inspired knowledge graph library + MCP server for LLM agents. With Spreading Activation (associative retrieval), Hebbian Learning (experiential learning), and 4-stage Memory Consolidation (L0~L3 auto promotion/eviction), agents automatically structure and retrieve past experience. It reached MRR 0.793 (finance/medical/legal) with FTS alone, and HotPotQA nDCG 0.636.

16 MCP tools — Auto-ontology (rules + LLM + embedding) construction
5-axis ranking (relevance × importance × recency × vitality × context)
Zero-dep core — swappable SQLite/PostgreSQL/Qdrant/Neo4j backends

Python MCP Knowledge Graph Hebbian PyPI

Search 12 posts

Rust Commerce Search Engine

A commerce search API server rewritten in Rust/Axum to overcome the performance limits of a NestJS search engine. It implements concurrent multi-index OpenSearch search, Redis caching, and unified search across multiple data sources (products/brands/categories). Achieved 1/5 the memory, 30% faster response, and 2x indexing throughput vs. NestJS.

28ms average response, 2,100 req/s — Tokio async runtime + Tower middleware
12MB idle memory (vs. 60MB on NestJS, 1/5) — leveraging zero-cost abstractions
Jenkins → Docker → K8s automated deployment pipeline

Rust Axum Tokio OpenSearch Redis Docker

AI/ML 15 posts

AI Agent Browser Automation

An LLM-based browser automation agent with a 4-layer architecture (Orchestrator → Planner → Navigator → Extractor). It dynamically registers tools via MCP, and combines Playwright-based DOM parsing with CSS-selector confidence scoring to build automation that is robust to web structure changes. Built from prototype to production in 49 commits over 4 days.

Human-in-the-Loop raised task completion from 30% → 95%
5.5x fewer MCP tool calls — DOM context pre-injected at the planning stage
No-code automation: scenario recorder → JSON playbook → repeatable execution

TypeScript Python Playwright MCP LLM Next.js

Search 10 posts

NestJS Hybrid Search Engine

A commerce hybrid search engine grown over 14 months and 318 commits. It combines OpenSearch keyword search with Qdrant 384-dimensional vector semantic search via RRF, and improved search accuracy by 40% through LLM-based query expansion (synonyms/intent analysis) and a reranking pipeline. A Nori morphological analyzer detects Korean verbs to skip unnecessary GPT calls, cutting response time from 2~3s to 300ms.

40% higher search accuracy from semantic search (resolving keyword mismatch)
Nori verb detection optimizes GPT calls — 2~3s → 300ms response
Multi-tenant index design — multiple mall search services on a single cluster

NestJS OpenSearch Qdrant Nori Python FastEmbed

Full Stack 10 posts

Tauri 2.0 AI Desktop App

A Tauri 2.0 cross-platform AI desktop app with 1/10 the binary size and 1/3 the memory of Electron. A Remote WebView architecture renders a remote server UI directly in the local app without a frontend build, and it implements mistral.rs-based local LLM inference, NAT traversal via a Bore tunnel, and automatic switching between 3 operating modes (local/remote/hybrid).

Rust Sidecar pattern — Python services auto start/stop with the app
Remote WebView removes the frontend build — shorter deploy time
Custom-built mistral.rs local LLM inference + Bore tunnel NAT traversal

Tauri 2.0 Rust React TypeScript mistral.rs

Search AI/ML Case Study

i-Scream Mall AI Search

A case study of building and operating an AI search system in production for an education-focused shopping mall (i-Scream Mall). Semantic search + LLM query expansion were applied on a NestJS search engine to improve product search accuracy. It reliably handles 5,000+ TPS peak traffic, and a later Rust rewrite further cut operating costs.

Stable handling of 5,000+ TPS peak traffic — zero-downtime production operation
Semantic search resolves keyword mismatch — improved search conversion
NestJS → Rust rewrite cut memory to 1/5 and improved response 30%

NestJS Rust OpenSearch Nori LLM

05.

Open Source

Python

graph-tool-call

An LLM-agent tool engine that searches 1,000+ API tools on a graph. Supports an MCP Proxy gateway.

pip install graph-tool-call

Python

synaptic-memory

A brain-inspired knowledge graph — Spreading Activation, Hebbian Learning, Memory Consolidation. MCP server with 16 tools.

pip install synaptic-memory

Python

devhub-social

A unified async client for developer communities — Dev.to, Bluesky, Twitter/X, Reddit, and 9 platforms in total.

pip install devhub-social

Python

ku-portal-mcp

An MCP server for Korea University's KUPID portal + Canvas LMS — notices, timetable, library, assignments, grades.

pip install ku-portal-mcp

06.

Tech Stack

Languages & Frameworks

Rust Python TypeScript Go Axum NestJS Next.js FastAPI Tauri React

AI / ML

vLLM llama.cpp Qdrant OpenSearch k-NN HuggingFace LangChain LangGraph MCP Neo4j PyTorch FAISS FastEmbed

Infrastructure & CI/CD

Kubernetes Docker K3s Redis Istio Jenkins ArgoCD Caddy GitHub Actions Helm MLflow AWS GitLab CI

07.

Timeline

2026

2025

Developed a Tauri 2.0 cross-platform desktop app
Built a Rust commerce search engine (28ms, 2,100 req/s)
Built an AI agent browser automation system
Operated Moon / i-Scream Mall commerce search services
Built XGEN 1.0 infrastructure and GPU model serving

2024

Built a NestJS hybrid search engine (318 commits, 14 months)
Designed and developed a semantic search API
Built Qdrant vector-DB-based semantic search
Developed the Aurora commerce search API (OpenSearch multi-index)
Demand forecasting API / persona recommendation system

08.

Education

Korea University, Grad School

M.S. in Artificial Intelligence · SW·AI Convergence

Hanyang University

B.S. in Urban Engineering · ML/DL trade-area analysis thesis (team lead)

Certifications Information Processing Industrial Engineer SQLD Network Administrator Lv.2

Languages Korean (native) English (professional)

09.

By the Numbers

284+ Tech Blog Posts

60% Token Cost Cut (Bank GenAI)

11 Open Source Projects

1,068 Tool Benchmark (graph-tool-call)

8 Team Led (AI Lab)

28ms Rust Search Engine Response

30+ Enterprise Clients Advised

15x LLM Inference Throughput Gain