Automated multi-label behavioral classification from clinical interview transcripts using generative AI and comparison with classical embedding + XGBoost approaches - evaluated against 26 behavioral labelled dataset.
Behavioral scientists manually annotated clinical interview transcripts using 26 behavioral labels across Risk, Effort, and Social Influence categories - a costly and time-intensive workflow. The objective was to automate multi-label behavioral classification from transcript passages while handling overlapping behavioral definitions. Transcript data was stored and processed through AWS S3 storage infrastructure.
Goal was to build a domain-adapted LLM behavioral classification pipeline that can predict and reason about behavioral labels given the interview transcript. Evaluated multiple generative AI solutions with local and cloud LLMs for multi-label behavioral prediction using structured prompt engineering and behavioral-definition alignment.
Models Evaluated
A classical NLP pipeline using semantic embeddings and XGBoost-based multi-label classification.
LLM Pipeline
Classical ML
Infrastructure
Evaluation
Need intelligent classification pipelines for domain-specific or behavioral data? Whether prompt engineering, fine-tuning, or classical ML - I can design the right solution for your use case.
✉ Get in Touch