My doctoral research at North Carolina State University lies at the intersection of Offline Reinforcement Learning (RL) and Apprenticeship / Imitation Learning (IL) for human-centric domains such as intelligent tutoring systems and healthcare. My work seeks to develop algorithms that can learn expert decision-making policies entirely from static, pre-collected data—without requiring online exploration. This is particularly crucial in sensitive settings like education or medicine, where deploying a flawed policy can be costly or unethical. To this end, I design frameworks that balance theoretical rigor with real-world interpretability, combining ideas from energy-based modeling, contrastive learning, and Bellman-consistent policy estimation to ensure stability, robustness, and explainability in offline settings.
A central theme in my research is understanding and modeling heterogeneous and evolving reward functions—a hallmark of human decision-making that traditional RL systems fail to capture. Through my works on EM-EDM (EDM 2024) and THEMES (KDD 2025, AIED 2025), I leveraged apprenticeship learning frameworks capable of inducing policies from expert demonstrations that reflect multiple latent goals or reward regimes changing over time. These frameworks leverage Expectation-Maximization (EM) clustering and energy-based distribution matching to segment expert trajectories into meaningful sub-behaviors and infer the underlying evolving reward structures that drive them. For example, in intelligent tutoring systems, my models can identify how a student’s learning strategy or pedagogical focus shifts across sessions and adapt policy induction accordingly—achieving both interpretability and predictive accuracy in modeling student-tutor interactions.
Building upon these ideas, my recent work explores Batch Contrastive Imitation Learning (BCIL), which introduce fine-grained, state-conditioned contrastive objectives to distinguish expert and suboptimal actions within mixed-quality datasets. Together, these efforts contribute toward a unified vision of safe, data-efficient, and human-aligned AI, capable of learning from observation rather than costly trial-and-error. Broadly, my research aims to push the frontier of offline human-centric decision-making, providing a foundation for adaptive and trustworthy AI systems in education, healthcare, and generative intelligence.
Xi Yang, Md Mirajul Islam, Ge Gao, Min Chi. THEMES: An Offline Apprenticeship Learning Framework for Evolving Reward Functions. The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2025). (pdf)
John Hostetter, Adittya Soukarjya Saha, Md Mirajul Islam, Tiffany Barnes, Min Chi. Human-Readable Neuro-Fuzzy Networks from Frequent Yet Discernible Patterns in Reward-Based Environments. The 34th International Joint Conference on Artificial Intelligence (IJCAI 2025). (pdf)
Md Mirajul Islam, Xi Yang, Rajesh Debnath, Adittya Soukarjya Saha, Min Chi. A Generalized Apprenticeship Learning Framework for Capturing Evolving Student Pedagogical Strategies. The 26th International Conference on Artificial Intelligence in Education (AIED 2025). (pdf)
Md Mirajul Islam, Xi Yang, John Hostetter, Adittya Soukarjya Saha, Min Chi. A Generalized Apprenticeship Learning Framework for Modeling Heterogeneous Student Pedagogical Strategies. The 17th Educational Data Mining Conference (EDM 2024). (pdf)
• Built a Mini-CLIP vision–language retrieval system from scratch: swapped in a ViT-B/16 image tower, Transformer text
encoder, hard-negative contrastive training, and FAISS-based retrieval, yielding highly scalable zero-shot image–text recall
on COCO dataset. [Github]
• Proposed a few-shot setting-based energy infrastructure detection model (nuclear power plant, solar photovoltaic power
plant, and conventional hydroelectric power plant) that uses embeddings trained on a smaller dataset to identify complex
objects from the aerial imagery data (DOTA 1.0).[Github]
• Fine-tuned the downstream tasks from GLUE benchmark (CoLA, SST-2, MRPC, QQP, MNLI, QNLI, RTE, WNLI) using
various pre-trained BERT models including Albert and compared their performances.
• Implemented community detection algorithms on large networks using Spectral Graph Clustering; detected anomaly on
time-evolving networks; modeled virus propagation across networks. (PySpark, networkgx) [Github]
• Built this enjoyable one-player game implementing the A* search algorithm with three different heuristics. [Github]
• Trained Random Forest, Adaboost, SVM classifiers for automatic false weather reading detection using three sampling
strategies over the NC ECONet dataset and K-fold cross validation to tune the hyperparameters. [Github]
Different people possess different human traits and show different types of behaviors. Identifying these differences in traits through observing a common physical parameter can facilitate diversified applications. Therefore, we propose a novel human trait identification technique that can predict a person’s traits using brainwave data in real time. To do so, first, we conduct a thorough study over brainwave EEG data collected from 80 persons while inducing five different emotional states over them. Our analysis reveals several new findings, which in turn, leads us to a novel unified solution of finding different human traits through exploiting machine learning techniques over the EEG data. We conduct a rigorous user evaluation of our developed solution over 20 different participants. Outcomes of the user evaluation demonstrate high accuracy along with good user ratings, which exhibit a gleaming potential of our proposed approach to be a ubiquitous solution for real-time trait identification.
Identifying physical traits and emotions based on system-sensed physical activities is a challenging problem in the realm of human computer interaction. Our work contributes in this context by investigating an underlying connection between head movements and corresponding traits and emotions. To do so, we utilize a head movement measuring device called eSense, which gives acceleration and rotation of a head. Here, first, we conduct a thorough study over head movement data collected from 46 persons using eSense while inducing five different emotional states over them in isolation. Our analysis reveals several new head movement based findings, which in turn, leads us to a novel unified solution for identifying different human traits and emotions through exploiting machine learning techniques over head movement data. Our analysis confirms that the proposed solution can result in high accuracy over the collected data.
Management of medical records in dynamic contexts requires an understanding of the overall infrastructure of record flows and poses more challenges for vulnerable environments such as among the Rohingya refugees in Bangladesh. The Rohingya people are a Muslim minority group who fled from the western state of Rakhine, Myanmar, after devastating suffering. They took shelter in various refugee camps located in the Cox’s Bazar district of Bangladesh. Various Government and Non-Government Organizations (NGO’s) of Bangladesh are working together to provide them with different services, including healthcare support. Understanding the overall infrastructure of how health clinics are providing medical treatments and how they are collecting and storing the resultant patient records is crucial as any changes in these records may have political, financial, or social impacts. Through a field-based study in the Rohingya camps in Bangladesh, we explored the management of medical records in different organizations.
Facebook users often join Facebook groups to connect to the people with the same interest regardless of the fact that the other members take the same standing with them. Our study aims to investigate Bangladeshi users’ motivation to join and strategies to manage their Facebook groups and identify the relevant challenges. In our ongoing work, we are conducting a survey and interviewing Facebook-group users to understand how Facebook groups are bringing the users of similar interest and agenda together on Facebook and providing the admins with imagined sovereignty. This poster presents some of our crucial findings. This set of findings will be useful in designing better tools for managing Facebook groups for empowering the admins and the users.
Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. The Neural Network (NN) model recognizes the text contained in the images of segmented words as shown in the illustration below. As these word-images are smaller than images of complete text-lines, the NN can be kept small and training on the CPU is feasible. 3/4 of the words from the validation-set are correctly recognized and the character error rate is around 10%. Here is the Github repository:
There are 28 industries of ACI Limited. All of them has huge sales data. We used time series forecasting algorithms for predicting the sales amounts in different future time windows. The models are: Exponential smoothing, Prophet, and LSTM.
AI bot for disease prediction. A dataset of University of Columbia containing the mapping of symptoms to diseases was used to train the bot. USSD API (for cellular phone users). SMS API (for sending message from server to customer). CAAS API (for subscription charges). This project was developed at TADHACK 2018 and awarded the 2nd Runner Up prize.
It is a website I developed using Laravel framework. Faculties, administrators, and students are the users of this website.
Here is the git repository:
It is an android app to create an e-Information about the donor, the consumer and organization that are related to donating or consuming the blood. People unaware of their blood group, mismatch of blood groups, if matches , distant location, disease, lack of required amount of blood. Time consuming process. Only confined within the known people of the consumer. Here one can find the nearest available blood donor in shortest possible time. This app was created for Android. (Software Engineering and Information System Design Project)
Here is the git repository:
It is a hardware project where we designed and implemented a automatic car parking system using IR sensors.(Digital System Design Project).
The goal of this project is to create a portable keypad with no physical appearance which will work based on the principle of sensing. With the development of modern technologies devices like smartwatch has been widely spread. We can think of not having any physical keypad on the device. There will be a projection of keypad and by following the keypad projection and putting finger on a specific key will cause the visual representation of that particular key on the monitor. Our project is based mainly on IR sensors which can sense any obstacle set up in front of this and produce a responsive voltage. Measuring the responsive voltage we can meet up our goal of creating an input keypad without any physical appearance providing the comfort and versatility to users to carry and use it anywhere and anytime.
It is a project for automatic filtering, aggregating, analyzing twitter data using redisDB.
Here is git repository:
How we can get train a deep learning model to predict steering wheel angles and help a virtual vehicle drive itself in a simulator. The model is created using Keras, relying on Tensorflow as the backend. Data augmentation techniques: Camera And Steering Wheel Angle Calibration, Image Horizontal Flip, Darken Image, Random Shadow, Shift Image Left/Right/Up/Down. We trained the model using Adam as the optimizer and a learning rate of 0.001. After much, tweaking of parameters, and experimentation of multiple models, we ended up with one that is able power our virtual car to drive autonomously on both tracks.