Machine Learning Algorithms that Make Sense
in constrained and large-scale settings with applications in Advertising, Healthcare, Sustainability (Climate, Computing, Agricultural), Social Goods...
MAIL stands for practical Machine Learning and AI Lab, led by Dr. Khoa D Doan.
Here at MAIL, We develop computational frameworks that enable existing complex/deep models to be more suitable for practical uses. We focus on improving the following aspects of existing models: (i) training/inference, (ii) realistic assumptions, (iii) algorithmic robustness, and (iv) efficiency in constrained settings. Most of our ML/AI solutions center around large-scale approaches that have low computational complexity and require less human effort.
***Warning: This page is severely out of date. Update coming soon in Summer 2024!
Research Interests
Our research focuses on understanding the practical limits of using existing ML methods in the real-world. Essential, we seek answers to the following question: How to make ML models simpler & reliable to use in constrained settings? Simplicity refers to the ability to (i) build or implement the method easily, (ii) execute the deployed model efficiently, and (iii) evolve the deployed model with less effort. Reliability relates to (i) whether we can rely on the model to solve the intended task well, (ii) whether this performance is preserved under frequently perturbed environments in practice such as data corruptions or distributional changes, and (iii) whether the model is resilient to (i.e., its performance is not significantly affected by) various forms of security attacks such as adversarial examples and causal attacks. In this sense, we believe that many existing ML methods, including those with complex deep neural networks, are reliable but not yet easy-to-use because they do not satisfy various constraints seen in real-world applications. We also strongly believe the effort to answer this question will help us truly realize the potential of AI/ML methodology in practice.
Our goal, therefore, is to develop computational frameworks that enable existing complex/deep models to be more suitable for practical uses. We focus on improving the following aspects of existing models: (i) training/inference, (ii) realistic assumptions, (iii) algorithmic robustness, and (iv) efficiency in constrained settings. Most of our ML/AI solutions center around generative-based approaches that have low computational complexity and require less human effort. Currently, our research activities include, but not limited to, the following themes (with selected publications):
Information Retrieval and Applications
- Interpretable Graph Similarity Computation via Differentiable Optimal Alignment of Node Embeddings (SIGIR 2021 by Doan et al.)
- Efficient Implicit Unsupervised Text Hashing using Adversarial Autoencoder (WWW 2020 by Doan et al.)
- Image Hashing by Minimizing Discrete Component-wise Wasserstein Distance (arxiv 2021 by Doan et al.)
- Generative Hashing Network (ACCV 2022 by Doan et al.)
- EBM Hashing Network (Under Submission 2021 by Doan et al.)
- One Loss for Quantization: Deep Hashing with Discrete Wasserstein Distributional Matching (CVPR 2022 by Doan et al.)
- Asymmetric Hashing for Fast Ranking via Neural Network Measures (SIGIR 2023 by Doan et al.)
Generative Models
- Image Generation Via Minimizing Frechet Distance in Discriminator Feature Space (arxiv 2021 by Doan et al.)
- Regression via implicit models and optimal transport cost minimization (arxiv 2020 by Manchanda et al.)
AI Backdoor Security with Generative Models
- Backdoor Attack with Imperceptible Input and Latent Modification (NeurIPS 2021 by Doan et al.)
- LIRA: Learnable, Imperceptible and Robust Backdoor Attacks (ICCV 2021 by Doan et al.)
- Adversarial Defenses for Vision Transformers (Under Submission 2022 by Peng et al.)
- Marksman Backdoor: Backdoor Attacks with Arbitrary Target Class (NeurIPS 2022 by Doan et al.)
- Defending backdoor attacks on vision transformer via patch processing (AAAI 2023 by Doan et al.)