Pioneering Privacy-Preserving Analytics for Fraud Detection

Person at podium in room full of people, presentation up on projector with the info of their project. Person at in front of the projector screen in room full of people, presentation up on projector reading the info of their project, Flags behind them and standing banners to their right.

Understanding the Mission
In an age where personal data privacy is both a right and a liability, Dr. Jia Zou and her team are trailblazing the intersection of artificial intelligence (AI), machine learning (ML), and cybersecurity. The project, “A Federated Query Optimizer for Privacy-Preserving Analytics and ML,” seeks to resolve one of the Homeland Security Enterprise’s (HSE) most pressing challenges: fraud detection without compromising personal privacy.

“Imagine training a model to detect fraudulent passports or ID cards without ever exposing real personal data,” Dr. Zou said. “We’re creating synthetic datasets that mimic real-world scenarios, allowing collaboration without compromising privacy.”

With co-investigators Dr. Chaowei Xiao from the University of Wisconsin-Madison and Dr. Yingzhen Yang from Arizona State University, the team aims to revolutionize the way HSE and its partners train fraud detection models.

HSE agencies, from Customs and Border Protection (CBP) to Immigration and Customs Enforcement (ICE), face a critical hurdle: access to high-quality, large-scale ID data. Current privacy regulations restrict the sharing of real-world identity documents. Consequently, detecting fraudulent documents using machine learning is hindered by the lack of labeled data necessary to train these systems effectively.

Dr. Zou’s team is addressing this gap by developing a synthetic dataset, IDNet, capable of simulating diverse identity documents. By using advanced techniques like few-shot generative modeling and Bayesian optimization, they produce datasets that balance fidelity, diversity, and privacy.


Innovating with IDNet
In its first iteration, IDNet has already made waves. The dataset features nearly 600,000 synthetic documents, including driver’s licenses from multiple U.S. states and passports from European countries. Open-sourced on Zenodo, IDNet has garnered significant attention with over 1,200 views and 250 downloads.

“We’ve demonstrated that we can generate high-quality synthetic data with minimal real-world input,” Dr. Zou noted. This achievement is underscored by six high-impact publications in leading conferences and journals like IJCV and ICML 2024.

The project also integrates cutting-edge methodologies such as neural architecture search and federated learning. These innovations ensure that synthetic data and models are tailored for specific queries, maximizing both privacy and utility.


From Cryptography to Privacy Advocacy
Dr. Zou’s passion for privacy and security stems from her academic roots at Qinghua University, where her research focused on cryptography and network security. A career spanning roles at IBM, Rice University, and Arizona State University honed her expertise in distributed systems.

“The shift to privacy-preserving systems was a natural progression,” Dr. Zou explained. “I realized that trustworthy computing—ensuring privacy and security—is now more critical than performance optimization.”

This focus has guided her toward solutions that address real-world issues while fostering collaboration across government, academia, and industry.


Overcoming Challenges with Creativity and A Vision for the Future
Building a synthetic dataset without access to real-world labeled data posed significant challenges. Dr. Zou’s team tackled this by immersing themselves in literature reviews and leveraging insights shared by DHS sponsors. This groundwork informed the design of fraud patterns and the development of the IDNet dataset.

Additionally, the team’s novel approach combines foundational generative networks with advanced optimization techniques. “We’re pioneering a framework that not only creates synthetic data but also evaluates its quality against real-world benchmarks,” Dr. Zou said.

While the project directly addresses HSE needs, its implications are far-reaching. Synthetic datasets like IDNet could enable secure collaborations between universities, government agencies, and tech companies, reducing barriers to innovation. Furthermore, the methods being developed could transform how sensitive data is handled in fields ranging from healthcare to finance.

“Our work is just the beginning,” Dr. Zou remarked. “We aim to provide tools and frameworks that empower others to innovate securely and responsibly.”



With a multidisciplinary team, a robust methodology, and strong stakeholder engagement, Dr. Zou’s project is set to redefine the standards for privacy-preserving analytics. Its success highlights the potential of combining AI, privacy technologies, and collaboration to address pressing global challenges.

“Our ultimate goal,” Dr. Zou concluded, “is to build trust into the systems we create, ensuring that privacy is never a barrier to progress.”