Available for new projects
Jiale Cheng

Jiale Cheng

Physicist | Machine Learning Engineer | GenAI Specialist

M.Sc. in Advanced Physics (Computational Physics). Specialized in Generative AI models for synthetic hadronic particle simulations, exploring diffusion models and invertible neural networks (INNs) for high-energy physics applications.


Currently working as a Senior Data Scientist at CoverWallet (Aon), leveraging large-scale data analysis, LLM-based applications, and machine learning to enhance decision-making and automation in complex environments.

scroll

Core Expertise

🧠

Machine Learning & GenAI

Python Pandas NumPy Scikit-learn TensorFlow PyTorch Diffusion Models Normalizing Flows INNs OpenAI API Llama Gemini Claude HuggingFace LangChain MLFlow AirFlow
☁️

Big Data & Cloud

PySpark Distributed Computing ETL Pipelines AWS GCP Azure BigQuery SnowFlake
💻

Programming Languages

Python R C++ JavaScript Scala
📊

Data Visualization

SQL Power BI Looker Seaborn Matplotlib

Selected work

Featured Projects

ML, data engineering, and strategic implementations at scale

Financial Crime Detection
🛡️

AML Detection Platform

ThetaRay × Santander·2023-2025

Led end-to-end implementation of AI-driven Anti-Money Laundering detection platform across 4 international geographies for Banco Santander.

  • Improved detection rates by double-digit percentages
  • Reduced processing time by 40% (billions of daily transactions)
  • Deployed across Uruguay, Portugal, Poland, and Mexico
PythonPySparkAirFlowMLFlowUnsupervised ML
Data Engineering
⚙️

ETL Pipeline Re-engineering

ThetaRay·2023-2024

Re-architected feature calculation pipelines to handle billions of transactions daily with advanced orchestration and distributed computing.

  • 40% reduction in processing time
  • Scalable to billions of daily transactions
  • Enhanced model feature engineering capabilities
PySparkAirFlowSQLFeature Engineering
Business Intelligence
📊

Market Intelligence Suite

InsudPharma·2022-2023

Built comprehensive market intelligence platform integrating competitor analysis, price tracking, and patent expiration monitoring for pharmaceutical industry.

  • Automated data collection from 50+ sources
  • Real-time KPI tracking dashboard
  • Strategic decision-making tool for C-level executives
PythonSeleniumBeautifulSoupPower BIPower Apps
Risk Analytics
🌍

Climate Risk Models

Management Solutions·2021-2022

Developed methodologies to measure financed emissions and climate-related risks for financial institutions, contributing to ECB stress testing.

  • Contributed to 2022 ECB climate stress test
  • IFRS9 model validation and compliance
  • Time-series risk projection models (PD, LGD, EAD)
PythonRScikit-LearnTime-Series Analysis
Machine Learning
🤖

Unsupervised ML Models

ThetaRay·2023-2025

Designed and deployed unsupervised machine learning models to detect money laundering, human trafficking, and terrorism financing patterns.

  • Pattern detection without labeled data
  • Reduced false positives significantly
  • Geographic-specific risk scenario evaluations
PythonUnsupervised LearningAnomaly DetectionMLFlow
Data Collection
🕷️

Web Scraping & Automation

InsudPharma·2022-2023

Developed automated data collection pipelines for market analysis using web scraping, PDF parsing, and data extraction tools.

  • Automated collection from regulatory databases
  • Price evolution tracking across competitors
  • Patent expiration monitoring system
SeleniumBeautifulSoupPDFPlumberPower Automate

Career

Professional Experience

Building at the intersection of machine learning, data engineering, and real-world impact

Senior Data Scientist

CoverWallet (Aon)

Present

Leveraging large-scale data analysis, LLM-based applications, and machine learning to enhance decision-making and automation in complex environments.

PythonLLMsMLFlowLangChainAWSGCPAirflowGemini

Data Science & AI Consultant

Freelance

Present

Developing custom solutions for international clients, focusing on automation, predictive models, and Generative AI strategies.

ConsultingGenAI SolutionsFull Stack Data

AI Teacher

Ironhack

Present

Mentoring the next generation of Data Scientists, teaching Artificial Intelligence, Machine Learning, and Python.

MentoringEducationAI

Lead Data Scientist

ThetaRay

Sept 2023 - 2025

Led a cross-functional team of 10 professionals. Spearheaded the development of unsupervised ML models to detect financial crimes (AML) for global banks like Santander.

Team LeadPySparkAirFlowAMLPython

Data Scientist

InsudPharma

Sept 2022 - Aug 2023

Designed and implemented data-collection pipelines using Selenium and BeautifulSoup. Developed internal web applications for tracking KPIs and analyzing new business opportunities.

SeleniumPower BIMarket Intelligence

Data Scientist

Management Solutions

Sept 2021 - Aug 2022

Developed methodologies to measure financed emissions and climate-related risks. Validated models for IFRS9 and risk projections for the 2022 ECB climate stress test.

Risk ModelingRPythonClimate Risk