Disponible para nuevos proyectos
Jiale Cheng

Jiale Cheng

Physicist | Machine Learning Engineer | GenAI Specialist

M.Sc. en Física Avanzada (Física Computacional). Especializado en modelos de Inteligencia Artificial Generativa para simulaciones de física de partículas hadrónicas, explorando la frontera de los diffusion models y redes neuronales invertibles (INNs).


Actualmente Senior Data Scientist en CoverWallet (Aon), donde combino el análisis de datos a gran escala con aplicaciones basadas en LLMs y Machine Learning para potenciar la toma de decisiones y la automatización en entornos complejos.

scroll

Core Expertise

🧠

Machine Learning & GenAI

Python Pandas NumPy Scikit-learn TensorFlow PyTorch Diffusion Models Normalizing Flows INNs OpenAI API Llama Gemini Claude HuggingFace LangChain MLFlow AirFlow
☁️

Big Data & Cloud

PySpark Distributed Computing ETL Pipelines AWS GCP Azure BigQuery SnowFlake
💻

Programming Languages

Python R C++ JavaScript Scala
📊

Data Visualization

SQL Power BI Looker Seaborn Matplotlib

Selected work

Proyectos Destacados

ML, data engineering, and strategic implementations at scale

Financial Crime Detection
🛡️

AML Detection Platform

ThetaRay × Santander·2023-2025

Led end-to-end implementation of AI-driven Anti-Money Laundering detection platform across 4 international geographies for Banco Santander.

  • Improved detection rates by double-digit percentages
  • Reduced processing time by 40% (billions of daily transactions)
  • Deployed across Uruguay, Portugal, Poland, and Mexico
PythonPySparkAirFlowMLFlowUnsupervised ML
Data Engineering
⚙️

ETL Pipeline Re-engineering

ThetaRay·2023-2024

Re-architected feature calculation pipelines to handle billions of transactions daily with advanced orchestration and distributed computing.

  • 40% reduction in processing time
  • Scalable to billions of daily transactions
  • Enhanced model feature engineering capabilities
PySparkAirFlowSQLFeature Engineering
Business Intelligence
📊

Market Intelligence Suite

InsudPharma·2022-2023

Built comprehensive market intelligence platform integrating competitor analysis, price tracking, and patent expiration monitoring for pharmaceutical industry.

  • Automated data collection from 50+ sources
  • Real-time KPI tracking dashboard
  • Strategic decision-making tool for C-level executives
PythonSeleniumBeautifulSoupPower BIPower Apps
Risk Analytics
🌍

Climate Risk Models

Management Solutions·2021-2022

Developed methodologies to measure financed emissions and climate-related risks for financial institutions, contributing to ECB stress testing.

  • Contributed to 2022 ECB climate stress test
  • IFRS9 model validation and compliance
  • Time-series risk projection models (PD, LGD, EAD)
PythonRScikit-LearnTime-Series Analysis
Machine Learning
🤖

Unsupervised ML Models

ThetaRay·2023-2025

Designed and deployed unsupervised machine learning models to detect money laundering, human trafficking, and terrorism financing patterns.

  • Pattern detection without labeled data
  • Reduced false positives significantly
  • Geographic-specific risk scenario evaluations
PythonUnsupervised LearningAnomaly DetectionMLFlow
Data Collection
🕷️

Web Scraping & Automation

InsudPharma·2022-2023

Developed automated data collection pipelines for market analysis using web scraping, PDF parsing, and data extraction tools.

  • Automated collection from regulatory databases
  • Price evolution tracking across competitors
  • Patent expiration monitoring system
SeleniumBeautifulSoupPDFPlumberPower Automate

Career

Trayectoria Profesional

Building at the intersection of machine learning, data engineering, and real-world impact

Senior Data Scientist

CoverWallet (Aon)

Presente

Liderando iniciativas de Data Science, aprovechando el análisis de datos a gran escala, aplicaciones basadas en LLM y aprendizaje automático para mejorar la toma de decisiones y la automatización en entornos complejos.

PythonLLMsInsurTechAutomation

Consultor Data Science & AI

Freelance

Presente

Desarrollo de soluciones a medida para clientes internacionales, enfocadas en automatización, modelos predictivos y estrategias de IA Generativa.

ConsultoríaGenAI SolutionsFull Stack Data

AI Teacher

Ironhack

Presente

Formación de la próxima generación de Data Scientists, impartiendo conocimientos sobre Inteligencia Artificial, Machine Learning y Python.

MentoringEducationAI

Lead Data Scientist

ThetaRay

Sept 2023 - 2025

Liderazgo de un equipo cross-funcional de 10 profesionales. Desarrollo de modelos no supervisados para la detección de crímenes financieros (AML) para bancos globales como Santander. Mejora de tasas de detección en porcentajes de dos dígitos.

Team LeadPySparkAirFlowAMLPython

Data Scientist

InsudPharma

Sept 2022 - Aug 2023

Diseño de pipelines de recolección de datos usando Selenium y BeautifulSoup. Desarrollo de aplicaciones web internas para seguimiento de KPIs, evolución de precios y análisis de nuevas oportunidades de negocio.

SeleniumPower BIMarket Intelligence

Data Scientist

Management Solutions

Sept 2021 - Aug 2022

Desarrollo de metodologías para medir emisiones financiadas y riesgos climáticos. Validación de modelos para IFRS9 y proyecciones de riesgo (PD, LGD, EAD) para el test de estrés climático del BCE 2022.

Risk ModelingRPythonClimate Risk