Skip to main content

Uses

Technologies and tools I work with daily — organised from broad domains down to the specific libraries and frameworks behind each project.

Deep Learning & Computer Vision

Most of my research projects live in the deep learning and computer vision space: semantic segmentation of medical images, aerial road extraction, and multi-task ship classification. I work mainly with PyTorch for building and training networks — custom loops, mixed-precision, early stopping, and TorchScript export for inference. For standard backbone architectures I use torchvision (ResNet18, ResNet34) and segmentation-models-pytorch to rapidly ablate encoder/decoder combinations without boilerplate.

Image processing relies on OpenCV for preprocessing, sliding-window tiling, and morphological post-processing, and on Rasterio for reading geospatial GeoTIFF imagery with correct CRS handling. Augmentation pipelines (random crops, elastic transforms, colour jitter, Gaussian blur) are built with torchvision transforms and albumentations-style compositions.

Classical ML & Statistical Modelling

Not every problem needs a neural network. For tabular and time-series data I default to gradient boosting and ensemble methods, reaching for deep learning only when the data is high-dimensional or unstructured.

  • Ensemble methods — Random Forest and XGBoost / LightGBM for classification and regression. Used in the maritime datathon (97.6% anomaly detection, 92.6% risk classification) and IoT malware pipeline (97.8% accuracy). Hyperparameter search via GridSearchCV and early stopping.
  • Unsupervised & anomaly detection — Isolation Forest for IoT traffic anomaly scoring; HDBSCAN for geographic clustering of maritime incidents (handles arbitrary cluster shapes and noise, unlike K-Means).
  • Time-series forecasting — Prophet for additive decomposition with seasonality components and trend changepoints, applied to temporal incident prediction in the maritime platform.
  • Evaluation & validation — cross-validation pipelines, stratified splits, ROC/AUC, F1, precision, recall, and regression metrics (R²) via scikit-learn.

NLP & Information Retrieval

My NLP work centres on information retrieval over biomedical text — building search engines that combine classical lexical ranking with neural semantic understanding.

  • Lexical retrieval — Apache Lucene (Java) for inverted index construction, custom biomedical text analysers, and BM25 ranking. Fast first-stage retrieval before any neural re-ranking.
  • Neural re-ranking — BioBERT (via HuggingFace Transformers) fine-tuned as a cross-encoder for passage relevance scoring over clinical trial documents.
  • Dense retrieval — document embeddings indexed with FAISS (IVF index) for approximate nearest-neighbour search; combined with lexical scores via Reciprocal Rank Fusion.
  • Evaluation — TREC framework (MAP, NDCG@10, P@10); hybrid pipeline improved MAP from 0.31 to 0.42 over BM25 alone.

Data Engineering & Distributed Systems

When data doesn't fit in memory or needs to be processed in real time, I move to distributed systems. My go-to stack is Apache Spark via PySpark — for both batch feature engineering (IoT traffic, AIS maritime data) and real-time Structured Streaming pipelines. MLlib handles large-scale model training when keeping data in the cluster.

On the storage side: PostgreSQL for relational workloads (ETL pipelines, data quality frameworks, Row-Level Security for multi-tenant access — built during my internship at ENXENIO); MongoDB for heterogeneous document records; and Elasticsearch for full-text search at scale. Data wrangling in Python leans on pandas and NumPy before any distributed processing.

BI & Visualisation

Communicating results is as important as producing them. For enterprise BI I use Power BI (DAX, incremental refresh, row-level security) — the main tool during my internship at ENXENIO for financial reporting dashboards. For the Datathon 2026 maritime platform I built an interactive Next.js dashboard with geographic hotspot maps, incident timelines, and P0-P4 severity prioritisation.

For research and project reporting I use Matplotlib for training curves, confusion matrices, ROC plots, and segmentation overlays; Seaborn for distribution plots and correlation heatmaps during EDA; and Plotly for interactive geographic and time-series charts.

Languages & Dev Environment

Python is my primary language for everything ML, data, and scripting. Java for Apache Lucene IR work. R for statistical analysis and reporting (ggplot2, knitr). SQL for analytical queries, window functions, and ETL orchestration. JavaScript for the Next.js maritime dashboard.

Day-to-day: VS Code for Python and JS (Pylance, Jupyter, Ruff, Docker extensions); IntelliJ IDEA for Java (Maven, debugger); Jupyter Lab for EDA and experiment notebooks; Docker for reproducible environments; Git with conventional commits; Postman for API endpoint testing.

Full stack at a glance

LanguagesPython · Java · R · SQL · JavaScript
Deep LearningPyTorch · torchvision · segmentation-models-pytorch · U-Net · ResNet18/34 · TorchScript
Computer VisionOpenCV · Rasterio · albumentations · sliding-window inference
Classical MLscikit-learn · XGBoost · LightGBM · Prophet · HDBSCAN · Isolation Forest
NLP & SearchHuggingFace Transformers · BioBERT · Apache Lucene · BM25 · FAISS · RRF
Distributed / StreamingPySpark · Apache Spark · Structured Streaming · MLlib
DatabasesPostgreSQL · MongoDB · Elasticsearch
Data wranglingpandas · NumPy · ETL pipelines · data quality frameworks
BI & VisualisationPower BI · Next.js · Matplotlib · Seaborn · Plotly
Dev toolingDocker · Git · VS Code · IntelliJ IDEA · Jupyter Lab · Postman
OSWindows 11 · Linux (Ubuntu / Debian)

Curriculum Vitae

Full details on education, experience, and projects. CV_PardoGarcia_Enrique_EN.pdf