Uses
Technologies and tools I work with daily — organised from broad domains down to the specific libraries and frameworks behind each project.
Deep Learning & Computer Vision
Most of my research projects live in the deep learning and computer vision space: semantic segmentation of medical images, aerial road extraction, and multi-task ship classification. I work mainly with PyTorch for building and training networks — custom loops, mixed-precision, early stopping, and TorchScript export for inference. For standard backbone architectures I use torchvision (ResNet18, ResNet34) and segmentation-models-pytorch to rapidly ablate encoder/decoder combinations without boilerplate.
Image processing relies on OpenCV for preprocessing, sliding-window tiling, and morphological post-processing, and on Rasterio for reading geospatial GeoTIFF imagery with correct CRS handling. Augmentation pipelines (random crops, elastic transforms, colour jitter, Gaussian blur) are built with torchvision transforms and albumentations-style compositions.
Classical ML & Statistical Modelling
Not every problem needs a neural network. For tabular and time-series data I default to gradient boosting and ensemble methods, reaching for deep learning only when the data is high-dimensional or unstructured.
- Ensemble methods — Random Forest and XGBoost / LightGBM for classification and regression. Used in the maritime datathon (97.6% anomaly detection, 92.6% risk classification) and IoT malware pipeline (97.8% accuracy). Hyperparameter search via GridSearchCV and early stopping.
- Unsupervised & anomaly detection — Isolation Forest for IoT traffic anomaly scoring; HDBSCAN for geographic clustering of maritime incidents (handles arbitrary cluster shapes and noise, unlike K-Means).
- Time-series forecasting — Prophet for additive decomposition with seasonality components and trend changepoints, applied to temporal incident prediction in the maritime platform.
- Evaluation & validation — cross-validation pipelines, stratified splits, ROC/AUC, F1, precision, recall, and regression metrics (R²) via scikit-learn.
NLP & Information Retrieval
My NLP work centres on information retrieval over biomedical text — building search engines that combine classical lexical ranking with neural semantic understanding.
- Lexical retrieval — Apache Lucene (Java) for inverted index construction, custom biomedical text analysers, and BM25 ranking. Fast first-stage retrieval before any neural re-ranking.
- Neural re-ranking — BioBERT (via HuggingFace Transformers) fine-tuned as a cross-encoder for passage relevance scoring over clinical trial documents.
- Dense retrieval — document embeddings indexed with FAISS (IVF index) for approximate nearest-neighbour search; combined with lexical scores via Reciprocal Rank Fusion.
- Evaluation — TREC framework (MAP, NDCG@10, P@10); hybrid pipeline improved MAP from 0.31 to 0.42 over BM25 alone.
Data Engineering & Distributed Systems
When data doesn't fit in memory or needs to be processed in real time, I move to distributed systems. My go-to stack is Apache Spark via PySpark — for both batch feature engineering (IoT traffic, AIS maritime data) and real-time Structured Streaming pipelines. MLlib handles large-scale model training when keeping data in the cluster.
On the storage side: PostgreSQL for relational workloads (ETL pipelines, data quality frameworks, Row-Level Security for multi-tenant access — built during my internship at ENXENIO); MongoDB for heterogeneous document records; and Elasticsearch for full-text search at scale. Data wrangling in Python leans on pandas and NumPy before any distributed processing.
BI & Visualisation
Communicating results is as important as producing them. For enterprise BI I use Power BI (DAX, incremental refresh, row-level security) — the main tool during my internship at ENXENIO for financial reporting dashboards. For the Datathon 2026 maritime platform I built an interactive Next.js dashboard with geographic hotspot maps, incident timelines, and P0-P4 severity prioritisation.
For research and project reporting I use Matplotlib for training curves, confusion matrices, ROC plots, and segmentation overlays; Seaborn for distribution plots and correlation heatmaps during EDA; and Plotly for interactive geographic and time-series charts.
Languages & Dev Environment
Python is my primary language for everything ML, data, and scripting. Java for Apache Lucene IR work. R for statistical analysis and reporting (ggplot2, knitr). SQL for analytical queries, window functions, and ETL orchestration. JavaScript for the Next.js maritime dashboard.
Day-to-day: VS Code for Python and JS (Pylance, Jupyter, Ruff, Docker extensions); IntelliJ IDEA for Java (Maven, debugger); Jupyter Lab for EDA and experiment notebooks; Docker for reproducible environments; Git with conventional commits; Postman for API endpoint testing.
Full stack at a glance
| Languages | Python · Java · R · SQL · JavaScript |
|---|---|
| Deep Learning | PyTorch · torchvision · segmentation-models-pytorch · U-Net · ResNet18/34 · TorchScript |
| Computer Vision | OpenCV · Rasterio · albumentations · sliding-window inference |
| Classical ML | scikit-learn · XGBoost · LightGBM · Prophet · HDBSCAN · Isolation Forest |
| NLP & Search | HuggingFace Transformers · BioBERT · Apache Lucene · BM25 · FAISS · RRF |
| Distributed / Streaming | PySpark · Apache Spark · Structured Streaming · MLlib |
| Databases | PostgreSQL · MongoDB · Elasticsearch |
| Data wrangling | pandas · NumPy · ETL pipelines · data quality frameworks |
| BI & Visualisation | Power BI · Next.js · Matplotlib · Seaborn · Plotly |
| Dev tooling | Docker · Git · VS Code · IntelliJ IDEA · Jupyter Lab · Postman |
| OS | Windows 11 · Linux (Ubuntu / Debian) |
Curriculum Vitae
Full details on education, experience, and projects. CV_PardoGarcia_Enrique_EN.pdf
