Object Recognition with Pixel-Level Masks
Computer Vision pipeline for multi-class object recognition using deep learning — CNNs with pixel-level segmentation masks, data augmentation, and comprehensive evaluation metrics.
Visit websiteOverview
A computer vision pipeline for multi-class object recognition that produces dense pixel-level segmentation masks rather than bounding boxes alone. The system distinguishes object instances within cluttered scenes and outputs per-pixel class probabilities, enabling precise region-of-interest extraction for downstream tasks.
Architecture & Training
- Backbone: Convolutional encoder pre-trained on ImageNet; weights fine-tuned end-to-end on the target dataset
- Segmentation head: Pixel-level mask decoder producing per-class probability maps at input resolution
- Data augmentation: Random horizontal/vertical flips, colour jitter, random crops, and elastic deformations to improve robustness to scale and appearance variation
- Loss: Combined cross-entropy and Dice loss to balance multi-class supervision and handle class imbalance
Evaluation
Performance is reported with ROC curves (per class and micro/macro-average AUC), F1-score, precision, and recall at pixel level. Qualitative inspection of mask boundaries confirms the model captures fine-grained shape detail, outperforming bounding-box baselines on overlap metrics.
Stack: Python · PyTorch · torchvision · OpenCV · scikit-learn · Matplotlib
