Rotoscoping: Hollywood’s video data segmentation?
Explores video segmentation techniques like rotoscoping and green screens used in Hollywood VFX, comparing them to modern AI models like Deeplab v3+.
Explores video segmentation techniques like rotoscoping and green screens used in Hollywood VFX, comparing them to modern AI models like Deeplab v3+.
A tutorial on building an OCR application using Angular for the frontend and Azure Computer Vision API for text extraction from images.
A curated list of top data annotation companies worldwide, grouped by annotation type, focusing on services for computer vision, NLP, and audio data.
A technical tutorial explaining the fundamentals of Convolutional Neural Networks (CNNs) by manually calculating layers from the classic LeNet-5 architecture.
Explores the human effort behind AI training data, covering challenges of data annotation and techniques like transfer learning to reduce labeling workload.
A curated list of open-source and free tools for data annotation across computer vision, NLP, audio, and other domains, including image and video labeling.
Explores efficient state representations for robots to accelerate Reinforcement Learning training, comparing pixel-based and model-based approaches.
Using Azure Cognitive Services and Logic Apps to automatically tag and enrich images stored in SharePoint with metadata.
Introduces PlaNet, a model-based AI agent that learns environment dynamics from pixels and plans actions in latent space for efficient control tasks.
Explores fast, one-stage object detection models like YOLO, SSD, and RetinaNet, comparing them to slower two-stage R-CNN models.
Building a Mortal Kombat controller using TensorFlow.js, CNNs, and transfer learning for posture classification from a webcam feed.
A team wins a tech hackathon by creating an AR app that uses AI and computer vision to recommend web content based on what a phone camera sees.
Explores the R-CNN family of models for object detection, covering R-CNN, Fast R-CNN, Faster R-CNN, and Mask R-CNN with technical details.
Explores classic CNN architectures for image classification, including AlexNet, VGG, and ResNet, as foundational models for object detection.
A retrospective on the transformative impact of deep learning over the past five years, covering its rise, key applications, and future potential.
An introductory guide to the fundamental concepts of object detection, covering image gradients, HOG, and segmentation, as a precursor to deep learning methods.
A review and tips for Georgia Tech's OMSCS CS6476 Computer Vision course, covering content, assignments, and personal experience.
A developer builds and explains a reverse image search API for finding similar products, detailing its implementation and challenges.
A deep dive into Generative Adversarial Networks (GANs), summarizing and explaining key research papers in the field.
Explains the three key research papers behind Facebook's computer vision pipeline for object segmentation: DeepMask, SharpMask, and MultiPathNet.