Overview

Custom vs. pre-defined project topics

This class includes a semester project that will expose you to the entire computer vision pipeline (from data collection/curation to classification/prediction). Choose one option:

Deliverables

Each project has five parts with deadlines, itemized below. Custom projects may have this schedule adapted, if needed (yet the total credit remains the same).

You are free to choose the programming language and libraries that you like for the semester project. In particular, we recommend Python with appropriate packages (OpenCV for computer vision methods and Pytorch for deep learning), or Matlab with necessary toolboxes. (Matlab is available for free to all Notre Dame students, faculty and staff.) When using Python, we strongly recommend creating a specific environment for your project, e.g., using Anaconda. If you haven't tried virtual environments before, don't be scared. Try right away and ask Louisa or Walter or help.

All deliverables (code, reports) should be pushed to the GitHub private repo (which you need to create and share with Louisa and Walter) by the deadline indicated for each project stage. Here is the formal assignment that specifies how to share your GitHub repo with us.

Teamwork

It is permissible to form teams working on a single semester project. Teams should not be larger than 2 students. Each teammate will be graded separately and only for their individual contributions. So it is important for you to indicate in each of your reports which part was done by which teammate.

Please split your work in a way that allows each of you to be engaged in all project deliverables. An example approach to good team work is to develop different approaches (e.g., different segmentation methods, or different classifiers) solving the same problem by both team members, then compare them or build an ensemble classifier, and contribute equally to the report and coding. An example of a bad team work is when one teammate makes all calculations and/or does all of the coding, and the second teammate writes the full report. The worst possible split would be: one teammate does all of the work, while the second teammate brings pizza (don't do this!).

Computational resources

All predefined projects can be completed using your own laptop, or a free Google Colab account if you need GPU time. However, if you need resources of a bigger caliber (powerful GPUs and storage), there are two solutions:


Datasets

For each project a few appropriate databases are suggested below. But it is good to look at this well-maintained repository of links to various Computer Vision databases: http://homepages.inf.ed.ac.uk/rbf/CVonline/Imagedbase.htm. Don't be alarmed by the old-school HTML design -- this webpage is relatively up-to-date.


Pre-defined project topics

Project 1: Driving coach

Create an application that analyzes dashcam videos and identifies and tracks objects relevant to the driver, like cars, lanes, signs, etc. Look at an example here. The software should be able to detect/track multiple objects at a time, on a stream of video.

Suggestions for possible video sources/datasets: KITTI / KITTI Road, or YouTube 8M. Consider collecting ~30 different videos for training, ~10 different videos for validation and ~10 different videos for presentation of final performance.

Students assigned to project 1:

Project 2: Where am I? Identification of ND campus buildings

Create an application that can recognize and identify (some) ND Campus buildings. The application should be able to analyze pictures taken from ND buildings, and identify the name of the building. Ideally, the application should be implemented to run on mobile devices, but alternatively, it can run on a computer with pictures from a mobile phone. Optionally, the application can show the location of the building on a map.

Suggestion for dataset: Select three to five buildings, and collect 20-30 at different angles and distances.

Students assigned to project 2:

Project 3: Hand gesture recognition

In this project you will write a program that recognizes a hand gesture (one of a few pre-defined). You can assume a uniform background and a vertical placement of a hand. You will need to implement a segmentation algorithm, propose and select features and do the classification. The final solution should work in real-time for a video stream captured by the camera in your laptop ("real-time" means 1 frame per second, or faster).

Suggestions for possible image sources/datasets: Idiap, University of Padova. Consider recognizing at least 3 different gestures. Collect (or download) at least 20 different samples per one gesture for training, at least 10 different samples per gesture for validation and at least 10 different samples per gesture for presentation of final performance.

Students assigned to project 3:

Project 4: Face recognition

In this project you will write a program that uses a face detection method (e.g., Viola-Jones) and one of the image feature descriptors (e.g., LBP) to recognize your face. Your solution should correctly recognize you when tested on a small set of people, for instance your classmates.

Suggestions for possible image sources/datasets: BioID (preferred) or (if you need more diverse data) FRGC (this comes from CVRL -- ask Walter for details). Or you can collect images of your face + a face of your teammate. If you collect your own data, prepare at least 20 different samples per one face for training, at least 10 different samples per face for validation and at least 10 different samples per face for presentation of final performance.

Students assigned to project 4:

Project 5: Object tracking in outdoor environment

In this project you will write a program that detects and tracks selected object observed in outdoor environment. One example are the planes that approach SBN and can be easily observed on the sky from ND campus. Assume that you are filming an object with a camera that shakes and then you need to detect the object's position in each frame. If you don't like planes, pick an object you like, for instance a car. You can use some pre-trained deep learning models to extract useful features.

Suggestion for dataset: Collect your own object movies, or select them from YouTube (or YouTube 8M). Consider collecting ~30 different videos for training, ~10 different videos for validation and ~10 different videos for presentation of final performance.

Students assigned to project 5:

Project 6: Object-following Raspberry Pi robot

This project is for students who love mixing computer vision, programing and electronics. The aim is to build an autonomous robot equipped with a camera and following the selected object.

This video illustrates an example final result. I have several Raspberry Pi 4 kits (board, small LCD screen, keyboard, mouse, RGB PI camera), and one set of robot car chassis, motors, motor controllers and battery (and I can purchase more, if needed), and I can lend them to interested students for the entire semester. You may need to find other minor elements, such as wiring, connectors, etc. You should go beyond just color-based detection: assume the objects are more complicated are required both color and shape features. Use of deep-learning-based detection is permitted. Collecting your own data set for training and evaluation increases the chances to build a better robot. Use different instances of the same class in training, validation and on-line testing. For instance, if you used ball A in training, use ball B in offline evaluation and ball C in presentation how your robot follows the watches (in general).

Students assigned to project 6:


Custom projects from past semesters

The following projects were formulated jointly by students and the instructor in the past editions of this class, to match specific interest and background skills of students. This list serves only an informative role to demonstrate what students who discussed their individual project with the instructor have been working on. Please do not simply choose from this list (as you can do in case of pre-defined topics). Instead, if you want to try a project that is not listed on the "Pre-defined Project Topics" list, discuss it with Walter before the deadline for the project topic selection. It's important, as all the projects below were customized to various students' skill levels and research interests.

Emotion Recognition from Face Expressions: The project aims to provide a high level of user authenticity in the LMS system throughout the implementation of facial recognition that validates the student submitting the course requirement. Also, the project aims to provide support for the students through emotion detection feature that assists in tracking students’ emotion interaction with the course content.

Shoe brand detector: The project we have selected will be categorizing shoe brands like Nike, Adidas, etc based on the provided images of running shoes.

Swimmer speed detection and stroke count: The goal of this project is to develop a computer vision-based tool that can accurately detect the speed and stroke of a swimmer in real-time. Input would be a video file, that has a swimmer swimming above water a certain distance.

Classification of galaxies: A student will create a model that will categorize galaxies into several classes.

eyeTrack: The project's goal is to develop a system that can accurately detect and track eye movements in real-time. This involves a combination of image processing, feature extraction, and possibly machine learning algorithms.

SettlersEyes: The goal of the project is to design and implement a computer vision algorithm that automatically scores a Settlers of Catan game provided an image of the board.

Denoising Movement-Related Artefacts in MRI Brain Scans: Extension of Neural Networks class project to enhancing MRI brain scans, especially those affected by movement, causing motion blur arfetacts.

Animals Identification: In this project student will implement a computer vision algorithm that will classify animals given their picture.

Neural Style Transfer: The goal of the project is to implement a CV algorithms that takes and image and changes it to be in the style of another image.

Human saliency-based classification of dangerous situations while driving: A student plans on using the CYBORG approach to train a neural network to decide when the car needs to be stopped. Instead of hand-crafting the solution, student will use human saliency to automatically transfer driver's knowledge (about road signs, pedestrians walking or standing near the curb, obstacles in front of our car, etc.) to a neural network during training.

Multimodal Sentiment Analysis of Social Media Posts: The scope of the project is multimodal sentiment analysis on Twitter data. A student will collect the data, which will include both images and associated text, so image-text pairs. The goal would be to identify sentiment based on the image and text of a post.

Classification of a state of the Rubik's cube: The aim of this project is to detect the Rubik’s cube in an image and detect all visible color squares. Ideally, this information will be passed on to an algorithm deciding on the next step needed to solve the cube.

Quantum Approximate Optimization Algorithm for Feature Selection: A student will explore the applicability of quantum computing in simple vision classification problem (at the level of CIFAR-10 / MNIST, or simpler).

3D graph room reconstruction: The scope of this project is monocular 3D reconstruction, supported by the available context (e.g., surrounding walls in a room with a chair, or trees surrounding an animal, etc.).

Basketball coach: The project's goal is to design and implement an algorithm that will be classifying basketball shot attempts as makes and misses.

Tumor Identification and Breast Cancer diagnosis: In this project students will implement a method that will examine breast x-ray images to detect abnormalities, including tumor and/or cancer areas.

Diffusion-based Semantic Segmentation: This project involves application of Diffusion Models in image segmentation.

Gaming bot: The goal of the project is to program a bot that can play a simple first-person shooter game by viewing the screen. The student intends to emulate a human player by computer vision, embedding human-reaction time into the bot as a feature, as well as human error and other human-like characteristics. The PyAutoGUI module will be used to create a program that is capable of sending input to other applications (for example, a game). The program could receive the images from the screen and interpret it to determine what action it should take.

Adaptive Model Motion Tracking and Awareness for rPPG: The goal of this project is to develop an efficient tracking strategy to improve estimation of blood volume pulse from face videos (now quite sensitive to relative face movements).

MyMood: The project's goal is to design and implement an algorithm that classifies face expressions, to be applicable in college students' mental health assessment.

3D Point Cloud Scene Reconstruction / Object Segmentation and Depth Estimation in Videos: Comparative Analyses of classical and state of the art algorithms for Scene Reconstruction (Depth Estimation and Segmentation) which includes monocular learning algorithms SfM-Learner, FRCN, and possibly algorithms for stereo camera data.

American Sign Language (ASL) signs recognition: This project involves developing a computer vision solution that can determine American Sign Language (ASL) signs in real time, and transfer those signs into written form. Essentially, this would be the equivalent of “voice to text” but “sign to text” instead. Students also plan to incorporate dynamic information, found in transitions between frames, to improve the performance.

Understanding the impact of face makeup using computer vision: In this project students will investigate which types of the face makeup impact the most face recognition methods. This will be researched by "removing" (blurring or blacking out) local areas based on saliency maps, and running face matchers to obtain similarity scores. Score distributions will serve as the information for statistical analyses.

Recognition of face expressions: The aim of this project is to detect the face and recognize a facial expression (out of a set of pre-defined expressions).

What type of pitch is thrown by a pitcher: Students will choose 5 different pitches (fastball, curveball, slider, etc.) and download examples of each from the MLB's Baseball Savant website.

Optical Recognition of Written Japanese Characters: This project is a classical OCR applied to written kanji characters. The project will be limited to recognize a subset of characters (20-30) with a potential to be scalable to a full set of a couple thousand of characters, if possible.

Understanding the way face recognition methods process faces with makeup: Student's description: want to look at the classification and saliency of edited faces. What does this mean more specifically? I have a dataset of pictures of faces that I have applied makeup and augmented reality filters to. First, I want to see if the standard facial recognition algorithms (e.g. ArcFace) classify an edited photo of a person as the same person as their unedited version. Next, I want to learn why the facial recognition model does or doesn't classify an edited image and its unedited counterpart as the same person by identifying which regions of the image are most salient.

Classification of x-ray bone fractures: In this project students will design a method to detect bone's abnormality/normality regions. We plan to the MURA dataset in this project: https://stanfordmlgroup.github.io/competitions/mura/.

Transfer Learning for Computer Vision with Psychometric Data: In this project students will investigate which parts of the psychophysics-trained visual models are transferable across various neural network architectures.

Soft Robotic Pose Estimation: My computer vision term project will be to estimate the pose of a soft robot that I’ve built in order to close the loop for feedback control. Soft robots are robots made of complaint materials. With more flexibility, they are well suited for applications that are poor fits for rigid bodies. However, with this extra flexibility it becomes much more difficult to estimate the pose of the robot. Because soft robots have effectively infinite degrees of freedom, the estimate for the robot shape is the location of the robot’s entire backbone. In reality, it is impossible to actually measure the entire backbone’s position. But if we find a sufficient number of points along the backbone’s arc length, then a continuous estimate of the backbone can be obtained.

Aesthetic beauty of art pieces: In this project, students will design a method to classify the aesthetic beauty of various renowned art pieces. The hypothesis is that a trained machine can "understand" features found as commonalities within universally accepted "beautifully" artwork.

Scene description with Transformer Language Model: In this project, students will use a transformer model in image captioning pipeline to generate descriptions of scenes seen in pictures.

Recognition of chess game state: Given a video of a chess game being played from a top-down view, the model should identify the position of each piece and update the positions whenever a piece is moved.

Lip-reading service: The aim of this project is to develop a solution that recognizes words by observing lips in videos. This solution may later augment speech-to-text services by adding visual information to the audio channel. This "lip-reading" service will assume that the speaker's frontal face is fully visible. One team member will develop one model, and the second teammate will develop the second model, which will be later combined into an ensemble classification. Database to be considered in training: XM2VTS. For test purposes, own data should be collected.

An interactive mixed reality environment: Object trackers will be designed and used for identifying and positioning objects. Students will also add some sort of character or shape recognition so that what users write or draw would be interpreted by the system and determine what types of objects are displayed and how they interact. Pre-trained object detectors will be used as baselines (not the only/final solution) to be compared with the designed and trained object detector(s).

Investigation of possible gender bias in face recognition: The aim of this project is to improve our understanding of why face recognition accuracy is different for females and males. This analysis will be based on experiments in which key features of the face (eyes, nose, ears, etc.) will be selectively "removed", impostor and genuine scores will be calculated for these altered images, and compared. That should provide some understanding which face features are the biggest drivers of gender-specific variability of recognition accuracy.

Calorie prediction from images of food with human saliency: Given an image of food, we are investigating the feasibility of training with human saliency using a modified CYBORG framework alongside image segmentation.

Tumor Identification and Breast Cancer Diagnosis: In this project students will implement a method that will examine breast x-ray images to detect abnormalities, including tumor and/or cancer areas.

Detection of spoofed CT lung scans: The goal of this project is to train a deep learning model to detect lung CT scans altered by modern GAN tools ("deep fakes"). If time permits, location of the artifact will be implemented, for instance via Class Activation Mapping mechanism.

Face attributes: For this project, students will create an algorithm, which predicts several face attributes (e.g., hair color) from a face image. CelebA will be the dataset used in this project.

Body pose recognition in ND sports videos: The goals of this project are: (a) run a few pre-trained models estimating body skeleton (e.g., Detectron2) to narrow down one or two good candidates, (b) build the annotation tool and annotate a small chunk of ND sports videos, (c) evaluate models that won (a) on data prepared in (b) with the aim to find skin regions useful for rPPG, and (d) propose and implement improvements in either speed of accuracy to make the model(s) close to real-time operation.

Prediction of shutter speed, focal length, ISO (sensitivity to light) and aperture from a photo: The goal of this project is to build a model predicting detailed photographic parameters of a photo. The most important parameters are the shutter speed, focus length, ISO (sensitivity to light) and aperture. The ground truth data can be taken from EXIF. The model will use only a photograph to estimate these values without an EXIF data.

The "OCR calculator": Our project will be a basic calculator that can recognize handwritten numbers as well as addition, subtraction, multiplication, and division symbols. When the system sees an equal sign and a corresponding number, it will check the equation for correctness. For example, if someone wrote 2 + 2 = 4, the system would say that the solution is correct. If someone wrote 2 + 2 = 3, the system would recognize that the solution is incorrect. Finally, if the program sees an equal sign without a corresponding answer, it will solve the problem for the user. This system/program could be used to help elementary/middle school students practice their basic arithmetic or be used as a grading tool for teachers.

Classifying piano notes and chords via a webcam: The goal of the project is for a webcam to recognize which notes are being pressed and be able to replicate the sound purely based on what is being observed visually. It will be done close to real time so the program will keep up with the actual playing of the keyboard.

Cell segmentation: The aim of this project it to build a cell segmenter using the "active learning" (a.k.a. "curriculum learning" or "reactive learning") approach, in which small amount data is used for initial training of the model. This model is then evaluated (possibly with human in the loop) to correct its decisions and provide more training data for another round of training. This process it repeated until the satisfactory accuracy is achieved.

Human recognition in case of severe body occlusion: In emergency scenarios, like people drowning, it is not possible to have the whole body exposed to a camera. Instead, selected parts (usually legs or arms) are only visible. The project goal is to build a system that would support rescuers in timely detection of subjects drowning in rivers.

How trustworthy is your computer vision system? The aim of this project is to build a simple "trustworthiness" metric based on observing reactions of deep learning-based models to random / out-of-class inputs. A trustworthy model should not produce nonsense for nonsense inputs -- it should either decline to make a decision, or inform about low abnormality of input samples. That results in high trustworthiness. If it does produce nonsense, its trustworthiness is lower. The goal of this project is to provide a method of quantitative assessment of such behavior.

Explainability of deep learning-based emotion detection models: The aim of this project is to apply "deep dream" and GradCAM techniques to understand how CNNs detect emotions in face images (we will focus on Ekman's categorization of emotions). That is, where these models "look" (GradCAM will help here), and what global features are responsible for a given categorization ("deep dream" will help here). This project assumes using both existing emotion detection models, and training own CNN-based model.

Which face elements are the most important in human face recognition? The aim of this project it to build a method for human face alterations (making eyes bigger, or changing the skin tone) and then run an example face matcher to assess which elements of the human face impact face recognition the most.

Weather prediction in drone videos: The aim of this project is to classify the weather conditions (into several classes, such as Sunny, Cloudy, Snowy, Foggy, etc.) in drone images, with the aim to aid the drone to avoid bad-weather areas when flying from point A to B.

3D segmentation of spinal MRI images: The aim of this project is to build a 3D segmentation algorithm estimating 3D locations of intervertebral discs in MRI volumes. Pre-trained 3D Convolutional Neural Networks, fine-tuned on new data, are planned to be used as end-to-end segmentation models in this project.

License plates detection: The aim of this project is to design an algorithm that detects a position of all license plates in a 2D image. In testing stage, the student will use his own pictures (taken at a parking lot) to validate the performance of the algorithm on data not used in training.

Photorealism in generative models: This project aims at introducing (a) measures to assess photorealism in style-transfer-generated images (with an option to conduct human perception experiments) and (b) include them in style-transfer GAN pipeline.

Cell segmentation: The aim of this project is to design an algorithm which will segment the cells within the image, and — based on the segmentation results — will calculate selected geometrical properties of the cells. A "research component" in this project is to try several segmentation models (for instance, one deep learning-based, and the second learning-free), compare their performance, and propose a fusion strategy (to use the strengths of both models).

Guitar chords classification: The goal of the project is to analyze the pictures that contain the left guitar player's hand and recognize the corresponding chord in real time.

Texture classification in agricultural aerial images: The aim of this project is to classify areas on agricultural pictures into several categories: normal, cloud shadow, double plant, planter skip standing water, waterway, weed cluster. The student will use the existing database for this project: https://www.agriculture-vision.com/dataset.

Mask detector: The aim of this project is to design and implement an algorithm to determine whether or not someone is wearing a mask. Two different feature extractors (one per teammate) will be created: one deep learning-based, and the second learning-free. A team effort will be to compare those approaches and propose a fusion that improves the overall accuracy.

Psychophysics-based losses for improved open-set image classification: The goal of this project is to improve image classification performance of DenseNet using OpenSet/Psychophysics-based losses. A set of novel loss functions that leverages psychophysics (human reaction time or performance) will be proposed and validated (both teammates will work on different loss functions and will do a comparison of the results at the end of the project). Data planned to be used: ImageNet + human annotations for 80 classes, and (optionally) Open Images Dataset V6 + Extensions (https://storage.googleapis.com/openimages/web/index.html).

Automatic labeling of fingerprint videos: The aim of this project is to develop an OCR (Optical Character Recognition) algorithm to detect labels put next to the fingers filmed by iPhone (a part of a larger system of contactless fingerprint recognition). A real-world complication is that labels show up and disappear in different moments, so the final decision must be based on an ensemble of classifiers processing various parts of the videos. This solution should allow labeling a large database of finger videos collected from 400 subjects (approx. 30,000 videos).

Vision-based control of the Super Mario Strikers, a soccer-based video game: The system will analyze frames of footage and track significant elements, such as the locations of both the teams' players, the ball location, the goal locations, and field elements such as the sidelines and center line. In addition to tracking location, the system will also track the states of players, such as whether they are stationary, running, or sprinting. Using all this location and state information, the system would create a simplified internal representation of the game state. Once the system can determine game state from a frame of game footage, another system will be trained using reinforced learning to make a decision based on the game state, effectively creating an automated player.

Pulse rate estimation from webcam videos. The goal of this project is to design and implement an algorithm that will detect a pulse rate from face videos acquired by a webcam. It includes face detection and selection of best face regions to estimate the pulse. A new database of face videos needs to be collected and should include videos from at least 20 subjects. Training and testing should be subject-disjoint. The ground-truth pulse rate will be collected by an oximeter and compared with the estimated rate. Ideally, the system should display a pulse rate for the webcam video stream in real-time.

Age prediction from face images: The goal of this project is to design and implement an algorithm that will estimate a person's age given a face image and information if it is a female and male face. To check possible dependence of the designed algorithm on gender, two types of evaluations will be carried out: (a) same-gender: system trained on women images only will be tested on women images only, and system trained on men images only will be tested on men images only; (b) cross-gender: training and testing images will contain faces of subjects of different gender. Teammates will collectively discuss the observed accuracy and identify the reasons for possible lower performance in case of cross-gender evaluations.

Seam carving: Seam carving, or content-aware image resizing, is a technique that allows image retargeting (displaying them on media of various sizes) by removing "unimportant" image areas and using only the "important" ones. This is done by introducing or removing seams. In this project student will apply object segmentation and recognition to generate a new kind of content-aware "energy" function that is used in content-aware image scaling.

Frisbee game visual analysis: The aim of this project is to design an algorithm that recognizes the players (potentially specifying the team) and the game disc in Ultimate Frisbee game videos. One of the team members will work on using classical features (HOG — Histogram of Oriented Gradients, corners detectors, correlation filters, color-based features, etc.) to detect players, and the other teammate will work on using deep learning-based features (e.g., CNN fine-tuned for detecting players). These two solutions will probably end up with different results. Teammates will work together to (a) identify upsides and downsides of two approaches (each teammate can offer a constructive critique to a colleague) and (b) integrate responses from your two different classifiers to get more accurate tracking results.

Drowning detection system: This project aims at developing a component of a drowning detection system, which has to detect and recognize swimmers. The camera will be placed underwater to collect videos of swimmers, later to be used to train a vision system to recognize swimmers underwater. Multiple videos (at least 30 for train, 30 for validation and 30 for test) for multiple swimmers (at least 3) will be collected for this project.

Recognition of face expressions in multispectral videos: The goals of this project are (a) to collect visible and thermal (LWIR: long-wave infra red) videos of people making various facial expressions, and then (b) create a system that can classify those expressions.

Classification of skin lesions: The aim of this project is to design an algorithm that will classify images of skin lesions into 7 classes. The HAM10000 dataset w/10015 images of skin lesions will be used as training. All images are colored with the lesions centered. Two separate datasets will be acquired, as a part of this project effort, to validate (Part II) and test (Part III) the generalization ability of the designed method. Teammate 1 will experiment with ResNet-50 as the main workhorse feature extractor. Teammate 2 will experiment with DenseNet as the main feature extractor. Feature-level or score-level fusion will be investigated in Part III to increase the classification accuracy.

Unconstrained text detection in meme images: The aim of this project is to build an algorithm that detects texts in memes. The complexity of this task comes from the fact that fonts and style of such texts are hard to predict, so the solution will need to generalize well on unknown types of texts.

Turning Sketches into Web Components: The project's goal is to be able to recognize and accurately classify hand drawn sketches of common UI components found in Web Applications. For example, a designer could draw up a high level layout for Sakai and I should be able to have a list of what components are needed (side bar menu, several buttons, tables, images, text, etc...) and their general location. The end goal of this classification would be to automate the creation of the boiler plate code needed for simple web Applications.

Optical Recognition of Chinese Characters: This project is a classical OCR applied to old printed Chinese. 18 first chapters of the analects of Confucius will be used as training data, and the last two chapters will be used as a test data (at least 100 pictures). Closed set of characters, found in the training data, will be only considered on this project. Hence, if the algorithm finds a character that did not exist in the training data, the "unknown" class label will be returned. Database used on this project: https://ctext.org/library.pl?if=en&file=84798&page=13.

Credit card autofill: The aim of this project is to develop an algorithm to detect a credit card in a picture and to read the credit card number. While the solution can be developed on a database of credit card images acquired on a uniform background, the final solution should be tested in unconstraned environment. The reasons of (possibly) worse performance should be explained and appropriate updates to the designed method should be discussed.

Brand recogniton from labels: The aim of this project is to automatically detect and classify brands of food manufacturers (limited to 4-5 different brands). This will be done after taking pictures of different packages showing labels with brand logos (at different angles, in different scale, etc.).

Automatic recognition of workout data: The goal of this project is to capture workout data from a Concept2 erg. This data can then be automatically uploaded to a workout log. This project is based off of ErgBot.

Active contours for refining deep-learning-based segmentation: In this project, a deep-learning-based method will be used to roughly segment the objects from the background. Then the active contours will be applied to refine the segmentation results. This project will require finding a suitable energy function for the problem.

Historical document OCR: A "Recurrent U-Net" will be developed in this project that passes letter predictions from U-Net into a bidirectional LSTM to pull word predictions from there. A similar strategy was used in this paper: https://arxiv.org/pdf/1507.05717.pdf.

Presentation Attack Detection in iris recognition: The project will be a binary classifier that states whether an iris is live or a spoof based solely on the textures in the image. This tool will make use an existing database and will not need to consider other influences of spoofs such as pupil glare or identification. The tool that will be employed to develop this will be the Python OpenCV library.

Transformation Aware Distance Metric Learning: This project aims to develop an approach to improve descriptor matching for applications reliant on image transformation analysis. It will highlight the drawbacks of transformation-invariant deep networks for specific applications such as image forensics and try to improve upon them.

Deep learning in semantic segmentation of iris images: The aim of this project is to use CNN-based models (including a CNN that performs dilated convolutions) to solve a semantic segmentation task. The use case is iris recognition and its first step: image segmentation.

Automatic learning-free neural image sparse segmentation and reconstruction: Microscopic image data (microCT X-ray, sSEM, SCoRe, etc.) is difficult to properly segment due to a lack of ground truth annotations for training learning models. This project aims to improve upon the quality of reconstructions generated by classical (learning-free) image processing techniques by developing an optimization technique that adapts to local image conditions and iteratively selects the highest-quality segmentation.

CAPTCHA solver: The aim of this project is to write an algorithm that recognizes CAPTCHA letters/digits from a few different CAPTCHA sources.

Recognition of cities based on skyline pictures: The aim of this project is to write a program that accepts a picture with a skyline and outputs which city the picture was taken in. A small set of cities, say 10, will be selected to reduce the difficulty of the problem.

Recognition of numbers in lottery tickets: This project will focus on automatic detection and recognition of numbers on various lottery tickets. Tickets from at least three different types of lotteries will be used. A nice complication for this semester project is that the program will not be looking at a particular region for numbers.

Offsides detector for football: In this project an automatic offside football detector will be designed. The input information will be a short clip with a football game seen from a tribune perspective. The goal of the system will be to detect appropriate objects (players and a ball), classify players from different teams, recognition of the game state, and detection if and when the offside happened.

Classification of shoe brands: The aim of this project is to build two classifiers (deep learning-based, and learning-free) that categorizes a shoe brand given a picture of a running shoe. Classes considered now are Nike, Adidas, Under Armour, and potentially others. Team members will compare their classifiers and will propose a fusion strategy to increase the confidence of final decision.

Maritime object detection and classification in global SAR scenes. The goal of this project is to identify the maritime objects in SAR images. For each object, estimate its length and classify it as vessel or non-vessel. For each vessel, classify it as fishing or non-fishing. The ambition of this project is to make a submission to the "dark vessels detection" challenge: https://iuu.xview.us/challenge.