Semester project (Part 2): Data acquisition and preparation

Due Feb 28, 2025 by 11:59pm Points 100

We need appropriate data to start actual development of the project solutions in the next phases. The dataset you will collect needs to be split — in general — into three subsets: one for training, one for validation, and one that is an "unknown" subset for testing (and you can think about 60% / 20% / 20% as an initial split). As a reminder: you will use the training set to find your best solution (parameters of feature extraction and classification). You will use the validation set to check how this interim solution works on new data not used in training (this prevents your classifier from overfitting). Finally, you will use the "unknown" test subset to evaluate your final solution (so don't touch this data until the final evaluation and report). This phase of the project aims at collecting at least the training and validation partitions.

What to do and deliver?

Acquire data that you will use in training and validation. Depending on your project, you may either take photos/videos with your own camera, or download one or several existing (publicly available) datasets for this purpose. If you are able to acquire also the "unknown" test partition now, please do so.
Prepare a short description of a database that you acquired (no need to upload an actual database into GitHub or a Google Drive). Push the report (under "Part 2" section in your readme.md) to your semester project repo by the deadline. Include these elements into your report:
- source (download link and associated paper(s) offering the dataset(s) if applicable, or indicate how you acquired the images)
- differences between the train and validation subsets, which you think are important from the point of view of your project
- number of distinct objects/subjects represented in the data, number of samples per object/subject (if applicable)
- brief characterization of samples: resolution, sensors used, illumination wavelength, ambient conditions, etc. (whichever applies)

Note 1: By submitting this report you attest that you physically downloaded the data that your report presents. Don't submit a report presenting data that you plan to acquire (plans were part of the conceptual design, now it's time to do things). If you are pursuing a customized project and your data collection is complicated (e.g., the University has to execute the license agreement), please discuss it with Walter before the deadline — we will find an individual solution.

Note 2 (for teams): Remember to explain in your report individual contributions of each team member. This is needed to have this assignment graded.