We need appropriate data to start actual development of the project solutions in the next phases. The dataset you will collect needs to be split — in general — into three subsets: one for training, one for validation, and one that is an "unknown" subset for testing (and you can think about 60% / 20% / 20% as an initial split). As a reminder: you will use the training set to find your best solution (parameters of feature extraction and classification). You will use the validation set to check how this interim solution works on new data not used in training (this prevents your classifier from overfitting). Finally, you will use the "unknown" test subset to evaluate your final solution (so don't touch this data until the final evaluation and report). This phase of the project aims at collecting at least the training and validation partitions.
What to do and deliver?
Note 1: By submitting this report you attest that you physically downloaded the data that your report presents. Don't submit a report presenting data that you plan to acquire (plans were part of the conceptual design, now it's time to do things). If you are pursuing a customized project and your data collection is complicated (e.g., the University has to execute the license agreement), please discuss it with Walter before the deadline — we will find an individual solution.
Note 2 (for teams): Remember to explain in your report individual contributions of each team member. This is needed to have this assignment graded.