- Who is the data set about? The data set is comprised of facial photographs of random people, male and female, 0 < age <= 116, that is between the newborns and 116 years old, across an ethnicity range of 5 possibilities. Who were sampled in this data set? The dataset in this exercise contains data from UTK Face Dataset (https://susanqq.github.io/UTKFace/) which is also available in Kaggle at https://www.kaggle.com/jangedoo/utkface-new . The dataset is represented by the face images of people with long age span (0 – 116 years old). One image contains the face image of one person. The people whose images are shown are the customer of the clothing store mentioned in our domain specification. Who were over sampled or under sampled? People between the ages of 20 and 30 were over sampled in the data, and people between 10 and 19 were under sampled. Also There seems to be an oversampling of newborns, age 0. The Gender distribution was also skewed with men being over represented. Finally the ethnicity distribution also had the first category (0) over represented with the fifth category (4) being under represented. There are no identifiable information about whose photo is part of the data set. There are no name or any other personal information associated with the images. The images have 3 data points associated with them, gender, age and ethnicity.
- What events, activities, behaviors, and observations etc. are recorded by the data set? There are no events, activities, or behaviors recorded by the data set. The data set does record the observation of the person age, gender and ethnicity. The data set does record the observation of the person age, gender and ethnicity.
- When did the event, activity, behavior, and observation, etc. take place? No information is supplied in the Kaggle challenge page regarding the timeline of the data collection. However, the data has been simplified by the sponsor of this challenge in September 2020.
- Where did the event, activity, behavior, and observation, etc. take place? It was not indicated where the test data was collected from. The test data appears to be a collected of various facial images which have been cleaned and had preprocessing tasks performed on the data. The test data does not contain geographical information.
- Why There was no indication as to why the facial images were initially captured.