UFood is a fictitious food delivery company that operates in over 1000 cities in a major country.
They have tasked me with exploring a dataset to provide insights on the data, specifically to propose and describe a customer segmentation based on customer behaviors, to visualize the data, and provide reasoning behind the discoveries.
Let's dive in.
The CSV was read in and duplicates were removed. At this point some of the columns were not in a very usable state so they had to be manipulated. One example of this is these marital status columns.
These would be much more usable in 1 column with a word that describes their status. Here is the code that can achieve that.
So now instead of 1s and 0s, we have this:
A similar type of manipulation was done with a couple other categories, including level of education and the number of accepted campaigns. With the columns prepared, we could now try some segmentation of the data.
A quick sort of the values showed that the age ranges from 24-80. The age data was stored as an individual number so it would be easier to use if it were segmented. The following code does just that.
A quick chart showing the breakdown of customers' ages shows that the core audience of the service is between 31 and 70 years old.
Breaking this down further into how much they spend, including a breakdown of those who accepted campaigns (lower chart), is very helpful in knowing where to guide future marketing efforts.
By taking a look at the campaign acceptance per age group, we are able to see who is most likely to accept campaigns and we can see how that compares to the amount spent per age group.
This graph shows that the 23-30 and 71-85 age groups are more likely to accept campaigns, even though they spend the least per age group for accepted campaigns. Curiously, the 41-50 age group accepts the fewest campaigns even though they spend the most per age group for accepted campaigns.
Next, we took a look at the breakdown of where the purchases were made, either on the web, from a catalog, or from a store.
The same was completed for those who accepted campaigns. There appears to be very little difference whether people accept campaigns or not, in regards to where they make purchases from.
Purchases from stores and the web outperform those purchases from catalogs, so we can recommend to focus on these mediums as opposed to catalogs.
Next, we found some interesting insights with regards to the number of children in each family.
The trendline on this regplot shows that the amount spent on purchases decreases as the number of children in the family of the purchaser increases. Similarly, the number of campaigns accepted goes down as the number of children in the family increases.
And finally, we can look at the total spending per marital status to determine which category of people spends the most money.
We can see that married people spend the most, followed by single people and then people in relationships. These segments can be focused on in future marketing campaigns.
In conclusion, we have segmented the data in a way that offers some insights into who is spending what in which place.
Age 30-70 spends the most money but is the least likely to accept campaigns.
A focus on customers with no/fewer kids, especially those who are not divorced or widowed, who spend money in stores will yield higher spending.
An opportunity for growth could be users who are 23-30 and 71+, as they are the ones who had a higher campaign acceptance rate.