Why and how we automate the data cleaning and preparation
You've probably seen that AddMaple summarizes your data files in less than five seconds, giving you tables and charts for each column. If you're wondering how this is possible, we'll try break down some of the main automation tasks we carry out for you. Ultimately we believe that the more time, energy and curiosity you save during the initial data prep, the more reserves you'll have left for interpreting the data, writing your report and communicating your insights. Hence why we prioritize automating this part of your workflow, so you don't even know what happened!
Data Type Detection
Like Sherlock, we're detectives looking for clues in your data so that we know with a high degree of confidence what type of data each column contains. This is done programmatically and is quite involved, visit this page for a list of data types we detect.
In short, we automatically or automagically detect:
Numeric data such as salaries, ages, product quantities sold and so on. With numeric data identified, we determine the highest and lowest values (maximum and minimum) and help you create bins or buckets with an even spread of data. This is why you would see a histogram waiting for you on the dashboard. These columns are labeled as NUM and are green. We also detect Currencies and Percentages - they show up with specific icons in our interface but you interact with them in the same way as our numeric columns.
Categorical data such as answers to a multiple choice question. We detect when columns only contain one tag, for example, the column titled 'Country of Birth', would only contain one country for each record in the column as a tag. We also detect when a column contains multiple tags in one column, for example, responses to the question, 'Select all supermarkets you visited last month' would contain more than one supermarket, with each record having a different combination of supermarket tags. For these tricky columns, we give you a count of all the supermarkets, so you can see how many times each supermarket was selected. We also allow you to filter by one or more supermarkets, to see all related supermarkets for the ones selected.
How can you tell if a categorical column in your dataset contains one tag per record or multiple? We help you differentiate between these categorical columns with colors. Single tag columns are turquoise and are labeled as MC for Multiple Choice. Note this is applicable outside surveys, this is used for any dataset containing one tag. Multi-tag columns are blue and are titled Multi-Select, and they tell you that there are two or more tags per record in that column.
Dates and timestamp such as review date and time. We detect dates and time in a wide range of formats to support the most commonly used structures. This is useful for error logs, events captured by your servers, the dates and times a respondent submitted a survey and so on.
Duration columns such as how long a user interacted with a feature. We detect this column type as a type of numeric value, where records are binned into duration ranges according to the max/min duration values in that column. If you have duration data, you'll be able to see how many records fall within the same duration range so you can see how many records fall within each duration bucket for easy analysis. You'll then be able to filter down by a range to understand more about those who did something for a longer or shorter time period for example.
Messy data such as Empties within a column, N/A within a category/ numeric columns and even mid-dots aka interpuncts ( · ) some people love to insert in response to open-ended questions. We detect these and more, and group them off for you, for smooth onward analysis, so that you don't have to first 'clean' columns containing these irregularities. Many data analysis tools require you to remove N/A records within numeric columns for example, we handle this for you. Regarding missing data, you would usually need to replace these records with placeholder text such as 999 or BLANK, etc but there is no need to do this in AddMaple because we automatically detect and group off data like this for you. If text columns contain some records with a single mid-dot, we group them off with the empty records so that you can proceed to analyze the text.
Opinion scales or Likert scales show up in pink. We detect both text and numeric variants, for example you may have categories such as "Strongly Agree, Agree, egc." or just numbers between 1 and 7. AddMaple creates special Likert charts for this type of data and groups together related columns right from the dashboard.
Text columns show up in green and area ready for you to explore with interactive word clouds. We also support AI powered thematic analysis allowing you to convert your open-ends into quantifiable and explorable data.
All of this detections happens instantly, even on large datasets.
Tag Detection
Dealing with tags or survey questions that can have multiple responses to the same question is difficult to do in spreadsheets.
This is where AddMaple shines. We automatically detect this data type whether the data is separated by commas (,), semicolons (;), colons (:), or pipes (|). In a spreadsheet a common approach of dealing with this data would be to separate it using formulas into mutiple columns - but that makes filtering and pivoting much harder.
In AddMaple the data stays in the same column and you can easily explore, filter and pivot.
Instant Summaries
After a data set has been loaded and the data types automatically detected, AddMaple produces instant summaries of each column. This enables you to see an overview of your data at a glance.
Because the column types are automically detected, we can display different summaries for each data type.
Number Bucketing
AddMaple performs intelligent bucketing (binning or grouping) of numeric data.
Our algorithm handles large, small and negative numeric ranges. The buckets that we create are rounded to sensible values and are not distorted by outliers.
As filters are applied the buckets are recalculated in an instant making it easy to dive into a particular range.
The below data was imported as raw numeric values, but AddMaple automatically grouped it into ranges.
Statistics
AddMaple automatically performs the appropriate statistical tests for you as you explore your data.
When you expand a column, we compare that column against all others in your dataset to find those that have significant relationships. This features saves a lot of time and helps you uncover hidden insights in your data.