Data helps in discovering trends, describing events, documenting achievements, improving performance, controlling outcomes. However, all data is not equal. In the growing flood of data, you may reasonably ask, “So, what type of data would prove worthwhile?” “What data might be most cost-effective and useful?” Even more important, “What data is reliable and trustworthy?”
As you begin to discover data, there are many categories, including internal data, external data, open or closed data, structured and unstructured data, big and little data.
In her “Smart Data Collective” blog, Michele Nemschoff, talks about “7 Important Types of Big Data”. Nemschoff focuses on external data, defining structured and unstructured data. Structured data (basically database data) falls into 5 categories: created (e.g. surveys, loyalty programs), provoked (e.g. customer ratings), transacted (e.g. sales transactions), compiled (e.g. credit scores, demographics), and experimental (the result of testing to discover outcomes). Unstructured data (essentially everything outside of database data, such as filing data, stored objects, etc.) can be: captured or user-generated. Due to the often media-rich files (pictures, music, movies, and x-rays), unstructured data requires many times more storage space than structured data, which is usually text data or numbers.
Two categories of data that make life easier, not only for analysis and management, are metadata and open data.
Metadata is data about data. Metadata helps you find things, and helps others find information about your work. Some examples of metadata are:
- Book metadata: title, author, publisher language.
- Music metadata: genre, composer, location of recording, label.
- Dataset metadata (census data): date, area type, theme.
A terrific little video, “A beginners Guide to Metadata,” put out by EdinaDatacentre, shows why everyone should be aware of and add metadata. Add metadata because it makes citation easier, clarifies who the creator is, meets requirements of funding bodies, and helps save time and money, among other benefits.
The Open Data Handbook states that open data is data that can be freely used, reused and redistributed by anyone – subject only, at most, to the requirement to attribute and share alike. The focus is on non-personal data, that is, data that does not contain information about specific individuals. The interoperability of open data dramatically enhances the ability to combine different datasets together.
Describing Nine Types of Data and How They Should Be Used, Jerry W. Thomas asserts that “little” data, such as small relevant samples, can often serve decision-makers in predicting trends and outcomes, as well as “big” data.
- Experimental data – carefully designed and controlled experiments.
- Survey research data – scientific research studies for which data is often experimental; precise survey research data is obtained through research design, normative data, mathematical modeling, stimulus controls, statistical controls, historical experience, quality-assurance standards.
- Marketing-mix modeling data – can often be designed to be predictive and forward-looking by creating alternative futures as the basis for survey research.
- Media-Mix modeling data – as with marketing mix, only different variables.
- Sales data – mostly measures actual sales, but due to other factors is not reliable for cause and effect, including advertising effectiveness, product quality, productivity, competitive influences.
- Eye-tracking data – technology facilitates collecting data that supports understanding why a package, sign, website, advertisement is failing to register certain messages or images.
- Biometric or Physiological measurements – this data is a future trend; for now good at tracking arousal, but no precise way to determine whether positive or negative without survey or qualitative research also.
- Communities or Advisory Panel data – can be randomly selected members or members who “opt in”; panels can be real or virtual, and provide a broad range of data, from survey responses to feedback on new product features.
- Social Media Data – data is massive, real time, inexpensive, and can be an early-warning system for a negative PR crises or of some expected aberration. This data is highly influenced by a variety of environmental factors. Without the exact source, the context, the stimulus, or the history that underlie the data, the research value of the data can be risky.
Most data is historical or backward-looking data (tracking data), including financial, sales, behaviors, weather, inventory. This means that analysts look at the past to see trends and predict the future, or to report on past performance or achievements. The more the data collection process follows carefully established research methodology and protocols for control and objectivity, the more trustworthy the data is.
Also, most data can be a trustworthy indicator of what happened, but not why it happened or the forces that influenced the outcomes. The more the data environment is influenced by external forces, such as economics, political-legal events, pricing disturbances, competitive pressures, patient behaviors and physiology, and other factors, the less the data will be useful for understanding why and how. Surveys and qualitative research can provide more precise determination of causal factors.
So, why might this matter to you? Even small amounts of the right data, collected and analyzed effectively, can provide insight and improve performance for non-profits and other organizations.
Datassist is consistently on the leading edge of making sense out of your data to get results, providing real-world answers to your unique questions.