The AI Hierarchy: crafting a successful AI strategy from the ground up

Part III: Data Collection - The Essence of AI

Illustration of birds flying over a graphIllustration of birds flying over a graph
Table of Contents

Itinerary

Mapping the entire world

In 2007, Google's founders -Sergey and Larry-  were experimenting with researchers at Stanford with a vision of “searching the whole world.” They had heard about camera tech that could take continuous photos and be stitched together. To test this theory they messily strapped some cameras to an old hippie van. They began to drive their camera strapped van around the Stanford campus and Palo Alto to see if they could validate their image capturing hypothesis.

The tech proved to be successful and with it they had discovered they could unlock their vision of “searching the world” by strapping cameras to cars and driving around the world. This new data collection strategy has given Google a decade plus of new innovations and resulted in significant customer demand. First they released Street View where they allowed customers to explore an uncanny view of city streets and sights. Even without an application of data analytics or machine learning, the team at Google found that meticulously collecting a unique dataset could help drive user engagement and use.

As Google's capabilities and the volume of Street View imagery grew, the company saw an opportunity to derive greater value from this extensive data collection. By applying deep learning techniques to their massive image data, Google began to identify and catalogue various elements within the photos – from street signs to business names. This progression from simply capturing images to extracting actionable insights significantly enhanced the overall Maps experience, making it richer and more useful for end-users.

Collecting unique data points and refining that raw data into well-structured and high-quality datasets is essential for unlocking the full potential of AI and machine learning technologies. This story prepares us for our discussion today. With a focus on the data collection first, you unlock opportunities to drive value for your users well before the deployment of your first algorithm. 

The Data Centric Approach

Over the past decade as Machine Learning and AI have re-emerged from the “AI Winter,” the industry has thought more data is better. Organizations have been paralyzed by the need to collect massive datasets on par with the petabytes available to Google, Facebook, and Amazon. While it’s true that you can’t build ML models without a large dataset, we’ve learned from Andrew Ng that ‘large’ might be a bit smaller than we thought. 

Ng, a pioneer of deep learning methods and co-founder of Coursera, has begun to teach the “Data-Centric AI Approach”. In this approach, you start at the base of the AI Strategy Pyramid not just with the goal of collecting substantial amounts of data, but with the goal of ensuring the highest level of quality of this data. Over the course of today’s blog I hope to help you understand what data collection looks like and also what it really means to have ‘high quality’ data. With this, you’ll be able to de-risk the investment in your AI strategy by ensuring a high quality data flow, giving a much higher chance that your model's output closely represents reality, and ultimately achieves your business goals. 

What is data?

To understand what the data-centric approach to AI looks like we must first understand what data actually is. Data is information. In our everyday lives, data surrounds us. We collect data visually (reading road signs), through sound (music), scent (freshly baked bread), and touch (the texture of sand). But how do computers collect data? Early in the 20th century, humans began to experiment with encoding our senses into machines using electrical signals. This experimentation resulted in the concept of the bit. A bit in the simplest sense is just a measure of whether electricity is flowing or not flowing. Electricity is flowing (represented as 1 bit) or electricity is not flowing (represented by 0 bits).

Data is just electrical signals that are trying to represent the reality that our human senses perceive. We capture images using cameras, sound using recording devices, and we have sensors for capturing pressure and temperature. When we talk about data quality, we are talking about how accurately we can go from the richness of the world to a reduced reality of 0’s and 1’s.

How do we collect data?

Let’s expand a little bit on how we actually capture data and encode it into computers. We primarily do this in four ways:

  1. Sensors: Devices like cameras, microphones, thermometers, and pressure sensors capture specific aspects of the environment, generating data streams in real-time.
  2. The Internet of Things: A subset of sensors, the internet of things refers to the internet connectivity of our everyday appliances and devices and how these devices send data across the internet. (eg smart thermostats)
  3. Input Devices: Keyboards, mice, and touchscreens allow humans to directly input data into the system.
  4. Cookies: Cookies help companies track user behaviour across the internet and this data can be used to understand their behaviour and personalise different experiences. 

Focus on qualitative vs. quantitative data quality

Now let’s return back to Andrew Ng and his data centric approach to AI. With our understanding of how data is captured and encoded into machines, we can now better grasp what it means to have a high quality data set. When measuring temperature or pressure, this means making sure that each measurement captured is as accurate as possible and there aren’t any misreadings. This case, however, is much easier than qualitative data capture. Not easier in the sense of technical implementation, but because when we start to capture qualitative measurements, consistency is extremely difficult across different samples. 

Let’s look at an example to illustrate why this is difficult. Let’s imagine a manufacturing facility. This facility wants to use computer vision to determine if a part on the line is faulty. To do this, they need to capture these images and ‘label’ them with a faulty/not faulty tag. This information can then be used to train an algorithm on whether or not each part is faulty. How would we actually achieve these labels? It would probably have to be a human with lots of experience spotting defects in these specific pieces of equipment. 

So we can imagine a manufacturing worker sitting in front of hundreds or thousands of images and inspecting them to see whether or not they think that the part is faulty. We then also imagine that as the coffee runs out and the day drags on, the consistency of labelling may degrade and there may be a few misses. If the dataset isn’t consistent, the algorithm will learn these inconsistencies and have inconsistent results. This company will have invested their money creating an algorithm that works, but works poorly, making it more difficult to see an adequate return on this investment and improvement in their operations.

Getting from A to B - sending data to storage

Now that we understand the substance of data as electrical signals represented by 0’s and 1’s, we need to then understand how those signals get from the sensing device, the web browser, or the smart thermostat to a data storage location. How is it that we centralize all of this information to make it available for data transformation, analytics, and ultimately training our AI algorithms?

The IP/TCP Model

Back in the 1970’s as researchers were sorting out how to allow for time-sharing on the very few computers that were available, they began to develop a model that has largely evolved into what we today call the “Internet”. This model describes layers of independent protocols that communicate with each other to pass data from one machine to another.

You may have heard of an IP address before and how this is associated with your internet connection, but probably haven’t thought too much about it otherwise - and for good reason. The internet is an infinitely complex web of software and hardware that all work together to seamlessly pass the 0’s and 1’s around the world at blazing fast speeds. For the purposes of data collection in an AI strategy we need to be only somewhat aware of the inner workings so that we can deliver on our vision and not compromise security along the way. 

High-Speed data transfer

Imagine a scenario where an autonomous vehicle collects gigabytes of data per minute. The transmission speed of this data is critical for real-time analysis and decision-making. Here, leveraging high-speed cellular networks and understanding the bottlenecks in data transmission become paramount. This example shows the importance of evaluating data volume and network capabilities to ensure swift and cost-effective data handling.

When crafting your data collection strategy, and also while settling on your MVR (Minimum Viable Robot), you’ll want to think about the tradeoffs and strategies that will get you the data you need at the rate that you need it. Can you validate end user value with lower data volume at lower speeds? This will reduce the complexity of implementation while also keeping costs low as you validate that your model outputs are resulting in the value you outlined in your vision and narrative. 

Emerging tech

Innovations like edge computing and 5G networks are revolutionizing how data is transmitted and processed. Edge computing allows data to be processed closer to its source, reducing latency and bandwidth use, while 5G networks offer unprecedented transmission speeds. These technologies are reshaping the landscape of data collection and storage, offering new possibilities for AI applications that require near-instantaneous data analysis and decision-making.

Staying on top of what seems like a continuous stream of incredible innovations is daunting. It is essential to ensure that you know what tools and technologies are available so that you are able to construct a vision that takes advantage of the newest technologies and help you to stay ahead of the curve. This must be done with caution, though. The press release for a new tool may promise the world, but often it needs some time to mature. Think about this trade-off as you settle in on your data collection strategy and highlight if the value of the new and cutting edge is greater than the established and well documented present solution. 

Don’t ignore security

With the advent of sophisticated hacking techniques, securing data in transit has never been more critical. Implementing encryption protocols such as TLS (Transport Layer Security) or IPSEC can fortify data security. For instance, a healthcare provider transmitting sensitive patient data can employ these technologies to ensure data integrity and confidentiality, exemplifying how security measures can be integrated without compromising performance.Early consultation with cybersecurity experts can illuminate potential vulnerabilities and the most effective countermeasures.

Keep it simple

  1. Prioritize Data Integrity: Focus on ensuring the integrity and accuracy of your data. High-quality data is foundational for effective AI models. Consider techniques like anomaly detection and data cleansing early in the collection process to improve the quality of your inputs.
  2. Evaluate Data Relevance and Representation: Assess the relevance of collected data in relation to your AI objectives. Ensure the data accurately represents the diversity of scenarios your AI solution will encounter. This might involve actively seeking out underrepresented data to avoid biases.
  3. Iterate with a Feedback Loop: Establish a feedback loop to continuously refine your data collection strategies based on the performance of your AI models. Use insights from data analytics and model outputs to identify gaps in your data and areas for improvement.

Wrapping Up

Data is the foundation for a reason. Lots of data is good, but lots of good data is better. By understanding data as a digital reflection of our complex world, we can focus on what’s important for the next layers of our AI Strategy Framework. I hope our discussion has led to an understanding that the quality of your data sets the stage for the transformative potential of AI.

As we move forward, the next instalment will talk about data storage, examining various storage options and their significance for your AI projects. We'll explore how these choices impact your ability to harness data effectively and adapt to evolving technological landscapes.

Need Help?

If you're seeking to unlock the full potential of AI within your organization but need help, we’re here for you. Our AI strategies are a no-nonsense way to derive value from AI technology. Reach out. Together we can turn your AI vision into reality.

Chapters
No items found.
No items found.
Explore

Want to stay in the loop?

Subscribe below to get updates as they happen!
You have subscribed! Keep an eye on your emails for future updates.
Oops! Something went wrong while submitting the form.

Mitchell Johnstone

Director of Strategy

Mitch is a Strategic AI leader with 7+ years of transforming businesses through high-impact AI/ML projects. He combines deep technical acumen with business strategy, exemplified in roles spanning AI product management to entrepreneurial ventures. His portfolio includes proven success in driving product development, leading cross-functional teams, and navigating complex enterprise software landscapes.

Next post
There is no next post
Back to all posts
Previous post
There is no previous post
Back to all posts
Illustration of birds in flight around the Kepler Satellite
Part V: Data Exploration & Transformation

NASA's Kepler mission used innovative data strategies and AI frameworks to collect, process, and analyze vast amounts of astronomical data, leading to significant discoveries about planets and the universe.

Read More
Illustration of birds flying around Earth
Part IV: Ensuring Reliable and Accessible Storage

Discover how Amazon Web Services (AWS) transformed from a strategic insight at Jeff Bezos' home into a pivotal cloud solution for businesses, enabling innovative digital infrastructure management and strategic growth.

Read More
Illustration of birds flying over a graph
Part III: Data Collection - The Essence of AI

Google's founders used camera tech and a van in 2007 to validate image stitching, evolving to Street View and enhancing Maps with AI-driven data insights, setting a foundation for data-centric AI strategies.

Read More
Illustration of birds working on a whiteboard to plan out data strategy
Part II: Crafting a Compelling AI Product Vision and Narrative

Part II discusses crafting a compelling AI product vision, leveraging historical insights and modern management techniques for effective AI projects.

Read More
An illustration of birds sitting on a tree, a server is in the background.
Streamlining Website Management with Headless WordPress

Tired of endless CMS changes disrupting your marketing flow? Headless WordPress offers consistency, power, and ease of use.

Read More
Illustrated depiction of birds trying to put together a machine
Part I: Introducing the AI Strategy Framework

Get a proven AI Strategy Framework to take your project from idea to value-driven AI implementation. Actionable steps included.

Read More
Illustration of birds sitting on a stack of automation gear.
Cutting Costs with Automation: A Small Business Guide

Discover effective strategies for leveraging automation to cut operational costs and boost profitability in small businesses. This guide provides insights into selecting and implementing the right automation tools to streamline processes, reduce manual labor, and enhance efficiency.

Read More
Illustration of birds on a servers
AI in Business: Revolutionizing the Corporate Landscape

How AI is reshaping various aspects of business operations, from decision-making processes to customer experiences.

Read More
Illlustration of a bird on a desk
Harnessing AI for Efficient Inspiration Curation

We streamlined our inspiration curation by using GPT-4.0 to transform a disorganized Slack thread into a well-structured, easily navigable database, saving hours and enhancing our creative workflow efficiency.

Read More
Image of hands on a keyboard in oil painting style
Ecommerce and how it has changed the retail market

The retail industry has changed dramatically over the past decade. From the rise of online shopping and increased competition, to evolving consumer priorities and automation in retail – ecommerce is reshaping how we shop. In this post we'll explore some of these changes, along with their impact on consumers and retailers alike.

Read More
Oil painting of cows in a sunset
Project Launch: Ventec Website

Leading the charge in agricultural tech, Ventec needed a new site to better represent their industry. Today, we're proud to announce the launch of Ventec's new online platform!

Read More
Chess pieces painted in oil
Start With Strategy: The Key To A Successful Project

Without a defining strategy, projects can fall apart at any point in the process. It's important to start on strong footing to ensure success as a result.

Read More
Image of results of SEO, stylized into an oil painting
How long does it take for SEO to start working?

The time factor of SEO is often longer than many companies expect. Here's what to expect when it comes to launching an SEO strategy.

Read More
Painting of an automated arm moving
How to use automation to save you time and money

The key to success for many businesses today is automating tasks to ensure that costs are low, consistency is high, and less time is wasted overall.

Read More
Painting of a man climbing
Top Growth Tools To Expand Your Business

With the world of software evolving at a breakneck pace, here are a few tools that we use to help our clients' businesses grow.

Read More
oil painting of brain above a table representing AI
How Is AI Going To Change Graphic Design?

Artificial intelligence is changing the way every business operates, even in the creative fields that may once have been deemed safe from machine intelligence.

Read More
Sketch of a crane flying
Cranes, Trains, & Automobiles: Evolving Work Culture In The Digital World

With the COVID-forced digital shift, we have created a new benefit initiative to help improve quality of life for team members .

Read More
Watercolor of Paris
Company Trip Report: France 2022

A summary of our first official Cranes, Trains & Automobiles work away, and what's coming down the pipeline!

Read More
Watercolor background with the word "Branding" in the bottom left corner
A guide to creating a brand that works for your audience

It's time to step back and think about your brand in terms of what makes sense for your audience.

Read More
Orange painting with the word "Hybrid" in it
Hybrid Work: The Future Of The Office

The days of the office-bound worker are numbered. Organizations that have been slow to adapt will struggle to compete with those that embrace hybrid work, as employees seek more flexibility in their careers.

Read More
Logo of TIlt Five in a wheat field painting
Tilt Five Announcement

Congratulations to Tilt Five on partnering up with Asmodee to launch Catan in AR!

Read More
Watercolor image with a link icon in the middle of it.
Backlinks and Search Engine Optimization

When it comes to SEO, backlinks hold a lot of power. In fact, they’ve been shown to have a huge effect on how well your site performs in the search engines.

Read More
painting with the word, "Story" in the background
How do you build a brand story?

If you're looking to build your company into something more than just another commodity offering among many others on the market, here are some steps to get started:

Read More
Computer icon on a watercolor backsplash
How Owning A Great Website Impacts Your Business

Metrics on good websites vs. poor ones can be difficult to assess. With that being said, there are some important reasons to ensure your website is helping your company grow.

Read More
Plane moving around world icon, on a purple and green water color background
How Travel Breeds Creativity And Happiness For Our Team

We've found that traveling with our team has made them happier and more creative in the process.

Read More
Watercolor image background with "On-Page SEO" as wording in the middle
On-Page SEO: Questions To Ask Your 'Expert'

SEO is a convoluted field that can be difficult to understand as a non-expert. We have some tips on things to ask your developer or SEO expert as they change your site.

Read More
Our Paper Crane logo against a black and white watercolor splash
Legitimizing Your Brand

Brand legitimacy is a powerful tool for businesses, but many small businesses don't think that way. In this article, we'll discuss the concept of brand legitimacy and how it can help your business grow.

Read More
WordPress logo on a pink watercolor splashed background
How to speed up your Wordpress site

With recent search engine algorithm updates, page speed is more important than ever. Learn about how you can speed up a WordPress website!

Read More
Water color blue background with a Webflow logo
Webflow: When To Use It

Webflow can be a powerful tool in the right hands and perfect situations. In others, it is used to lesser effect when better tools may fit the bill better.

Read More
Oil painting of mountains
Vault 44.01 Lands $150M in Capital Commitment

With a significant fiscal investment, Grey Rock Investments showed their trust in Vault 44.01

Read More
Virtual Gurus logo over a colorful painted backdrop
Virtual Gurus Closes 8.4 Million

The Virtual Gurus were successful in closing 8.4 million dollars in funding after showing incredible year-over-year revenue growth on a consistent basis.

Read More
Board room painting
Kudos Lands $10M In Funding

With employee engagement trending internationally, Kudos leads the way with their unique software.

Read More
Next.js and Headless CMS: Revolutionizing Enterprise Web Development

Read More