The AI Hierarchy: crafting a successful AI strategy from the ground up

Part IV: Ensuring Reliable and Accessible Storage

Illustration of birds flying around EarthIllustration of birds flying around Earth
Table of Contents

Itinerary

The birth of a cloud

In the quiet confines of Jeff Bezos's home in the early 2000s, the executive team was thinking strategy. While reviewing their strengths, weaknesses, and opportunities, a company altering idea emerged. The executive team realized they had a competitive advantage with their expertise in managing digital infrastructure. As a result of building one of the largest operational websites in the world, they had been forced to become skilled in the trade of managing large digital systems.

That day, the team recognized a universal challenge: countless businesses, especially those looking to implement digital innovation, grappled with the daunting task of building and managing technology infrastructure. The skills that had helped Amazon build their e-commerce giant, were skills that were highly valued by every other company looking to grow their digital presence. To capitalize on this, Bezos and his team decided to offer Amazon’s robust digital infrastructure as service. Famous investor Bill Gurley has said this is a top 3 business move of all time. This new offering allowed businesses to outsource the building of digital roads, so they could design the cars that drive upon those roads. 

Amazon Web Services (AWS) was envisioned as the answer to a question many businesses hadn't fully articulated yet: How could they pivot from investing in the mechanics of technology to harnessing its strategic potential? With AWS, companies could tap into a scalable, secure, and sophisticated infrastructure, transforming their approach from one of operational burden to strategic advantage. The AWS retreat at the Bezos house revealed a guiding principle for businesses embarking on their AI journeys: the real value lies not in the infrastructure itself but in the innovation it enables.

For us, this story serves as a reminder as we navigate the complexities of executing an AI strategy. Our focus should remain on the problem we are solving for our customers. AWS and the other cloud service providers can handle the infrastructure, so we can innovate, create value, and transform our vision into reality.

Planning for flexibility - see the unseeable. 

Let’s now take a step back and think about the big picture. Not all data, in all industries, is best suited for cloud storage. There are many heavily regulated industries and instances where sensitive data needs extra protection. In these cases an internally managed solution or hybrid solution may be required. We’ll discuss an evaluation framework to help you understand if your data needs additional security considerations. 

If at all possible, try to focus efforts on collecting and storing data that is a good fit for the cloud. This will lower complexity, and speed up time to value. If you do have sensitive data, engaging an information security expert will allow you to see what secure tools a cloud provider can provide or build out the more complex managed strategy. 

In addition to considering options for where the data is stored (cloud, on premises, or hybrid), we need to also decide the best storage format. Given that we aren’t fortune tellers and can often overlook important data points that may be relevant in the future, I usually recommend the first data storage location that allows for raw, structured and unstructured formats of many different kinds. A place where you can store text and images as well as structured data that would fit nicely into columns and rows.

This way you have a few advantages:

  1. Cost-Effectiveness: Cloud databases like AWS S3 offer storage at around $25/TB/month, making it economically feasible to warehouse vast quantities of raw data.
  2. Flexibility: Premature commitment to a structure may pigeonhole your data strategy. Flexibility helps avoid the pitfalls of scattered, inaccessible databases, ensuring data scientists and analysts have access to the information.

What this means in practice is that you’ll probably want a Data Lake. This is the industry term to describe a place that allows data of all types to be stored in raw form. It also means that you may have to worry about additional layers of data storage and data transformation down the road, but with the benefit of added flexibility, this is well worth it. 

Structured vs. Unstructured

What do we mean when we say structured or unstructured data? Most of us have used Microsoft Excel in some capacity. Data that fits nicely into the rows and columns is considered structured. Consider filling out a form to sign up for a website. They ask for your name, email, phone number. This data can be stored in a structured format with each row representing a user who has signed up. 

User_ID Name Email Phone Number
1 Mitch mitchell.johnstone@papercrane.ca 4033893266

Conversely, unstructured data—words, images, sensory readings—defies easy categorization, requiring specialized databases for efficient storage and retrieval. The diversity of data in AI applications often requires accommodating both structured precision and unstructured richness. 

Speed and Efficiency

The next thing we need to consider is how our data scientists and analysts are going to use the data within the database. Are they going to need to extract huge chunks of data to perform model training? Do they need to quickly find single pieces of data? For AI, data scientists are definitely going to need the ability to extract large amounts of data in batches to perform their model training - which we’ll get into later in this series - but there also may be different uses for the data. Do business users want access to explore the data, or do data scientists want to sort out different relationships and extract examples from the dataset? There are two things to think about for database performance in this case. 

Indexing

Imagine a library where the books are placed in random order; indexing saves us from this chaos. By creating pointers to data locations, indexes ensure rapid retrieval, directly impacting query speed and user satisfaction. Whether organizing by date/time in time-series databases or utilizing unique identifiers, choosing the right indexing strategy is paramount for operational efficiency.

Partitioning

Dividing a database into manageable segments, partitioning enhances performance and simplifies data management. This strategy not only accelerates queries but also facilitates parallel processing and distributed storage, optimizing resource utilization.

Considering different data types

With a brief introduction to indexing and partitioning it’s important to now think about the primary data types that you are dealing with. Are you dealing with data collected across time? Are you collecting data that can all be connected back to a single person or event or category? Thinking about the type of data that you have up front will help you to select the right data storage strategy. There are different types of databases that excel for different types of data, so understanding the underlying structure, how it's created, and what the relationships between data points may look like will help you in your decision making. Let’s take a look at two specific examples to see how different data can result in different storage options.

Handling Time-Series Data: The Weather Forecasting Company

Use Case Overview: Imagine a company specializing in high-resolution, real-time weather forecasting. This company collects vast amounts of time-series data from various sensors across the globe, including temperature, humidity, wind speed, and atmospheric pressure readings. Each data point is timestamped, creating a continuous stream of data that needs to be processed, analyzed, and stored efficiently for predictive modeling and real-time weather updates.

Example of Data:

  • Timestamp: 2024-04-10 14:00:00
  • Location: San Francisco, CA
  • Temperature: 68°F
  • Humidity: 75%
  • Wind Speed: 5 mph
  • Atmospheric Pressure: 1013 mb

Database Solution: For this use case, a time-series database (TSDB) like InfluxDB or TimescaleDB would be ideal.

Why This Solution Works:

  • Efficiency: TSDBs are optimized for handling time-stamped data, making them highly efficient for writing, querying, and analyzing time-series data at scale.
  • Scalability: These databases can handle the high volume of data generated by weather sensors without performance degradation, crucial for real-time analysis.
  • Query Support: They support complex queries essential for time-series analysis, such as calculating moving averages or identifying trends over time.

Handling Object-Oriented Data: The Digital Asset Management Platform

Use Case Overview: Consider a digital asset management platform that helps creative teams manage their digital content. This platform deals with a wide variety of object-oriented data, including high-resolution images, videos, design files, and accompanying metadata (tags, descriptions, project associations). The data is not only varied in type but also needs to be accessible in a highly relational context, where connections between different assets (such as those belonging to the same project) are easily retrievable.

Example of Data:

  • Asset ID: 00123
  • Type: Image
  • File: example_image.jpg
  • Metadata: {some text
    • Tags: ["campaign_2024", "spring", "outdoors"]
    • Description: "Spring 2024 campaign outdoor shoot."
    • Project: "Spring 2024 Campaign"
      }

Database Solution: For managing object-oriented data with complex interrelationships, a document-oriented database like MongoDB or a graph database like Neo4j could be considered, depending on the complexity of the relationships between assets.

Why These Solutions Work:

  • Flexibility: Document-oriented databases like MongoDB store data in a flexible, JSON-like format, allowing for varied and nested data types (like images with metadata). This flexibility is ideal for handling the multifaceted attributes of digital assets.
  • Relationship Handling: If the use case involves complex relationships between assets (e.g., shared tags, project hierarchies), a graph database like Neo4j offers superior capabilities in querying deeply interconnected data, making it easier to navigate and understand the relationships between various digital assets.
  • Scalability and Performance: Both MongoDB and Neo4j scale well with large datasets while maintaining performance, ensuring that as the platform grows, retrieving and managing assets remains efficient.

By selecting the appropriate database solutions tailored to the specific needs of time-series and object-oriented data use cases, organizations can optimize their data management practices for better performance, scalability, and analytical capabilities. These examples illustrate how understanding the nature of your data can guide the selection of the most suitable database technology.

Making life easy for the Chief Information Security Officer

Security is paramount when considering data storage options for AI projects. The risks of data breaches, unauthorized access, and data corruption must be addressed. Topics such as advanced encryption for data at rest and in transit, rigorous access controls, and regular security audits can be difficult to wrap your mind around. However, there are a few things that you can do to make life easy on your security team (or a contracted security resource) to quickly make informed decisions that will move an AI Strategy forward without undue risk. 

  1. Share the Vision: Sharing your AI Narrative with the security team will help bring them along for the ride. They are likely to get excited and want to find ways to make this vision a reality. It will also help them contribute to filling in blind spots and identify future risks up front. This collaboration will pay off. 
  2. Data Legislation: Provide your security friends with some data samples as well as some notes and summaries of the legislation you think may apply to your data. This will help jump start their evaluation to see if the legislation may apply and the implications for your data security strategy. 
  3. Network Diagram: Provide the team with a network diagram that shows where data comes from and where it moves through the AI system. It’s okay if you’re not 100% sure of the implementation at this point, but a rough architecture diagram showing the comings and goings of your data can help the team sort out what data security techniques may need to be applied 

Keep it simple

Above we’ve considered some of the essentials when making your decision for data storage and honestly it’s a bit overwhelming. Especially if your team lacks the guidance of a tech leader with years of experience (and battle scars) it can seem daunting to make a decision. It also doesn’t help that there are billions of dollars being poured into the market for AI development which seems to increase the universe of options by the day. 

When trying to get quickly to value and validate your AI vision, it is helpful to use a few heuristics to pick a data storage option and move on:

  1. Cloud-First: Consider cloud storage solutions for their scalability and ease of integration with analytics and ML services.
  2. Open Source Preference: Lean towards open-source databases and frameworks, which come with extensive community support and flexibility.
  3. Compatibility Check: Prioritize tools and platforms that offer native support or established connectors for your analytics and ML frameworks of choice.

Wrapping Up

Not all of us have the opportunity to hang at the Bezos residence and come up with a brand new seemingly unrelated business idea that turns into a money printing machine. What we do get as a result of this meeting is the ability to build, launch, and scale our AI products quickly and cheaply. Thanks to AWS, Microsoft, and Google we can focus on our customers and their problems. We can get to know them better than any of our competitors and use these tools to win. 

When trying to get to value early, data storage can be one of the most difficult hurdles. There are so many unknowns and making a decision can be daunting. By understanding regulatory risks, your long-term plan, and the short term target, you can avoid analysis paralysis. I hope that this blog has helped you feel equipped to make a decision. 

Next up we have Data Transformation. Taking our raw, ugly data and making it pretty. Getting closer to our first taste of value. See you next week!

Need Help?

If you're seeking to unlock the full potential of AI within your organization but need help, we’re here for you. Our AI strategies are a no-nonsense way to derive value from AI technology. Reach out. Together we can turn your AI vision into reality.

Chapters
No items found.
No items found.
Explore

Want to stay in the loop?

Subscribe below to get updates as they happen!
You have subscribed! Keep an eye on your emails for future updates.
Oops! Something went wrong while submitting the form.

Mitchell Johnstone

Director of Strategy

Mitch is a Strategic AI leader with 7+ years of transforming businesses through high-impact AI/ML projects. He combines deep technical acumen with business strategy, exemplified in roles spanning AI product management to entrepreneurial ventures. His portfolio includes proven success in driving product development, leading cross-functional teams, and navigating complex enterprise software landscapes.

Next post
There is no next post
Back to all posts
Previous post
There is no previous post
Back to all posts
Illustration of birds in flight around the Kepler Satellite
Part V: Data Exploration & Transformation

NASA's Kepler mission used innovative data strategies and AI frameworks to collect, process, and analyze vast amounts of astronomical data, leading to significant discoveries about planets and the universe.

Read More
Illustration of birds flying around Earth
Part IV: Ensuring Reliable and Accessible Storage

Discover how Amazon Web Services (AWS) transformed from a strategic insight at Jeff Bezos' home into a pivotal cloud solution for businesses, enabling innovative digital infrastructure management and strategic growth.

Read More
Illustration of birds flying over a graph
Part III: Data Collection - The Essence of AI

Google's founders used camera tech and a van in 2007 to validate image stitching, evolving to Street View and enhancing Maps with AI-driven data insights, setting a foundation for data-centric AI strategies.

Read More
Illustration of birds working on a whiteboard to plan out data strategy
Part II: Crafting a Compelling AI Product Vision and Narrative

Part II discusses crafting a compelling AI product vision, leveraging historical insights and modern management techniques for effective AI projects.

Read More
An illustration of birds sitting on a tree, a server is in the background.
Streamlining Website Management with Headless WordPress

Tired of endless CMS changes disrupting your marketing flow? Headless WordPress offers consistency, power, and ease of use.

Read More
Illustrated depiction of birds trying to put together a machine
Part I: Introducing the AI Strategy Framework

Get a proven AI Strategy Framework to take your project from idea to value-driven AI implementation. Actionable steps included.

Read More
Illustration of birds sitting on a stack of automation gear.
Cutting Costs with Automation: A Small Business Guide

Discover effective strategies for leveraging automation to cut operational costs and boost profitability in small businesses. This guide provides insights into selecting and implementing the right automation tools to streamline processes, reduce manual labor, and enhance efficiency.

Read More
Illustration of birds on a servers
AI in Business: Revolutionizing the Corporate Landscape

How AI is reshaping various aspects of business operations, from decision-making processes to customer experiences.

Read More
Illlustration of a bird on a desk
Harnessing AI for Efficient Inspiration Curation

We streamlined our inspiration curation by using GPT-4.0 to transform a disorganized Slack thread into a well-structured, easily navigable database, saving hours and enhancing our creative workflow efficiency.

Read More
Image of hands on a keyboard in oil painting style
Ecommerce and how it has changed the retail market

The retail industry has changed dramatically over the past decade. From the rise of online shopping and increased competition, to evolving consumer priorities and automation in retail – ecommerce is reshaping how we shop. In this post we'll explore some of these changes, along with their impact on consumers and retailers alike.

Read More
Oil painting of cows in a sunset
Project Launch: Ventec Website

Leading the charge in agricultural tech, Ventec needed a new site to better represent their industry. Today, we're proud to announce the launch of Ventec's new online platform!

Read More
Chess pieces painted in oil
Start With Strategy: The Key To A Successful Project

Without a defining strategy, projects can fall apart at any point in the process. It's important to start on strong footing to ensure success as a result.

Read More
Image of results of SEO, stylized into an oil painting
How long does it take for SEO to start working?

The time factor of SEO is often longer than many companies expect. Here's what to expect when it comes to launching an SEO strategy.

Read More
Painting of an automated arm moving
How to use automation to save you time and money

The key to success for many businesses today is automating tasks to ensure that costs are low, consistency is high, and less time is wasted overall.

Read More
Painting of a man climbing
Top Growth Tools To Expand Your Business

With the world of software evolving at a breakneck pace, here are a few tools that we use to help our clients' businesses grow.

Read More
oil painting of brain above a table representing AI
How Is AI Going To Change Graphic Design?

Artificial intelligence is changing the way every business operates, even in the creative fields that may once have been deemed safe from machine intelligence.

Read More
Sketch of a crane flying
Cranes, Trains, & Automobiles: Evolving Work Culture In The Digital World

With the COVID-forced digital shift, we have created a new benefit initiative to help improve quality of life for team members .

Read More
Watercolor of Paris
Company Trip Report: France 2022

A summary of our first official Cranes, Trains & Automobiles work away, and what's coming down the pipeline!

Read More
Watercolor background with the word "Branding" in the bottom left corner
A guide to creating a brand that works for your audience

It's time to step back and think about your brand in terms of what makes sense for your audience.

Read More
Orange painting with the word "Hybrid" in it
Hybrid Work: The Future Of The Office

The days of the office-bound worker are numbered. Organizations that have been slow to adapt will struggle to compete with those that embrace hybrid work, as employees seek more flexibility in their careers.

Read More
Logo of TIlt Five in a wheat field painting
Tilt Five Announcement

Congratulations to Tilt Five on partnering up with Asmodee to launch Catan in AR!

Read More
Watercolor image with a link icon in the middle of it.
Backlinks and Search Engine Optimization

When it comes to SEO, backlinks hold a lot of power. In fact, they’ve been shown to have a huge effect on how well your site performs in the search engines.

Read More
painting with the word, "Story" in the background
How do you build a brand story?

If you're looking to build your company into something more than just another commodity offering among many others on the market, here are some steps to get started:

Read More
Computer icon on a watercolor backsplash
How Owning A Great Website Impacts Your Business

Metrics on good websites vs. poor ones can be difficult to assess. With that being said, there are some important reasons to ensure your website is helping your company grow.

Read More
Plane moving around world icon, on a purple and green water color background
How Travel Breeds Creativity And Happiness For Our Team

We've found that traveling with our team has made them happier and more creative in the process.

Read More
Watercolor image background with "On-Page SEO" as wording in the middle
On-Page SEO: Questions To Ask Your 'Expert'

SEO is a convoluted field that can be difficult to understand as a non-expert. We have some tips on things to ask your developer or SEO expert as they change your site.

Read More
Our Paper Crane logo against a black and white watercolor splash
Legitimizing Your Brand

Brand legitimacy is a powerful tool for businesses, but many small businesses don't think that way. In this article, we'll discuss the concept of brand legitimacy and how it can help your business grow.

Read More
WordPress logo on a pink watercolor splashed background
How to speed up your Wordpress site

With recent search engine algorithm updates, page speed is more important than ever. Learn about how you can speed up a WordPress website!

Read More
Water color blue background with a Webflow logo
Webflow: When To Use It

Webflow can be a powerful tool in the right hands and perfect situations. In others, it is used to lesser effect when better tools may fit the bill better.

Read More
Oil painting of mountains
Vault 44.01 Lands $150M in Capital Commitment

With a significant fiscal investment, Grey Rock Investments showed their trust in Vault 44.01

Read More
Virtual Gurus logo over a colorful painted backdrop
Virtual Gurus Closes 8.4 Million

The Virtual Gurus were successful in closing 8.4 million dollars in funding after showing incredible year-over-year revenue growth on a consistent basis.

Read More
Board room painting
Kudos Lands $10M In Funding

With employee engagement trending internationally, Kudos leads the way with their unique software.

Read More
Next.js and Headless CMS: Revolutionizing Enterprise Web Development

Read More