4 minute read

Get reliable network coverage and security protection, fast.

A modern network must be able to respond easily, quickly and flexibly to the growing needs of today’s digital business. Must provide visibility & control of applications, users and devices on and off the network and Intelligently direct traffic across the WAN. Be scalable and automate the process to provide new innovative services. Support IoT devices and utilize state-of-the-art technologies such as real-time analytics, ML and AI. And all these must be provided with maximum security and minimum cost.

This is the power that brings the integration of two cloud managed platforms, Cisco Meraki and Cisco Umbrella. This integration is binding together the best of breed in cloud-managed networking and Security.

cisco.com

TONY BAER PRINCIPAL, DBINSIGHT

Data lakes also provide organisations with greater security, as they can be encrypted, enabling organisations to securely store and access their data without compromising the integrity of their data. Moreover, data lakes can give organisations real-time insights into their data, allowing them to make fast and well-informed decisions.

Data lakes for non-destructive testing data

Non-destructive testing (NDT) and inspection data are crucial in ensuring the safety of industrial assets and operations.

A data lake – a centralised repository for storing all structured and unstructured data at any scale – could be an ideal solution for storing and managing NDT data, as well as inspection metadata, in a cost-effective, centralised manner. This reduces storage costs by eliminating the need for X-ray film, chemicals, paper, or archive rooms, while digital data can be accessed from almost everywhere, thus increasing accessibility.

A key advantage of using a data lake to store and manage NDT data is that it is a valuable source for artificial intelligence (AI) projects. The large amount of data stored in a data lake can be used to train machine learning models, improving the efficiency of NDT and inspection processes. For instance, historical NDT data and inspection metadata can be used to train machine learning models to predict when equipment is likely to fail, allowing organisations to schedule maintenance and repairs proactively, reducing downtime and increasing the overall efficiency of their operations.

Challenges from setup to data swamps Nevertheless, data lakes have challenges. Complexity in setup and management, data governance, and security are three such challenges. Additionally, if the data formats vary, these lakes can become a data swamp. Data standardisation is therefore crucial for better data quality, governance, and reusability.

ASTM International recommends using a standardised digital format, Digital Imaging and Communication for Non-Destructive Evaluation (DICONDE), to store NDT data and inspection metadata from different NDT methods and sources in a centralised location. DICONDE provides a vendor-independent data storage and transmission protocol for nondestructive materials testing and ensures data is complete, unaltered, and has a traceable history.

“In the world of data management, the question of whether to use one data lake for all data or multiple lakes is a common one,” says James Serra, Data & AI Solution Architect at Microsoft. “While using a single data lake is ideal, there are many reasons why organisations may opt for multiple data lakes.”

Serra is an expert in data warehousing and data management, with over 35 years of experience in data modelling, data governance, and development methodologies. Alongside his role at Microsoft, he is a popular blogger, author, and speaker, having presented at PASS Summit, SQLBits, Enterprise Data World conference, Big Data Conference Europe, and dozens of SQL Saturdays.

One main reason is organisational structure, where each department or team owns their own data. Additionally, having multiple lakes may be necessary to comply with multi-regional data residency requirements, subscription or service limits, quotas, and constraints, or to implement different policies for each data lake. Another important factor is security, where confidential or sensitive data needs to be kept separate from other data for security reasons.

Multiple data lakes can also help to improve latency by having a data lake in the 90% Effective performance of some lake houses in comparison to traditional data warehouses

Welcome to the lake house

Data lake houses promise to combine the best of both worlds: the economics of scale and flexibility of the data lake, with the reliability and control of the data warehouse.

The cloud has revolutionised data storage and processing, enabling data lakes to deliver data warehouse-level performance. Open-source technologies like Spark, Drill, and Trino provide data transformation capabilities, while opensource data lakehouse table formats like Delta Lake and Apache Iceberg help structure the data.

These lakehouses are designed to behave and perform like data warehouses and can reach 80-90% of the performance of warehouses. The commercial ecosystem around these open-source formats is becoming the main battleground between big vendors.

"The data lake house will not replace data lakes or purpose-built data warehouses, but in the long run they will co-opt enterprise data warehouses," says Tony Baer, Principal, dbInsight, a company that has carried out extensive research in this area. "Data lake houses will enable data lakes to perform and be controlled, governed, and secured like data warehouses."

same region as an end-user or application querying the data. Disaster recovery, different data retention policies, and the ability to implement different service levels for different data types are further reasons to consider having multiple data lakes.

“It’s important to note that using multiple data lakes can increase the complexity and cost of your data management infrastructure and require more resources and more expertise to maintain, so it’s important to weigh the benefits against the costs before implementing multiple data lakes,” says Serra.

“Multiple data lakes also may require additional data integration and management tools to help ensure that the data is properly transferred between the different data lakes and that data is consistent across all data lakes,” says Serra. “Finally, having multiple data lakes adds the performance challenge of combining the data when a query or report needs data from multiple lakes.”