Presentation: DataOps – When DevOps meets Data Processing

Data strategies for cloud-based or hybrid approaches

Data is considered as the most valuable resources in the modern world since all industrial sectors and business aspects are running based on different kinds of data. Before all of us or machines can use and analyse the data, it is undoubted that data needs to be driven through a set of processing processes, because data is not readable by human, or unstructured that even machines cannot understand it correctly. Therefore, the effectiveness of industrial and business operation and management also depends on the quality of the data processing steps. That leads to a concern that how we can assure the data processing quality when there are too many different unexpected events such as larger amount of data than expected comes to the system, or computing resources are suddenly not available anymore. In additional, the development and deployment scripts for processing data can cause hazardous events if engineers make irreversible mistakes.

Cloud computing and DevOps methodology are the rising technologies and terms in the recent years. Cloud computing offers fast start-up, flexible, scalable, and elastic computing resources based on the usage demand to enhance business operation, while DevOps provides the engineers automation features for software development and deployment to reduce human efforts and mistakes.

As a result, with the help of cloud computing technology and DevOps methodology, the data processing process can be designed and implemented with the most efficient and economic solutions. That leads to the born of the term DataOps.

In the presentation, we will walk you through:

  • Concept of DataOps
  • The provisioning automation of cloud resources required for data processing with IaC (Infrastructure-as-Code) and CI/CD
    • Tech. stack: Terraform and Azure DevOps
  • The automation of data processing with data pipeline
    • Tech. stack: Azure DataFactory

Hung Pham, Datics-Consulting