Blog

Azure Data Factory Interview Question and Answers

1.    What is Azure Data Factory? Azure Data factory is a cloud based ETL/ELT tool. Azure Data Factory is a cloud-based integration service offered by Microsoft that lets you create data-driven workflows for orchestrating and automating data movement and data transformation overcloud. Data Factory services also offer to create and running data pipelines that move

Transform data using mapping data flows

In this post, we’ll use the Azure Data Factory user interface (UX) to create a pipeline that copies and transforms data from an Azure Data Lake Storage (ADLS) Gen2 source to an ADLS Gen2 sink using mapping data flow. The configuration pattern in this tutorial can be expanded upon when transforming data using mapping data

Mapping data flows in Azure Data Factory

What are mapping data flows? Mapping data flows are visually designed data transformations in Azure Data Factory. Data flows allow data engineers to develop data transformation logic without writing code. The resulting data flows are executed as activities within Azure Data Factory pipelines that use scaled-out Apache Spark clusters. Data flow activities can be operationalized

Data Factory – Move files to time folder structure

Scenario: I have a folder full of CSV files. Each file name is a date for which this file contains data (i.e. 2021-10-01T00:00:00Z). I want to organize these files into folders, with a time hierarchy. Meaning, the top level folders will be years (2022, 2023…), the second level will be months in this year (1,2,3…12),

Azure Data Engineering Questions and Answers – 2023

1.    What is Data Engineering? Data Engineering is a field within the broader domain of data management that focuses on designing, building, and maintaining systems and infrastructure to support the collection, storage, processing, and analysis of large volumes of data. It plays a crucial role in the data lifecycle, ensuring that data is properly ingested,

Databrics interview questions and answers – 2023

1. What is Databrics ? Databricks provides a collaborative and interactive workspace that allows data engineers, data scientists, and analysts to work together on big data projects. It supports various programming languages, including Scala, Python, R, and SQL, making it accessible to a wide range of users. The platform’s core component is Apache Spark, which

SQL Interview Questions and answers – 2021

1. What is DBMS? A Database Management System (DBMS) is a program that controls creation, maintenance and use of a database. DBMS can be termed as File Manager that manages data in a database rather than saving it in file systems. A Database Management System (DBMS) is a software application that interacts with the user, applications, and the

Questions and answers for dimensionality reductions

1. What is dimensionality reduction? When we have a dataset with multiple input features, we know the model will overfit. To reduce input feature space, we can either drop or extract features, this is basically a dimension reduction. Now let’s discuss more about both techniques. Drop irrelevant, redundant features as they do not contribute to