Quest for Excellence

Sunday, June 25, 2023

Migrating Azure Data Factory (ADF) to its AWS Glue equivalent

Migrating Azure Data Factory (ADF) to its AWS equivalent involves a series of steps to ensure a smooth transition of your data integration and orchestration workflows. The AWS service that is often considered as an equivalent to Azure Data Factory is AWS Glue. Here's a high-level overview of the migration process:

Assess Your ADF Workflows: Start by reviewing your existing ADF workflows and identifying their components, such as pipelines, activities, datasets, and linked services. This assessment will help you understand the complexity and dependencies of your workflows.
Recreate Datasets and Linked Services: In AWS Glue, datasets are analogous to ADF's datasets, and linked services are similar to data connections. Recreate the datasets in AWS Glue, ensuring the necessary permissions and access rights are set up. For linked services, create equivalent connections to your data sources and destinations.
Migrate Pipelines: Analyze your ADF pipelines and rewrite them using AWS Glue. AWS Glue provides a serverless ETL (Extract, Transform, Load) service, which can be used to build and manage your data workflows. Reconstruct your ADF pipelines in AWS Glue using its workflow features and components.
Configure Data Transformations: Identify the data transformation activities within your ADF pipelines and implement them using AWS Glue features. AWS Glue offers a range of transformation capabilities, including data mapping, data conversions, filtering, aggregations, and more. Adapt your ADF data transformations to use AWS Glue transformations.
Set Up Scheduling and Triggers: In ADF, you might have scheduled pipelines or triggers based on events. Configure the equivalent scheduling and triggering mechanisms in AWS Glue. AWS Glue provides options to schedule workflows based on cron expressions or event-driven triggers.
Validate and Test: Thoroughly test your migrated workflows in AWS Glue to ensure they function as expected. Verify data integrity, transformation accuracy, and pipeline performance. Test the monitoring and logging capabilities to ensure proper visibility into your workflows.
Adjust Security and Permissions: Set up appropriate security measures in AWS Glue, including access controls, authentication, and encryption. Ensure the necessary IAM (Identity and Access Management) roles and policies are in place to control data access and management.
Migrate Monitoring and Alerting: If you have monitoring and alerting systems in place for ADF, identify the corresponding services in AWS to migrate these capabilities. AWS CloudWatch and AWS CloudTrail are common options for monitoring and logging in the AWS ecosystem.
Plan Data Migration: Consider the data migration process for your source and destination data stores. AWS provides various services like AWS Database Migration Service (DMS) for databases and AWS Snowball for large-scale data transfer. Evaluate the most appropriate method for migrating your data from Azure to AWS.
Gradual Transition: Plan for a phased migration approach where you gradually move pipelines from Azure Data Factory to AWS Glue. Start with less critical or non-production pipelines and iteratively migrate the more complex workflows.

It's important to note that while AWS Glue is often considered an equivalent service to Azure Data Factory, there might be differences in features and capabilities. Therefore, it's recommended to consult the AWS Glue documentation and consider any specific requirements of your data integration workflows during the migration process.

Saturday, June 24, 2023

Steps to migrate from Azure to AWS

Here are the steps on how to migrate from Azure to AWS:

Plan your migration. This includes identifying the resources you want to migrate, assessing their readiness, and developing a migration plan.
Prepare your Azure environment. This includes decommissioning any unused resources, updating your DNS records, and creating a staging environment for your migrated resources.
Migrate your resources. There are a number of different ways to migrate resources from Azure to AWS, including using AWS Simple Migration Service (SMS), third-party tools, or manual migration.
Test your migrated resources. Once your resources have been migrated, you need to test them to make sure they are working properly.
Decommission your Azure environment. Once you have confirmed that your migrated resources are working properly, you can decommission your Azure environment.

Here are some additional tips for migrating from Azure to AWS:

Use AWS SMS to simplify your migration. AWS SMS is a fully managed service that makes it easy to migrate workloads from Azure to AWS.
Use a third-party tool to migrate your resources. There are a number of third-party tools that can help you migrate your resources from Azure to AWS.
Manually migrate your resources. If you have a small number of resources, you may want to manually migrate them.
Test your migrated resources thoroughly. It is important to test your migrated resources thoroughly to make sure they are working properly.
Decommission your Azure environment once you have confirmed that your migrated resources are working properly.

Migrating from Azure to AWS can be a complex process, but it can be made easier by following these steps and tips.

There are many third-party tools that can help you migrate your resources from Azure to AWS. Some of the most popular tools include:

CloudFuze: CloudFuze is a cloud-to-cloud migration service that supports the migration of a wide range of resources, including files, folders, databases, applications, and virtual machines.
CloudMover: CloudMover is a cloud migration tool that provides a drag-and-drop interface for migrating resources between different cloud providers.
CloudEndure: CloudEndure is a cloud migration tool that provides a continuous replication of your workloads to AWS, so that you can failover to AWS in the event of an outage.
AvePoint CloudMover: AvePoint CloudMover is a cloud migration tool that supports the migration of a wide range of resources, including files, folders, databases, applications, and virtual machines.
Quest tools: Quest provides a suite of tools for migrating resources to AWS, including Quest Migration Manager for AWS, Quest vSphere to AWS Converter, and Quest Sharegate for AWS.
Turbonomic: Turbonomic is a cloud migration and optimization platform that provides automated migration planning and workload placement recommendations. It can analyze your Azure resources and suggest the best migration strategy to AWS based on factors like performance, cost, and compliance.
RiverMeadow: RiverMeadow is a cloud migration tool that supports the migration of workloads from Azure to AWS. It offers features such as automated workload discovery, provisioning, and migration, making it easier to migrate applications and virtual machines.

These are just a few of the many third-party tools that can help you migrate your resources from Azure to AWS. When choosing a migration tool, it is important to consider the specific resources that you need to migrate, as well as your budget and timeline.

Migrating from Azure to AWS

There are several migration options to consider for a migration from Azure to AWS depending on factors such as

complexity of the application,
data transfer requirements,
downtime tolerance, and
overall migration strategy.

Migration Options:

Lift and Shift (Rehosting):

Lift and Shift involves migrating applications and infrastructure as-is from Azure to AWS without making significant changes.
You would replicate the existing Azure environment in AWS, provisioning equivalent resources like virtual machines (EC2 instances), storage, and networking components.
Tools like AWS Server Migration Service (SMS) or third-party migration tools can assist with automating the migration process.

Replatforming (Rearchitecting):

Replatforming involves making some modifications to the application architecture and leveraging native AWS services to optimize performance, scalability, and cost.
During this process, you identify the equivalent AWS services for the Azure services used and modify the application accordingly.
It might involve migrating databases to AWS RDS, using AWS Lambda for serverless computing, or leveraging AWS Elastic Beanstalk for application hosting.

Repurchasing:

Repurchasing refers to replacing the existing Azure-based application with a different solution available in AWS.
It involves selecting and adopting a new application or software-as-a-service (SaaS) solution available in AWS.
This option requires a thorough evaluation of available AWS services and the impact on the business processes and data.

Refactoring (Re-architecting):

Refactoring involves significant changes to the application codebase, possibly rewriting parts of the application to leverage cloud-native services and technologies.
This option allows you to take advantage of AWS-specific features, such as AWS Lambda, DynamoDB, or Amazon S3.
It often requires a deeper understanding of the application architecture and development effort.

Hybrid Approach:

The hybrid approach involves maintaining a hybrid environment with some applications running in Azure and others in AWS.
This option allows for a phased migration where specific workloads or applications are moved based on priority or feasibility.
It may involve establishing secure connectivity between Azure and AWS using VPN or Direct Connect.

Each migration option has its pros and cons, and the choice depends on factors such as the complexity of the application, time constraints, budget, and long-term goals. It is advisable to conduct a thorough analysis, create a migration plan, and consider engaging with experienced migration specialists or consulting services to ensure a successful Azure to AWS migration.

Tuesday, April 26, 2016

How to find and delete duplicate records in Oracle Database

There are many ways you can do this - Below is one simple way to do it

select column_name, count(column_name)
from table
group by column_name
having count (column_name) > 1;

How to delete duplicate records in Oracle Database

delete from
table_name a
where
a.rowid >
any (select b.rowid
from
table_name b
where
a.col1 = b.col1
and
a.col2 = b.col2
);

Tuesday, April 12, 2016

Is Domain Driven Design (DDD) useful to your projects?

DDD is not for everyone and certainly not for all projects. To gain the most benefits when following DDD and deciding which projects will benefit from them, the following criteria can be used
1) You need a complex core domain that will be invested in over time
2) An iterative development process
3) Access to domain experts
4) Solid design principles required for refractoring
5) Sharp design sense
6) A focussed, motivated and experienced team

You need disciplined developers who are willing to work with domain experts and understand the business rather than worry how they can wedge in the latest angular framework/tools into a project

Tuesday, September 23, 2014

Spike Story

Spikes are an invention of Extreme Programming (XP), are a special type of story that is used to drive out risk and uncertainty in a user story or other project facet.

A story aimed at answering a question or gathering information, rather than at producing a shippable product. Sometimes the development team cannot estimate a story without doing some actual work to resolve a technical question or a design problem. So we create a spike story whose purpose is to provide the answer or solution. Like any other story, the spike is then given an estimate and included in the sprint backlog and the outcome demonstrated at the end of the iteration.

A spike story could include activities such as research, design, exploration and prototyping. The purpose could be

1)To gain the knowledge necessary to reduce the risk of a technical approach.

2)To get a better understanding of the requirement.

3)To increase the reliability of a story estimate for technically or functionally complex features.

There can be two types of spike stories

1)Technical - To determine feasibility and impact of design strategies.

2)Functional - To analyze the aggregate functional behavior and to determine how to break it down, how it might be organized and where risk and complexity exists, in turn influencing implementation decisions.

Since spikes do not directly deliver value to the user, they should be used only rarely.

We should be able to estimate (time box) it and the result (answer, solution, prototype) should be something that can be demonstrated by the development team and acceptable by the product owners.

It should be reserved for more critical and larger unknowns only.

Do not plan the spike story and implementation story in the same sprint. If you think it is that simple then probably it is not a spike story since every story will inherently have some unknowns discovered when implementing it.

Saturday, July 5, 2014

Technical Debt

Technical debt is a metaphor referring to the eventual consequences of poor system design, also known as design debt or code debt. Every project is going to have some form of technical debt no matter what and there are several ways you get there. You can create technical debt advertently or inadvertently.

I know but I need to ship it soon

Advertently Created Technical debt is when you know that you are creating technical debt. One very common argument is the need to ship the product quickly in order to get to the market first. If we don't ship our product in time then there is a risk of losing the market to competition. So we pay more attention to getting out the product early and compromise on quality assuming that once shipped we will get back to getting the quality right. There are circumstances where this is a valid argument and a good development team will have a plan to reduce debt once the product is released.

I know but I don't care

There is a case where applications are built without paying attention to design or quality. The teams knows that it is not right and debt is being built but rather care less. The common arguments to this approach is "We don't have time for all this", "Our product owner will not allow this","We have to get this done even if quality is compromised", "This is how we have been doing it", "If it ain't broken why fix it" etc...This is commonly seen in projects that have started using scrum. The goal is to go fast and faster, or to get things done. The wrong metric is used to measure progress. Focus is on burned down charts and micromanagement of tasks and most likely the result of work done at the end of sprint is not release ready without a hardening or testing sprint. This is a very naive approach where the team does not understand that doing agile or scrum does not bring agility. The true value of agile comes when you are able to ship a product with quality at the end of a sprint. If you cant do that you are doing it wrong.If you can't ship your product at the end of a sprint with quality typical one week or a maximum of four weeks then you are not AGILE. Sadly this is the most commonly seen way of creating debt.

I don't know what I am doing

There are inadvertent ways in which you can still create debt into your code. One is when you have a relatively less experienced team trying there best unknowingly creating a mess. Teams new to Test Driven Development probably focus on the basic aspects of TDD and not fully utilizing design insights to refactor code to patterns. The same issue when applying the wrong patterns to a problem without realizing the same. These teams probably need training, a Technical mentor and probably adopt pair programming and code review sessions to come up with better solutions. Not realizing the maturity level of the teams and not providing appropriate training and mentor ship can lead progress in the wrong path. By the time the team realizes about the technical debt acquired it will probably be late.

I didn't know how I was supposed to do it at the time

There is one more pattern of creating inadvertent debt is a case when the domain of the business itself was complex that the evolution of understanding of the business was not reflected in the code. This is the typical case where the team feels "If I knew what I know now at the beginning of the project I probably would have built it in a different way". Most projects even with a lot of talent and practices could still get into technical debt in this way due to the complexity of the domain and gap in understanding the requirements. This could also happen if the requirements changed over time for the right reasons.

To sum up, no matter what you do you are going to have technical debt build up in the process and you are going to have a strategy to deal with it. The best way to ensure managing technical debt is to make sure you are keeping you design flexible to change and have a suite of tests to manage your features and code so that you can constantly refactor to update you code and design to reflect your current understanding of the system and tools.

Good pragmatic practices including Acceptance Test Driven Development, Domain Driven Design along with Test Driven Development and constant refactoring and using all the feedback loops appropriately to build a culture of learning and reacting to change is a good way to go.
The only people who know and see technical debt are the folks who write the code. Its very difficult to convince anyone outside the development team. The only way to get rid of technical debt is to constantly refactor code while developing code in a smart way. It is the responsibility of the developer to keep the code base clean and maintainable by applying all the pragmatic practices suitable to the project situation. Caving in to reasoning's that do not keep the code base clean is irresponsible and unprofessional because essentially you are making your product owner pay for the irresponsibility when he has to pay more to add features in the future at a greater cost.

Recent Speaker Events by Benoy John

16/04/2016 Business Driven Development - www.TwinCitiesCodeCamp
12/05/2015 Introduction to Domain Driven Design -Iowa Code Camp Ankeny
11/02/2014 Domain Driven Design Building Blocks Entity,Value Objects,Aggregates - Iowa Code Camp Ankeny
10/04/2014 Acceptance Test Driven Development - www.TwinCitiesCodeCamp.com
07/28/2014 QACoP Lunch and Learn - Acceptance Test Driven Development
07/19/2014 Domain Driven Design - Iowa Code Camp Coralville, Iowa
10/02/2013 Collaboration Driven Development - Fall Iowa Code Camp Ankeny, IA

About Me

Translate