Thoughts on Refreshing Business Software – Freshworks Blog – Thoughts on Refreshing Business Software

How we saved 20% EC2 by migrating to AWS R6i instances

Yuvanesh Palanisamy — Thu, 09 May 2024 15:44:17 +0000

In the competitive landscape of cloud services, cost optimization is a perpetual goal for site reliability engineering (SRE) teams. Our recent effort involved a strategic shift in production workload from AWS R5 instances to a more cost-effective and performant solution. This blog post delves into our journey, the rigorous analysis, the challenges faced, and the substantial cost savings and performance gains we achieved by migrating to AWS R6i instances.

The selection process

Our journey began with a critical assessment of our existing infrastructure. We were running hundreds of AWS R5 instances to support our workload. While the R5 instances had served us well, we recognized that evolving technologies and pricing models might present better options. Looking ahead, and considering our anticipated growth in traffic based on past trends, we aimed to optimize both the cost and performance of our application. This required finding a solution that was not only cost-effective but also scalable and capable of handling a more intensive workload.

With our application’s high CPU and memory demands in mind, we set out to identify instance types that would meet these specific needs. Our attention was drawn to the AWS Graviton instances, specifically the R6g and R7g, which offered a promising price-to-performance ratio due to their ARM-based architecture. We also considered the latest Intel instances, the R6i series, renowned for their improved performance at a comparable cost.

The selection process among these instances was thorough and data-driven. We established a series of benchmarks and load tests to emulate the most challenging aspects of our production environment. These tests were meticulously designed to push the instances to their limits and to compare their capabilities side by side. We concentrated on CPU-intensive tasks such as HTML parsing to establish a baseline performance metric, and we tested various API request patterns that our Rails application typically processes to ensure a comprehensive comparison.

Diving deep into load test results: Why R6i stood out

The load test results were eye-opening. The R6i instances demonstrated a clear advantage over Graviton instances as well as R5 instances (as claimed in AWS docs) in CPU-intensive operations. For instance, when running HTML parsing, a single-threaded test with 10,000 iterations showed that the R6i instances completed tasks significantly faster across various Ruby versions.

Note: This test is to understand the performance of Graviton over the Intel instances with a CPU-intensive operation (HTML parsing) over various Ruby versions.During the read path tests of our Rails application (GET and LIST APIs), the R6i instances continued to impress. With a 30-minute test involving nine concurrent users and eight distinct URLs, the R6i instances exhibited lower latencies and higher throughput than the R5 and Graviton instances. The P90, P95, and P99 latencies were consistently better, and the average CPU utilization was lower, indicating a more efficient use of resources.The create/update API tests further solidified R6i’s lead. With 20 concurrent users generating a maximum of 400 requests per minute, the R6i instances not only had lower latencies but also maintained higher throughput and lower CPU utilization compared to the other instance types.

The baseline of 400 requests per minute from 20 concurrent users for the defined test cases was finalized and replicated in other instance types because that was the breaking point in R5 where the CPU started taking 1 core.The background job-performance tests, which observed the behavior of one of our busiest background jobs, Sidekiq real-time processing under load, showed that the R6i instances had the highest throughput and the lowest latency and CPU usage. This testing was critical for ensuring that background processes did not become a bottleneck in our system.The 30-minute test shows that R6i is better in all aspects. However, we noticed fluctuations in average memory usage because of the garbage collection (GC) frequency. But this was ironed out in our three-hour test with more GC cycles.

Preparing for a multi-architecture setup

While we ultimately selected the R6i instances for their superior performance and cost savings, we had to make our main application and the other components of our infra ARM-compatible to conduct this load testing.

We began by addressing compatibility within our Rails application. This process involved identifying and updating gems that were not ARM-compatible. We made strategic changes, such as switching to the ‘cld’ gem for language detection, upgrading ‘curb’ for HTTP requests, replacing ‘therubyracer’ with Node.js for JavaScript runtime, and patching ‘authlogic’ to use an updated ‘scrypt’ gem.

Our Docker image, based on ‘amazonlinux’ version, did not support ARM, prompting us to upgrade to ‘amazonlinux:2’. We replaced specific compilers and manually installed tools like ‘tar’, ‘gzip’, and ‘make’ that were not pre-installed. We navigated through a series of binary replacements and configuration changes, such as upgrading ‘exiftool’, switching to ‘protobuf-devel.aarch64’, and modifying Nginx installation options.

Then we built the app in an ARM-based instance. Building the Docker image on a Graviton instance involved setting up Docker and Git, cloning the necessary repositories, and executing a Docker build with the appropriate arguments. We verified the functionality by running the Docker container and making successful curl requests to our application endpoints.

The Kubernetes cluster setup required us to ensure that all dependencies—including sidecars, daemonsets, and additional deployments—were ARM-compatible. We built Docker images for each dependency, updating Dockerfiles with ARM-compatible binaries as needed. This process also involved upgrading base images and installing ARM-compatible versions of required binaries and development packages. Apart from the main app, we had over 30 Docker images to be converted to ARM-compatible by following the above process.

Conclusion

Our migration to AWS R6i instances was a calculated, data-driven decision that resulted in a 20% annual cost savings and a 15-20% improvement in overall latency.

Before (with R5)

After (with R6i)

You can see considerable improvements in all aspects of the infrastructure.

The transition not only reflects our commitment to efficiency and performance but also underscores the importance of regular infrastructure reviews. We recommend that readers adopt a proactive approach to their infrastructure management. It’s crucial to evaluate usage patterns and stay informed about new options in the market. We suggest conducting a check at least once a quarter to ensure that your infrastructure aligns with your current needs and takes advantage of the latest technological advancements.

In sharing our story, we hope to inspire others to closely examine their setups, run tests, and be open to making changes that could lead to better performance and savings. Remember, the cloud landscape is ever-evolving, and staying ahead is crucial.

The post How we saved 20% EC2 by migrating to AWS R6i instances appeared first on Thoughts on Refreshing Business Software.

Migrating Jenkins master from AWS OpsWorks to Amazon EKS

Bala Vamsi — Fri, 26 Apr 2024 16:44:05 +0000

What is Jenkins?

Jenkins is a popular open-source automation server that is widely used for continuous integration (CI) and continuous deployment (CD) in software development pipelines. In a Jenkins setup, multiple build agents can work in parallel to build and test code, build artifacts, generate reports, and deploy applications.

Introduction

Let’s walk through the step-by-step process of migrating Jenkins master from AWS OpsWorks to Amazon EKS. The transition from OpsWorks, a managed configuration service, to EKS, a Kubernetes-based container orchestration platform, promises enhanced scalability and flexibility for the Jenkins infrastructure.

We have migrated 35 Jenkins masters from OpsWorks to EKS. We have also migrated 24 slaves from different sources like OpsWorks, Spot group, Nomad, EC2, and ECS into EKS.

Why we chose EKS over OpsWorks

We decided to move our Jenkins services from OpsWorks to Amazon Elastic Kubernetes (EKS) prior to AWS announcing the end of support for OpsWorks stacks. We chose EKS since it is a managed Kubernetes service provided by AWS. EKS service provides multiple features, including:

Managed Kubernetes control plane: Amazon EKS fully manages the Kubernetes control plane, including the API server and etcd, ensuring high availability and scalability
Compatibility: EKS is certified Kubernetes conformant, meaning it is compatible with existing Kubernetes applications and tools
Automatic updates: EKS provides automated updates for the Kubernetes control plane, making it easier to stay up to date with the latest features and security patches
Integrated with AWS services: EKS seamlessly integrates with other AWS services such as Elastic Load Balancing (ELB), Amazon Relational Database Service (RDS), Amazon S3, and more
Multi-AZ and high availability: EKS supports deploying clusters across multiple availability zones (AZs) for high availability and fault tolerance
Security and compliance: EKS integrates with AWS Identity and Access Management (IAM) for fine-grained access control and supports Kubernetes role-based access control (RBAC). It also helps in meeting regulatory compliance requirements
Spot instances support: EKS allows you to use EC2 Spot instances as worker nodes, reducing costs for fault-tolerant and flexible workloads
VPC networking: EKS integrates with Amazon Virtual Private Cloud (VPC), allowing use of VPC networking features, including private networking and security group controls
Logging and monitoring: EKS integrates with AWS CloudWatch for logging and monitoring Kubernetes applications and infrastructure

Prerequisites

Before proceeding with the migration journey, we need to ensure that we have the following prerequisites in place:

Access to the AWS Management Console
A backup of critical Jenkins configurations, jobs, and data
Kubectl and K9s command line tools
GitHub to store the YAML files of the Jenkins master and Argo CD for deployment

1. Assess current Jenkins configuration

Initially our Jenkins setup was on OpsWorks, where Jenkins masters were segregated into different layers. Over each layer, we have a different set of EC2 nodes and their Elastic Block Store (EBS) volumes based on the usage of the Jenkins master
All these Jenkins masters have a common application load balancer (ALB) through which path-based routing was enabled
Our OpsWorks-hosted Jenkins architecture:

2. Set up Amazon EKS cluster

Create a cluster in Amazon EKS
Map the Elastic File System (EFS) to be the base storage for the cluster
In EKS, Jenkins masters are hosted as deployments where each team has a separate namespace. Resources of the master are shared across the entire EKS cluster
We are using Argo CD for deploying all the different Jenkins masters, and those Jenkins masters YAML files were stored under a Git repo
We have a dedicated Argo CD app where we can update the resources or any configuration changes for a specific Jenkins master

EKS Jenkins architecture:

3. Gather Jenkins public image from Docker repository

Gather the necessary Docker image from Docker Hub that matches the current Jenkins version in use for each product

Tag and push the Jenkins Docker image to an ECR to avoid throttling due to image pull restrictions. This avoids Docker Hub rate-limiting so we can pull the images within the private network of our AWS account
Once an image is pushed to ECR, make use of that image in the kustomization file of the respective Jenkins master

4. Migrate Jenkins data

Export Jenkins job configurations, settings, and data from OpsWorks
Migrating the Jenkins data is a bit time-consuming. A couple of teams have Jenkins masters larger than 300 GB
There are different access points for different Jenkins masters. Mount them to the respective Jenkins master in OpsWorks and then initiate the data transfer

Note: Amazon EFS access points are application-specific entry points into an EFS file system that make it easier to manage application access

Once the data transfer is finished, remove conflicting files like jenkins.fingerprints.GlobalFingerprintConfiguration.xml and fingerprints folder. If those files are not removed, they tend to cause Jenkins master startup failure

5. Update DNS and networking

Update DNS records to point to the EKS Jenkins ALB
Use Ingress to set up the networking policies and security groups for the Jenkins masters
Use external DNS to enable the internal communication between the Jenkins master and slaves

6. Configure Jenkins on EKS

Deploy Jenkins masters on the EKS cluster using the Argo CD deployment tool
Argo CD will take care of configuring necessary Kubernetes resources such as ConfigMaps, secrets, pods, service, etc

7. Test and validate

Once the Jenkins master is ready, log in using one of your sign-in methods (we are using Google sign-in)
Conduct comprehensive testing to ensure that Jenkins jobs execute as expected
Verify integrations, plugins, and dependencies in the EKS environment

Technical aspects and best practices

Plugin compatibility: Ensure that Jenkins plugins used in OpsWorks are compatible with the EKS environment. Update or replace plugins as needed
Security and access controls: Review and update security policies, IAM roles, and access controls to align with EKS best practices
Volume management and safe data transfer: We opted to make use of EFS volume as storage instead of EBS based on advantages such as elastic scalability, no pre-provisioning, cost efficiency for shared workloads, and regional and cross-AZ availability. For data transfer, mount the access point of EFS volume into the OpsWorks node and initiate the data transfer to keep the transfer process private and secure
Version management: Version management and configuration changes are easy with EKS compared to OpsWorks
Effective utilization of resources: EKS allows for more granular control over resource allocation. We can define resource requests and limits for Jenkins pods and optimize resource usage

Challenges faced during migration

Data transfer: Data transfer is one of the sensitive components of dealing with high data volumes such as 250 GB. We initiated the transfer/copy as a background process but still observed the transfer process being killed intermittently. To avoid this, we used the nohup command
Downtime: Since we are not following the high-availability clusters model in our Jenkins environment, product teams faced certain downtime during the migration
Slave connectivity issues: In OpsWorks, we enabled Java Network Launch Protocol (JNLP) connectivity of slaves using the node IP followed by the 50000 port of Jenkins. Post-migration, we started observing connectivity issues since we can’t control the node IP assignment over a pod. So we used external DNS and enabled slave connectivity
Jenkins service abruptly restarting: Post-migration, we observed the Jenkins service abruptly restarting and affecting availability. While debugging, we found the issue was with the old data of fingerprints cached in the current setup. Post removal of jenkins.fingerprints.GlobalFingerprintConfiguration.xml and the fingerprints folder, the service ran without any issues
Performance issue: Post-migration, we observed a couple of Jenkins masters taking longer to load and even observed a huge increase in job execution times compared to the environment in OpsWorks. During the analysis, we found that the instance type in the EKS was smaller families that provide less compute ability. To overcome this, we created different auto-scaling groups (ASG) nodes for the higher-end machines and moved the affected Jenkins masters to those nodes, which increased performance

Conclusion

Migrating your Jenkins master from AWS OpsWorks to Amazon EKS is a strategic move toward a more scalable and containerized infrastructure. By following the outlined steps and best practices, you can seamlessly transition your Jenkins environment and leverage the benefits offered by Kubernetes.

The post Migrating Jenkins master from AWS OpsWorks to Amazon EKS appeared first on Thoughts on Refreshing Business Software.

Modernizing analytics data ingestion pipeline from legacy engine to distributed processing engine

Dineshkumar Shanmugam — Tue, 12 Mar 2024 14:01:00 +0000

Analytics is a key part of all Freshworks products. It gathers events from these products and puts them together in one central analytics platform. Then the data is transformed based on preset rules, making sure it’s organized by products and customers. Once the data is ready, we can use it for visualization and other tasks. This whole process happens continuously, and we promise that any event will be in our analytics system less than 15 minutes from when it happens.

To make this happen, we have a persistent pipeline. This pipeline has different jobs: It pulls data, transforms it, groups it together, and puts it into a special storage place regularly.

As Freshworks’ products keep growing, we handle more and more data in Analytics. We process millions of messages per minute.

At the busiest times, we get about 800,000 messages per minute. After we transform and multiply them, they become about 1.3 million messages per minute.
When we temporarily move data from our products to Analytics, like for special projects, it can go up to 3 million messages per minute. This needs extra infrastructure to handle.

Managing a large volume of data in a short time is demanding. It poses challenges in terms of expenses, scale, and speed. This article narrates the journey of transitioning our analytics data platform from traditional high-scale data handling methods to a modern distributed and auto-scalable system.

Legacy system

The legacy system used the traditional Python consumers and API layer to ingest data into a time series database in real time. This was built about 8 years ago and has been through many scale and performance enhancements, but it preserved the core API. The system was able to handle the scale efficiently with horizontal scaling (adding more infrastructure) until its deprecation. But the downside was the maintenance and cost required to achieve scale.

Here’s how the legacy system operated:

A group of horizontally distributed Python consumers
Followed by a separate system built with Ruby on Rails that received data through an API in real time and batched into CSVs
A scheduler written in Apache Airflow handled the loading of CSV files into the target warehouse

Here are the key functions of the system:

Continuously receive messages from the Central Kafka
Identify and remove duplicate messages while transforming them in real time
Validate and enforce a specific schema and structure for the transformed messages in real time
Generate CSV files and upload them to designated S3 buckets at regular intervals
Load the CSV files into a central storage location (target warehouse) at scheduled intervals

Architecture of legacy pipeline

Components and their roles

Central Kafka system

The centralized Freshworks Apache Kafka service is called Central. All the products push their events to Central from where the downstream systems will consume (Analytics is one of the consumers).

Transformation and push consumers

This is the entry point of the analytics ingestion pipeline. These are the set of Python consumers that read the messages from Kafka (Central) and apply the transformation rule configured based on the payload type. The transformed messages are pushed to a success topic in Kafka, which holds all the transformed messages from C1. The transformed message has all the necessary meta information of the target schema and table name along with the necessary data to be inserted into it.

Push consumers read the messages from the success topic and push (API call) the message to the reports platform endpoint. The transformed message is self-sufficient for the API layer to understand and continue the ingestion further.

Data ingestion and inlet layer

The data ingestion layer is the API gateway to which the push consumers push the events. The received events will be validated against the metadata and verified for the quality of the events received.

In the initial design, the API ingested data directly into the time series database. But whenever the traffic surged, the effect was more on the MySQL time series database since it started facing more writes than usual, affecting the overall write time. The ingestion API has to wait until the record is ingested into the time series database and thus the response time of API will be impacted.

To avoid this scenario, a Sidekiq queue was introduced to decouple the API and time series and to do controlled insertion into the time series database. The API layer could scale independently with no impact on the successive layers.

The Inlet layer dequeued from the Sidekiq queue and inserted events into the time series database.

Time series database

The time series database was built in MySQL and stored the incremental data every day.
The entire event is stored in a single column as JSON along with columns such as tableID, AccountID, etc.
This was a daily table and we always stored up to five days of data for fallback and recovery in case of disaster.

Batching layer

This is the batching layer that pulls a batch of data from the time series database and creates the CSV files.

Every row in the time series database has an associated row_id. The audit tables maintain the last row ID for every table that got batched and pushed as CSV files into S3.
This layer gets the last_row_id from the audit tables and pulls the newer rows.
Created CSVs with the set of data required.
CSV files will be uploaded to S3.

Airflow and Snowflake

The airflow layer is a typical scheduler that pulls the CSV files from S3 and loads them into the corresponding table in Snowflake (Upsert for most of the cases). Snowflake is our reporting database, which stores the transformed data for all the tenants.

Challenges in the legacy pipeline

We were experiencing a continuous surge in traffic from our products to Analytics over time. This heightened influx imposed additional strain on the system. Also, our pipeline had to handle both the usual traffic and migration simultaneously whenever needed.

Therefore, in cases where we encountered a substantial migration with a throughput similar to that of our product’s regular traffic, we adopted a more gradual approach, utilizing limited resources. We did synchronize the data, but with a further delay. The delay ranged from four hours to 24 hours based on the volume and criticality of the migration. But for a critical migration with high throughput of approximately 2x, we had to double the infrastructure. And when the migration was complete, we had to decommission the new resources.

This process of scaling in and out came with challenges:

Scalability

The scaling process was carried out manually, often necessitating the ETL engineer’s input to gauge and adjust the scaling levels.
Kafka strictly maintains a one-to-one correspondence between partitions and consumer processes, so the highest achievable level of parallelism for a consumer group was determined by the number of partitions within a given topic.
When we introduced additional consumers, it was crucial to correspondingly increase the number of Inlet machines. This was done to prevent a rapid accumulation of data in Redis.
However, having more Inlet machines translated to an augmented number of connections on the RDS.
Furthermore, the storage capacity of the RDS system depleted more quickly than a typical day when scaling adjustments are made.
Both Redis and Sidekiq limited scaling. Additionally, the time series database presented a bottleneck in terms of scalability, and implementing sharding introduced additional overhead.

Maintenance

Managing multiple intermediary systems between Kafka and the target warehouse posed operational challenges.
Any service downtime within the entire pipeline resulted in overall system downtime, requiring significant on-call effort for maintenance.
The deployment process was complex due to the presence of multiple components.

Cost

The integration of numerous systems led to increased operational costs. Over the years, the overall system costs grew steadily, driven by the rising traffic from various products.
Any scale-out in the consumer system necessitated a corresponding scale-out in subsequent systems, resulting in a multiplication of the overall pipeline costs.

Solution

The solution is to replace the entire batching layer (highlighted in pink in the legacy architecture diagram above) with a single distributed streaming pipeline in Apache Spark. The internal details are covered in the next section.

New system

Our solution was to build a streaming pipeline in Apache Spark, replacing the legacy pipeline.

The streaming pipeline reads messages from Success Topic and does all the validation, schema enforcement, schema evolution, segregation, and batching. Finally, it creates CSV files and loads them into S3. The output file from the streaming pipeline is similar to the output files from the legacy pipeline.

The Spark pipeline covers all the functional and non-functional requirements in parity with the legacy pipeline. It also has an advantage when it comes to costs, ease of maintenance, performance, and scale.

Workflow steps

Consumption as DataFrame

The Spark application, triggered at predefined intervals, initiates the journey by reading data from Kafka in the form of DataFrames. The Dataframe at this point looks like this:

TopicName	Offset	Partition	Value
Topic 1	Offset 1	1	abc
Topic 2	Offset 2	10	xyz1

The Value column holds the entire message. Since the Spark pipeline consumes from the Success Topic, the value column holds events published by the transformation consumers (C1 consumers). This transformed message holds all the necessary details, such as:

Target product (Freshdesk, Freshservice, CRM, etc.)
Target TableName

Thus the transformed message has sufficient information for the Spark pipeline to proceed

The Spark application connects to the meta database and pulls the schema (table, column, mandatory keys, primary keys) of all the tables for the product

Filtering DataFrame for each table and applying schema

The DataFames undergo filtering to create filtered DataFrame tailored to specific tables; e.g., if we have 100 tables to process in a job, there will be 100 filtered DataFrames.

The next step is to assign schema to each of the filtered DataFrames based on the target product and table name.

The resultant DataFrame after this step looks identical to the target table. For example, the filtered DataFrame of the Tickets Table of Freshdesk looks like this:

AccountID	TicketID	Col1	Col2
123456	8439	tkt	it

File creation

The filtered DataFrame can be written as a file into the load bucket.
The output is neatly consolidated CSV/Parquet files for each table.
These files, in turn, serve as the direct input for loading data into their corresponding target tables.

Upsert using staging table

Since the incremental batch file has to be merged into the target table for every batch, we have an intermediate staging table to achieve the same.
The staging table acts as a temporary holding ground for new data, enabling efficient upsert operations against the final target table.

In addition to incorporating all the necessary features into the existing pipeline for parity, the new pipeline boasts several technical advantages:

The entire pipeline is driven by configuration settings. The addition, removal, or modification of an ingestion pipeline can be accomplished through configurations
Making changes to a table—whether adding, removing, or modifying—will not impact the pipeline and requires no alterations to the code. The pipeline seamlessly handles schema evolution for any modified or added tables.
Spark’s auto-scaling feature proves to be advantageous. Parameters for controlling auto-scaling can be selected and configured based on our specific requirements.
Manually scaling up or down a cluster is also straightforward and, once again, driven by configurations.
The need for Redis is eliminated, given that Spark functions as an in-memory compute engine.

Key features and implementation details

Schema evolution

New features often require the addition of tables or columns
When a column is removed at the source, a corresponding removal from the analytics schema is essential to maintain coherence
Changes in data type from one form to another in the source system

To address this, we’ve implemented a schema evolution flow:

Schema revision job: This job triggers the creation of a Redis key, timestamped with the current time (time.now), signifying the occurrence of a schema revision
Spark integration: Before processing each microbatch, the Spark component checks the Redis key against a variable storing the previous timestamp. If a change is detected, it retrieves the updated metadata from the database.

Observability

Analytics is a P0 system and observability is crucial to identify any issue or incident on time
Mean time to detect and mean time to resolve have to align with the commitment of the platform to the customer

Operational metrics

For observing the health, stability, and performance of the application, some of the key metrics are published to Trigmetry (central service for observability in Freshworks).

All the operational metrics are published to Trigmetry from the executor machines of the cluster in regular intervals to ensure continuous observability and alert the engineers in case of anomalies.

Number of incoming events in a micro-batch
Number of messages processed
Number of rejected messages in transformation
Batch duration

Consumer lag metrics

Unlike a regular Python consumer, the Spark streaming application does not start with a static consumer group, so it does not commit offsets to broker. So the traditional way of calculating the consumer lag with the consumer group is not possible.

Instead, it stores the offsets in the checkpoint file. So we have written a standalone scheduled Python consumer to poll the latest offset for the topic and compare against the offset in the checkpoint file to calculate the lag.

Key metrics captured

Input records per batch: When dropped to zero continuously, alert as ‘Spark streaming application failure’
Processed records per batch: When dropped to zero continuously alert as ‘Spark streaming application failure’
Rejected records per batch : Alert: Spark streaming records rejected
Batch duration: When the batch duration crosses the trigger interval but input records and processed records are more than zero, alert as ‘Spark streaming batch spillover’
Consumer lag: The consumer is lagging behind beyond the threshold value, alert as ‘Spark streaming lag above threshold’

This will indicate the following scenarios:

Surge in incoming traffic
Consumer is not running for a period of time
The processing rate is behind the incoming rate

Summary of improvements

Scalability

The ad-hoc pipelines can be scaled independently based on the input traffic without any bottlenecks. The new pipeline being a single standalone application has no limitation or bottleneck.
Scaling operations are more efficient. Addressing lag issues is considerably swifter and simpler. Dynamic allocation and auto-scaling features allow us to increase infrastructure and throughput within five minutes of hitting lag thresholds.
The previous limitations tied to Redis and MySQL, which hindered scalability, have been eliminated.
If an incident is identified and requires re-sync of data in high volume, the streaming pipeline can complete it within a few hours, versus days in the old pipeline. This is again because of the scaling capability being independent and has no dependencies on the live pipeline.

Maintenance

Deployment has become more straightforward in Jenkins. Adding a new product is a matter of including a new entry in the configuration file and deploying the application. This action initiates the cluster, deploys the app, and handles the distribution of CSV files, logs, and metrics.
The overall maintenance efforts have reduced drastically after the migration. We have a single application and a cluster setup in place of five components before. The number of alerts has reduced by 75%.

Availability and fault tolerance

The system boasts high availability with the supervisor in place. Any failure or machine crash triggers an immediate automatic restart by the supervisor.
The pipeline, even in the event of a failure, resumes from the point of interruption. The checkpoint is managed locally, independent of Kafka. We can also revisit any previous point in time to reprocess data, which proves invaluable for message reprocessing.

Extensibility

The groundwork laid in the platform simplifies the process of modernizing C1.
For near-real-time (NRT) use cases, achieving lower trigger intervals is now more feasible, provided the target supports NRT ingestion.

Accuracy

The new pipeline accommodates a broader range of emojis and characters that were previously unsupported. This enhancement has notably improved accuracy in VARCHAR columns.

Cost savings

Over half of the infrastructure expenses have been saved through optimizations in our AWS infrastructure.
Streamlined logging processes have contributed to cost savings on Haystack. Additionally, Redis usage has been reduced.

Cost savings across regions
Region	Savings percentage (legacy vs. stream)	Daily savings ($)	Yearly savings ($)
AU	48%	63	23K
IND	80%	623	227K
EUC	66%	261	95K
US	82%	1361	496K

Total annual savings: $841K + Haystack & staging ($260K) = $1 million

Effective use of EMR

The entire Spark application is hosted in EMR clusters. Amazon EMR is the service provided by AWS in which we can deploy and run our Apache Spark application on-demand.

Apache Spark being a distributed engine, the best performance is achieved only on the right strategy of distribution of workload. So, the best results can be achieved only by choosing the optimal values in resource allocation based on our use case, efficient memory, and disk optimization in the EMR clusters used.

Optimisations on EMR

Resource allocation

Two core containers for small applications (XL machines)
Four core containers for big or medium-sized applications (2XL machines)
Driver is of equal size of executor
Master node is generally 2XL since it has to coordinate multiple applications

Memory optimization

Used Persist and Unpersist effectively. Since Spark consumes a batch of messages in every trigger, the input dataframe after initial validation is cached so that the subsequent filtering for each table is done on the cached dataframe. This eliminated the redundant operations starting from the source. At the end of the micro-batch, the data frame is un-persisted to free up the storage memory for the next micro-batch.
The entire DAG involves no shuffling since there is no join or repartition involved.

Disk optimization

The cache is done in memory and not spilled onto disk. Since the trigger interval is usually one to five minutes, the volume of messages in a batch will be a few hundred megabytes and will fit into memory; there will be no IO to disk.
Faced application failures due to disk being full since the logs are getting pushed to HDFS by default.
- Changed the configuration to route Spark event logs to S3 instead of HDFS.
- Enabled the rolling logs and tuned the logging configurations
  - –conf spark.eventLog.dir=s3://{event_log_bucket}/{event_log_directory}/ ‘
  - –conf spark.history.fs.logDirectory=s3://{event_log_bucket}/{event_log_directory}/ ‘

[
{
"Classification": "spark-defaults",
"Properties": {
"spark.history.fs.cleaner.enabled": "true",
"spark.history.fs.cleaner.maxAge": "2h",
"spark.history.fs.cleaner.interval": "1h",
"spark.eventLog.rolling.enabled": "true",
"spark.eventLog.rolling.maxFileSize": "10m",
"spark.history.fs.eventLog.rolling.maxFilesToRetain": "5",
"spark.eventLog.rotation.interval": "300"
}
},
{
"Classification": "yarn-site",
"Properties": {
"yarn.log-aggregation.retain-seconds": "14400",
"yarn.log-aggregation-enable": "true"
}
}
]

Conclusion

The challenges faced in the legacy pipeline and resolved in the new pipeline are scalability, cost, maintenance, and high availability. The new pipeline operates at much higher scale with a much lower cost. It needs less effort for maintenance and keeping the lights on.

There are modern techniques and advancements happening in the world of distributed computing, and this sets the path forward to build an efficient data ingestion system.

Upcoming projects in data platform

This modernization is the first project in this space, and there are many projects to follow:

Modernization of C1 consumers and integrating C1 into the application. One application from Kafka to Batch files.
NRT pipelines from source to sink
Building a data lake solution in addition to data warehouse to support machine learning workloads

Upcoming articles in this series

Deep dive into choosing the right infrastructure in EMR for our use case
Deep dive into observability and building a consumer lag dashboard for Apache Spark streaming
Deep dive into effective use of cache and persist in a Spark application

References

https://kafka.apache.org/intro
https://spark.apache.org/
https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-what-is-emr.html
https://spark.apache.org/docs/latest/sql-programming-guide.html

The post Modernizing analytics data ingestion pipeline from legacy engine to distributed processing engine appeared first on Thoughts on Refreshing Business Software.

Strategies for preventing leaks in Ember JS-based applications

Vasantha Kumar — Wed, 21 Feb 2024 00:32:12 +0000

Editor’s note: This is the second installment of a two-part series on exploring memory leaks and their proactive identification within the Ember JS framework. In Part One, we explored the nuances of memory leaks and underscored typical pitfalls to be cautious of when crafting single-page applications. Now, our focus shifts to exploring targeted strategies for averting memory leaks within the Ember JS framework. Additionally, we will delve into proactive approaches and available tools to identify and address these issues.

Optimal strategies for preventing memory leaks

Here are some of the best practices in the Ember JS framework.

Proper component cleanup:

Use the willDestroyElement and willDestroy component lifecycle hooks to release any resources such as event listeners, timers, and external references
Remove any event listeners or timers you’ve added within the component when it’s destroyed to prevent them from accumulating

Use services:

Instead of storing application-wide state directly in components, consider using Ember Services. Services help centralize and manage shared state and data more effectively.

Binding removal:

Remove bindings between objects or components when they are no longer needed to prevent unnecessary memory retention.

Use Ember Data properly:

When using Ember Data, be cautious about creating a large number of records in the store, as they can accumulate in memory. Use unloadRecord or unloadAll to remove records when they are no longer needed.

Additionally, this is a valuable GitHub repository of illustrative instances of memory leaks. This repository serves as an excellent initial resource for recognizing memory leaks and learning how to rectify them.

Proactive tools for detecting memory leaks

Chrome heap snapshot analyzer

Google Chrome provides an excellent memory allocation analyzer for visualizing an application’s heap memory usage. Here are the steps to perform memory leak analysis within an Ember test suite:

1. Ensure comprehensive test coverage for your application. More tests mean a higher chance of identifying memory leaks

2. Run Ember tests, initially focusing on specific modules. This allows you to isolate tests and incrementally address any memory leaks

JavaScript:

ember test –serve –filter=

3. After a test run, refer to the Chrome DevTools guide on how to capture a heap snapshot

4. If you have followed the guidance in ember-memory-leak-examples, you’ll know that Ember stores everything in a container object. Examine the heap snapshot for these containers and filter them to pinpoint the root causes of memory leaks

5. Consider using a tool like Cleanheap, which can clear Weak Retainers or Weak References that may linger in the snapshots. Re-upload the cleaned-up snapshots (using the “Load” option in Chrome profiler) to the browser

In this image, an option is shown to load an external heapshot file

6. Identify any remaining Container classes in the snapshots, fix them in your code, and repeat the process from step 2 until no such classes are present

7. This is an ongoing process that should be integrated into your development workflow to continuously identify and resolve memory leaks

This process provides a comprehensive approach to detecting and resolving memory leaks, but it relies heavily on manual effort and may not be easily scalable for larger development teams working on multiple aspects of the application.

To streamline and automate this process, create a library that significantly enhances the developer experience.

ember-cli-memory-leak-detector

This is an Ember add-on designed to aid in the detection of memory leaks within your application. What sets it apart is its proactive approach, which allows you to identify leaks during development, fostering a leak-free test-driven development environment.

When integrated into your test suite, this add-on identifies classes that are retained within the application and flags any modules where issues arise. Additionally, it provides a clear and informative report on the retained classes, making it easier to spot and address potential memory leaks.

Figure 1: Image of a possible memory leak code with event listeners

In Figure 1, we try to remove an event listener while the component is being destroyed. But the bind function will create a new function that loses the reference, thus causing a memory leak.

Figure 2: Image of failure test cases after detecting a leak

In Figure 2, the add-on captures those classes retained ToDoistComponent since there is a memory leak.

Figure 3: Image of a code block without any memory leaks

In Figure 3, the bind function reference has been captured in a variable, thus it can be removed safely.

Figure 4: Image of memory leaks been fixed and test cases passed

In Figure 4, the add-on passes since there are no leaks and thus no classes have been retained.

As promising as this add-on is, it currently comes with a few limitations:

It lacks support for Ember Exam, an add-on that enables test execution in random order, parallel mode, and more. You can follow the progress on this issue
There are instances where it may take longer than expected to display the results in the browser, leading to occasional browser timeouts

Nonetheless, despite these limitations, this add-on significantly contributes to the application development process by providing a faster feedback loop for identifying memory leaks within the system.

The post Strategies for preventing leaks in Ember JS-based applications appeared first on Thoughts on Refreshing Business Software.

Grasping memory leaks in single-page applications

Vasantha Kumar — Wed, 21 Feb 2024 00:29:21 +0000

Editor’s note: This is the first installment of a two-part series on exploring memory leaks and their proactive identification within the Ember JS framework. Here, we delve into the causes and consequences of memory leaks in application development. In Part Two, we discuss proactive strategies for identifying and mitigating memory leaks specifically within Ember JS. Through practical examples, we’ll equip you with the knowledge and tools necessary to conduct thorough memory leak analysis and optimize your Ember applications for peak performance.

Introduction

Heap is a region of a computer’s memory used for dynamic memory allocation, where data is stored and managed during a program’s execution. In the context of web browsers, heap memory is essential for managing the memory allocation for JavaScript and the Document Object Model (DOM).

In a browser like Google Chrome, the heap size refers to the amount of memory allocated for JavaScript execution within the browser. Chrome’s V8 JavaScript engine, which powers the browser, manages this heap. The heap size can vary depending on several factors, including the user’s system resources and the specific version of the browser.

Understanding memory leaks

Memory leaks occur when memory that is allocated for an object in a computer program is not properly released, even after the object is no longer needed. This can happen because of programming errors or inefficiencies in the way memory is managed by the program.

Think of memory in a computer program like water in a bucket. Just as you fill a bucket with water when you need it and empty it when you’re done, a program allocates memory for objects when they are created and releases that memory when the objects are no longer needed. However, imagine there’s a small hole in the bottom of the bucket. Even when you’re not using the bucket, the water drips slowly through the hole, wasting resources.

A memory leak is like that hole in the bucket—memory that should be released isn’t, leading to a gradual buildup of unused memory that can eventually slow down or crash the program.

Single-page applications: Vulnerable to memory leaks

Single-page applications (SPAs) built on frameworks like Ember, React, or any other modern JavaScript framework are not inherently more prone to memory leaks than traditional multi-page applications. But they can be more susceptible to memory management issues due to their specific characteristics and complexities and the way these frameworks work.

Here are some reasons why SPAs may be more prone to memory leaks:

Longer lifespan of pages: SPAs load a single HTML page and use it throughout the application’s lifetime. This means that objects and data associated with a page may persist longer in memory, increasing the chances of memory leaks if they are not properly released when no longer needed

Event listeners: SPAs often use event listeners to handle user interactions and updates. If these event listeners are not removed when they are no longer needed, they can lead to memory leaks

Reference cycles: In SPAs, reference cycles can easily occur, where objects reference each other in a way that prevents them from being garbage-collected. For example, a component can reference an object in a closure, preventing the closure from being collected

Data caching: SPAs often cache data for better performance. If data is not managed and cleared properly, it can lead to memory bloat and potential memory leaks

Mitigation of memory leaks

To mitigate memory leaks in SPAs, developers need to be diligent about managing memory and resources. This includes:

Properly managing the lifecycle of components, event listeners, and objects.
Identifying and breaking reference cycles.
Using built-in tools and libraries for memory profiling and debugging.
Thoroughly testing and profiling applications to catch and resolve memory leaks.

Monitoring memory leaks

We can use browser developer tools (e.g., Chrome DevTools, Firefox Developer Tools) to profile memory usage. Look for consistently increasing memory consumption.

Heap snapshots in developer tools can reveal retained objects and help identify which parts of your code are causing memory leaks.

In this image, memory is allocated in heap after a snapshot taken in Chrome

Conclusion

In Part One of this series, we’ve explored the fundamentals of memory leaks, including their causes and consequences in application development.

In the next installment of our series about mastering memory management in Ember development, we’ll dive deeper into proactive strategies for identifying and mitigating memory leaks within the Ember JS framework. From advanced debugging techniques to leveraging specialized tools, we’ll provide actionable insights and practical examples to help you optimize your Ember applications for peak performance.

The post Grasping memory leaks in single-page applications appeared first on Thoughts on Refreshing Business Software.

How to approach A/B testing and controlled launch of experimental features for product-led growth

Arun Jerry — Fri, 12 Jan 2024 20:29:34 +0000

What is A/B testing and how is it useful for product growth?

A/B testing is a product methodology for doing research on feature adoption, from small buttons to large-scale features. We do it to get insight on how each variant performs and which variant has the highest success ratio of feature adoption.

There are different ways to get the user’s attention on a new feature in the product, for example an in-product tour or a nudge. Based on the user’s interaction with these nudges, the A/B app will suggest which feature or experiment has a higher success ratio. These insights are mostly tracked using Heap or other third-party analytics services; on occasion, in-house trackers are built.

Heap is the industry best, and its wide range of insights help us get even better user attention on product details.

How is our A/B service different from traditional A/B services?

Here’s how we use A/B services in Freshworks products:

The isolated data approach

Moving feature configs/rules outside the product as a service.

We have several combinations of properties based on which users will view a particular feature. Some of the common and generic properties that apply to almost all products are:

Email: generic or business
Region: U.S. or non-U.S.
Account state
- Trial
- Active
- Free
- Suspended
Monthly recurring revenue (MRR) range
Plan subscribed to
A/B required
- When A/B required check is enabled, it randomizes the truth based on even/odd account IDs.
Targeted IDs

The rule will contain all these parameters plus a feature key that uniquely identifies the feature.

Key takeaways while moving the configs outside the product

We don’t have to wait for the deployment cycle
Code independence
Lighter app
No live sync required
No data is shared externally

1. We don’t have to wait for the deployment cycle

Usually, we show a feature based on some conditions of the above mentioned properties. When we need to change these conditions, we have to wait and change only in the deployment cycle. The flexibility is missing.

In our method, we manage the configs outside the codebase, and it can be updated using a specific GUI. Now, it is easier for the product to make changes directly to the rules, and this is reflected swiftly in the product.

2. Code independence

Now that the config app has a UI on its own, the product team can get access with levels to control the availability of the feature (features that are not tied to the plan).

This will improve the speed of our experiments, and we can track them in Heap to take actions quickly based on the insights. This helps strengthen and speed up our decision-making.

3. Lighter app

The app is very light since it is only holding configurations and keys. These will create edit and modifiable access, as per the roles.

To be in sync, a dev also will get access to this platform since they will use this key from the app to code conditions.

4. No sync required

In the traditional approach, account information or account attributes are attached to these feature keys, this requires a sync between the feature app and the product instance. For example when a customer upgrades from a plan A to plan B, account attributes have changed, hence this will cause a reliability issue with the feature app as the app holds plan info as plan A. Hence a sync is required in the traditional approach.

Our app does not require sync because our app provides the applicable features based on the parameter sent to the app, also the changes in account and subscription properties happen periodically hence the sync isn’t required in our case

The product holds parameters like account ID, MRR, plan, and account state. At the point of deciding which features are applicable for the account, a call is made and kept in the store. Once in every cycle, it is fetched again when required. Hence, the app does not require sync.

5. No data is shared externally

Sometimes we have policies that prevent user data being shared outside the product or account. So we have two approaches here:

Send account information and get the applicable features
Get the configs and store in the app database, then figure out applicable features later

The first option can be used if there are no rules pertaining to data being shared externally.

The second option can be used when data needs to stay within the app. The config will be fetched and the applicable features will be arrived upon.

A basic architecture of the app

There are three major entities that contribute to this process.

The config app or the A/B testing framework
Frontend service that interacts with features based on the keys present
The backend app that will provide the keys from the A/B app

We’ve covered the backend A/B service/app. Let’s see the frontend service.

When does the config data get fetched for each account?

Service is a concept that is present in Ember.js and is being leveraged for our scenarios here. A service is a singleton that carries a set of values globally across the app.

Our service is part of the frontend app. It will be called during the app initialization, then it fetches the information from the config app and gets the applicable features.

There is a method as part of the same service mentioned above to check if a given feature example “support_feature” is present in the list of applicable features for this account. This results in a Boolean value that decides whether to show the feature in the product and how to display it, thereby enabling the app to act differently based on the response from the config app.

The product team can change the config data so the app will pull data for every five route changes or any other desired period, so the data is in sync with the config app.

Scalability

The sync can be scaled up to a push notification level where an event can be triggered to the backend app. This invokes a change in the frontend app to get the changes from the config app. There are several ways to get this synced up on a live basis. This is a scope that is dependent on each product’s scenarios and how likely the features are to be changed.

Handling other features and use cases

Since we have several different parameters based on which feature is displayed, we have a lot of scope to do a controlled experiment.

Controlled launch
Beta launches
- For specific accounts or for other params
Kill switch
- Disabling features when an error occurs

Future scope

There can be several new params added, which can support new scenarios.

Scheduled release
Automated release
- Where the feature experiment has a time limit—for example, a year-in-review feature
Sentiment-based features

Another great feature is to track and report the experiment features to be automated. If tracking is enabled, the performance of these flows should be gathered from the Heap app.

This can be later translated into insights, and those can be automated as well.

The post How to approach A/B testing and controlled launch of experimental features for product-led growth appeared first on Thoughts on Refreshing Business Software.

Parsing Shopify Liquid templates in frontend unit testing for SSR applications

Shanmuganathan Balaraman — Fri, 12 Jan 2024 20:23:25 +0000

With fully Javascript-generated applications, testing frontend code has become easier and more reliable because all the rendering logic resides in the frontend and only required data is pulled via APIs. But in the case of server-side-rendered (SSR) applications, testing frontend code is challenging.

In SSR code, rendering logics will be on the server. When a browser requests a page, server-side programming languages (such as Ruby, PHP, Java, or Python) will hydrate the required data in the frontend template and return a fully rendered page as a response. So providing a rendering logic for the testing environment is the biggest challenge. Two common ways to solve this problem are:

Providing mock HTML data: In this method, the dev will create a mock HTML structure similar to server-rendered data. The main drawback in this method is maintaining parity—whenever a rendering logic changes in the backend, we need to update the mock data accordingly, otherwise test cases won’t give reliable results

Running a separate server: In this method, we need to run our entire application on a separate server and perform testing on it. This will overcome the above drawback but will increase infra costs and testing time based on the size of applications

How we solved this problem

For the Freshworks customer portal, we use Liquid written on Ruby for SSR. Since both of the above methods have their own challenges, we decided to introduce a hybrid approach in which real Liquid templates will be provided as mock data for the frontend test suite. But how will the frontend test suite understand Liquid templates?

LiquidJS

LiquidJS is a Javascript version of Shopify Liquid. It provides a standard Liquid implementation for the Javascript community. We use LiquidJS to parse and render our Shopify Liquid templates as mock data for our frontend test suite.

Implementation

In our implementation, we’ll use Jest alongside JSDOM and LiquidJS. Jest is a popular Javascript testing framework by Facebook known for its simplicity, fast execution, and easy setup with minimal configuration. JSDOM provides a virtual browser environment, essential for testing frontend code in a non-browser environment like Node.js, as it simulates browser behavior and interactions without the need for an actual browser.

First, install the necessary npm packages with npm install jest and npm install liquidjs. Then, let’s create a product.liquid file, which simply iterates over a given object and constructs a list.

Now let’s write a test to parse the above Liquid file with Freshdesk and Freshservice as products param, render in JSDOM, and test whether the first list text is Freshdesk. By default, Jest will not run the JSDOM environment, so we are explicitly instructing Jest to use the JSDOM environment.

Then we need to read the Liquid file and append the content of the rendered Liquid file into JSDOM. Here, the renderFileSync method takes two parameters. The first parameter is file name (we don’t need to give the file extension and full path, since it’s defined while initializing), and the second parameter is input for the Liquid file.

Finally, we need to check if the content of the first li is Freshdesk.

Our final code will look like this:

The result of the test will be:

Handling customized Liquid tags/filters

LiquidJS has all the common tags and filters implemented in Shopify Liquid; however, we have a lot of custom tags and filters. So we need to register them for LiquidJS to understand. A simple example for implementing a custom tag called upper:

This hybrid approach, leveraging real Liquid templates as mock data without the need for a backend server, offers a nuanced testing solution. While it demands some initial effort in setup and writing custom tags/filters, the payoff is substantial. It markedly increases test accuracy, ensuring frontend stability even when backend templates change. This method effectively bridges the gap between frontend testing reliability and backend dynamics, making it a more effective testing solution for frontend apps that utilize Shopify Liquid templates in the backend.

The post Parsing Shopify Liquid templates in frontend unit testing for SSR applications appeared first on Thoughts on Refreshing Business Software.

Enhancing single-page application performance: Strategies for improved user experience

Vasantha Kumar — Fri, 08 Dec 2023 19:05:35 +0000

Freshworks developed the web application Freshchat as a single-page application (SPA). SPAs are designed to initially fetch a document and then sequentially download all their associated assets.

This design choice can lead to suboptimal application performance, as the rendering speed relies heavily on various factors within the client’s web browser, such as the CPU, GPU, network latency, and more.

There are several strategies to address this performance challenge, with the most prominent one being to minimize the amount of code sent to the browser. Some other effective approaches include:

Code splitting and dynamic imports of third-party libraries: Break down the code into smaller, manageable chunks and only load the necessary parts when they are required, particularly for third-party libraries.
Utilizing lightweight libraries as an alternative to heavier ones: Opt for smaller and more efficient libraries rather than those that are bloated or resource-intensive.
Leveraging browser-native web APIs instead of polyfills: Instead of relying on polyfills to bridge compatibility gaps, use the native web APIs supported by modern browsers.
Configuring build tools to ship ES6 code: Ensure that your build tools are set up to deliver code in the ES6 format, taking advantage of its improved efficiency and features.

By implementing these strategies, Freshchat enhanced its performance and responsiveness, providing a smoother user experience while optimizing resource utilization.

Ways to reduce the bundle or asset size of an application

Code splitting

Within an SPA, assets are categorized into two types: application code and vendor code. These two files are typically downloaded and executed sequentially, causing the user interface to block rendering, which degrades the user experience. This leads to increased first contentful paint (FCP) and time to first byte (TTFB).

In our efforts to reduce the size of these assets, we’ve leveraged code splitting, which enables breaking down the code into smaller, more manageable segments. This approach allows us to transmit only the code required for the initial rendering of pages. For subsequent changes in routes, additional code chunks are retrieved from the server. As a result, this strategy significantly enhances both FCP and TTFB, ultimately improving the user experience.

In our Freshchat Messaging widget, we were able to achieve this code splitting with the help of Embroider from the Ember framework. Below are the results.

Before code splitting

After code splitting

Dynamic imports

Dynamic import is an ECMA Stage 4 feature that enables on-demand loading of scripts.

In our system, specific third-party libraries are loaded during the initial rendering process. This not only consumes a significant amount of bandwidth but also extends the rendering time. By implementing the Ember auto-import add-on, we harness the capabilities of dynamic imports, which enhance rendering speed by loading third-party assets only when they are needed.

This approach has proven highly effective in reducing asset sizes within both the agent portal and web widget components.

In our agent portal, we have reduced the Vendor JS size by around 250 KB by dynamically importing libraries like Highcharts JS, Medium Editor, Lunr JS, and internalization language files.

Outcome of dynamic loading:

No.	Library	Size reduced in Vendor JS
1.	Highcharts	295 KB (95 KB compressed)
2.	en-us language file for internalization	470 KB (125 KB compressed)
3.	Medium Editor	102.4 KB (26.KB compressed)
4.	Lunr	40.96 KB (8.76 KB compressed)

Using lightweight third-party libraries

Both of our applications contain libraries that are sizable and require substitution.

Moment JS

Moment JS, a valuable date-time library that has been in use for a decade, has played a crucial role. However, it comes with a sizable footprint of 70 KB, contributing to increased bundle sizes.

To address this, we’ve made the transition from Moment to Day JS, a lightweight alternative that weighs in at just 2 KB. This shift has significantly reduced bundle sizes.

We also incorporated INTL native web APIs whenever they are needed for specific date manipulation tasks.

To format a date with specific locale, we can do something like this:

JavaScript:

console.log(new Intl.DateTimeFormat('en', {weekday: 'short'}).format(date))

// This will print current day in shorter format

JavaScript:

console.log(new Intl.NumberFormat('en-US').format(100000))

// This will print '100,000' according to en-us format

jQuery

In the days of Internet Explorer, jQuery emerged as a lifeline for web developers. This DOM utility library simplified the process of writing cross-browser compatible code, shielding developers from the intricacies of different browsers’ implementations.

However, as we transitioned into the modern browser era, the necessity for jQuery diminished. Modern browsers offer comprehensive native support for all DOM-related functions.

This transition to native functionality is not only a welcome change but also results in a significant reduction in bundle size, shedding 30 KB from the application.

Lodash

Lodash, a utility library, provides an array of functions such as cloning, mapping, and currying. It’s worth noting that this library can be substituted with plain JavaScript code, offering an alternative approach where Lodash is not required.

Lodash is quite substantial in size and should be used judiciously, as it has the potential to significantly increase the bundle size. While Lodash does support tree shaking with its ES6 modules, it’s still preferable to rely on native ES6 standards readily available in modern browsers, eliminating the need for this heavy library.

We are currently in the process of transitioning away from Lodash in our applications. As an initial step, we’ve implemented ESLint rules to discourage the use of Lodash in favor of plain functions.

Using browsers’ native web APIs

Polyfills represent code snippets designed to extend modern functionality to older browsers that lack native support for these features. However, this convenience comes at the expense of increased bundle size.

For instance, we can leverage native Fetch for handling HTTP requests instead of relying on a polyfill. Internally, we’ve crafted a native Fetch wrapper class that streamlines the process, adding essential headers and options for simplifying API calls.

Similarly, for date manipulation tied to specific locales, we can tap into the native internationalization APIs (INTL) instead of resorting to polyfills.

In our application, we strongly recommend the use of native web APIs while avoiding unnecessary polyfills. This reduces code bloat that is transmitted to the browser.

Using build tools configuration

Within our web applications, we use Ember CLI for application compilation, and it offers the flexibility to target either modern or evergreen browsers.

By selecting the appropriate targets, we can deliver the most up-to-date ES6 code to the browser. This has several benefits:

Reduced code shipment: The omission of polyfill support for older browsers results in a more concise codebase, leading to reduced file sizes
Enhanced parsing efficiency: Fewer imperative code constructs expedite parsing time within the browser. A notable example is the use of async/await. In cases where support for older browsers is necessary, Babel typically employs regenerator-runtime to handle generators and async/await, leading to code bloat and slower parsing times in the client browser
Improved developer experience (DX): The resulting code is more readable and intuitive, as it eliminates the need for jump and loop statements generated by regenerator-runtime. This enhances the overall developer experience

Updating caniuse package

While this might be considered a relatively simple optimization, it often remains in the shadow of other performance-related challenges within our application. By ensuring regular updates to the caniuse database, our build tools can access up-to-date statistics on browser usage and adjust asset compilation accordingly.

This proactive approach yields significant benefits, resulting in bundle size reductions from 10 to 100 KB. The frequency of these savings is determined by how regularly these packages are updated.

In conclusion, Freshchat’s journey in addressing performance challenges within its SPA has been marked by a proactive commitment to optimizing asset delivery.

Freshchat’s dedication to these strategies exemplifies its commitment to providing a more responsive and efficient SPA, showcasing a relentless pursuit of excellence in web application development.

The post Enhancing single-page application performance: Strategies for improved user experience appeared first on Thoughts on Refreshing Business Software.

Moving to self-managed EC2 instances from OpsWorks for Elasticsearch clusters

Dileep Dora — Fri, 08 Dec 2023 19:03:52 +0000

The observability platform within Freshworks manages telemetry events such as logs, metrics, and traces generated by Freshworks product and platform teams. Elasticsearch is the data storage for all log events. We have 600+ node clusters consisting of 2+ petabytes of data. Elasticsearch clusters are hosted in AWS OpsWorks and orchestrated using Chef cookbooks. Long before the AWS end-of-life for OpsWorks was announced, our goal was to move away from OpsWorks since it has a few drawbacks:

Limited support for different types of instances
High instance setup time

Existing setup

We orchestrate our Elasticsearch clusters in AWS OpsWorks with Chef cookbooks (Chef 11). The operating version of Elasticsearch is 6.8.4. The OpsWorks stack has master, client, and data layers. The Elasticsearch client layer will receive read traffic from the API layer, whereas write traffic will be received from worker nodes.

New solution

The solution uses AWS EC2 launch templates + Rundeck + Ansible playbooks.

We’ve considered running Chef cookbooks on machines booted using EC2 launch templates, but that has few drawbacks:

We need to install Chef client per machine, which results in higher boot time
We miss out on OpsWorks lifecycle management
Remote execution is better with Ansible

Ansible:

Ansible is an open-source automation tool that allows us to define infrastructure as code using Ansible playbooks. Playbooks specify the desired state of our systems, configurations, and applications.

Rundeck:

Rundeck is an open-source software platform that provides a centralized interface for automation and orchestration across various systems.

EC2 launch templates:

Launch templates provide a way to define the configurations for your instances, including the instance type, Amazon Machine Image (AMI), security groups, key pairs, and other settings. This makes it easier to maintain consistency across your instances and helps automate the process of instance provisioning.

The solution is to boot an instance with desired configuration and execute Ansible playbook, which consists of Elasticsearch installation and configuration via Rundeck.

Implementation

Adding EC2 launch templates

We’ve created launch templates via Terraform, with required instance type(s), AMI, security groups, subnet, and desired tags for cluster formation. Below are a few tags that have been added in Ansible playbook to achieve whatever we’re doing with OpsWorks:

cl_name - cluster name
cl_layer - type of node in cluster master/client/data
cl_nodetype - storage tier hot/cold if it's data node
Name - unique instance name

Earlier we were deriving this from the OpsWorks layer. For example, if layer is cluster-1-data-cold it is of Elasticsearch data node and cold tier and belongs to cluster-1

Ansible playbooks

For Elasticsearch installation and configuration, the existing recipe functionality is replicated using Ansible playbooks. We use EC2 metadata (availability zone)/custom instance tags (name, cluster, layer (master/client/data)) information in Elasticsearch configuration to replicate in elasticsearch.yml file.

Since we have the recipes already, we just need to write Ansible playbooks for the same functionality. Writing playbooks is similar to recipes, but there is a change in syntax. Here is an example to run Elasticsearch as a system service using Chef and Ansible:

Chef:

Chef::Log.info "ES service: Adding es service to chkconfig - centos"
service 'elasticsearch' do
    supports :start => true, :stop => true, :status => true, :restart => true
    action [ :enable, :restart ]
end

Ansible:

---
- name: es_service
  tags: es_service
  block:
  - name: Enable service elasticsearch, and not touch the running state
    service:
      name: elasticsearch
      enabled: yes
  - name: Restart service elasticsearch
    service:
      name: elasticsearch
      state: restarted

In OpsWorks, we are using the discovery.zen.ping.unicast.hosts list for master node discovery.

For master node discovery in Ansible, we’ve used the Elasticsearch EC2 discovery plugin. These are the settings used in configuration for master node discovery:

discovery.zen.minimum_master_nodes: 3
discovery.zen.fd.ping_retries: 5
discovery.zen.hosts_provider: ec2
discovery.ec2.endpoint: ec2.us-east-1.amazonaws.com
discovery.ec2.tag.cl_layer: master
discovery.ec2.tag.cl_name: cluster-1

We’ve installed the EC2 discovery plugin and added cl_layer and cl_name tags to master nodes.

Rundeck

A job is added to run the desired Ansible playbook on the required EC2 machines. This is a remote execution facilitated via SSH (have used the same ec2_keypair as EC2 machines).

Migration

Once done with the implementation, we did phase-wise migration to move the cluster:

Added cl_name and cl_layer to existing master nodes in OpsWorks so that newly booted nodes in EC2 can discover them.
Added a few client nodes with the above approach to verify if they’re able to join the same cluster.
Moved data nodes in sets to ensure there’s no performance impact by adding new nodes and excluding old nodes.
Stop nodes as soon as their allocation becomes zero.
Repeat 2 and 3 until all data nodes are moved.
Move remaining client nodes.
Move master nodes.

We’ve done this in a controlled manner using the following cluster settings:

curl -X PUT "localhost:9200/_cluster/settings?pretty"
-H 'Content-Type: application/json' -d '{
"transient" : {
    "cluster" : {
      "routing" : {
          "rebalance" : {
              "enable" : "all"
          },
          "allocation" : {
             "node_concurrent_incoming_recoveries" : "1",
             "cluster_concurrent_rebalance" : "2",
             "node_concurrent_recoveries" : "1",
             "node_concurrent_outgoing_recoveries" : "1"
          }
      }
    },
    "indices" : {
        "recovery" : {
            "max_bytes_per_sec" : "250mb"
        }
    }
}
}
'

We moved roughly about 600+ node clusters consisting of about 2+ petabytes of data within a week.

Conclusion

It took roughly a week to migrate all 600+ node clusters
A critical part of the migration is migrating Elasticsearch cluster master nodes discovery, which we’ve handled using the EC2 discovery plugin
While migrating Elasticsearch clusters, we need to take care of the data transfer rate so clusters won’t be impacted
Though we’ve done this for datastore instances, we can use Ansible for any stateless services as well

The post Moving to self-managed EC2 instances from OpsWorks for Elasticsearch clusters appeared first on Thoughts on Refreshing Business Software.

Social Media Mastery for Affiliates: Building Your Brand Online

Priyanka Shiva — Wed, 06 Dec 2023 16:21:24 +0000

In today’s digital age, social media has emerged as a powerhouse for affiliate marketers looking to build their brands and boost their earnings. With billions of users worldwide, platforms like Facebook, Instagram, Twitter, and TikTok offer a vast landscape to connect with your audience and promote affiliate products effectively. In this comprehensive guide, we’ll explore a step-by-step guide to mastering social media marketing for affiliate programs, ensuring your online presence becomes a thriving brand-building machine.

Why social media for affiliate marketing?

Social media platforms are ideal for affiliate marketers for several reasons:

Massive reach: Social media networks attract billions of active users daily. This gives you access to an extensive audience base for online business.
Engagement: These platforms encourage engagement, enabling you to interact directly with your followers, build relationships, and establish trust.
Visual appeal: Many affiliate products can benefit from visual promotion, making platforms like Instagram and Pinterest particularly valuable.
Targeting: Social media platforms provide sophisticated targeting options, allowing you to reach the right audience with your affiliate offers.

Now, let’s dive into the steps to master social media for affiliate marketing:

1. Choose the right platforms

Not all social media platforms are created equal. Consider your niche and target audience when selecting your primary platforms for your marketing strategy. Here’s a quick overview of popular options for different platforms:

– Facebook: Ideal for a broad range of niches due to its diverse user base and extensive Facebook ad options.

– Instagram: Perfect for visual niches like fashion, beauty, and lifestyle.

– Twitter: Great for news, tech, and niches requiring real-time updates.

– LinkedIn: Suited for B2B and professional niches.

– Pinterest: Excellent for DIY, crafts, recipes, and visual product niches.

– TikTok: Rising in popularity and excellent for creative promotions.

2. Optimize your profile

Your social media profile is your digital storefront. Ensure it’s complete and optimized:

– Use a professional profile picture and cover photo.

– Craft a compelling bio with keywords related to your niche to crack the SEO.

– Include your affiliate website link (if allowed).

3. Social media content creation and strategy

Creating engaging content is crucial. Here are some tips:

– Quality over quantity: Focus on delivering value. Your content should educate, entertain, or inspire your audience.

– Visuals: Use high-quality images and videos. Visual types of content tend to perform well.

– Consistency: Maintain a consistent posting schedule. A content calendar can help.

– Hashtags: Research and use relevant hashtags to expand your reach to increase conversion.

4. Build a community

Engage with your followers. Respond to comments and messages, and participate in discussions. Building a loyal community enhances brand voice.

5. Promote your affiliate products

When promoting affiliate products, follow these guidelines:

Disclosure: Always disclose when you’re promoting affiliate products. Honesty builds trust.
Value-centric: Craft your promotions in a way that offers genuine value to your audience.
Tracking: Use tracking links to monitor your affiliate promotions’ performance and calculate the return on investment.

6. Analyze and adjust

Social media platforms provide analytics tools. Regularly review these insights to understand what’s working and what needs adjustment. Adapt your social media strategy accordingly.

7. Stay informed and adapt

Social media is ever-evolving. Stay updated with platform changes, trends, and algorithm updates. Be ready to pivot your strategy when necessary for your digital marketing.

Conclusion

Mastering social media for affiliate marketing is an ongoing process that requires dedication, strategy, and authenticity. By choosing the right platforms, optimizing your profile, creating valuable content, building a community, promoting affiliate products ethically, and staying informed, you can build a powerful online brand that resonates with your audience, increases conversion rates, and drives affiliate earnings.

Remember, the key to social media mastery is not just about selling products but also about creating genuine connections with your audience. By providing value and building trust, you’ll establish a strong brand presence that can lead to long-term success in the world of affiliate marketing.

The post Social Media Mastery for Affiliates: Building Your Brand Online appeared first on Thoughts on Refreshing Business Software.

Building infinite scroll with virtual scroll in Ember: A step-by-step guide

Sanket Munot — Wed, 08 Nov 2023 18:03:49 +0000

With the growing size of content, there is often a need to divide large sets of content into smaller pages. Infinite scroll and virtual scroll are two techniques that allow us to divide and present content more efficiently. In this article, we will discuss both of these techniques and implement them in Ember.

What is infinite scroll?

To learn more about infinite scroll, let’s discuss traditional pagination. In most websites, pagination is a common technique for breaking huge sets of content into smaller pages, making it more manageable for users to navigate and consume. Instead of displaying all the content on a single long page, pagination breaks it up into portions. A user is provided with CTAs like next>> or <to navigate between these portions.

In contrast to this, infinite scroll allows the user to keep scrolling down to load more stuff without needing to click on a CTA. Infinite scroll provides a seamless experience on large websites with many pages, such as Instagram or an e-commerce website. It loads more data by keeping track of how far a user has scrolled down the page.

Once a certain threshold—say, 70%—is breached, an API call is made for the next page. As the user completes 100% content of the first page, the next page is rendered on the screen. This improves UX by eliminating the load time of new pages.

What is virtual scroll?

Although infinite scroll improves user experience with smooth scrolling for pagination, it can render hefty amounts of content on the browser, which can cause performance issues because it takes memory and CPU power to render everything on the screen simultaneously.

Virtual scroll, on the other hand, loads only a portion of the items—the ones you can currently see on your screen. This makes it feel like you’re scrolling through an endless list of items, as the list keeps growing at the bottom while removing items from the top.

Why build it from scratch?

Over the different versions of Ember, many changes have been introduced to the framework, and not all of these changes are necessarily backward-compatible. This means code and add-ons that worked perfectly in one version of Ember may require modifications or updates to function correctly in a newer version.

Finding an add-on that supports the specific version of Ember your application is using can be challenging. And even after identifying the appropriate add-on, getting it up and running smoothly in your project can be tiresome. It may involve configuring settings, dealing with potential conflicts with other add-ons or dependencies, and addressing any breaking changes introduced in Ember updates.

Even when we manage to find an add-on that is compatible with our project, locating a virtual scroll add-on that can handle dynamic heights without requiring prior knowledge of the number of nodes is quite tedious.

How to implement a basic infinite scroll

An infinite scroll consists of some key blocks:

Scroll container: The main container responsible for showing the content. We listen to the scrolling events on this container to identify whether to fetch the next page or not
Collision detection: In the scroll container, we need to detect if the user has scrolled to the bottom of the page or an imaginary point in the container
“Load more” action: Once the collision is detected, we need to make an API call to fetch the next page
Records modification: New records received from the API call need to be added to the existing records

As part of this process, we are implementing a generic higher-order component that can be used across the product. The functionality of the component will be to run an action/function whenever a collision is detected in the scroll container. The higher-order component (HOC) would accept the below params:

hasNext: Boolean variable to identify if the next page is available
nextLoading: Boolean variable to identify if an API call is already running; this is needed because sometimes API calls can take longer to provide a response, and we shouldn’t make the next call unless a previous call is resolved
next: a function that triggers when a collision is detected; this function should take care of calling the next page API and appending next-page data to the existing set of data

Infinite scroll implementation

We make use of the didRender hook to add an event listener to the scroll container with the ID “basic-infinite-scroll.” This listener will check for changes in what’s visible on the screen. Importantly, we’ll remove this listener once the user triggers an action to load the next page. After the new page loads, didRender will help us to set up a fresh scroll listener. (Learn more about didRender from Ember guides.)

The “scrolledTill” variable keeps track of how far the user has scrolled down the page in the viewport. We’ll use this variable in later sections when implementing virtual scroll behavior.

(Note: There are other ways to make an infinite scroll with a MutationObserver. Learn more about MutationObserver from MDN docs.)

In this example, we’re keeping things simple. We make a new API call whenever the user scrolls down to about 70% of the entire scrollable area. We call it “bufferPercentage.” This approach might not be the best one—custom calculations could be used with respect to requirements.

Unset

 id="basic-infinite-scroll">

{{yield}}

JavaScript

didRender() {

this._super(...arguments);

this.addInfiniteListener();

},

addInfiniteListener() {

const scrollListener = (e) => {

const {

scrollTop,

offsetHeight,

scrollHeight,

} = e.target;

constscrolledTill = get(this, 'scrolledTill')

if (Math.abs(scrollTop - scrolledTill) > 50)

set(this, 'scrolledTill', scrollTop)

const bufferPercentage = 0.7

const scrollThreshold = (scrollHeight - offsetHeight) * bufferPercentage

if (scrollTop > scrollThreshold && get(this, 'hasNext') && !get(this, 'nextLoading')) {

this.sendAction('next')

wrapper.removeEventListener('scroll', scrollListener)

}

}

const wrapper = document.getElementById('basic-infinite-scroll');

wrapper.addEventListener('scroll', scrollListener)

 }

Ramp infinite scroll to virtual scroll

The basic structure of what we are trying to achieve

The subset of nodes consists of:

Buffer nodes are extra nodes kept on both top and bottom that are visible on scroll for a user to feel nodes are present on both sides
Visible nodes are the nodes present in the viewport. This is identified using startNode and endNode integer variables

We do a basic transformation of adding an ID to records, which is further used while caching heights. We use buildRecords in didReceiveAttrs, which makes sure to rerun every time records change. (Learn more about didReceiveAttrs from Ember guides.)

JavaScript

buildRecords() {

const records = get(this, 'records')

records.forEach((element, idx) => {

set(element, 'id', idx)

});

}

didReceiveAttrs() {

this._super(...arguments);

this.buildRecords()

}

Infinite scroll relies on tracking how far a user has scrolled on a page. The size of the content nodes is essential for this calculation, but sometimes these nodes can have varying sizes.

To keep track of where the user is on the page, we add up the heights of the nodes currently visible on the screen. This total is essentially the height of the part of the page the user can see, called the viewport height, and we store this value for reference.

However, there’s a catch. To know the height of a node, it must be rendered in the first place. To overcome this limitation, we use something called a tolerance height. This is a predetermined value that we use as a placeholder until we can calculate the actual height after rendering the node. It helps us estimate the position of the user on the page more accurately.

We initialize a variable “cacheRecordsWithHeight,” which stores the heights of all the nodes. This height could be the original height (if the node is already rendered) or the tolerance height, which is 200 pixels in our case.

We recalculate the heights on every change of render using the “recalculateHeights” function. To get the height of the rendered node, we need to add an ID (#virtual-record-${index}) to DOM nodes. We try to query the DOM element with this ID and use the client height if the DOM element is present.

JavaScript

didRender() {

this._super(...arguments);

this.addInfiniteListener()

this.recalculateHeights()

},

cacheRecordsWithHeight: {},

recalculateHeights() {

const {

cacheRecordsWithHeight,

records

} = getProperties(this, 'cacheRecordsWithHeight', 'records');




let cache = {}

for (let index = 0; index < records.length; index++) {

if (cacheRecordsWithHeight[index] && cacheRecordsWithHeight[index].originalHeight) {

cache[index] = cacheRecordsWithHeight[index]

} else {

const row = document.querySelector(`#virtual-record-${index}`)

cache[index] = {

originalHeight: row ? row.clientHeight : false,

toleranceHeight: 200,

}

}

}

set(this, 'cacheRecordsWithHeight', cache)

}

We calculate the sum of node heights from the beginning to the “startNode” and use this value as padding-top for the scroll container. This padding-top creates the visual illusion that there are more nodes above what the user can currently see on the screen.

Here’s how we calculate the padding-top value to achieve this effect:

JavaScript

paddingTop: computed('startNode', {

get() {

const {

cacheRecordsWithHeight,

startNode

} = getProperties(this,

'cacheRecordsWithHeight',

'startNode');

return Object.keys(cacheRecordsWithHeight)

.slice(0,startNode)

.reduce((sum, curr) => {

return sum += cacheRecordsWithHeight[curr].originalHeight||cacheRecordsWithHeight[curr].toleranceHeight

}, 0)

}

})

Now that everything is in place, we need to calculate startNode and endNode.

To determine the startNode, we calculate it by considering the current position that has been scrolled and comparing it with the cached heights of all the records. Calculating the endNode is relatively straightforward: We simply add the number of items in the viewport and the buffer count to the startNode to find the endNode. This helps us identify the range of records that should be visible on the screen.

JavaScript

startNode: computed('scrolledTill', {

get() {

const { scrolledTill,cacheRecordsWithHeight } = getProperties(this,

'scrolledTill',

'cacheRecordsWithHeight',

);

let sum = 0;

const start = Object.keys(cacheRecordsWithHeight).find(

(record) => {

sum += cacheRecordsWithHeight[record].originalHeight

return sum > scrolledTill

}

)

return parseInt(start) || 0

}

}),

endNode: computed('startNode', 'records', {

get() {

const {

startNode,

records,

viewportBuffer,

bufferNodes

} = getProperties(this,

'startNode',

'records',

'viewportBuffer',

'bufferNodes');

return Math.min(records.length, startNode + viewportBuffer + bufferNodes)

}

}),

currentView: computed('startNode', 'endNode', {

get() {

const { records, startNode, endNode } = getProperties(this, 'records', 'startNode', 'endNode')

return records.slice(startNode, endNode)

}

})

Unset

 id="basic-virtual-scroll" style="padding-top:{{paddingTop}}px">

{{yield currentView}}

We yield the “currentView”, which is the subset of visible nodes. This yielded value could be used by the child component to display the records.

In summary, we have discussed infinite scroll and virtual scroll and why to use them. We have also implemented a basic yet generic version for both infinite and virtual scroll.

Production-ready code may involve additional considerations beyond what is presented in the article. Some of those could be:

Make use of a mutation observer instead of relying on scroll listener and element heights, as DOM operations are expensive compared to a MutationObserver
As mentioned in the “Infinite scroll implementation” section, the use of the “bufferPercentage” approach for collision detection may not be optimal. An alternative approach is to utilize the remaining height in the scroll container. For instance, you can set a “bufferHeight” threshold, such as 300 pixels. Whenever the remaining height in the container becomes smaller than this “bufferHeight,” you can trigger the next API call. This approach can provide more precise control over when to load the next set of data based on the actual space available in the scroll container, ensuring a smoother user experience

The post Building infinite scroll with virtual scroll in Ember: A step-by-step guide appeared first on Thoughts on Refreshing Business Software.

How the AST Finder and Builder tools can help developers build codemods

Rajasegar Chandiran — Wed, 08 Nov 2023 18:03:11 +0000

In this article, we will look at two tools called AST Finder and AST Builder, which will significantly improve the developer experience for writing codemods.

Codemod is a tool/library for large-scale codebase refactors that can be partially automated but require human oversight and occasional intervention. Codemod was developed at Facebook and released as open-source.

If you want to know more about codemods, their building blocks, and how they work, please check out this detailed post.

Using codemods: What AST Builder is all about

AST Builder is a playground for building abstract syntax tree (AST) nodes using source code. Because AST plays a significant role in writing codemods, this tool will assist the developers to a great extent in writing codemods. One of the primary reasons codemods are more resilient in making effective code transformations is because codemods do AST-to-AST transformation on your source code.

It supports Javascript (ES5, ES6, and some ES7 constructs), JSX, and Glimmer.js handlebars syntax. For coverage information, see Cover core API andCover ES6 API. And I am planning to include more language syntax and semantics.

Using codemods: Why we leverage the AST Tooling

We already have a well-established and battle-tested tool called AST Explorer for visualizing ASTs. So why do we need a new tool? Because AST Explorer is only for exploring your ASTs—it doesn’t tell you how to create AST nodes. Even though AST Explorer offers IntelliSense in its editor for the jscodeshift APIs, it doesn’t work for all the parsers. For instance, you can only use the autocomplete API for the recast parser. If you choose any other parser other than recast, you won’t get IntelliSense in the codemod editor.

Most of the time, you will be creating nodes for transforming code using codemods, so we definitely need a tool that makes it easy to create nodes. The problem is that there is no proper documentation on creating AST nodes using the jscodeshift API. All you have to do is learn from other codemods out there, sift through the code, and find out how you can create new nodes.

For that, you need to understand the parser internals, the node schema, and the types of node builders available for the language you are using.

If you are still not convinced why this tool will make a difference in developer experience for building codemods, read what others have to say here.

For example, for Javascript, you need to know the ESTree spec or the node builder definition in ast-types. This module provides an efficient, modular, Esprima-compatible implementation of the abstract syntax tree type hierarchy pioneered by the Mozilla Parser API.

Let’s say you want to replace a CallExpression, foo(), with a new one like foo.bar(). The AST representation for the above two expressions will be:

I have omitted a lot of information in the above code for clarity and readability purposes. It only contains the relevant information for the actual CallExpression AST node. If you want to explore the full tree structure of the AST, you can check it in AST Explorer.

As you can see from the above two AST nodes, the only difference between the two is the callee object, which is a simple Identifier in foo() and a MemberExpression in foo.bar(). Usually with codemods, we will be replacing the original expression with the new one. Hence here, we will be replacing the original CallExpression with a new one like this:

In order to replace the old CallExpression with a new one, first we need to find the existing CallExpression. From the above codemod, you can see we are querying the AST using jscodeshift API like this:

If you try to build the above CallExpression within the AST Explorer transform editor the first time, you will have a tough time because you are not familiar with the builder API in the first place and you don’t know the correct order and type of parameters to supply to correctly build the AST node. And don’t forget the typos and punctuation errors you make while typing the code.

This is where AST Finder comes into the picture. It acts as a reference guide for finding APIs to easily query your AST nodes. Just input the code in the input editor (see the image above to identify the input editor, which is always in the top-left pane) and you will get the find API automatically generated for you without any mistakes. So if you input foo.bar() into the AST Finder, it will give you something like:

Now you can simply copy the query from AST Finder and use it in your codemods. How cool is that?

To replace the old CallExpression with a new one, we need to build the new one. From the above codemod, you can see we are building the new one using jscodeshift API like this:

There are also some subtle nuances with the jscodeshift API that beginners won’t know. For example, the API j.callExpression is a constructor for building CallExpression nodes, whereas j.CallExpression is an instance of the type CallExpression, which is basically used to find nodes of the type CallExpression.

This is where AST Builder comes into the picture. It acts as a reference guide for builder APIs to easily build your AST nodes. Just input the expected code in the input editor (see the image above to identify the input editor, which is always in the top-left pane) and you will get the builder API automatically generated without any mistakes. So if you input foo.bar() into the AST Builder, it will give you something like:

You can safely omit the ExpressionStatement wrapper if you are just replacing the nodes.

Now you can simply copy the builder API from AST Builder and use it in your codemods. How easy is that?

Using codemods: How AST tooling makes a difference

AST Builder uses ast-node-builder with an npm package underneath, which provides the APIs for building AST nodes through jscodeshift. All the APIs take an object as a parameter and return the builder API in string format, which you can use with jscodeshift to create new nodes. The object that is passed to the API as a parameter is actually a node in the AST generated by the respective parser. When you feed the node to the ast-node-builder API, you get back the jscodeshift API to build that node.

This allows developers to easily and effectively create AST nodes from source code instead of tinkering with the autocomplete API in AST Explorer. All you have to do is enter or paste the source code into the input editor and you can see the jscodeshift API automatically generated in the output editor.

You can also use AST Builder to visualize your AST in the top-right pane without all the noise and clutter of meta information. We deliberately filter out the loc nodes and tokens from the AST, since we feel they are not of much use for working with codemods. To dig deeper into the builder, you can take a look at the source code here. It is built in Ember.js.

Community and adoption

The Ember.js community has been pioneering the use of codemods for upgrades and deprecations for a long time. Since the community is actively building codemods for their use cases, AST Builder proves to be a valuable tool for the task. It was even featured in an EmberConf 2021 talk called “Cross-File Codemodding” by Joshua Lawrence.

EmberConf 2021 – Cross-File Codemodding by Joshua Lawrence

In this talk, Joshua talks about how easy it is to use AST Builder for automatically generating jscodeshift APIs for building handlebars nodes to transform AST with codemods.

Stay tuned for more about the exciting tools we are building around ASTs and codemods.

Using codemods: References for developers

Here’s a list of resources that can empower developers to use codemods effectively.

The post How the AST Finder and Builder tools can help developers build codemods appeared first on Thoughts on Refreshing Business Software.

Handling race conditions on a MySQL JSON column in Rails

Sriram T. M. — Wed, 25 Oct 2023 20:25:05 +0000

Freshservice provides intelligent, unified, easy-to-use software that helps businesses modernize employee experience, maximize uptime, and extend services beyond IT. It is a SaaS product powered by Ruby on Rails and is backed by a MySQL database for persistence storage. Incidents is one of the core modules of Freshservice where thousands of write operations happen per minute. We handle a lot of structured and unstructured data, and for certain use cases we make use of the JSON column to store the unstructured data. The writes to a JSON column are prone to data inconsistencies in highly concurrent applications if they aren’t handled properly. We’ll see here how we can handle this issue during concurrent requests.

Rails’ way of handling updates

Whenever any record is updated, Rails internally tracks the changes made for the particular model using ActiveModel::Dirty. It proceeds with the update only if there are any changed attributes. In the case of a JSON column, Rails does not consider the specific updates that were made to the JSON object as dirty changes, but instead considers the entire JSON object to be part of dirty changes. Also, Rails does not have native methods, which update only the modified keys in a JSON object. Let’s see this with an example.

ticket = Ticket.find_by(id: 1)

ticket.json_column
=> {"city1" => "Mumbai", "city2" => "Chennai"}
ticket.json_column = ticket.json_column.merge({"city1" => "Bangalore", "city3" => "Hyderabad"})

ticket.changes
=> {"json_column"=>[{"city1"=>"Mumbai", "city2"=>"Chennai"}, {"city1"=>"Bangalore", "city2"=>"Chennai", "city3"=>"Hyderabad"}]}

ticket.save
-> UPDATE `ticket` SET `json_column` = '{\\"city1\\":\\"Bangalore\\",\\"city2\\":\\"Chennai\\",\\"city3\\":\\"Hyderabad\\"}' WHERE `ticket`.`id` = 1

ticket.reload.json_column
=> {"city1" => "Bangalore", "city2" => "Chennai", "city3" => "Hyderabad"}

Here we can see that even though we didn’t modify the “city2” key, it is considered as part of the final update query, where the entire JSON object is set again. In short, a patch (or) selective update of the JSON column didn’t happen. This way of updating works fine most of the time, until we get to see the actual issue in concurrent requests.

Issue with concurrent requests

Let’s consider the below scenario, where the requests are concurrent. In both the requests, the ticket is loaded at the same instance, but the request-1 updates before request-2.

def request1

         ticket = Ticket.find_by(id: 1)
         ticket.json_column = ticket.json_column.merge({"city1" => "Mumbai",                "city3" => "Hyderabad"})
          sleep(2)
          ticket.save
          p ticket.reload.json_column

end

def request2

         ticket = Ticket.find_by(id: 1)
         ticket.json_column = ticket.json_column.merge({"city1" => "Delhi", "city4" => "Kolkata"})
          sleep(3)
          ticket.savE
          p ticket.reload.json_column

end

ticket = Ticket.find_by(id: 1)
p ticket.json_column
-> {"city1" => "Bangalore", "city2" => "Chennai"}

thread1 = Thread.new { request1 }
thread2 = Thread.new { request2 }
thread1.join
thread2.join

# Request1
-> UPDATE `ticket` SET `json_column` = '{\"city1\":\"Mumbai\",\"city2\":\"Chennai\",\"city3\":\"Hyderabad\"}' WHERE `ticket`.`id` = 1

# Request2
-> UPDATE `ticket` SET `json_column` = '{\"city1\":\"Delhi\",\"city2\":\"Chennai\",\"city4\":\"Kolkata\"}' WHERE `ticket`.`id` = 1

p ticket.reload.json_column
-> {"city1" => "Delhi", "city2" => "Chennai", "city4" => "Kolkata"}

We can see here that the “city3” added in the first request is missing due to the update done in the second request. The second request is unaware of the latest changes to the JSON column and updates based on the state of the column during the initial load.

Ways of handling the race condition

The simplest way to handle the race condition is by using the JSON_MERGE_PATCH function of MySQL. More about JSON_MERGE_PATCH is explained in this link. Let’s consider the same concurrent requests scenario by updating the JSON column using JSON_MERGE_PATCH.

def request1
        ticket = Ticket.where(id: 1)
        new_value_to_be_updated = {"city1" => "Mumbai", "city3" => "Hyderabad"}.to_json
         ticket.update_all(["json_column = JSON_MERGE_PATCH(json_column, ?)", new_value_to_be_updated])
         sleep(2)
         p ticket.reload.json_column

end

def request2
         ticket = Ticket.where(id: 1)
          new_value_to_be_updated = {"city1" => "Delhi", "city4" => "Kolkata"}.to_json
          ticket.update_all(["json_column = JSON_MERGE_PATCH(json_column, ?)", new_value_to_be_updated])
          sleep(3)
          p ticket.reload.json_column

end

ticket = Ticket.find_by(id: 1)
p ticket.json_column
-> {"city1" => "Bangalore", "city2" => "Chennai"}

thread1 = Thread.new { request1 }
thread2 = Thread.new { request2 }
thread1.join
thread2.join

# Request1
-> UPDATE `ticket` SET `json_column` = JSON_MERGE_PATCH(`json_column`, '{\"city1\":\"Mumbai\",\"city3\":\"Hyderabad\"}') WHERE `ticket`.`id` = 1

# Request2
-> UPDATE `ticket` SET `json_column` = JSON_MERGE_PATCH(`json_column`, '{\"city1\":\"Delhi\",\"city4\":\"Kolkata\"}') WHERE `ticket`.`id` = 1

p ticket.reload.json_column
-> {"city1" => "Delhi", "city2" => "Chennai", "city3" => "Hyderabad", "city4" => "Kolkata"}

Note: The update_all ActiveRecord method used here doesn’t instantiate the model as well as trigger callbacks/validations.

We can see here that the JSON merge is handled at the MySQL engine level instead of the merge at the Ruby level. When the merge is done using Ruby, we lose track of the latest data in db during concurrent requests.

The other way of handling the race condition is by using the Optimistic Locking functionality of Rails, where the updates are restricted by raising ActiveRecord::StaleObjectError if there is a change in the lock version, but this needs handling of the error to update the column again.

The post Handling race conditions on a MySQL JSON column in Rails appeared first on Thoughts on Refreshing Business Software.

Attention is not always all you need!

Srivatsa Narasimha — Wed, 25 Oct 2023 20:16:25 +0000

Transforming the landscape: Fourier transforms as a replacement for the attention mechanism.

Large language models have been the talk of the town for quite some time now. Such models allow us to get deeper insights from the text data and extract meaningful excerpts based on our requirements.

Ever since the launch of such large language models, BERT has been quite efficient and reliable in capturing the bidirectional context and delivering high-quality results. In the CRM domain, we use these language models for various tasks like sentiment classification, intent detection, phrase extraction, and out-of-office email detection. BERT models are fine-tuned with our custom CRM data explicitly for each of these tasks. In order to enhance the capabilities of these large language models, we first focus on CRM domain pre-training tasks, generating better contextual representations, and then test it further on relevant downstream tasks. However, BERT models come with their own set of disadvantages related to cost, training, and inference times due to the presence of self-attention layers.

In this blog, we show the advantages of replacing self-attention layers in BERT with a non-parametric transformation called a Fourier transform, which achieves the same goals as attention layers of mixing tokens and affecting the embedding of a particular token by relatable ones in the entire text. We study the ability of FNETs to overcome drawbacks associated with BERT and compare the performance across different tasks.

BERT architecture

A basic transformer consists of an encoder to read the text input and a decoder to produce a prediction for the task. The transformer encoder is composed of multiple layers of self-attention and feed-forward neural networks that transform the input text into a set of contextualized word embeddings. Since BERT’s goal is to generate a language representation model, it only takes advantage of the encoder part. BERT is pre-trained in two versions:

BERT BASE: 12-layer, 768-hidden-nodes, 12-attention-heads, 110M parameters
BERT LARGE: 24-layer, 1024-hidden-nodes, 16-attention-heads, 340M parameters

It is quite evident that one of the main disadvantages of the BERT model is the computational complexity that is associated with multiple self-attention layers. Time taken for pre-training or fine-tuning these models with hundreds of millions of parameters can range anywhere between a few hours and several days. This can be a challenge for organizations that need to quickly deploy a model and don’t have the computing power or time to train it for a long time. Additionally, inference time can be slow since BERT models require a lot of computation to produce predictions. Lighter versions like DistilBERT can alleviate this problem, but only to a certain extent.

Replacing attention layers in BERT with Fourier transform layers

The success of BERT in achieving state-of-the-art results across various NLP tasks is often attributed to attention layers. While token-wise weights learned through attention layers are essential for high-quality context, recent research suggests that similar results can be achieved through alternative mechanisms. For example, some studies have replaced attention weights with unparameterized Gaussian distributions or fixed, non-learnable positional patterns and achieved minimal performance degradation while retaining learnable cross-attention weights. Moreover, recent efforts to improve attention efficiency are based on sparsifying the attention matrix or replacing attention with other mechanisms, such as MLPs.

Although the standard attention mechanism has a memory bottleneck with respect to sequence length, efficient transformers with O (NN) or even O (N) theoretical complexity such as Longformer, ETC, and BigBird have been developed. In the Long-Range Arena benchmark, Performer, Linear Transformer, Linformer, and Image Transformer (Local Attention) were found to be the fastest and had the lowest peak memory usage per device. Finally, the team at Google found that replacing attention layers with Fourier transform layers offered similar performance, reduced model size (no learnable parameters), and simplicity. (Lee-Thorp, James, Joshua Ainslie, Ilya Eckstein, and Santiago Ontanon. “Fnet: Mixing tokens with Fourier transforms.” arXiv preprint arXiv:2105.03824 (2021). https://arxiv.org/pdf/2105.03824.pdf)

Enter Fourier nets

Fourier neural networks, or Fourier nets for short, are a class of neural networks that use the Fourier series as the basis of their architecture. These networks have gained popularity in recent years due to their ability to efficiently model complex, high-dimensional functions and their use in various applications such as image and audio processing.

Fourier nets use a Fourier series as the basis of their architecture. A Fourier series is a mathematical representation of a periodic function as a sum of sine and cosine functions of different frequencies. A discrete Fourier transform (DFT) is used to decompose this series into individual frequencies. Particularly for sentence embeddings, a DFT for a sentence sequence of n tokens can be written as:

Xn is the nth input token of a sentence, and Xk is the transformed representation of the sum of all xn tokens with additional factors. To compute (1), a fast Fourier transform methodology is used, which brings down the time complexity to O (NlogN) as opposed to the quadratic complexity that is associated with self-attention layers.

A normal BERT transformer has a multi-head self-attention layer with h heads, which looks like:

Y is the final multi-head attention. Q, K, and V are different representations of token embeddings in a sentence with dimensional size dk. WiQ, WiK, and WiV are learnable parameters. For a detailed explanation, see this paper.

The Google team replaced the self-attention sublayer of each transformer encoder layer with a Fourier transform sublayer. This involved applying individual 2D Fourier transforms along both the sequence dimension and the hidden dimension, resulting in a complex number.

Where Y is the final transform equivalent to the final multi-head attention output. Fseq and Fhidden are the 1D FFT transforms. is the real number part of the 2D transform, meaning the feedforward and output layers did not need to be modified to handle complex numbers. Equations (2) and (5) clearly show that FNETs perform the desired mixing of tokens, which is equivalent to the self-attention mechanism but without the baggage of heavy matrix multiplications and the huge number of learnable parameters.

Now, we show the performance of BERT vs FNETs across three different tasks that are specific to our CRM.

Tasks

1. MLM for domain pre-training:

The goal here is to randomly mask out 15% of the words in the input and replace them with a [MASK] token before passing the sequence through the BERT attention-based encoder and then predict only the masked words based on the context provided by the other non-masked words in the sequence.

While this approach solves the unidirectional constraint, a downside is that there is a mismatch between pre-training and fine-tuning, since the [MASK] token does not appear during fine-tuning. Hence, during masking, the following is done:

80% of the tokens are actually replaced with the token [MASK].
10% of the time, tokens are replaced with a random token.
10% of the time, tokens are left unchanged.

We perform pre-training on sales conversation emails using the Masked LM pretraining task. The goal here is to allow the model to learn better representations for tokens present in the sales conversations context. For this task, we use a corpus of 8 million emails, ranging through conversations across different products.

This pre-training has been performed on G5.24xlarge GPU with 384 GB memory with a batch size of 32 for two epochs each. We compare the performance of the following models over the pretraining task:

BERT-BASE-MULTILINGUAL-UNCASED
FNET-BASE

These experiments are also performed with maximum token lengths of 64 and 128 to explore the convergence at multiple sentence lengths.

Validation loss refers to the measure of how well a trained model generalizes to unseen data during the validation phase. It quantifies the error or mismatch between the predicted outputs of the model and the actual expected outputs of the validation dataset. During model training, the training dataset is used to update the model’s parameters and optimize its performance. However, it is crucial to evaluate the model’s performance on data it has not seen before to assess its ability to generalize and make accurate predictions on new, unseen instances. The lower the validation loss, the better the model performance because it indicates a reduction in prediction errors.

Perplexity measures how well a language model predicts a given sequence of words. Entropy measures the average amount of information or uncertainty associated with each word in a sequence. Perplexity, on the other hand, is the exponentiation of entropy and provides a more interpretable value. Perplexity can be computed using the following formula:

Perplexity =exp(cross-entropy) (6)

A lower perplexity indicates a better language model that is more confident and accurate in predicting the next word in a sequence. It reflects the model’s ability to capture the underlying patterns and structure of the training data.

We observe that FNETs have shown a drastic reduction in training duration, with just a minimal difference in validation loss and eval perplexity. Also, the gap between these evaluation metrics reduces as we increase the token length.

2. Deal classification based on email sentiment
We tested with a training set of 54K emails and a test set of 18K emails. We compared the performance of the following models over the fine-tuning classification (won or lost) task:

BERT-BASE-UNCASED
FNET-BASE

This fine-tuning has been performed on G4.12xlarge GPU with 100GB memory with a batch size of 32 for four epochs each.

With these results, we observe that FNETs have shown a drastic reduction in training duration, with just minimal difference in validation AUC. The training time gap is starker at higher token lengths. The average inference time is calculated by measuring the individual inference times for each email and taking the mean across 18K emails. We can see the same drastic difference at higher token lengths.

3. Email phrase extraction
We tested with a training set of 200K emails and a test set of 58K emails. We compared the performance of the following models:

BERT-BASE-UNCASED
FNET-BASE

This fine-tuning has been performed on G4.12xlarge GPU with 100GB memory with a batch size of 32 for three epochs each.

We used the Jaccard score to evaluate the quality of the phrases predicted by our model. The Jaccard score is the ratio of the number of common tokens between two strings and the number of unique tokens from the two strings.

Set A–set of unique tokens of string 1
Set B–set of unique tokens of string 2

With these results, we observe that FNETs have shown a drastic reduction in training duration with just minimal difference in validation Jaccard scores. The training time gap is starker at higher token lengths. Another advantage that can be seen here is that at higher token lengths, BERT-based models throw OOM errors, while FNETs do not because they are lighter.

Can Fourier transforms remove noise in text just like they do in voice signals?

We also observed that the phrases extracted from FNETs were cleaner and much more nuanced than the phrases from the BERT models. Fourier transforms basically decompose signals into their constituent frequencies and then pick the important frequencies as a means of denoising the signals. Similarly, our hypothesis was that Fourier transforms would denoise the extracted phrases. Although we do not yet have a metric to measure this capability, we can see that this hypothesis holds true for some examples. Interestingly, for the last email, the fine-tuned FNET model doesn’t extract any phrase at all, which is actually correct!

Summary

FNETs are lighter alternatives for computation-heavy BERT-based models. Self-attention layers are replaced by non-parametric Fourier transform layers. We demonstrate the drastic decrease in training and inference times with only a slight decrease across different evaluation metrics. We also observed that the FNET model was close to 40% lighter than the BERT-based models and can especially be used in pipelines implemented on small devices. The need for optimization of large language models, especially transformer-based, is growing by the day, and the usage of FNETs is a possible alternative.

Aanchal Varma co-authored this piece. Aanchal was a senior data scientist at Freshworks. She’s experienced in solving complex problems and building scalable solutions in the fields of NLP, deep learning, language modeling, and machine learning.

The post Attention is not always all you need! appeared first on Thoughts on Refreshing Business Software.

Learning from Python app to Golang app migration

Nikhileshkumar Ikhar — Wed, 25 Oct 2023 06:19:19 +0000

We have a legacy Python application responsible for processing Kafka events. The Python app processes the event and updates the results in the database, Redis, and other services. The performance of this app is crucial, and the processing of each event has to happen in a few milliseconds. This service’s growth over the previous few years has been exponential. We tried to optimize the Python app, but we needed more.

The Python app’s performance was slow. Under heavy load, it wouldn’t process fast enough and Kafka consumer lag would increase. Some of the reasons were:

The initial Python app was written in non-async mode
Using CPU profiling, we found that detailed logging consumed lots of CPU cycles
These benchmarks show that Python is generally slower than Golang

Non-performance-related reasons were:

The Python app relied on custom code to observe and log the metrics. The latest observability solutions, like Prometheus, were not integrated
The Python app had low test coverage. This made the overall feature release very slow

Golang has a wide range of features that complement the requirements of modern cloud computing:

Golang has built-in support for concurrency
Golang is a statically typed language. This allows the compiler to optimize the code more aggressively
Golang has a wide range of library support required for our use case
Golang has a built-in testing framework. Adding test cases from the initial development phase helps with faster development
Golang offers cross-platform compilation, allowing developers to build and compile applications for multiple architectures and operating systems. See the complete list here

We chose to migrate our application to Golang due to its simplicity of development and potential future needs.

Migration

The Python app was not converted to Golang all at once. Each function was converted individually. Golang has a different naming convention. Some Python functions and methods had names that were not clear or descriptive. These names were changed to make them more understandable and easier to use.

To ensure that we could track which Python function or method had been developed in Golang, we added the Python function or method name to the Golang code documentation. This allowed us to cross-check the code in both languages.

In Python, coroutines were written with asyncio library.

import asyncio

async def hello():

print(“Hello, world!”

async def main():

task = asyncio.create_task(hello())

await task

if __name__ == “__main__”:

asyncio.run(main())

They were converted to goroutines in Golang.

package main

import (

“fmt”

“time”

)

func hello() {

fmt.Println(“Hello, world!”)

}

func main() {

go hello()

time.Sleep(1 * time.Second)

}

At regular intervals, we were looking at the processing speed. A Python script with coroutines was used to generate events for load testing. Python’s async library creates coroutines every second. Each coroutine will independently generate a specified number of events.

import asyncio

class EventProducer:

def __init__(self):

self.loop = asyncio.get_event_loop()

async def produce_events(self, count:int):

“””

We will call kafka producer to publish the event.

“””

pass

async def periodic_produce(self, msg_per_sec: int, num_secs: int, ) -> None:

“””

A coroutine function that periodically produces messages at a certain rate for a specified number of seconds.

Args:

msg_per_sec (int): The number of messages to produce per second.

num_secs (int): The total number of seconds to produce messages for.

Returns:

None.

“””

for _ in range(num_secs):

tasks = []

tasks.append(self.loop.create_task(self.produce_events(msg_per_sec), name=“produce”))

await asyncio.sleep(1.0)

await asyncio.gather(*tasks)

CPU profiling helped in choosing the functions and methods that mattered most. This code will store the pprof file in a pod after 10 minutes. We would conduct our test in those 10 minutes.

package main

import (

“log”

“os”

“runtime/pprof”

“time”

)

func run(){

// actual code here

time.Sleep(10 * time.Minute)

}

// This Golang code will do CPU profling for first 10 min

func main() {

f, perr := os.Create(“/tmp/cpu.pprof”)

if perr != nil {

log.Fatal(perr)

}

pprof.StartCPUProfile(f)

time.AfterFunc(10*time.Minute, func() {

pprof.StopCPUProfile()

log.Println(“Stopping CPU Profile”)

})

run()

}

Then we copied the pprof file to local to visualize the CPU-intensive functions and methods using the following bash commands:

kubectl cp /:/tmp/cpu.pprof ~/Downloads/cpu.pprof

go tool pprof-http=”:8000″ ~/Downloads/cpu.pprof

This is the flame graph of the initial Golang code. Function block width shows the time spent in that function. The darker the block’s color, the more CPU cycles it consumes. This helps focus on areas to improve performance.

We verified our code for race conditions.

go run -race -tags musl cmd/app/*

We started small, and as each feature was completed, we added documentation and test cases to it. We later automated the code testing with Jenkins and GitHub.

Difficulties

We were reading the Python code and converting it to Golang. Existing logic has to be converted line by line. But there were some difficulties, such as:

In Python, we can create and return complex structures on the fly. This becomes the difficult part in Golang.
- In Golang, it is essential to define a struct before utilizing it in any function, which contrasts with the flexibility of Python, where any type of data can be returned and the code will execute. While writing code, this difference can be frustrating as it disrupts the flow and continuity of development
- In Golang, we have to stop writing the logic and decide whether the data type is reused or whether we have to create a new one. This becomes difficult with a large code base
In Python, it’s easy to work with JSON. We can treat JSON data as a dictionary and write our code accordingly.
- In Golang, we must first understand the JSON code’s structure and write an equivalent Golang struct
- In Golang, creating a struct from a large, nested JSON document is difficult. Using an online or offline tool to convert a large, nested JSON document into Golang code is useful
- In Golang, we have to be sure about the type of values. Writing multiple marshal and unmarshal methods for custom types is overwhelming for those coming from Python. This also helped to make sure we were not reading or writing to unwanted data types

Learnings

Golang is a statically typed language. VS Code will continuously inform us if our code is not compiling. Most of the runtime errors were related to returning an empty array or null pointer from a function
Using Docker-Compose at the start of this project was useful to maintain development speed. Writing test cases helped us quickly evaluate our changes
We did CPU profiling to understand which part of the code is used most
We implemented time metrics in Prometheus format and utilized Grafana to visualize them. This proved to be highly beneficial in analyzing the performance of our application

Scaling results

We were able to get an observable increase (5x) in performance by doing the following things:

We developed a Golang codebase with synchronous processing to ensure careful handling of goroutines. The initial sync code achieved a processing speed comparable to the initial Python code
Upon incorporating goroutines, our application was faster than the Python application
We significantly improved by transitioning from auto-commit to manual async-commit in Kafka
We implemented a caching mechanism using Redis to store frequently used DB query results. This caching mechanism significantly enhanced our processing capabilities
We avoided logging nested structs and only logged them when it was required. Serialization and formatting of nested structs are expensive operations

In conclusion, converting Python code to Golang was a challenging but rewarding experience. I think Golang is a great language for building fast and scalable web applications.

The post Learning from Python app to Golang app migration appeared first on Thoughts on Refreshing Business Software.