Case-Study: Performance Modelling of AI Inference on GPUs

Learn how TechnoLynx helps reduce inference costs for trained neural networks and real-time applications including natural language processing, video games, and large language models.

Case-Study: Performance Modelling of AI Inference on GPUs
Written by TechnoLynx Published on 15 May 2023

Problem

Our client was heavily involved in the development and use of AI applications in various sectors. As their AI models became more complex, the cost of inference—running models to generate results—became a critical issue for the company. The client, highly experienced with AI models, sought a way to reduce these costs by optimising their use of graphics processing units (GPUs). They wanted to better understand the relationship between different GPU topologies and their impact on performance, including factors like clock speeds and ray tracing capabilities.

The client was particularly concerned about the efficiency of their machine learning models. They were running multiple models across a wide range of GPU architectures, including dedicated graphics cards and discrete GPUs. Each type of GPU had different strengths and weaknesses, and the client needed to optimise its resources strategically. They wanted a way to predict the inference performance of various models on different GPU topologies to reduce running costs without sacrificing performance.

Solution

Our task was to model the performance of various AI operations on different GPU architectures and provide the client with clear insights into the performance implications of each. We needed to examine popular AI model operations, such as convolutions, which are central to tasks like image recognition and video analysis. Our approach involved recreating several of these operations and modelling them on a low-level GPU system.

We used Python and OpenCL for this task. Python provided flexibility in coding and testing, while OpenCL gave us the ability to work closely with the underlying GPU hardware. This allowed us to model the exact behaviours of the GPU as it executed complex machine learning tasks.

The core of our solution involved creating a performance model that could predict how well certain GPU topologies would perform with different types of AI workloads. This model took into account various GPU parameters such as:

  • Clock speeds: Higher clock speeds typically lead to faster processing, but they can also increase power consumption and heat generation.

  • Memory bandwidth: This determines how quickly data can be transferred between the GPU and the system’s main memory.

  • Parallel processing: Many AI models, particularly deep learning models, require large amounts of data to be processed simultaneously. GPUs excel at this because they can handle multiple calculations in parallel.

  • Compute units: These are the individual processing units inside the GPU, which determine how many tasks it can handle at once.

We also designed a tool to measure the characteristics of any OpenCL-capable GPU the client was using. This tool could analyse the GPU’s performance on specific tasks and provide detailed feedback on how it would handle different AI models.

Performance Modelling in GPUs

Performance modelling of GPUs is an important part of optimising AI systems. Modern GPUs are highly specialised hardware designed to handle tasks like 3D graphics, virtual reality, and machine learning. They are far more efficient at these tasks than central processing units (CPUs) because they have hundreds or even thousands of cores that can process data simultaneously.

In this case, we focused on discrete GPUs, which are separate from the system’s main CPU and memory. These dedicated graphics cards have their own memory and processing power, making them ideal for high-intensity tasks like AI inference. However, discrete GPUs vary in their ability to handle different AI models, and understanding which GPU was best suited for the client’s needs was critical to optimising their system.

For instance, the client had a variety of video cards at their disposal, including models that supported advanced features like ray tracing for 3D graphics. However, these features, while useful in areas like virtual reality, didn’t always provide a performance boost for their specific AI tasks. Our model allowed the client to identify which features were essential for their work and which were unnecessary, saving them valuable resources.

Predicting GPU Performance for AI Models

The predictive aspect of the performance model was key to helping the client reduce costs. By analysing the characteristics of a GPU—such as its clock speeds, memory bandwidth, and parallel processing capabilities—the client could predict how efficiently it would run their AI models.

For example, the client often used machine learning algorithms that involved multiple layers of convolution and matrix multiplication. These operations are highly parallelisable, meaning they run best on GPUs with a large number of cores and high memory bandwidth. On the other hand, certain types of tasks, such as training models with very large datasets, may require GPUs with high memory capacity rather than just raw processing power.

With the model we developed, the client was able to forecast how different AI models would perform on various GPU architectures. This allowed them to choose the most cost-effective GPU for each specific task, significantly reducing their inference costs. Additionally, by knowing which features were essential for their work, they could avoid purchasing more expensive GPUs with unnecessary capabilities.

Results

The final result of our work was a detailed performance model that not only helped the client predict how well their AI models would perform on different GPU architectures, but also provided them with valuable insights into how their graphics cards worked on a low level. This knowledge was crucial for their development team, enabling them to optimise their use of GPUs in the long term.

The model we provided was sophisticated enough to predict performance across a wide range of GPU architectures. The client was now able to test various AI models on GPUs with different configurations, identifying the best possible setup for their needs.

The tools we developed also helped the client measure the performance of their discrete GPUs. By analysing the clock speeds, memory usage, and other parameters, the client was able to make informed decisions about which GPU to use for different types of tasks.

The most significant benefit, however, was the cost savings. By optimising their use of GPU resources, the client reduced the amount of time and money they spent on AI inference. This not only improved the performance of their models but also allowed them to reallocate resources to other areas of their business.

Educational Value

Beyond achieving an optimised performance for the client’s AI system, the performance model offered invaluable insights into how GPUs function at a fundamental level.

While the performance model was primarily designed to optimise their AI systems, the insights it provided were invaluable for understanding how GPUs functioned at a fundamental level.

Through our reports and workshops, the client’s development team gained a deeper understanding of how their GPUs worked, enabling them to better utilise these powerful tools in future projects. The client appreciated this internal educational purpose, which helped them enhance their AI capabilities over time.

Conclusion

Our performance modelling project helped the client tackle the growing costs associated with AI inference by optimising their use of GPUs. By building a model that could predict the performance of various AI models on different GPU architectures, we enabled the client to make better-informed decisions and save on GPU resources.

As artificial intelligence (AI) continues to grow in use, demands on computational power rise sharply. This applies across many sectors, from natural language processing to video games. In real-time applications including financial forecasting and user behaviour tracking, delays can cause serious issues.

By combining the performance model with data from trained neural networks, the client can now adjust GPU usage on the fly. This real-time adaptability ensures faster output, lower energy use, and better overall reliability. It also helps when working with large language models, which require steady, efficient processing. The flexibility gained made future scaling far easier.

In the long run, the performance model proved to be not just a tool for improving efficiency, but also a valuable educational resource for the client’s team. This project highlighted the importance of understanding the intricate relationship between AI workloads and GPU performance, enabling the client to build more cost-effective, high-performance systems for the future.

At TechnoLynx, we specialise in helping businesses optimise their AI workflows. Whether you’re looking to improve your GPU performance, reduce costs, or develop new AI solutions, our team can provide the tools and expertise you need to succeed.

Contact us to know more!

Image by Freepik
Image by Freepik
AI in Pharma R&D: Faster, Smarter Decisions

AI in Pharma R&D: Faster, Smarter Decisions

3/10/2025

How AI helps pharma teams accelerate research, reduce risk, and improve decision-making in drug development.

Sterile Manufacturing: Precision Meets Performance

Sterile Manufacturing: Precision Meets Performance

2/10/2025

How AI and smart systems are helping pharma teams improve sterile manufacturing without compromising compliance or speed.

Biologics Without Bottlenecks: Smarter Drug Development

Biologics Without Bottlenecks: Smarter Drug Development

1/10/2025

How AI and visual computing are helping pharma teams accelerate biologics development and reduce costly delays.

AI for Cleanroom Compliance: Smarter, Safer Pharma

AI for Cleanroom Compliance: Smarter, Safer Pharma

30/09/2025

Discover how AI-powered vision systems are revolutionising cleanroom compliance in pharma, balancing Annex 1 regulations with GDPR-friendly innovation.

Nitrosamines in Medicines: From Risk to Control

Nitrosamines in Medicines: From Risk to Control

29/09/2025

A practical guide for pharma teams to assess, test, and control nitrosamine risks—clear workflow, analytical tactics, limits, and lifecycle governance.

Making Lab Methods Work: Q2(R2) and Q14 Explained

Making Lab Methods Work: Q2(R2) and Q14 Explained

26/09/2025

How to build, validate, and maintain analytical methods under ICH Q2(R2)/Q14—clear actions, smart documentation, and room for innovation.

Barcodes in Pharma: From DSCSA to FMD in Practice

Barcodes in Pharma: From DSCSA to FMD in Practice

25/09/2025

What the 2‑D barcode and seal on your medicine mean, how pharmacists scan packs, and why these checks stop fake medicines reaching you.

Pharma’s EU AI Act Playbook: GxP‑Ready Steps

Pharma’s EU AI Act Playbook: GxP‑Ready Steps

24/09/2025

A clear, GxP‑ready guide to the EU AI Act for pharma and medical devices: risk tiers, GPAI, codes of practice, governance, and audit‑ready execution.

Cell Painting: Fixing Batch Effects for Reliable HCS

Cell Painting: Fixing Batch Effects for Reliable HCS

23/09/2025

Reduce batch effects in Cell Painting. Standardise assays, adopt OME‑Zarr, and apply robust harmonisation to make high‑content screening reproducible.

Explainable Digital Pathology: QC that Scales

Explainable Digital Pathology: QC that Scales

22/09/2025

Raise slide quality and trust in AI for digital pathology with robust WSI validation, automated QC, and explainable outputs that fit clinical workflows.

Validation‑Ready AI for GxP Operations in Pharma

Validation‑Ready AI for GxP Operations in Pharma

19/09/2025

Make AI systems validation‑ready across GxP. GMP, GCP and GLP. Build secure, audit‑ready workflows for data integrity, manufacturing and clinical trials.

Image Analysis in Biotechnology: Uses and Benefits

Image Analysis in Biotechnology: Uses and Benefits

17/09/2025

Learn how image analysis supports biotechnology, from gene therapy to agricultural production, improving biotechnology products through cost effective and accurate imaging.

Edge Imaging for Reliable Cell and Gene Therapy

17/09/2025

Edge imaging transforms cell & gene therapy manufacturing with real‑time monitoring, risk‑based control and Annex 1 compliance for safer, faster production.

Biotechnology Solutions for Climate Change Challenges

16/09/2025

See how biotechnology helps fight climate change with innovations in energy, farming, and industry while cutting greenhouse gas emissions.

Vision Analytics Driving Safer Cell and Gene Therapy

15/09/2025

Learn how vision analytics supports cell and gene therapy through safer trials, better monitoring, and efficient manufacturing for regenerative medicine.

AI in Genetic Variant Interpretation: From Data to Meaning

15/09/2025

AI enhances genetic variant interpretation by analysing DNA sequences, de novo variants, and complex patterns in the human genome for clinical precision.

AI Visual Inspection for Sterile Injectables

11/09/2025

Improve quality and safety in sterile injectable manufacturing with AI‑driven visual inspection, real‑time control and cost‑effective compliance.

Turning Telecom Data Overload into AI Insights

10/09/2025

Learn how telecoms use AI to turn data overload into actionable insights. Improve efficiency with machine learning, deep learning, and NLP.

Computer Vision in Action: Examples and Applications

9/09/2025

Learn computer vision examples and applications across healthcare, transport, retail, and more. See how computer vision technology transforms industries today.

Hidden Costs of Fragmented Security Systems

8/09/2025

Learn the hidden costs of a fragmented security system, from monthly fee traps to rising insurance premiums, and how to fix them cost-effectively.

EU GMP Annex 1 Guidelines for Sterile Drugs

5/09/2025

Learn about EU GMP Annex 1 compliance, contamination control strategies, and how the pharmaceutical industry ensures sterile drug products.

Predicting Clinical Trial Risks with AI in Real Time

5/09/2025

AI helps pharma teams predict clinical trial risks, side effects, and deviations in real time, improving decisions and protecting human subjects.

5 Real-World Costs of Outdated Video Surveillance

4/09/2025

Outdated video surveillance workflows carry hidden costs. Learn the risks of poor image quality, rising maintenance, and missed incidents.

GDPR and AI in Surveillance: Compliance in a New Era

2/09/2025

Learn how GDPR shapes surveillance in the era of AI. Understand data protection principles, personal information rules, and compliance requirements for organisations.

Generative AI in Pharma: Compliance and Innovation

1/09/2025

Generative AI transforms pharma by streamlining compliance, drug discovery, and documentation with AI models, GANs, and synthetic training data for safer innovation.

AI Vision Models for Pharmaceutical Quality Control

1/09/2025

Learn how AI vision models transform quality control in pharmaceuticals with neural networks, transformer architecture, and high-resolution image analysis.

AI Analytics Tackling Telecom Data Overload

29/08/2025

Learn how AI-powered analytics helps telecoms manage data overload, improve real-time insights, and transform big data into value for long-term growth.

AI Visual Inspections Aligned with Annex 1 Compliance

28/08/2025

Learn how AI supports Annex 1 compliance in pharma manufacturing with smarter visual inspections, risk assessments, and contamination control strategies.

Cutting SOC Noise with AI-Powered Alerting

27/08/2025

Learn how AI-powered alerting reduces SOC noise, improves real time detection, and strengthens organisation security posture while reducing the risk of data breaches.

AI for Pharma Compliance: Smarter Quality, Safer Trials

27/08/2025

AI helps pharma teams improve compliance, reduce risk, and manage quality in clinical trials and manufacturing with real-time insights.

Cleanroom Compliance in Biotech and Pharma

26/08/2025

Learn how cleanroom technology supports compliance in biotech and pharmaceutical industries. From modular cleanrooms to laminar flow systems, meet ISO 14644-1 standards without compromise.

AI’s Role in Clinical Genetics Interpretation

25/08/2025

Learn how AI supports clinical genetics by interpreting variants, analysing complex patterns, and improving the diagnosis of genetic disorders in real time.

Computer Vision and the Future of Safety and Security

19/08/2025

Learn how computer vision improves safety and security through object detection, facial recognition, OCR, and deep learning models in industries from healthcare to transport.

Artificial Intelligence in Video Surveillance

18/08/2025

Learn how artificial intelligence transforms video surveillance through deep learning, neural networks, and real-time analysis for smarter decision support.

Top Biotechnology Innovations Driving Industry R&D

15/08/2025

Learn about the leading biotechnology innovations shaping research and development in the industry, from genetic engineering to tissue engineering.

AR and VR in Telecom: Practical Use Cases

14/08/2025

Learn how AR and VR transform telecom through real world use cases, immersive experience, and improved user experience across mobile devices and virtual environments.

AI-Enabled Medical Devices for Smarter Healthcare

13/08/2025

See how artificial intelligence enhances medical devices, deep learning, computer vision, and decision support for real-time healthcare applications.

3D Models Driving Advances in Modern Biotechnology

12/08/2025

Learn how biotechnology and 3D models improve genetic engineering, tissue engineering, industrial processes, and human health applications.

Computer Vision Applications in Modern Telecommunications

11/08/2025

Learn how computer vision transforms telecommunications with object detection, OCR, real-time video analysis, and AI-powered systems for efficiency and accuracy.

Telecom Supply Chain Software for Smarter Operations

8/08/2025

Learn how telecom supply chain software and solutions improve efficiency, reduce costs, and help supply chain managers deliver better products and services.

Enhancing Peripheral Vision in VR for Wider Awareness

6/08/2025

Learn how improving peripheral vision in VR enhances field of view, supports immersive experiences, and aids users with tunnel vision or eye disease.

AI-Driven Opportunities for Smarter Problem Solving

5/08/2025

AI-driven problem-solving opens new paths for complex issues. Learn how machine learning and real-time analysis enhance strategies.

10 Applications of Computer Vision in Autonomous Vehicles

4/08/2025

Learn 10 real world applications of computer vision in autonomous vehicles. Discover object detection, deep learning model use, safety features and real time video handling.

10 Applications of Computer Vision in Autonomous Vehicles

4/08/2025

Learn 10 real world applications of computer vision in autonomous vehicles. Discover object detection, deep learning model use, safety features and real time video handling.

How AI Is Transforming Wall Street Fast

1/08/2025

Discover how artificial intelligence and natural language processing with large language models, deep learning, neural networks, and real-time data are reshaping trading, analysis, and decision support on Wall Street.

How AI Transforms Communication: Key Benefits in Action

31/07/2025

How AI transforms communication: body language, eye contact, natural languages. Top benefits explained. TechnoLynx guides real‑time communication with large language models.

Top UX Design Principles for Augmented Reality Development

30/07/2025

Learn key augmented reality UX design principles to improve visual design, interaction design, and user experience in AR apps and mobile experiences.

AI Meets Operations Research in Data Analytics

29/07/2025

AI in operations research blends data analytics and computer science to solve problems in supply chain, logistics, and optimisation for smarter, efficient systems.

← Back to Blog Overview