Problem:

Our client faced a security challenge that required monitoring human actions within a specific area using cost-effective CCTV installations. The goal was to detect suspicious behaviour in real time and flag it for further investigation. The solution needed to be robust yet affordable, as the existing infrastructure relied on basic camera setups without advanced capabilities.

The main challenge stemmed from the client’s limited budget, which restricted the option of upgrading to high-end cameras or implementing complex, GPU-backed processing systems from the start. Despite the budget constraints, the client needed the solution to be reliable and capable of detecting actions that could indicate potential security risks. The system had to work efficiently even with standard graphics processing units (GPUs) and off-the-shelf video cards.

Solution:

Our initial approach to solving the problem focused on using deep learning models. We intended to rely heavily on neural networks and action recognition techniques, which are known for their high performance in large-scale systems with access to abundant, high-quality training data. These systems can process video feeds, classify actions, and identify suspicious behaviour through continuous learning from labelled datasets.

However, as the project progressed, it became clear that the expected quantity and quality of training data could not be supplied. A large-scale deep learning model requires high-resolution video feeds, extensive datasets for pre-trained models, and considerable computing power, which simply wasn’t available for this particular project. Without enough real-world examples of suspicious actions, the deep learning model could not be fully trained to recognise specific actions or behaviours.

Recognising this limitation, our team shifted to a hybrid model. Instead of purely relying on neural networks, we decided to integrate transfer learning techniques for the parts of the project that dealt with modelling human bodies. Transfer learning allows us to take advantage of pre-trained deep learning models that have already been exposed to large datasets. We then adapted these models to recognise the basic structure and movement of the human body, without needing to start from scratch with a new training set.

To compensate for the lack of data in identifying suspicious actions, we incorporated a rules-based approach into the system. This rules-based method operates based on predefined sets of conditions that represent unusual or suspicious behaviour. These rules can include unexpected movements, actions that violate normal behaviour patterns, or lingering in restricted areas.

This hybrid model enabled us to process the video feeds using standard graphics cards and mid-range GPUs. We used PyTorch to handle the deep learning aspects of human body recognition and vectorised NumPy code for the rules-based logic. The rules-based components are more computationally efficient, requiring less GPU processing than the deep learning models, allowing the system to run on dedicated graphics cards without the need for high-end GPUs or video cards. This approach also optimised the clock speed and performance of the system, ensuring smooth operation within the existing hardware constraints.

Results:

The proof-of-concept delivery of the system was deemed a success, given the limitations of the available training data. Although the system did not perform with the same level of autonomy as originally planned, the hybrid model allowed for reliable human action recognition. Human supervision was still required to validate the flagged actions, but the system provided a strong foundation for future developments.

The combination of deep learning and a rules-based approach proved effective. The system was able to recognise specific actions and identify when those actions violated the preset rules. While human operators are still necessary for the final verification of suspicious behaviour, this hybrid system significantly reduces the workload by narrowing down the number of incidents they need to review.

Additionally, the successful deployment of this system opened up the opportunity for future improvements. With the system in place, the client can begin to collect more training data from real-world use. Over time, as more suspicious actions are captured and labelled, the dataset will grow, allowing for the deep learning component of the system to become more effective. This will eventually lead to a reduction in the reliance on human supervision, as the model becomes more capable of identifying suspicious actions independently.

One of the key outcomes was the optimisation of the system to run on mid-range GPUs, which were sufficient for processing both the deep learning and rules-based components. By using GPU-accelerated computing for the deep learning tasks, we managed to significantly boost the system’s performance without the need for expensive, high-end video cards. The discrete GPUs used in the system were able to handle the complex tasks of human body recognition and action classification while maintaining high clock speeds and performance levels.

The system also benefited from techniques like ray tracing, which improved the quality of visual inputs by tracking the movement of objects and people in higher resolution. This enhanced the clarity of the video feeds, allowing the system to detect small, subtle movements that might indicate suspicious actions.

Moreover, the use of optical flow in computer vision helped to track movement and direction within the video feeds. Optical flow refers to the pattern of apparent motion of objects in a visual scene. This was crucial in detecting actions like someone moving into restricted areas or behaving in an unusual manner. By leveraging pre-trained models and applying them to real-time video streams, the system could track and classify human actions more effectively.

Future Potential:

With the system in place, the client has the opportunity to upgrade it further by enhancing the action recognition and classification aspects. For instance, as the client begins to gather more data from actual incidents, they can use this data to improve the performance of the deep learning models. This would allow the system to detect more complex behaviours and reduce the need for manual intervention.

The system can also be scaled to higher-resolution video feeds or be applied to a wider range of security tasks. With better GPUs and higher-performance video cards, the system could be used for tasks like video editing, large-scale monitoring, or even virtual reality (VR) applications in security settings.

In the future, the client could implement more advanced neural networks to make the system more autonomous. With the use of dedicated graphics cards, the system could handle real-time analysis of large video streams without the need for human supervision. This could greatly increase the efficiency of the monitoring process, allowing for faster detection and response to suspicious activities.

TechnoLynx’s flexible approach ensures that the system can evolve alongside technological advancements. Our deep understanding of both machine learning and real-world constraints allowed us to deliver a solution that fits within the client’s budget while still offering high performance. The use of pre-trained models, combined with rules-based logic, provided a cost-effective solution that can be further enhanced as the client’s needs grow.

Conclusion:

In summary, our client’s security-related problem was successfully addressed through a combination of deep learning and rules-based logic. The hybrid model allowed for real-time monitoring of suspicious behaviour using cost-effective hardware, including mid-range GPUs and dedicated graphics cards. Although the system still requires human supervision, it significantly reduces the workload by pre-screening suspicious actions and flagging them for review.

As the client collects more data from real-world usage, the system can be further improved to provide more autonomous action recognition and classification. By utilising modern techniques like optical flow, ray tracing, and GPU acceleration, the system is well-equipped to handle future challenges in security monitoring and action classification.

Image by Freepik
Image by Freepik