System for Automatic Audio-description Generation from Video Content Analysis
Accessibility, Audio description, Deep Learning, Video Description, Automatic Generation
Audio description is an accessibility feature designed to make visual information accessible to blind or low vision people. To increase the range of audio description tracks in digital video applications, we propose a system for automatic audio description generation of movies. The system can use as source of information about the film the original script or the video itself. As a proof of concept, we developed a prototype that automatically generates audio description based on actions taken from the script and objects recognized in the video. The experiments contemplated the application of the solution in fiction films and surveillance videos. For fiction films, an evaluation was made with blind people. The results indicated that through the automatic audio description generated by the solution, it was possible to provide contextual information that can help the user in the general understanding of the story. For surveillance videos, a performance evaluation was made using the delay time of each component. Results indicate that a solution has the potential to be used in contexts that require real-time AD.