Computer Vision Internships

[OPEN][RI-IML_2019-CV-DM-028] Deep learning for depth estimation from multi-view video

This internship tackles the issue of learning depth estimation from multi-view video. Deep learning is likely to overcome traditional depth estimation techniques, just like most Computer Vision tasks. Latest networks, whose training are supervised by the quality of synthesized views, exhibit impressive results in terms of depth maps. Yet state-of-the-art solutions process each set of successive frames independently. The goal of this internship is to shift to a temporally stable solution, by enforcing the temporal consistency of the generated depth map stream. The framework environment includes an existing dataset of light field video sequences [1] that will be used as ground-truth for training. The neural network design will be based on existing deep learning approaches. Its architecture shall enable the extraction of the temporal redundancy occurring in the input images and enforce the temporal consistency of the output depth maps. To this end a survey of conventional optical flow estimation approaches might be a helper.

[1] Sabater et al., ‘Dataset and Pipeline for Multi-View Light-Field Video’, CVPR 2017

Skills : Machine Learning, Computer Vision, Image & video processing, Python

Keywords : Deep learning, light fields, view synthesis, depth estimation, optical flow estimation

This internship is located in Rennes, France. If interested, please apply at stage.technicolor@technicolor.com by sending us your resume and a cover letter with the internship reference in the email subject line.

[OPEN][RI-ISL_2019-CV-DM-VP-052] One-click video editing

In film post-production, digital makeup of actors to age or de-age them, add or remove scars and generally modify their appearance is a painstaking and time-consuming process, even for skilled artists. In this internship, the goal is to facilitate this process, by helping the development of automatic tools that allow artists to easily edit an actor's appearance across many scenes. The work will revolve around the concept of 'Unwrap mosaics', a novel representation that condenses the appearance of an actor within a video sequence into a single image, which can be edited and propagated back into the video. The candidate will build on a prototype of unwrap mosaics implemented at Technicolor. To ensure that the applied corrections are realistic, the internship will first target the development of an automatic color correction solution. More specifically, the goal is to compute automatically a suite of parametric corrections (color, contrast, sharpness) that will adjust the edit in all frames so that there is visually no seam after insertion. To further reduce the artistic effort required, the internship will also focus on the propagation of edits from one scene to other scenes where the same actor appears, managing both the geometric and the colorimetric corrections necessary. Both deep learning and traditional techniques can be tested and compared. The work will happen in the context of an ongoing and tight collaboration between Technicolor R&I and the film production teams in Hollywood specialized in VFX and digital makeup (https://www.technicolor.com/create/technicolor-los-angeles/vfx), addressing their ongoing needs in active and future projects.

Skills : machine learning, computer vision, video processing

Keywords : .vfx, post-production, image processing

[CLOSED][RI-ISL_2019-CG-CV-VP-017] Augmented Reality Point Clouds: Real Hologram Experience

Color Constancy aims to estimate the color of light source in the image. Many image processing tasks such as scene understanding may benefit from Color Constancy by using the corrected object colors. The goal of this internship is to explore and propose a new framework based on convolutional neural network (CNNs) to achieve the illuminant estimation for color constancy processing. Comparison with traditional methods should be conducted through a user test

Skills : Matlab/Python/C programming, ideally with image processing expertise Ability to write well-structure and documented code Good written and spoken English Excellent team working skills as the internship forms a part of a larger project, involving many team members Ability to work independently

Keywords : Machine Learning, Deep Learning, SVM, Clustering, Color Constancy

[CLOSED][RI-ISL_2019-CG-CV-DM-020] Aging 3D Character

In VFX production (film or advertisement) the need to reconstruct 3d actor’s face from video input is increasing. Over the last decade, the technology that pulls out a 3d facial model from a flat image has been improved significantly, while fine-scale mesoscopic detail may miss out. With the recent growth of deep learning techniques, we believe that morphing a 3d character’s age would be possible, by learning “(de-)aging” from data. Done automating this pipeline brings benefits to the VFX industry, reducing manual labour. Our research team is based in Rennes & New York. And collaborates with engineers and artists located at The Mill, New York.

Skills : Machine learning, Deep-learning, Computer Graphics, Computer Vision, Python, PyTorch, Maya.

Keywords : .deep network, visual effects, facial rig, 3d reconstruction, shape from shading

[CLOSED][RI-ISL_2019-CV-DM-VP-021] Deep Learning for Rotoscoping

Very recently deep learning approaches allowed developing very efficient approaches in various fields (e.g., image/video processing, computer vision, audio processing). This internship proposal targets the development of deep learning approaches for high-end visual effects. In this context, both the interaction with a user (roto artist) and the efficient propagation of the effect throughout a whole sequence are keys to achieve both a highly accurate and efficient process. The proposal will target these two aspects, interaction and spatio-temporal propagation in the context of deep learning segmentation and matting methods. Resulting algorithms might be integrated in a professional VFX software to help the colorists.

Skills : machine learning, deep learning, computer vision, video/image processing, PyTorch, TensorFlow or Keras deep learning frameworks, Python or C++.

Keywords : .machine learning (deep learning), video processing, computer vision, interaction, segmentation, tracking, rotoscoping, matting

[CLOSED][RI-ISL_2019-CV-DM-VP-022] Deep learning for 3D face rig

This internship proposal targets development of deep learning approaches for high-end visual effects (generation and animation of 3D avatars for film studios). Recent techniques, such as MoFA, achieve good 3D face rig reconstruction from still images and videos. However, these face rigs only cover skin parts, missing eyes and mouth interior. To improve this, we propose to study the use of Generative Adversarial Networks (GAN) to fill these parts.

Skills : machine learning, deep learning, computer vision, video/image processing, PyTorch, Python

Keywords : .machine learning, deep learning, video processing, computer vision

[CLOSED][RI-ISL_2019-CG-CV-DM-024] Extraction of quadrupeds motion parameters from video

The goal of the internship is to apply deep learning techniques for the extraction of motion parameters of quadrupeds from video. In order to cope with the lack of ground truth, the approach will build upon both weakly and unsupervised learning. Biomechanical knowledge or possibly tiny manual annotation dataset might also be exploited. The motivation for this work is to develop a statistical model of the motion of some quadrupeds in order to synthesize plausible animation.

The context of this work is the VFX workflow for animated movies industry. This work is part of an effort to automatize the currently very manual process.

The objective is to design the model and the learning methodology for extracting the 3D coordinates of a moving quadruped in video.

The expected outcome of the internship are :
- A model with the quantitative evaluation of its performance
- A description of the approach which might lead to a publication or patent
- A demo which will visually display the produced 3D animation

References
- Zhou, Xingyi, Qixing Huang, Xiao Sun, Xiangyang Xue, and Yichen Wei. "Weakly-supervised Transfer for 3D Human Pose Estimation in the Wild." arXiv preprint arXiv:1704.02447 (2017).
- Newell, Alejandro, Kaiyu Yang, and Jia Deng. "Stacked hourglass networks for human pose estimation." In European Conference on Computer Vision, pp. 483-499. Springer International Publishing, 2016.

Skills : deep learning

Keywords : deep learning

[CLOSED][RI-IML_2019-CV-DM-029] Deep learning for light field video synthesis

This internship tackles the issue of learning light field video synthesis from a view subset of a camera array. In the last few years, numerous deep learning architectures have been proven efficient for still image synthesis, including for large baseline setups. Next step consists in shifting to video, i.e. enforcing temporal stability and consistency. The framework environment includes an existing dataset of light field video sequences [1]. The intern’s research and developments will be built on the top on existing view synthesis deep learning literature, such as [2]. The network design shall successfully exploit the redundancy occurring in the successive input images and explicitly enforce the output temporal consistency. To this end a survey of conventional optical flow estimation approaches might be a helper.

[1] Sabater et al., ‘Dataset and Pipeline for Multi-View Light-Field Video’, CVPR 2017
[2] Flynn et al., ‘DeepStereo: Learning to Predict New Views from the World's Imagery’, CVPR 2016

Skills : Machine Learning, Computer Vision, Image & video processing, Python

Keywords : Deep learning, light fields, view synthesis, depth estimation, optical flow estimation

[CLOSED][RI-IML_2019-CV-HCI-031] Automatic extraction and encoding of haptic effects

In order to develop new cinematographic and Mixed-Reality (MR) experiences, immersive visual content is being developed at Technicolor. Besides, sensory effects, such as haptic effects, can be added to this media. However, the creation and encoding of this multisensory content are still open issues.

The purpose of the internship is to contribute to this topic by designing algorithms to automatically extract haptic information from immersive videos. The effects should be then encoded into a data format in order to be streamed and rendered to the appropriate end-user terminal. Thus a data format will need to be specified and then implemented into an existing streaming platform (inc. developing the necessary encoder, decoder and frame packing). The second part of the internship will focus on the extraction of the camera motion from an omnidirectional video

[1]. This data will be used to drive a motion platform. Additionally, more haptic information could be extracted. [1] Lee, J., Han, B., & Choi, S. (2016). Motion effects synthesis for 4D films. IEEE transactions on visualization and computer graphics, 22(10), 2300-2314. Skills : Computer vision, C++, Maths, English, motivated by research. Keywords: Haptics, video encoding, automatic extraction

Skills : Computer vision, C++, Maths, English, motivated by research.

Keywords : Haptics, video encoding, automatic extraction

[CLOSED][RI-IML_2019-CG-CV-DM-032] Joint deep completion of geometry and texture for Mixed Reality applications

We are proposing a 6 months internship in the Mixed Reality team, focusing on 3D scene completion with deep learning. The internship could be continued as a PhD scholarship. Scanning indoor scenes with RGB-D sensors usually ends up with incomplete scenes, with many missing geometry and texture details. These are not good enough for high-end Augmented and Mixed Reality applications. Classical approaches try to extrapolate scanned information towards missing regions using basic priors on existing patterns. Deep learning, on the other hand, provides a powerful framework to learn patterns from existing 3D scans and 2D images, from local details to global contextual information, which can be exploited to reconstruct missing parts. We aim at developing a multi-purpose tool for scene completion, based on deep learning, combining both colour and geometry information, respecting constraints provided by scanned regions, and taking scene classification and semantics into account. The solution will be integrated into a larger pipeline, and it will be used for Virtual Reality, Mixed Reality and Diminished Reality applications.

Skills : Computer Vision, Machine Learning, Image/Video Processing, 3D geometry, C++/Python, fluent English, good team spirit and communication skills

Keywords : Deep learning, 3D modeling, scene completion, inpainting, semantic labelling, Augmented reality

[CLOSED][RI-HOME_2019-CV-DM-HCI-033] Advanced deep learning methods for audio event detection in domestic environment

The internship addresses detection of audio events in domestic environment for emerging real life applications to be implemented within a set-top-box.

This task, which has been benchmarked in DCASE challenges (see [1] for the DCASE 2018), has attracted a lot of attention in the past few years. With the advances in deep neural networks (DNN) and the release of large-scale audio datasets, numerous approaches have been investigated in the literature, including both supervised and weakly-supervised [2] methods. Grounded on DCASE 2019 challenge with benchmarked datasets, the internship targets to build a state-of-the-art DNN model to do the inference accurately. Several settings might be considered: single channel vs. multichannel inputs, supervised vs. weakly supervised learning where the annotations are noisy and/or incomplete. The intern will conduct both research and implemention while investigating the use of advanced DNN architectures and data augmentation strategies for the considered tasks. Depending on the actual work and the obtained result, the work may be concluded by a participation in the DCASE 2019 challenge and by a submission of a scientific publication in an international conference/workshop.

[1] IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE), http://dcase.community/challenge2018/.
[2] Romain Serizel et al., “Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments,” Proc. DCASE2018 Workshop, July 2018. https://hal.inria.fr/hal-01850270.

Skills : Machine learning (deep learning), audio processing, Python.

Keywords : Machine learning (deep learning), audio signal processing, weakly supervised learning, acoustic event detection.

[CLOSED][RI-IML_2019-CG-CV-DM-040] Change Detection & 3D Model Update for Mixed Reality applications

Mixed Reality (MR) applications are based on the creation of a new world that merges 3D virtual assets with the real environment of the user. Reconstructing 3D point clouds or meshes to represent the real world is key, and for the apps to be even more realistic and user engaging, MR experiences should not restrict to static environment but also adapt to temporal changes. This internship will focus on the detection of geometric changes between different observations of the same scene captured at different instants. Semantic segmentation based on deep learning will serve in this process as a powerful tool to identify objects that have been moved, introduced or removed. The oldest 3D model will be updated with the most recent observations.

The intern will be included in the Immersive Lab within the Mixed Reality group at Technicolor Rennes. The proposed solution will be integrated into a larger pipeline and used for MR demonstrations. The internship could be continued as a PhD scholarship. Applicants should be strongly motivated by research.

Skills : Computer vision, machine learning, 3D geometry, C++/Python, fluent English, good team spirit and communication skills

Keywords : Deep learning, 3D modelling, semantic labelling, augmented reality

[CLOSED][RI-IML_2019-CG-CV-DM-043] 3D scene relighting for Mixed Reality applications

Mixed Reality (MR) applications are based on the creation of a new world that merges 3D virtual assets with the real environment of the user. These assets are often virtual objects or figures that can move in the real scene. Other MR applications can consist on retexturing or relighting the scene. This latter application is the subject of this internship.

The study will focus on realistic relighting effects in a mixed scene in presence of real lights. The scenarios will include the insertion of virtual lights to produce realistic effects (shadows, shading, specular effects…) as well as the removal of real lighting effects. This requires the estimation of lighting as well as surface reflectance properties in the real scene via image and 3D processing. Moreover, the stability of rendered lighting effects over time will be addressed.

Skills : computer vision, 3D geometry, C++/Python, machine learning, fluent English, good team spirit and communication skills

Keywords : lighting and reflectance modelling, 3D modelling, rendering & relighting, mixed reality, deep learning