International Standardization of FTV

FTV (Free-viewpoint Television) is visual media that transmits all ray information of a 3D space and enables immersive 3D viewing. The international standardization of FTV has been conducted in MPEG. The first phase of FTV is multiview video coding (MVC), and the second phase is 3D video (3DV). The third phase of FTV is MPEG-FTV, which targets revolutionized viewing of 3D scenes via super multiview, free navigation, and 360-degree 3D. After the success of exploration experiments and Call for Evidence, MPEG-FTV moved MPEG Immersive project (MPEG-I), where it is in charge of video part as MPEG-I Visual. MPEG-I will create standards for immersive audio-visual services.

FTV is the ultimate 3DTV, with an infinite number of views, and ranks at the top of visual media. FTV enables users to view a 3D scene by freely changing the viewpoint, as we do naturally in the real world. FTV is a natural interface between humans and the environment. It is also an immersive media that enables a realistic VR experience and revolutionizes 3D viewing.
FTV was proposed to the Moving Picture Experts Group (MPEG) in 2001 [13]. Since then, the MPEG has been developing various FTV standards. Multiview video coding (MVC) [14] is the first phase of FTV and enables efficient coding of multiple camera views. 3D video (3DV) [15] is the second phase of FTV and enables viewing adaptations and display adaptations for multiview 3D displays. MPEG started the third phase of FTV [16] in August 2013. This is MPEG-FTV, which targets immersive viewing of 3D scenes via super multiview, free navigation, and 360-degree 3D (360 3D) video. MPEG-FTV moved to MPEG-Immersive project (MPEG-I) [17] in January 2017 and it has been in charge of video part as MPEG-I Visual.
In this paper, international standardization of FTV is described.

HISTORY OF FTV STANDARDIZATION IN MPEG
The MPEG has been developing FTV standards since 2001. The history of FTV standardization in MPEG is shown in Fig. 1. In 2001, FTV was proposed to the MPEG, and the 3D audio visual (3DAV) activity started. In 3DAV activity, many topics, such as omnidirectional video, FTV, stereoscopic video, and 3DTV with depth information, were discussed. According to the results of the call for comments from the industry, discussion converged on FTV and MVC [14]  The MPEG started 3DV as the second phase of FTV in April 2007. 3DV is a standard for multiview 3D displays [15]. View generation was introduced into 3DV to increase the number of views for multiview 3D displays. The 3DV activity moved to the Joint Collaborative Team (JCT)-3V for further standardization processes in July 2012, and 3DV was completed in June 2016.
In August 2013, the MPEG started the third phase of FTV, MPEG-FTV [16], which targets immersive 3D viewing by enhancing the function of view generation. MPEG-FTV moved to MPEG-I [17] for further standardization in January 2017.

FTV FIRST PHASE: MVC STANDARD
The framework of MVC is shown in Fig. 2. MVC targets efficient coding of multiview video. In MVC, the number of input views is the same as output views. The view-generation function of FTV is not included in MVC. Multiview video data have a high correlation among views. This redundancy can be removed by interview predication. It can also be done by using a motion compensation method that is widely used to remove temporal redundancy in conventional video coding. MVC applies motion compensation-like prediction to not only time and but also view directions. MVC was standardized as the extension of H.264/MPEG4-AVC [18]. The MVC standard was adopted by Blu-ray 3D.

FTV SECOND PHASE: 3DV STANDARDS
The framework of 3DV is shown in Figure 3. View synthesis was introduced into 3DV, which sends a small number of views and generates a large number of views at the receiver for multiview displays. A multiview and multi-depth set is jointly compressed and sent to the receiver, and intermediate views are synthesized from views with the assistance of depth information at the receiver. 3DV enables display adaptation and viewing adaptation [19]. The FTV reference model, as shown in Fig. 4, was defined to develop the 3DV standard [20], and 3D warping is used for view synthesis of 3DV. View synthesis by 3D warping is sensitive to error in depth information. Nagoya University provided Depth Estimation Reference Software (DERS) [21] and View Synthesis Reference Software (VSRS) [22], as shown in Fig. 4. It also provided various test sequences such as pantomime, champagne_tower, kendo and balloons. The data format of 3DV is Multiview plus Depth (MVD). Coding standards such as MVC+D, 3D-AVC, MV-HEVC, and 3D-HEVC were developed [23]. Here, MVC+D is a depth-extension of MVC, 3D-AVC is AVC-based MVD joint coding, MV-HEVC is HEVC-based MVC, and 3D-HEVC is HEVC-based MVD joint coding.
Global View and Depth (GVD) [24] can be used as an alternative data format. GVD is a compact version of MVD and is obtained by removing the interview redundancy of MVD.

Motivation and Background
In 2010, the 2022 FIFA World Cup Japan Bid Committee planned to deliver the excitement of the soccer stadium to the world via FTV. It aimed to revolutionize the viewing of the soccer game by super multiview and free navigation. Super multiview realizes very realistic 3D viewing of the scene, and free navigation realizes a walk-through or fly-through experience of the scene. This became a strong motivation for the third phase of FTV.

Framework of FTV
Based on the above motivations and background, the framework of MPEG-FTV was created, as shown in Fig. 5 [25]. FTV has three types of application scenario [26]. The first is super multiview (SMV) with a high number of views and high density for super multiview displays. The second is a single view with freely changing viewpoint for free navigation (FN) in a wide area. Users can enjoy realistic 3D viewing and walk-through/fly-through experiences in 3D scenes. The third is 360 3D video with a wide FoV.

Call for Evidence
After a series of exploration experiments on FTV [27], the MPEG issued a Call for Evidence (CfE) on FTV [28] in June 2016. CfE is a procedure before a call for proposals to show evidence of a new technology that is better than currently available standards. FTV software used for the CfE is described in [29]. Submissions were collected for SMV and FN application scenarios, as shown in Figs. 6 and 7, respectively, in February 2016. Results evaluated in June 2016 showed clear evidence of the new technology [30].
FTV test material and software developed in MPEG-FTV are summarized in [31] and [32], respectively.

MPEG-I
MPEG-FTV moved to MPEG-I in January 2017. MPEG-I was established by integrating the FTV, light field, point cloud, and 360 video ad hoc groups. MPEG-I will create standards for immersive services. MPEG-FTV is in charge of the video aspects of MPEG-I. All application scenarios, requirements, test material, and software for MPEG-FTV were transferred to MPEG-I.
MPEG-I will use various technologies, such as FTV, light field, point cloud, 360 video, and 3D audio, to build immersive services. Therefore, the MPEG has structured MPEG-I as a suite of standards focusing on specific technologies. The five parts to MPEG-I are as follows [33]: •Part 1 -Technical Report on Immersive Media •Part 2 -Application Format for Omnidirectional Media •Part 3 -Immersive Video •Part 4 -Immersive Audio •Part 5 -Point Cloud Compression MPEG-I standards will be developed according to the stages of immersion shown in Fig. 8 [33]. The stages of immersion are categorized by degrees of freedom (DoF), which denotes the number of independent parameters used to define movement of a viewpoint in 3D space. 3DoF 3DoF+ Windowed 6DoF Omnidirectional 6DoF 6DoF Fig. 8 Stages of Immersion in MPEG-I [33].
For example, 3DoF is three rotational and unlimited movements around the X, Y, and Z axes. 3DoF has a fixed viewpoint and no translational movements along the X, Y, and Z axes. A typical use case is a user sitting in a chair looking at 3D 360 VR content on an HMD, as shown at the most left in Fig. 8. On the other hand, 6DoF is 3DoF with full translational movements along the X, Y, and Z axes. A typical use case is a user freely walking through 3D 360 VR content displayed on an HMD, as shown at the most right in Fig. 8. 3DoF+, windowed 6DoF, and omnidirectional 6DoF are stages in between. 3DoF+ is 3DoF with additional limited translational movements along the X, Y, and Z axes. Windowed 6DoF denotes 6DoF with constrained rotational movements around the X and Y axes (pitch and yaw, respectively) and constrained translational movements along the Z axis. Omnidirectional 6DoF denotes 6DoF with constrained translational movements along X, Y, and Z axes.

CONCLUSION
MPEG has been creating various standards on FTV. In the first phase of FTV, MPEG developed the MVC standard. In the second phase of FTV, MPEG developed the MVC+D, 3D-AVC, MV-HEVC and 3D-HEVC standards. The current third phase of FTV is MPEG-FTV. MPEG-FTV targets revolutionized viewing of 3D scenes via super multiview, free navigation, and 360-degree 3D technologies. MPEG-FTV developed test material, reference software, and evaluation methods for them. After the success of the exploration experiments and Call for Evidence, MPEG-FTV moved to MPEG-I and has been in charge of its video part. MPEG-I will create standards for immersive services based on the stages of immersion.