This specification defines WebVTT, the Web Video Text Tracks format. Its main use is for marking up external text track resources in connection with the HTML <track> element. WebVTT files provide captions or subtitles for video content, and also text video descriptions, chapters for content navigation, and more generally any form of metadata that is time-aligned with audio or video content.
List of program can open .vtt files
| Product Name | Company | Actions | 
|---|---|---|
| Atlantis Word Processor | The Atlantis Word Processor Team | open | 
| GOM Player Plus | GOM & Company | Add to GOM Player Plus, open | 
| PotPlayer | Kakao | Add to PotPlayer playlist, open, Play with PotPlayer | 
| VisionTools Pro-e | Crestron Electronics, Inc | open | 
Metadata Tracks
Metadata Tracks are used to convey any additional information (such as base64 encoded images, JSON, additional text or any additional text-based file format) the developer needs to include in the page based on time indexes. A web app can listen for cue events, extract the text of each cue as it fires, parse the data and then use the results to make DOM changes (or perform other JavaScript or CSS tasks) synchronised with media playback.
WEBVTT - Example metadata track containing JSON payload
multiCell
00:01:15.200 --> 00:02:18.800
{
"title": "Multi-celled organisms",
"description": "Multi-celled organisms have different types of cells that perform specialised functions.
  Most life that can be seen with the naked eye is multi-cellular. These organisms are though to have evolved around 1 billion years ago with plants, animals and fungi having independent evolutionary paths.",
"src": "multiCell.jpg",
"href": "http://en.wikipedia.org/wiki/Multicellular"
}
insects
00:02:18.800 --> 00:03:01.600
{
"title": "Insects",
"description": "Insects are the most diverse group of animals on the planet with estimates for the total
  number of current species range from two million to 50 million. The first insects appeared around
  400 million years ago, identifiable by a hard exoskeleton, three-part body, six legs, compound eyes
  and antennae.",
"src": "insects.jpg",
"href": "http://en.wikipedia.org/wiki/Insects"
}
WEBVTT
NOTE
Thanks to http://output.jsbin.com/mugibo
1
00:00:00.100 --> 00:00:07.342
{
 "type": "WikipediaPage",
 "url": "https://en.wikipedia.org/wiki/Samurai_Pizza_Cats"
}
2
00:07.810 --> 00:09.221
{
 "type": "WikipediaPage",
 "url" :"http://samuraipizzacats.wikia.com/wiki/Samurai_Pizza_Cats_Wiki"
}
3
00:11.441 --> 00:14.441
{
 "type": "LongLat",
 "lat" : "36.198269",
 "long": "137.2315355"
}
Good References
- Technical Specs: https://www.w3.org/TR/webvtt1/
- Metadata format can contain image, description, and its hyper link (href): https://www.w3.org/wiki/VTT_Concepts
- WebVTT Example in HTML 5 implemented by Ian Devlin: https://www.iandevlin.com/html5test/webvtt/html5-video-webvtt-sample.html
- Plugins supported: plyr.io, playr, Flowplayer, jwplayer, MediaElement.js, LeanBack Player, SublimeVideo, Video.js, Radiant Media Player. You can also have good information at https://videosws.praegnanz.de/ that shows HTML5 Video Player Comparison.