AI selects songs that “feel right” for the video

To select the perfect background music for the atmosphere of a video and choose songs that effectively express the emotions of the scenes, a broad knowledge of music is essential. Furthermore, picking a song with the ideal atmosphere from the countless available songs can be tedious and tiresome. Qosmo's ‘Video2Music'’ is an AI tool that assists with such time-consuming work. By simply uploading the video you want to add music to, you can instantly receive suitable candidate tracks.


Select multiple songs that “feel right” with the video

Instantly select multiple songs from the target music library that “feel right” with the given video. There are no restrictions on the genre of the target music.

We selected sound to the silent movie “Metropolis” using Video2Music.

Music→Video search is also available

In addition to the “video→music” search, “music→video” search is also available. In other words, you can also choose videos that perfectly match specific music.



Suggest similar-songs

Combined with Qosmo’s other search algorithms, you can build a wide range of flexible and easy-to-use search services, including the ability to suggest similar-songs.

Use Case

Selection of background music for video content creation in video production companies and TV broadcasters

The video shooting and editing are done, but you can't seem to find the right background music or always end up choosing the same kind of music. We believe many people face this challenge. For video creators, Video2Music can assist in selecting BGM. The usage is simple – just upload the video you've created, and it will provide multiple suggestions of suitable music.

Music selection requests from customers in video production

Song selection has depended on expert librarians at record companies and others who frequently receives requests for matching music. By using Video2Music to scan your existing music library in advance, even selectors who are not well-acquainted with the contents of the music library can quickly narrow down their choices, making it possible to streamline the selection process.

Improve purchase rate on audio and video material sales site

Tags and search words are used by users to find what they are looking for from a large number of stock songs and videos. We can provide a solution to directly address the challenge of “finding the right song for this video.”

Feature integration with movie editing software

Many movie editing software products offer music library for users to choose from. It would greatly increase UX if the product can recommend a timely selection of music that fits with the content in production.


You can see a detailed output of the Video2Music on this page. For a number of videos, 3 candidate songs are listed.

Video2Music detailed output

If you want to try it with your own videos, please use the following Video2Music demo. By uploading your video, you can experience the process of finding suitable music from our music library. Or if you want to try it with your own music library, please feel free to contact us.


Video2Music uses the deep-learning algorithm called Transformer to convert video/music input into mutually comparable latent vector features. Using the Contrastive Learning technique, we successfully calculate quantitatively the fitness between a video and a song, two distinct forms of media. Using online music videos as learning data, we train the model to decipher a relationship between music data and image frames.

The pre-trained model provided with the product license already supports a wide range of input videos and music styles. We can also retrain the model using your own original data to improve accuracy for specific applications.

Tech Spec

Price System

Initial fee (initial library indexing, system integration etc) Monthly fee (charged by fixed rate up to specified number of API calls)


Input: Video (30 seconds or longer) Output: Song candidates (can reverse input and output)

Operating Environment

Cloud: REST API On-premise: Linux-GPU environment

Processing Speed

Indexing: < 3 seconds (per song) Matching (per search) : < 1 second

Other products

