{"id":40,"date":"2025-10-14T11:29:39","date_gmt":"2025-10-14T09:29:39","guid":{"rendered":"https:\/\/rocketzki.com\/?p=40"},"modified":"2025-10-14T11:29:39","modified_gmt":"2025-10-14T09:29:39","slug":"how-to-modify-a-melody-using-python-part-i","status":"publish","type":"post","link":"https:\/\/rocketzki.com\/?p=40","title":{"rendered":"How to modify a melody using Python? Part I"},"content":{"rendered":"\n<p><strong>Starting with this article<\/strong> I am introducing a series on an original (at least for me and this blog) topic. <strong>Firstly <\/strong>because of the<strong> programming language<\/strong> that I\u2019m using to show my idea and <strong>secondly <\/strong>because of the subject \u2013 <strong>melody modification<\/strong>, or to be more precise, <strong>sound processing using Python language.<\/strong><\/p>\n\n\n\n<p>I have written a thesis on this subject in my native language (polish) already, therefore I decided to start this series for the worldwide audience to be able to learn about it.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Theory<\/h2>\n\n\n\n<p>First of all we need to introduce a <strong>few key concepts<\/strong> so we can transform them into a working application. We need to understand the domains that are <strong>sound <\/strong>and<strong> signal processing<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Sound<\/h3>\n\n\n\n<p>Every <strong>sound <\/strong>that we can hear can <strong>is really a wave<\/strong> that propagates through the air. It is a<strong> change in pressure of air over time<\/strong>.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"680\" height=\"424\" src=\"https:\/\/rocketzki.com\/wp-content\/uploads\/2025\/10\/The_Elements_of_Sound_jpg.jpg\" alt=\"\" class=\"wp-image-41\" srcset=\"https:\/\/rocketzki.com\/wp-content\/uploads\/2025\/10\/The_Elements_of_Sound_jpg.jpg 680w, https:\/\/rocketzki.com\/wp-content\/uploads\/2025\/10\/The_Elements_of_Sound_jpg-300x187.jpg 300w\" sizes=\"auto, (max-width: 680px) 100vw, 680px\" \/><\/figure>\n\n\n\n<p class=\"has-small-font-size\">Fig. 1, By Rburtonresearch \u2013 Own work, CC BY-SA 4.0, https:\/\/commons.wikimedia.org\/w\/index.php?curid=45035734<\/p>\n\n\n\n<p>Those changes in pressure, amplitudes <strong>can be measured and saved as a file to the computer<\/strong> (as a WAVE file for instance). WAVE file contains a clear information about the amplitudes.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Signal processing<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">FT &amp; FFT<\/h4>\n\n\n\n<p>Ok, we have skimmed the topic already but how to <strong>find what frequency we\u2019re hearing at a particular moment in time? <\/strong>We need to use a brilliant method called <a href=\"https:\/\/web.archive.org\/web\/20230618021605\/https:\/\/en.wikipedia.org\/wiki\/Fourier_transform\"><strong>Fourier Transform<\/strong><\/a><strong> (FT) <\/strong>or to be more precise its <strong>computer-suited algorithm version called <a href=\"https:\/\/web.archive.org\/web\/20230618021605\/https:\/\/en.wikipedia.org\/wiki\/Fast_Fourier_transform\">Fast Fourier Transform<\/a><\/strong> <strong>(FFT)<\/strong>. A naive version of this algorithm <strong>(FT) <\/strong>is <strong>really ineffective <\/strong>and would take too much time to process sound data so<strong> that\u2019s where FFT comes in<\/strong>. The history of <strong>FT <\/strong>and <strong>FFT <\/strong>is a great idea for a different article, though.<\/p>\n\n\n\n<p>Thanks to wiki we can vizualize the idea of <strong>FT <\/strong>on charts.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/web.archive.org\/web\/20230618021605im_\/https:\/\/upload.wikimedia.org\/wikipedia\/commons\/thumb\/3\/30\/FFT_of_Cosine_Summation_Function.svg\/722px-FFT_of_Cosine_Summation_Function.svg.png\" alt=\"File:FFT of Cosine Summation Function.svg\nsource: https:\/\/en.wikipedia.org\/wiki\/File:FFT_of_Cosine_Summation_Function.svg\"\/><figcaption class=\"wp-element-caption\">Fig 2. Function and its<\/figcaption><\/figure>\n\n\n\n<p>The first figure shows well already known <strong>amplitude over time <\/strong>chart for function:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/web.archive.org\/web\/20230618021605im_\/https:\/\/rocketzki.com\/wp-content\/uploads\/2021\/01\/obraz.png\" alt=\"\" class=\"wp-image-1056\"\/><\/figure>\n\n\n\n<p>The latter presents<strong> the result of the Fourier Transform operator <\/strong>(mathematically FT speaking it\u2019s a<a href=\"https:\/\/web.archive.org\/web\/20230618021605\/https:\/\/mathworld.wolfram.com\/LinearOperator.html\"> linear operator<\/a>) applied to the sample input shown in the first chart. As a result we get <strong>frequencies that make up the input wave and their power in dB<\/strong>.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">STFT<\/h4>\n\n\n\n<p>We have frequencies of the entire sample but <strong>we still don\u2019t know what frequency (note) the sound has at a certain point in time<\/strong>. For that problem, we have to use an another briliant method with a cool acronym<strong> <a href=\"https:\/\/web.archive.org\/web\/20230618021605\/https:\/\/en.wikipedia.org\/wiki\/Short-time_Fourier_transform\">Short Time Fourier Transform<\/a> (STFT)<\/strong>.<strong> STFT allows us to calculate the frequency and its power in dB at a certain point in time.<\/strong><\/p>\n\n\n\n<p><strong>A note<\/strong>, for instance a <strong>C7 <\/strong>sound, <strong>is a sound wave<\/strong> that has the freqeuency of <strong>2093.00 Hz<\/strong>. Table available at <a href=\"https:\/\/web.archive.org\/web\/20230618021605\/https:\/\/pages.mtu.edu\/~suits\/notefreqs.html\">https:\/\/pages.mtu.edu\/~suits\/notefreqs.html<\/a> shows freqiencies for most of the sound notes that a human ear can hear.<\/p>\n\n\n\n<p><strong>The conclusion is that<\/strong> <strong>having a value of frequency and its power at a certain point in time allows us to reproduce the melody of the entire sample!<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/web.archive.org\/web\/20230618021605im_\/https:\/\/rocketzki.com\/wp-content\/uploads\/2021\/01\/obraz-1.png\" alt=\"\" class=\"wp-image-1062\"\/><figcaption class=\"wp-element-caption\">Fig. 3, STFT result on human speech sample \u2013 Spectral Analysis<\/figcaption><\/figure>\n\n\n\n<p>The figure above depicts the <strong>result of STFT<\/strong> applied to <strong>the sound sample<\/strong> \u2013 a spectral analysis. (it\u2019s a different sample, not the same wave as in two previous figures). The green line, labeled as <strong>F0<\/strong>, is a chart of the melody computed by the <a href=\"https:\/\/web.archive.org\/web\/20230618021605\/https:\/\/librosa.org\/doc\/main\/generated\/librosa.pyin.html\">pYIN <\/a>algorithm. <strong>By modyfing the frequencies at certain points in time we can change the melody.<\/strong> Then we just need to use <strong>iSTFT<\/strong> <strong>(reverse opertion)<\/strong> to acquire modified sound.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Fundamental frequency<\/h4>\n\n\n\n<p>One last piece of theory. What is the <strong>fundamental frequency <\/strong>labeled as <strong>F0<\/strong>?<\/p>\n\n\n\n<p>After wikipedia<em> \u201cthe fundamental is the musical <a href=\"https:\/\/web.archive.org\/web\/20230618021605\/https:\/\/en.wikipedia.org\/wiki\/Pitch_(music)\">pitch<\/a> of a note that is perceived as the lowest <a href=\"https:\/\/web.archive.org\/web\/20230618021605\/https:\/\/en.wikipedia.org\/wiki\/Harmonic_series_(music)#Partial\">partial<\/a> present.\u201d<\/em> <strong>Pitch <\/strong>can me mapped to <strong>frequency <\/strong>in our terms. Looking at the<em> Fig. 3<\/em> we can see the brithest area spanning from around <em>0.06<\/em> to <em>0.36 s<\/em>. \u2013 the same that has been marked with blue line. <strong>This is the lowest frequency of all <a href=\"https:\/\/web.archive.org\/web\/20230618021605\/https:\/\/en.wikipedia.org\/wiki\/Harmonic\">harmonics<\/a> at certain point in time. <\/strong>All the higher harmonics (F1, F2) can also be seen in the figure as the brighter stripes (higher power in dB) in higher frequencies.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Implementation<\/h3>\n\n\n\n<p>That\u2019s a lot of theory going on behind this concept. Luckily, we have a wonderful Python library that does the math for us so we can just focus on our idea instead of implementing it by hand (that would be fun too though). <strong><a href=\"https:\/\/web.archive.org\/web\/20230618021605\/https:\/\/librosa.org\/doc\/main\/index.html\">librosa <\/a><\/strong>is a rich library for music and audio analysis. We just need to <a href=\"https:\/\/web.archive.org\/web\/20230618021605\/https:\/\/www.python.org\/downloads\/\">get Python<\/a>, install it, then get librosa and install it too. I am using <a href=\"https:\/\/web.archive.org\/web\/20230618021605\/https:\/\/www.jetbrains.com\/pycharm\/\">JetBrains\u2019 PyCharm<\/a> which makes it quite easy to do but you are free to choose your favorite IDE.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">PyCharm<\/h4>\n\n\n\n<p>Once you have acquired <strong>PyCharm <\/strong>go to <strong>File -&gt; Settings.<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/web.archive.org\/web\/20230618021605im_\/https:\/\/rocketzki.com\/wp-content\/uploads\/2021\/01\/obraz-2.png\" alt=\"\" class=\"wp-image-1069\"\/><\/figure>\n\n\n\n<p>Then choose <strong>Python interpreter <\/strong>and click on a <strong>\u2018plus\u2019<\/strong> sign. Type in \u2018librosa\u2019 and choose the latest version (0.8.0 at the time of writing this). Click <strong>Install package<\/strong>, then <strong>OK<\/strong>, wait for it to be installed and you are good to go.<\/p>\n\n\n\n<p>Another libs that <strong>should be <\/strong>installed for showing a plot are \u2018<a href=\"https:\/\/web.archive.org\/web\/20230618021605\/https:\/\/matplotlib.org\/\">matplotlib<\/a>\u2018 and \u2018<a href=\"https:\/\/web.archive.org\/web\/20230618021605\/https:\/\/numpy.org\/\">numpy<\/a>\u2018. Repeat the steps above and install the newest version.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Testing librosa<\/h4>\n\n\n\n<p>Create a new .py file and write the code below.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import librosa\nfrom librosa import display\nimport matplotlib.pyplot as plt\nimport numpy as np\n\nfig, ax = plt.subplots()\n\ny, sr = librosa.load(librosa.ex('trumpet'))\nstft_absolute_values = np.abs(librosa.stft(y))\nimg = display.specshow(librosa.amplitude_to_db(stft_absolute_values, ref=np.max), y_axis='log', x_axis='time', ax=ax)\n\nax.set_title('Power spectrogram')\n\nfig.colorbar(img, ax=ax, format=\"%+2.0f dB\")\nplt.show()<\/code><\/pre>\n\n\n\n<p>Run the code and voila! We have quickly transformed sound waves to frequencies and created a really neat diagram using Python and librosa. Good job.<\/p>\n\n\n\n<p>This is where I stop. In the <strong>next part <\/strong>I am going to show you an actual implementation of <strong>melody modification code<\/strong>, samples conversion using STFT and vice versa. Explain thoroughly how to read, understand end modify the results of Fourier Transformation. Hope to see you there!<\/p>\n\n\n\n<p>Should you any questions or remarks, feel free to <a href=\"https:\/\/web.archive.org\/web\/20230618021605\/https:\/\/rocketzki.com\/index.php\/about-michal-markieta-rocketzki\/\">reach me!<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Starting with this article I am introducing a series on an original (at least for me and this blog) topic. Firstly because of the programming language that I\u2019m using to show my idea and secondly because of the subject \u2013 melody modification, or to be more precise, sound processing using Python language. I have written [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":42,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-40","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bez-kategorii"],"_links":{"self":[{"href":"https:\/\/rocketzki.com\/index.php?rest_route=\/wp\/v2\/posts\/40","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rocketzki.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rocketzki.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rocketzki.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rocketzki.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=40"}],"version-history":[{"count":1,"href":"https:\/\/rocketzki.com\/index.php?rest_route=\/wp\/v2\/posts\/40\/revisions"}],"predecessor-version":[{"id":43,"href":"https:\/\/rocketzki.com\/index.php?rest_route=\/wp\/v2\/posts\/40\/revisions\/43"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/rocketzki.com\/index.php?rest_route=\/wp\/v2\/media\/42"}],"wp:attachment":[{"href":"https:\/\/rocketzki.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=40"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rocketzki.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=40"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rocketzki.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=40"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}