[OpenCV/Python] 악보 인식(디지털 악보 인식)

7. 인식 과정 - 음표(머리)

음표 인식은 악보의 구성요소들 중 가장 인식하기가 어렵습니다.

음표는 머리, 기둥, 꼬리, 점으로 이루어져 있으며 각각의 요소들을 인식 후

조합하여 어떤 음표인지 분류해야 합니다.

물론, 단순히 음표 모양을 템플릿으로 만들어 템플릿매칭을 통해 인식하거나

음표 모양 이미지를 많이 수집하여 딥러닝 시키는 방법 또한 존재합니다.

해당 포스팅에서는 음표의 구조적 특성을 이용하여 인식, 분류하는 알고리즘을 구현해보겠습니다.

먼저, 음표의 구성요소들을 탐색하기 전에, 음표가 될 수 있는 최소조건을 찾아보고, 1차로 걸러내도록 하겠습니다.

그전에 functions.py에 함수 하나를 정의하겠습니다.

# functions.py
import cv2
import numpy as np

def count_rect_pixels(image, rect):
    x, y, w, h = rect
    pixels = 0
    for row in range(y, y + h):
        for col in range(x, x + w):
            if image[row][col] == 255:
                pixels += 1
    return pixels

탐색할 사각형 형태의 구역에 픽셀이 몇개 존재하는지 카운트해주는 함수입니다.

stats의 area값을 이용해도 되지만 closing_image에서 가져온 값이라 부정확할 수 있습니다.

그뒤 recognition_modules.py에 음표 인식 함수를 만들어 놓겠습니다.

# recognition_modules.py
import functions as fs

def recognize_note(image, staff, stats, stems, direction):
    x, y, w, h, area = stats
    if len(stems):
        fs.put_text(image, w, (x, y + h + fs.weighted(30)))
        fs.put_text(image, h, (x, y + h + fs.weighted(60)))
        fs.put_text(image, fs.count_rect_pixels(image, (x, y, w, h)), (x, y + h + fs.weighted(90)))

    pass

온음표는 제외하고 기둥이 있는 음표만을 대상으로 탐색하겠습니다.

# modules.py
import cv2
import numpy as np
import functions as fs
import recognition_modules as rs

def recognition(image, staves, objects):
    key = 0
    time_signature = False
    beats = []  # 박자 리스트
    pitches = []  # 음이름 리스트

    for i in range(1, len(objects)):
        obj = objects[i]
        line = obj[0]
        stats = obj[1]
        stems = obj[2]
        direction = obj[3]
        (x, y, w, h, area) = stats
        staff = staves[line * 5: (line + 1) * 5]
        if not time_signature:  # 조표가 완전히 탐색되지 않음 (아직 박자표를 찾지 못함)
            ts, temp_key = rs.recognize_key(image, staff, stats)
            time_signature = ts
            key += temp_key
        else:  # 조표가 완전히 탐색되었음
            rs.recognize_note(image, staff, stats, stems, direction)

        cv2.rectangle(image, (x, y, w, h), (255, 0, 0), 1)
        fs.put_text(image, i, (x, y - fs.weighted(20)))

    return image, key, beats, pitches

위에서부터 순서대로 넓이, 높이, 픽셀의 개수입니다.

최소 10 이상의 넓이와 35 이상의 높이, 120개 이상의 픽셀이 존재하는군요.

해당 악보에는 머리가 비어있는 2분음표가 없어서 2분음표의 경우도 확인해보니

넓이와 높이는 비슷하고, 픽셀은 조금 더 적을 수도 있을 것 같습니다.

# recognition_modules.py
import functions as fs

def recognize_note(image, staff, stats, stems, direction):
    (x, y, w, h, area) = stats
    notes = []
    pitches = []
    note_condition = (
        len(stems) and
        w >= fs.weighted(10) and  # 넓이 조건
        h >= fs.weighted(35) and  # 높이 조건
        area >= fs.weighted(95)  # 픽셀 갯수 조건
    )
    if note_condition:
        for i in range(len(stems)):
            stem = stems[i]
            recognize_note_head(image, stem, direction)

    pass

음표가 될 수 있는 최소 조건을 통과한 객체들에 한하여 음표 머리를 탐색하는 알고리즘을 구현해보도록 하겠습니다.

for i in range(len(stems))가 의미하는 바는 위 악보에서 15번 객체처럼 한 객체 안에 여러 음표가 있을 수 있기 때문에

직선 성분의 개수만큼 탐색하는 것입니다.

# recognition_modules.py
import functions as fs
import cv2

def recognize_note_head(image, stem, direction):
    (x, y, w, h) = stem
    if direction:  # 정 방향 음표
        area_top = y + h - fs.weighted(7)  # 음표 머리를 탐색할 위치 (상단)
        area_bot = y + h + fs.weighted(7)  # 음표 머리를 탐색할 위치 (하단)
        area_left = x - fs.weighted(14)  # 음표 머리를 탐색할 위치 (좌측)
        area_right = x  # 음표 머리를 탐색할 위치 (우측)
    else:  # 역 방향 음표
        area_top = y - fs.weighted(7)  # 음표 머리를 탐색할 위치 (상단)
        area_bot = y + fs.weighted(7)  # 음표 머리를 탐색할 위치 (하단)
        area_left = x + w  # 음표 머리를 탐색할 위치 (좌측)
        area_right = x + w + fs.weighted(14)  # 음표 머리를 탐색할 위치 (우측)

    cv2.rectangle(image, (area_left, area_top, area_right - area_left, area_bot - area_top), (255, 0, 0), 1)

    pass

먼저 머리가 존재하는 범위는 정해져 있기에 탐색 구역을 지정할 수 있습니다.

정방향 음표는 기둥의 왼쪽 아래에 머리가 위치하고 있습니다.

역방향 음표는 기둥의 오른쪽 위에 머리가 위치하고 있죠.

범위를 확인해보면 위와 같습니다.

# recognition_modules.py
import functions as fs
import cv2

def recognize_note_head(image, stem, direction):
    (x, y, w, h) = stem
    if direction:  # 정 방향 음표
        area_top = y + h - fs.weighted(7)  # 음표 머리를 탐색할 위치 (상단)
        area_bot = y + h + fs.weighted(7)  # 음표 머리를 탐색할 위치 (하단)
        area_left = x - fs.weighted(14)  # 음표 머리를 탐색할 위치 (좌측)
        area_right = x  # 음표 머리를 탐색할 위치 (우측)
    else:  # 역 방향 음표
        area_top = y - fs.weighted(7)  # 음표 머리를 탐색할 위치 (상단)
        area_bot = y + fs.weighted(7)  # 음표 머리를 탐색할 위치 (하단)
        area_left = x + w  # 음표 머리를 탐색할 위치 (좌측)
        area_right = x + w + fs.weighted(14)  # 음표 머리를 탐색할 위치 (우측)

    cv2.rectangle(image, (area_left, area_top, area_right - area_left, area_bot - area_top), (255, 0, 0), 1)

    cnt = 0  # cnt = 끊기지 않고 이어져 있는 선의 개수를 셈
    cnt_max = 0  # cnt_max = cnt 중 가장 큰 값
    pixel_cnt = fs.count_rect_pixels(image, (area_left, area_top, area_right - area_left, area_bot - area_top))

    for row in range(area_top, area_bot):
        end, pixels = fs.get_line(image, fs.HORIZONTAL, row, area_left, area_right, 5)
        if pixels:
            cnt += 1
            cnt_max = max(cnt_max, pixels)

    fs.put_text(image, cnt, (x - fs.weighted(10), y + h + fs.weighted(30)))
    fs.put_text(image, cnt_max, (x - fs.weighted(10), y + h + fs.weighted(60)))
    fs.put_text(image, pixel_cnt, (x - fs.weighted(10), y + h + fs.weighted(90)))

    pass

음표 머리를 탐색하고 분류할 때 3가지의 특징점을 사용하겠습니다.

끊이지 않고 이어지는 가로선이 몇 개 있는지, 가로선 중 가장 긴 가로선, 머리 부분에 존재하는 픽셀

3가지를 이미지에 찍어보겠습니다.

머리가 채워진 음표 기준 가로선의 개수는 약 9~10개, 가장 긴 가로선은 평균 10,

픽셀의 개수는 최소 80개인 것 같습니다.

머리가 비워진 음표는 그보다 낮은 값들이 반환되었습니다.

# recognition_modules.py
import functions as fs
import cv2

def recognize_note_head(image, stem, direction):
    (x, y, w, h) = stem
    if direction:  # 정 방향 음표
        area_top = y + h - fs.weighted(7)  # 음표 머리를 탐색할 위치 (상단)
        area_bot = y + h + fs.weighted(7)  # 음표 머리를 탐색할 위치 (하단)
        area_left = x - fs.weighted(14)  # 음표 머리를 탐색할 위치 (좌측)
        area_right = x  # 음표 머리를 탐색할 위치 (우측)
    else:  # 역 방향 음표
        area_top = y - fs.weighted(7)  # 음표 머리를 탐색할 위치 (상단)
        area_bot = y + fs.weighted(7)  # 음표 머리를 탐색할 위치 (하단)
        area_left = x + w  # 음표 머리를 탐색할 위치 (좌측)
        area_right = x + w + fs.weighted(14)  # 음표 머리를 탐색할 위치 (우측)

    cnt = 0  # cnt = 끊기지 않고 이어져 있는 선의 개수를 셈
    cnt_max = 0  # cnt_max = cnt 중 가장 큰 값
    head_center = 0
    pixel_cnt = fs.count_rect_pixels(image, (area_left, area_top, area_right - area_left, area_bot - area_top))

    for row in range(area_top, area_bot):
        col, pixels = fs.get_line(image, fs.HORIZONTAL, row, area_left, area_right, 5)
        pixels += 1
        if pixels >= fs.weighted(5):
            cnt += 1
            cnt_max = max(cnt_max, pixels)
            head_center += row

    head_exist = (cnt >= 3 and pixel_cnt >= 50)
    head_fill = (cnt >= 8 and cnt_max >= 9 and pixel_cnt >= 80)
    head_center /= cnt

    return head_exist, head_fill, head_center

cnt가 3 이상, pixel_cnt이 50 이상이면 머리가 존재하는 것으로 하고

cnt가 8 이상, cnt_max가 9 이상, pixel_cnt가 80 이상이면 머리가 채워져 있는 것으로 하겠습니다.

그리고 head_center 라는 변수가 추가되었는데, 추후 음정 인식 때 필요한 머리의 중심 y 좌표입니다.

이 값을 오선의 좌표와 비교하여 음정을 찾아내도록 하겠습니다.

이후 조금 더 좋을 것 같은 특징점이 발견된다면 탐색할 특징점들만 바꾸면 됩니다.

그에 따른 임곗값도 얼마든지 수회 테스트를 통해 수정할 수 있으니 크게 걱정하실 것 없습니다.

# recognition_modules.py
import functions as fs
import cv2

def recognize_note(image, staff, stats, stems, direction):
    x, y, w, h, area = stats
    notes = []
    pitches = []
    note_condition = (
        len(stems) and
        w >= fs.weighted(10) and  # 넓이 조건
        h >= fs.weighted(35) and  # 높이 조건
        area >= fs.weighted(95)  # 픽셀 갯수 조건
    )
    if note_condition:
        for i in range(len(stems)):
            stem = stems[i]
            head_exist, head_fill, head_center = recognize_note_head(image, stem, direction)
            fs.put_text(image, head_exist, (x - fs.weighted(10), y + h + fs.weighted(20)))
            fs.put_text(image, head_fill, (x - fs.weighted(10), y + h + fs.weighted(50)))

    pass

머리가 존재하는지, 존재한다면 채워져 있는지 이미지에 찍어보도록 하겠습니다.

제대로 작동하는 것을 볼 수 있습니다.

현재 인식결과 악보의 끝세로줄이 음표로 인식 될 가능성이 있어보입니다.

머리에 해당하는 부분위로 탐색하여 픽셀이 많다면 음표가 아닌것으로 분류할 수도 있겠지만

간단하게 modules.py에서 recognition함수의 for i in range(1, len(objects)): 부분을

for i in range(1, len(objects) - 1): 로 수정하도록 합시다.

저작자표시

'인공지능 > 컴퓨터비전' 카테고리의 다른 글

[OpenCV/Python] 악보 인식(디지털 악보 인식) - 9 (0)	2021.08.06
[OpenCV/Python] 악보 인식(디지털 악보 인식) - 8 (0)	2021.08.06
[OpenCV/Python] 악보 인식(디지털 악보 인식) - 6 (2)	2021.08.05
[OpenCV/Python] 악보 인식(디지털 악보 인식) - 5 (0)	2021.08.05
[OpenCV/Python] 악보 인식(디지털 악보 인식) - 4 (0)	2021.08.04

골방의 프로그래머

[OpenCV/Python] 악보 인식(디지털 악보 인식) - 7

7. 인식 과정 - 음표(머리)

'인공지능 > 컴퓨터비전' 카테고리의 다른 글

댓글

티스토리툴바

[OpenCV/Python] 악보 인식(디지털 악보 인식) - 7

7. 인식 과정 - 음표(머리)

'인공지능 > 컴퓨터비전' 카테고리의 다른 글

관련글

댓글

티스토리툴바