Detecting Text & Getting Similarity between images

Edu_Siri에서 프로그램 동작 흐름은 아래와 같다.

원하는 동영상 삽입 -> 움직임 감지 및 화면 캡처 -> 캡처된 사진 간 유사도 비교 -> 글자 인식

움직임 감지 및 화면 캡처 관련은 이전 글을 참고하자. 이전 글에서 화면 캡처 시, 사용자가 디렉토리를 지정하면 지정된 이름으로 디렉토리 생성 후, 캡처본을 해당 파일 안에 저장한다. 이 캡처본을 훑으면서 이전 사진과 다음 사진 간의 유사도를 체크하면 된다

1. Check directory & Similarity between images

# 파일 속 이미지들 모두 검사
def file_listing(path)->str:
    files = os.listdir(path)

    for i in range(1, len(files)):
        comparePic(path, files[i-1], files[i])
    print("\n+=====================================+")
    print("| Organizing class materials is Done! |")
    print("+=====================================+")
    
# 사진 간 유사도 체크
 def comparePic(path, image1, image2) -> str:
    image1_path = path + "/" + image1
    image2_path = path + "/" + image2

    pre = cv.imread(image1_path)
    post = cv.imread(image2_path)
    result = pre.copy()

    grayPre = cv.cvtColor(pre, cv.COLOR_BGR2GRAY)
    grayPost = cv.cvtColor(post, cv.COLOR_BGR2GRAY)

    (Similarity, diff) = compare_ss(grayPre, grayPost, full=True)
    diff = (diff * 255).astype('uint8')

    if(Similarity < 0.95):
        PicToText(post)

comparePIC 함수를 중점적으로 보자. 사진 간 유사도를 체크하는 알고리즘은 꽤 단순하다. pre와 post에 각각 이미지를 불러온 후 cvtColor함수에 COLOR_BGR2GRAY옵션을 설정한다. 이렇게 설정된 이미지들은 아래와 같이 변환된다.

pre와 post에 담긴 사진을 모두 흑백으로 변경했으니, compare_ss 함수에 두 이미지를 인자로 넣고 full=True로 두어 이미지 전체에 대해 구조 비교를 수행하게 한다. 이 수행으로 Similarity값을 추출할 수 있다.

( Similarity는 두 입력 이미지 사이의 구조적 유사성 지수를 나타내는 것으로 완벽 불일치인 -1과 완벽 일치인 1 사이의 값만을 가진다. )

유사도 판별에 쓰이는 사진이 강의 자료인 것을 감안하면, 미세한 글의 변화에도 감지하여 글자 인식을 시킬 수 있어야 한다. 여러번의 테스트 결과 0.95 가 가장 최적화된 값이라 판단되어 기준값을 이와 같이 잡았다.

2. Set configuration before detecting Text

pip install pillow
pip install pytesseract

글자 인식을 하려면 기초 세팅이 필요하다. 위 2개부터 설치하자. pytesseract는 영어, 한글 두 언어를 모두 인식시킬 수 있는 라이브러리라 채택하게 됐다.

def PicToText(path)->str:
    # Config Parser 초기화 & Config File 읽기
    config = configparser.ConfigParser()
    config.read(os.path.dirname(os.path.realpath(__file__)) + os.sep + 'env' + os.sep + 'property.ini')

    ocrToStr(path, 'eng')

4번 째 줄로 운영체제에 맞는 패스 접근자를 이용하여 property.ini 설정 파일을 로드한다.

# /env/property.ini
[Bigdata Ocr Extract]
Version= 1.0

[Path]
OcrTxtPath= \\resource\\ocr_result_txt # 추출된 텍스트 파일이 저장될 경로

저장될 경로를 지정해주는 게 일반적이지만, EDU_Siri에서는 얻은 텍스트 파일을 형식에 맞게 수정하여 지정된 파일에 저장하므로 PATH 부분은 무시해도 상관 없다. ( configParser 패키지의 자세한 내용은 여기를 참고하자. )

3. Detecting Text & Save

def ocrToStr(fullPath, lang='eng'):
    img = Image.fromarray(fullPath)
    #preserve_interword_spaces : 단어 간격 옵션을 조절하면서 추출 정확도를 확인한다.
    #psm(페이지 세그먼트 모드 : 이미지 영역안에서 텍스트 추출 범위 모드)
    #psm 모드 : https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage
    
    #추출(이미지파일, 추출언어, 옵션)
    outText = image_to_string(img, lang=lang, config='--psm 1 -c preserve_interword_spaces=1')
    strToText(outText)
    
def strToText(outText):
    with open('./course_material/ppt_content.txt', 'a', encoding='utf-8') as f:
        f.write(outText)
        f.write("==========================================================\n")

이미지 추출 후 얻은 텍스트 값을 지정된 경로 파일에 쓰면 해당 기능은 종료된다.

Full Code

def comparePic(path, image1, image2) -> str:
    image1_path = path + "/" + image1
    image2_path = path + "/" + image2

    pre = cv.imread(image1_path)
    post = cv.imread(image2_path)
    result = pre.copy()

    grayPre = cv.cvtColor(pre, cv.COLOR_BGR2GRAY)
    grayPost = cv.cvtColor(post, cv.COLOR_BGR2GRAY)

    (Similarity, diff) = compare_ss(grayPre, grayPost, full=True)
    diff = (diff * 255).astype('uint8')

    if(Similarity < 0.95):
        PicToText(post)

# 글자 인식을 위한 함수 1
def ocrToStr(fullPath, lang='eng'):
    img = Image.fromarray(fullPath)
    #preserve_interword_spaces : 단어 간격 옵션을 조절하면서 추출 정확도를 확인한다.
    #psm(페이지 세그먼트 모드 : 이미지 영역안에서 텍스트 추출 범위 모드)
    #psm 모드 : https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage
    
    #추출(이미지파일, 추출언어, 옵션)
    outText = image_to_string(img, lang=lang, config='--psm 1 -c preserve_interword_spaces=1')
    strToText(outText)

def strToText(outText):
    with open('./course_material/ppt_content.txt', 'a', encoding='utf-8') as f:
        f.write(outText)
        f.write("==========================================================\n")

# 글자 인식하는 main 함수
def PicToText(path)->str:
    # Config Parser 초기화 & Config File 읽기
    config = configparser.ConfigParser()
    config.read(os.path.dirname(os.path.realpath(__file__)) + os.sep + 'env' + os.sep + 'property.ini')

    ocrToStr(path, 'eng')

def file_listing(path)->str:
    files = os.listdir(path)

    for i in range(1, len(files)):
        comparePic(path, files[i-1], files[i])
    print("\n+=====================================+")
    print("| Organizing class materials is Done! |")
    print("+=====================================+")

저작자표시 비영리 변경금지

1. Check directory & Similarity between images

2. Set configuration before detecting Text

3. Detecting Text & Save

Full Code

티스토리툴바