Today has been pretty productive. I’ve released a second open source project, a Python Serial Monitor and I’ve been working on some machine vision projects that I’ve wanted to do for a while.
Back in the day I wrote a bot for a game I liked playing. It would read my computer screen and then simulate keyboard and mouse functions based on what it read. It was all written in C using Tesseract for the OCR engine. It took me a while to accomplish due to several platform considerations I had to make while writing it.
With Python, however, all that platform consideration junk goes right out the window. You pretty much start out under the assumption that the code you write is going to be platform agnostic. That’s what I love about Python…. right up until I start trying to make binaries for the test team. Then I hate it. But for the development phase, Python is pretty sweet.
But first, the code:
import cv2 import pytesseract cv2.namedWindow("preview") vc = cv2.VideoCapture(0) if vc.isOpened(): # try to get the first frame rval, frame = vc.read() else: rval = False while rval: img_output_pt = None img_blur = cv2.GaussianBlur(frame, (5, 5), 0) mask_gray = cv2.cvtColor(img_blur, cv2.COLOR_BGR2GRAY) mask_gray = cv2.normalize(src=mask_gray, dst=None, alpha=0, beta=255, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_8UC1) ret3, img_thresh_Gaussian = cv2.threshold(mask_gray, 70, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU) cv2.bitwise_not(img_thresh_Gaussian, img_thresh_Gaussian); img_thresh_Gaussian2 = cv2.cvtColor(img_thresh_Gaussian, cv2.COLOR_GRAY2BGR) h, w, _ = img_thresh_Gaussian2.shape boxes = pytesseract.image_to_boxes(img_thresh_Gaussian2) for b in boxes.splitlines(): b = b.split(' ') img = cv2.rectangle(img_thresh_Gaussian2, (int(b), h - int(b)), (int(b), h - int(b)), color=(0, 255, 0), thickness=2) cv2.imshow("preview", img) rval, frame = vc.read() key = cv2.waitKey(20) if key == 27: # exit on ESC break cv2.destroyWindow("preview")
This code will take an eight dollar endoscope, like the ones that can be seen here, and start capturing images and draw boxes around characters that it identifies. The result of all of this will look something like:
Since the writing was in white ink and the background was in black, I had to invert the image before passing it into the OCR. I also had to make sure that the writing was perfectly vertical otherwise the OCR wouldn’t recognize the characters. Lighting was an issue as the adaptive thresholding would sometimes flicker if it wasn’t exactly right. Other than that, though, you can take a very cheap camera and start doing machine vision in pretty much no time at all.
I think it’s probably time to get into blob detection, but that will be for another post.