ReadEase OCR: Handwriting Recognition Mobile App

Part 1: Main.py

These are the imports for the main file of the app. The first 3 imports ensure the proper working of the "analyse image" button, which calls the API(The exact code that does that is in a separate file). I had some trouble trying to write the camera code in a separate file so I decided to put it into the main file and voila it works. The imports from "Logger" onwards are purely for the camera functionality. Another issue I had was that I tried importing "PILImage" and "Image" both as "Image" causing a clash. In general though this part of the code is quite boring.

Add your own content here. Click to edit.

Here we have the layout of the "main screen", ie the screen that is opened when you run the app. The "go to intermediary screen", and "open notes folder" are methods defined later on in the code, after we have built the layout of the app. The code for the layout of different screens can be put in a .kv file instead of a python file, but for my first project I didn't want to complicate things too much so I avoided doing this. If I were to do this project again though I would change this as it allows cleaner more efficient code.

This is the code for the camera. This will be opened when we navigate to the action screen. Again the capture button method is defined further down. If we want the user to have the option of toggling the camera on and off we simply switch "play=True" to "play=False".

This is the code for the intermediary screen. This is mostly simple and self explanatory, but one thing is important to note: when you are adding widgets to a layout with the ".add_widget" method, you need to add them in the same order that you defined them, otherwise only one of them will show up in the app.

Here is the action screen layout. Notice how we add in the camera from the code we wrote earlier. When we take a picture with the camera on the action screen we're sent to the picture screen automatically which asks us whether we want to analyse it.

Here we create a reference to the picture screen, which we save and use later on in the code. We then end the screenmanager part of the code since we have now added the initial layout fully for the app

This is where it gets a bit complicated. Here the goal is to make sure the API is analysing the image we have just taken with the camera. The three methods work as follows:

The first initialises the latest input, ie when we eventually call "self.latest_output_index" the code runs the "find_latest_output_index" method.

The second finds the latest output index. Every image that we analyse is placed in a folder called "Notes", and is called "output(i).txt", where i is the index of the file. This method gives us the index of the picture we have just taken.

The third method is the one that does the analysing. We have another file called "OCRRequestFile.py", which this method runs when it is called. That file calls the API and saves the results in the output file. This method uses the PILImage module to add the project directory to the image file path, and then it plugs that full path into the OCRRequestFile, and outputs the result as the latest output.txt file by calling the previous method. It then opens the resulting file so the user can edit it! Note that the line "if os.name == "nt": " only works on windows, it won't work on macOS.

This method runs the AnalyseButtonFile, which opens a file chooser, which gives the user the ability to analyse an image already saved.

This method defines the "capture" functionality of the camera on the action screen. It saves the image as "IMG-" and then the time is was taken, and it then takes you to the picture screen, and sets the image you have just taken as the image displayed on the picture screen. To do this we use the previous reference we created in the code previously of the picture screen. I'm not 100% sure why this works but it does!

This first method opens the notes folder. It first navigates to the folder in the project directory, and then opens it with the systems default file explorer.

The three methods here are simply to navigate between the screens.

The code right at the end of this file just tells Kivy to run the app.

Part 2: AnalyseButtonFile.py

This code creates the file chooser that is called when we click on the "analyse a saved image" button.

This code saves the image to the project directory, and then inputs it into OCRRequestFile. This is a bit of a UI mistake calling the button "save" when it really should be "analyse", but I'm just happy the functionality is there.

Overall running this file opens a file chooser that, when you click on a picture file it gives you the option to analyse the image, and this image is then saved in the notes folder.

Part 3: OCRRequetFile.py

Here we have the code that calls the Vision API. We first have to give it our credentials (which I have left off this site for obvious reasons), and then we call it, and then we store the result in the function we've defined.

This code allows us to save as many output files as we want into the notes folder, which we create if it doesn't already exist.

This code here adds 1 to the index of output files we save in the notes folder every time we get a new one.

This code opens the saved image, and calls the function that analyses the image, and then writes the result of that function into the output file.