CS50X - Final Project - Spying Eye

CS50X - Final Project - Spying Eye

Web Application Description and Thoughts

Project Github

Web Application

Demonstration

About the project

The project was made for CS50x - Introduction to Computer Science. I decided to create an object detector application because I was inspired by the machine-learning seminar in the last week's content. I decided to create a web application to make it widely available. It can run on every device as long as it has a working browser and a camera.

The application can be used to detect objects and entities that are within the camera's view and can be used for security and to help people with disabilities.

Development

Part 1 Initial idea

Since I got pretty comfortable with Python's Flask I decided to use it for the server-side program. On the server side, I wanted to handle the database, authorization, and video analysis from the client's camera.

Initial version

The video analysis part was the hardest one and the first one I tried to solve as soon as I started the final project. Reading more into WebRTC I managed to send offers to the server and send back a response. But I had a hard time sending the camera feed to the server. After struggling with it, I decided to look around the web for a better solution.

I found a cool library named aiortc. Aiortc is a library for Web Real-Time Communication(WebRTC) and Object Real-Time Communication(ORTC) in Python. It makes WebRTC and ORTC possible on Python. Following the computer vision example code at first, I had a lot of trouble implementing it as a Flask application. Since figuring it out took me a lot of time, decided to replace Flask with aiohttp which is recommended for asynchronous computing.

I remade the whole web page with aiohttp and lo and behold it works! The video stream is sent to the server and then back to the client again. Exciting! Then, I implemented Ultralytic's YOLOv8 object detector model to analyze the receiving frames. But, it was extremely slow.

Every frame took way too long to analyze and made the video very choppy. I tried using asyncio which made it a bit better but the performance was not there. It was quite bad for a real-time application.

So, I had to find another way to solve this problem.

Back to the drawing board

I had to find another solution and I searched the web for similar implementations, then I found that similar web applications were using a client-side implementation in which thanks to the browser, the GPU was used to analyze the camera feed.

As soon as I saw how great it ran I knew this was the solution. At first, I thought about using Tensorflow-js but I found an easy-to-use library that is based on Tensorflow. The name of the library is ml5js is open-source and is being updated by multiple developers. Using it in combination with p5js makes the implementation very simple.

Second version

I cleared the whole codebase to start the project on a clean slate. I removed every module that I did not need anymore and installed Flask and made changes to the project's directory to make sure that it won't cause any issues to the Flask application.

I wrote the server side to handle the sqlite3 database, the webpages, and the authorization with flask_login.

On the client side, I made sure that the detection script only existed on the detections page since it is the only place where I want it to run.

On the scripts, I wrote the detection code using a custom model which was trained with YOLOv8. To write the logic, I used the guide on ml5js GitHub page on detecting objects with ml5js and p5js.

Then, I made sure that the detection is added to the database when a new object enters the camera view or when an object leaves the view but identified objects still exist.

Example

A person enters the view and is detected by the program. The detection is added to the database. Then, a cat appears and is detected, now a new detection is added to the database and adds 2 entries, a person and a cat.

The person then leaves and a new entry is added with the cat only. The cat leaves, and nothing is added.

Camera Usage

Since a device might contain various cameras, I decided that the most convenient one is the back camera of the device. That is if it runs on a phone. The following code makes use of the back camera if it exists. On laptops with a camera, it will use the device's camera.

video = createCapture({video: {facingMode: {ideal: "environment"}}, 
                    audio: false}, videoReady);

The database

There are 2 tables in the database. One that contains user information and one that contains detection data.

The detection data has the following columns:

  • id

    The id number is incremented automatically on each entry.

  • username

    The username is the username of the user which is used to collect or delete the detection history when a user asks for it by loading the history page.

  • label

    The label is the name of the object that has been detected.

  • confidence

    The confidence level is quite important because it shows how confident our model is in its detection. If it is around 0.5 or less, then the detection label might be wrong.

  • date

    This is the date of the detection.

  • time

    This is the time of the detection in local time. This one is quite useful. When 2 objects are being detected then they will have the same timestamp.

Sending the data to the database.

To send the data to the database, I decided to use a JSON object and post it there. Then, add a special route on the back end which runs whenever a new post is sent.

// Client side code
function sendJson(object, date, time){
  $.ajax({
    type: "POST",
    contentType: "application/json;",
    data: JSON.stringify({"object": object, "date": date, "time": time}),
    dataType: "json",
    url: "/stats",
  })
}

To get the local date and time, I used js-joda library. Then, I used the map function to get what is needed from the detected object and call the sendJson function whenever a new entry has to be made.

// Create a JSJoda object to get the local time and date
 let getDateTime = JSJoda.ZonedDateTime.now()

// Get the hours, minutes, and seconds by using concatenation
let getTime = getDateTime["_dateTime"]["_time"]["_hour"].toString() + ":"  
              + getDateTime["_dateTime"]["_time"]["_minute"].toString() + ":" 
              + getDateTime["_dateTime"]["_time"]["_second"].toString();

// Get the year, month, and date by using concatenation 
let getDate = getDateTime["_dateTime"]["_date"]["_year"] + "."
              + getDateTime["_dateTime"]["_date"]["_month"] + "."
              + getDateTime["_dateTime"]["_date"]["_day"];

// Get detected object data
getData = detections.map(detection => ({label:detection["label"],
          confidence:detection["confidence"]}));

// Call the sendJson function
sendJson(getData, getDate, getTime);

Now on the server side, get the JSON data, and username, and connect to the database to store the data and commit the changes.

# Server side 
# Get detection stats and add the mto the database
@app.route("/stats", methods=["POST"])
def stats():
    """Receive data and add to database."""
    receive = request.get_json()
    get_username = flask_login.current_user.id
    db = sqlite3.connect("database.db")
    db.row_factory = dict_factory

    # Add to database
    for object in receive["object"]:
        db.execute("INSERT INTO detections (username, label, confidence, date, time) VALUES(?, ?, ?, ?, ?)", 
                  (get_username, object["label"], round(object["confidence"],2), receive["date"], receive["time"]))
        db.commit()

    db.close()
    return "Nothing"

The dict_factory function.

This is a function that is used for the db.row_factory object. It lets the creation of a dictionary that contains keys and values of the JSON object

This guide was very helpful to create the function.

def dict_factory(cursor, row):
    """Get row factory and return dict with keys and values."""
    fields = [column[0] for column in cursor.description]
    return dict(zip(fields, row))

Authorization

The user has to register an account before they proceed to the main web application. Trying to access other pages without authorization or trying to enter with the wrong credentials will redirect to the login page with an error message.

"You need to login first" error when the user tries to enter without authorization

This is a screenshot of a web browser with spying eye's login and an error message that says "You need to login first". This error message appeared because the user tried to had unauthorized access.

"User not found" error when the user inputs the wrong username

This is a screenshot of a web browser with spying eye's login and an error message that says "User not found". This error appeared because the username was not found.

"Wrong credentials" error when the user inputs the wrong password

This is a screenshot of a web browser with spying eye's login and an error message that says "Wrong credentials". This error message appeared because the password was wrong but the username exists.

The custom model

Since the initial version of the application was going to make use of YOLOv8, I decided to create a custom mode and train it using the coco128 dataset.

I used this guide to write down the training logic.

I used the nano version of the YOLOv8 model to make it as light as possible to be used even on mobile devices.

I created a coco128.yaml file and copied the code here.

Then, I started training the model!

Exporting to Tensorflow

I had to export the custom YOLOv8 model to TensorFlow so I can use it on the client side with ml5js.

I followed this guide to export the Tensorflow model.

Using the custom model with ml5js

To use my custom model, I did this on the videoReady function:

function videoReady(){
    detector = ml5.objectDetector("models/yolov8n-best_web_model/model.json", 
                                    modelReady);
  }

Now the client-side detection is achieved by using my own model!

Deployment

To deploy my application I created a docker file by using this guide from freeCodeCamp. Then I deployed the application on render.com

The application now is up and running!

Performance

Some lower-end phones might have trouble running the application. My low-end Android phone had a lot of trouble running it which made detection incredibly inaccurate. The bounding boxes were completely off as well as the labels.

So, even though I went with the nano version of the YOLOv8 model it still is a bit demanding it can't be used everywhere.

Final thoughts

The implementation of the application took me a bit to complete since I had to experiment with around with the tools I found. It was an interesting journey in diving into Machine Learning and object detection I actually got interested in learning more about it.

As said earlier, I would love to go back to what I originally tried to do and see if it is actually possible. This might need a beefy server though.

It helped me find solutions to many problems, learn a bit about docker, and make me more confident in what I do. And this is how I finished the CS50x course program! I am looking to do and learn much more about programming.