Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add object detection implementation #195

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

douaaz
Copy link

@douaaz douaaz commented Aug 22, 2024

No description provided.

@douaaz douaaz requested review from gohai and ziyuan-linn August 22, 2024 12:13
@gohai
Copy link
Member

gohai commented Aug 25, 2024

Hi @douaaz & @SherabCodes - I'll be looking through your changes momentarily, and giving feedback. Excited..! 🎉

"@tensorflow-models/face-landmarks-detection": "1.0.5",
"@tensorflow-models/hand-pose-detection": "^2.0.0",
"@tensorflow-models/mobilenet": "^2.1.0",
"@tensorflow-models/pose-detection": "^2.1.0",
"@tensorflow-models/speech-commands": "^0.5.4",
"@tensorflow/tfjs": "^4.2.0",
"@tensorflow/tfjs": "^4.20.0",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for @ziyuan-linn: coco-ssd indeed lists tfjs 4.20.0 as its peer dependency. Hope that bumping this won't cause any regressions..

src/ObjectDetector/index.js Show resolved Hide resolved
examples/objectDetector-single-image copy/sketch.js Outdated Show resolved Hide resolved
examples/objectDetector-single-image copy/sketch.js Outdated Show resolved Hide resolved
examples/objectDetector-webcam/sketch.js Outdated Show resolved Hide resolved
* This example demonstrates object detection on an image through ml5.objectDetector.
*/

let objectDetector;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For structure, I would suggest copying how e.g. BodyPose-keypoints is doing it (using detectStart()). This leads to a much more concise example code, with fewer states that the user has to think about!
I quickly tried this approach with your code, and it appeared to work well!

const predictions = await this.model.detect(image);
console.log('raw result from cocoSsd', predictions);

const result = predictions;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps worth having a look at the previous version of ml5 to see in which format it returned data to the user. Not that you necessarily need to imitate it, but not so sure about bbox..!

From a quick look, previously the result was an array of objects, each with the following properties:

label: String,
confidence: Number,
x: Number (px),
y: Number (px),
width: Number (px),
height: Number (px),
normalized: {
  x: Number (0-1),
  y: Number (0-1),
  width: Number (0-1),
  height: Number (0-1),
}

(the normalized part is perhaps less common in ml5, and perhaps we could drop this - but the rest seems to be similar to e.g. faceMesh returns results presently)

src/ObjectDetector/index.js Outdated Show resolved Hide resolved
@shiffman
Copy link
Member

This is so exciting to see, if it's ok, I will leave a few comments as well!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lovely implementation, but I wonder if it's outside the scope of a basic set of Object Detection examples since it's not more about "image processing" than the model itself. Perhaps a separate tutorial could be written for the community page about pre-processing images since it could be applied across many models, not just object detection.

Regardless, if we were to keep this example I would suggest rewriting it with p5.js functions rather than the native JS code which will be unfamiliar to beginners.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking, I think it's preferable to demonstrate examples without people in them. Who is this person? Do we have permission to use their face? Perhaps there is an image that can demonstrate a wider variety of classes inside the model.

let x = object.bbox[0];
let y = object.bbox[1];
let w = object.bbox[2];
let h = object.bbox[3];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to consider renaming the underlying properties that come out of the model, something like the following might be more intuitive:

    let x = object.bbox.x
    let y = object.bbox.y;
    let w = object.bbox.w;
    let h = object.bbox.h;

This also offers the opportunity for "destructuring" though this is perhaps a concept less familiar to beginners:

let { x, y, w, h } = object.bbox;


// await mediaReady(image, false);

// const predictions = await this.model.detect(image);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two lines above look like they were commented-out by mistake?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants