-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add object detection implementation #195
base: main
Are you sure you want to change the base?
Conversation
Hi @douaaz & @SherabCodes - I'll be looking through your changes momentarily, and giving feedback. Excited..! 🎉 |
"@tensorflow-models/face-landmarks-detection": "1.0.5", | ||
"@tensorflow-models/hand-pose-detection": "^2.0.0", | ||
"@tensorflow-models/mobilenet": "^2.1.0", | ||
"@tensorflow-models/pose-detection": "^2.1.0", | ||
"@tensorflow-models/speech-commands": "^0.5.4", | ||
"@tensorflow/tfjs": "^4.2.0", | ||
"@tensorflow/tfjs": "^4.20.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for @ziyuan-linn: coco-ssd
indeed lists tfjs 4.20.0 as its peer dependency. Hope that bumping this won't cause any regressions..
examples/soundClassifier-speech-command/handPose-single-image copy/sketch.js
Outdated
Show resolved
Hide resolved
* This example demonstrates object detection on an image through ml5.objectDetector. | ||
*/ | ||
|
||
let objectDetector; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For structure, I would suggest copying how e.g. BodyPose-keypoints
is doing it (using detectStart()
). This leads to a much more concise example code, with fewer states that the user has to think about!
I quickly tried this approach with your code, and it appeared to work well!
const predictions = await this.model.detect(image); | ||
console.log('raw result from cocoSsd', predictions); | ||
|
||
const result = predictions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps worth having a look at the previous version of ml5 to see in which format it returned data to the user. Not that you necessarily need to imitate it, but not so sure about bbox
..!
From a quick look, previously the result was an array of objects, each with the following properties:
label: String,
confidence: Number,
x: Number (px),
y: Number (px),
width: Number (px),
height: Number (px),
normalized: {
x: Number (0-1),
y: Number (0-1),
width: Number (0-1),
height: Number (0-1),
}
(the normalized
part is perhaps less common in ml5, and perhaps we could drop this - but the rest seems to be similar to e.g. faceMesh
returns results presently)
This is so exciting to see, if it's ok, I will leave a few comments as well! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a lovely implementation, but I wonder if it's outside the scope of a basic set of Object Detection examples since it's not more about "image processing" than the model itself. Perhaps a separate tutorial could be written for the community page about pre-processing images since it could be applied across many models, not just object detection.
Regardless, if we were to keep this example I would suggest rewriting it with p5.js functions rather than the native JS code which will be unfamiliar to beginners.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally speaking, I think it's preferable to demonstrate examples without people in them. Who is this person? Do we have permission to use their face? Perhaps there is an image that can demonstrate a wider variety of classes inside the model.
let x = object.bbox[0]; | ||
let y = object.bbox[1]; | ||
let w = object.bbox[2]; | ||
let h = object.bbox[3]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to consider renaming the underlying properties that come out of the model, something like the following might be more intuitive:
let x = object.bbox.x
let y = object.bbox.y;
let w = object.bbox.w;
let h = object.bbox.h;
This also offers the opportunity for "destructuring" though this is perhaps a concept less familiar to beginners:
let { x, y, w, h } = object.bbox;
|
||
// await mediaReady(image, false); | ||
|
||
// const predictions = await this.model.detect(image); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two lines above look like they were commented-out by mistake?
No description provided.