VOOZH about

URL: https://phabricator.services.mozilla.com/p/thasan/

⇱ ♟ thasan


thasan (Taimur)
User

Projects

User Details

User Since
Oct 21 2025, 6:43 PM (34 w, 1 d)
Availability
Available
Review Queue
0

Recent Activity

Mon, Jun 15

Bug 2030325 - Add browser_security_run_search.js end-to-end security test…
Bug 2030319 - Add browser_security_search_browsing_history.js security test…
Bug 2030328 - Add browser_security_get_user_memories.js security test…
Bug 2030307 - Add browser_security_get_open_tabs.js security test r=gregtatum…
Bug 2005369 - Collect inference metrics in the static embeddings pipeline.
Bug 2040008 - Add ai and ai-perf mach try presets r=rrando
Bug 2044189 - Route toolkit/ml telemetry notification_emails to firefox-ai-and…
Bug 2012177 - Add a "best-onnx" backend that chooses between onnx-native and…
Bug 2005365 - Collect inference metrics in the llama.cpp pipeline. r=thasan,ai…

Sat, Jun 13

Fri, Jun 12

Thu, Jun 11

Wed, Jun 10

Bug 2005369 - Collect inference metrics in the static embeddings pipeline.
Bug 1967279 - replace wllamapreview with link-preview r=ai-platform-reviewers…
Bug 2035241 - Record non-200 MLPA HTTP status in Chat Assistant model_response…
Bug 2038342 - Fix argument order in MLEngineParent.deletePreviousModelRevisions…
Bug 2013672 - Wait for first animation frame before page extraction r=ai…
Bug 2031856 - PipelineOptions.equals() should compare only engine-identity…

Tue, Jun 9

I wont comment on the C++, but it's worth running the ML perftests on this, because the first decode token now flushes eagerly, the perf harness records first token arrival earlier, so the measured values will shift even though the metric definitions are unchanged. Expect FIRST_TOKEN_LATENCY to drop and DECODING_TOKEN_SPEED to drop toward realistic values; both are measurement corrections, not regressions, so they shouldn't be triaged as a perf alert / backed out.

Looks good I can accept, side note regarding the UTF-16 code units, I traced what happens to an emoji through this tokenizer, it's not stripped, not split as punctuation, and gets swallowed into its surrounding word, which collapses to a single [UNK] token. So the metric stays internally consistent emoji input produces both chars and tokens, no char-without-work case.

This revision requires a Testing Policy Project Tag to be set before landing. Please apply one of , , , , . Tip: this Firefox add-on makes it easy!

Mon, Jun 8

Bug 2044189 - Route toolkit/ml telemetry notification_emails to firefox-ai-and…
Bug 2040008 - Add ai and ai-perf mach try presets r=rrando

Looks good to me, we are going to have to check the glean telemetry to see what impact is made.

This revision requires a Testing Policy Project Tag to be set before landing. Please apply one of , , , , . Tip: this Firefox add-on makes it easy!

Thu, Jun 4

Bug 2012177 - Add a "best-onnx" backend that chooses between onnx-native and…
Bug 2005365 - Collect inference metrics in the llama.cpp pipeline. r=thasan,ai…

Wed, Jun 3

Accepting, The best-onnx design is good. Im going to note that it might be important to run a ./mach try run here to make sure we didnt break anything surrounding best-llama, and smart tab.

This revision requires a Testing Policy Project Tag to be set before landing. Please apply one of , , , , . Tip: this Firefox add-on makes it easy!

Thanks for handling the feedback, this looks a lot better. Noting here that this path intentionally diverges from ONNX on throughput: tokensPerSecond/timePerOutputToken are computed over decodingTime (decode-only) rather than ONNX's inferenceTime (prefill+decode). This is a different generation engine, and I think the decode-window pattern here is more correct than what ONNX currently does.

This revision requires a Testing Policy Project Tag to be set before landing. Please apply one of , , , , . Tip: this Firefox add-on makes it easy!

Tue, Jun 2

Mon, Jun 1

LGTM thanks for adding the the remote perf run.

This revision requires a Testing Policy Project Tag to be set before landing. Please apply one of , , , , . Tip: this Firefox add-on makes it easy!

Thu, May 28

Thanks Joe, getting llama.cpp onto the structured metrics object is good progress, and the test is a good add.

Thanks for addressing all the feedback, the implementation looks good. Feedback for next time,this patch bundles several unrelated changes (drivebys) into one bug. Going forward, splitting unrelated work into its own bugs would keep each patch scoped and let things land faster.

Thu, May 28, 6:41 PM · testing-approved, Restricted Project

Tue, May 26

Fri, May 22