
Someone Reverse-Engineered Apple's Neural Engine and Trained a Model on It
A developer cracked Apple's undocumented ANE private APIs, measured its real throughput at 19 TFLOPS FP16 (not the marketed 38 TOPS), and trained a 109M-parameter transformer on hardware Apple designed exclusively for inference.