The Embedded Block Coding with Optimal Truncation (EBCOT) Tier-1 process is a critical component of the JPEG2000 framework, significantly influencing throughput in hardware encoders. This paper presents an optimized hardware architecture for a high-performance EBCOT Tier-1 encoder, based on parallel code-block processing. The design features simultaneous bit-plane coding via three parallel channels, along with concurrent arithmetic coding enabled by six context-generating units. To overcome the throughput bottleneck of conventional designs, we introduce a novel multiplexing strategy for the MQ coder. Our encoder achieves a throughput of 2782 Mb/s, a 7.3 times improvement over existing implementations, making it well-suited for high-speed JPEG2000 applications. The application of the proposed architecture achieves 1.05 times its throughput rate in the case of FPGA operating at frequencies lower than ASIC. When processing 512 $\times$ 512 size grayscale map, the processing speed can reach more than 160 FPS, which can well meet the real-time requirements of edge device applications.