Attention is a key mechanism for information selection in both biological brains and many state-of-the-art deep neural networks (DNNs). Here, we investigate whether humans and DNNs allocate attention in comparable ways when seeking information in a text passage to answer a question. We analyze 3 transformer-based DNNs that reach human-level performance when trained to perform the reading comprehension task. We find that the DNN attention distribution quantitatively resembles human attention distribution measured by eye tracking: Human readers fixate longer on words that are more relevant to the question-answering task, demonstrating that attention is modulated by the top-down reading goal, on top of lower-level visual layout and textual features. Further analyses reveal that the attention weights in DNNs are also influenced by both the top-down reading goal and lower-level textual features, with the shallow layers more strongly influenced by lower-level textual features and the deep layers attending more to task-relevant words. Additionally, deep layers’ attention to task-relevant words gradually emerges when pre-trained DNN models are fine-tuned to perform the reading comprehension task, which coincides with the improvement in task performance. These results demonstrate that DNNs can naturally evolve human-like attention distribution through task optimization. The results suggest that human attention during goal-directed reading comprehension is a consequence of task optimization and the attention weights in DNN are of biological significance.