Humans can categorize visual pictures faster when they hear audio that is semantically congruent with the visual picture (Chen & Spence, 2018). Simultaneously, numerous studies found some high-level visual processing occurs even without awareness. We investigated whether and how the cross-modal semantic congruency effects elicited by naturalistic sounds and spoken words on the processing of visual pictures still exist without visual awareness. To examine the time course and categorical specificity of cross-modal semantic congruency effects in unawareness, auditory cues were presented at 5 different stimulus onset asynchronies (SOAs: -1000, -750, -500, -250, 0) with respect to the picture, and participants made speeded categorization judgments (living vs. nonliving) in 2AFC and CFS paradigms. Audio and pictures (e.g., cats) formed four congruency relationships: congruent (cat audio), related (dog audio), incongruent (guitar audio), noise, and no-sound. In the aware condition, the response time of congruent (838ms) was faster than related (878ms), incongruent (880ms) and white noise (891ms). In the unaware condition, the response time of congruent (2451ms) was faster than related (2510ms), incongruent (2532ms) and white noise (2548ms). In both awareness and unawareness, the response time to naturalistic sound (866ms, 2503ms) was faster than spoken words (877ms, 2514ms), and the difference between congruent and incongruent showed the same tendency, only the difference with spoken words (54ms, 110ms) was bigger than with naturalistic sound (31ms, 55ms). There was no main SOA effect. For both naturalistic and spoken words, in almost all conditions, congruency effects were significant with and without awareness. On the other hand, for the naturalistic sound, the congruency effect showed on early SOAs (-250ms and 0) in the aware, but on even earlier SOAs (-1000ms) in the unaware condition. Congruency*SOA interaction was significant. We showed that the cross-modal semantic congruency effects found in the aware condition similarly exist in unawareness.