OpenAI GPT-4 will soon be made available.
It is multimodal, which means Google should start worrying immediately if it hasn’t already.

Daniel Braun, the CTO of Microsoft Germany, said that GPT-4 will be multimodal and arrive a week after March 9, 2023.
Multimodal AI indicates that it will be able to process several types of input, such as video, pictures, and audio

Multimodal, Large Language Models

The most significant implication of the statement is that GPT-4 is multimodal (SEJ predicted GPT-4 was multimodal in January 2023).

Modality is the type of input that a big language model is interested in (in this case).

Text, audio, pictures, and video can all be included in multimodal communication.

GPT-3 and GPT-3.5 only functioned in text mode.

According to the German news article, GPT-4 may be able to work in at least four modalities, including pictures, voice, text, and video.

Dr. Andreas Braun, CTO of Microsoft Germany, is reported as saying, “Next week, we will launch GPT-4, which will have multimodal models that give entirely different options, such as films…”

The reporting lacked specifics for GPT-4; therefore, it is unclear if the information presented on multimodality was GPT-4-specific or generic.

Holger Kenn, Microsoft’s Director of Business Strategy, discussed multimodalities; however, it was unclear if he was referring to GPT-4 multimodality or multimodality in general.

I think his allusions to multimodality were exclusive to GPT-4.

Kenn discussed multimodal AI, which can transform words not only into graphics but also into music and video, according to the news source.

Microsoft is also focusing on “confidence metrics” to improve the reliability of its AI by grounding it in factual data.

Microsoft Kosmos-1

Microsoft introduced Kosmos-1, a multimodal language model, at the beginning of March 2023, a fact that was allegedly underreported in the United States.

According to the German news website Heise.de, “…the researchers put the pre-trained model to a variety of tests, with good results in categorizing photos, answering questions about image content, automatically labeling images, optical text recognition, and voice production tasks….”

“Visual reasoning, i.e., deriving inferences from visuals without using words as an intermediary, appears to be crucial in this case.”

Kosmos-1 is a multimodal mode that combines the textual and visual modes.

GPT-4 is better than Kosmos-1 because it has a third mode, video, and it also seems to have sound as a mode.

compatible with several languages

GPT-4 appears to be compatible with all languages.It is said to be able to get a question in German and answer it in Italian.

Who would ask a German question and expect an Italian response?

This has been confirmed:

The technology has advanced to the point where it essentially “works in all languages”: you may ask a question in German and receive a response in Italian.

Microsoft (OpenAI) will “make the models comprehensive” using multimodality.

I feel the significance of the innovation lies in the model’s capacity to extract knowledge from several languages.
Hence, if the response is in Italian, it will know it and be able to respond in the language of the inquiry.

This would make it comparable to the objective of Google’s MUM multimodal AI.
Mom is rumored to be able to deliver responses in English for which data exists exclusively in another language, such as Japanese.

GPT-4 Submissions

There is yet no word on where GPT-4 will appear.
Nonetheless, Azure-OpenAI was explicitly addressed.

Google is attempting to overtake Microsoft by incorporating a competitive technology into its search engine.
This development reinforces the idea that Google is slipping behind in consumer-facing AI and lacks leadership.

Google has already integrated AI into several products, including Google Lens, Google Maps, and other consumer-facing areas.
This strategy entails employing AI as a form of assistive technology to aid humans in doing small activities.

Microsoft’s implementation is more apparent, and as a result, it is attracting all the attention and reinforcing the image of Google as floundering and lagging behind.

OpenAI GPT-4 Arriving Mid-March 2023

Multimodal, Large Language Models

GPT-3 and GPT-3.5 only functioned in text mode.

Microsoft Kosmos-1

GPT-4 Submissions

Recent Posts

Categories

Agency

Company

We're Kind Of Serious