ABSTRACT VIEW
PROPOSAL OF A METHOD FOR ESTIMATING THE MEANING OF MATHEMATICAL SYMBOLS CONTAINED IN NATURAL LANGUAGE SENTENCES
Y. Miyazaki, K. Imura
Shizuoka University (JAPAN)
In addition to numbers, mathematical formulas include symbols such as "|" and "{ }" and character strings such as "sin". It is not difficult for humans to interpret the meanings of these symbols that are different from numbers in mathematical formulas, but it is difficult for computers. For example, "|" can be used to express an absolute value such as a number like "|-2|", to express a conditional probability like "P(A|B)", or density of sets, determinants of matrices or the magnitude of a vector, but contrary to the fact that they all have different meanings, computers use the same vertical bar symbol to describe it. This research aims to solve this problem, and by solving this problem and having a computer correctly estimate the meaning of the symbols in the formula, we aim to be able to perform sophisticated formula searches from the approach of the meaning.

A related study by Sakurai et al. focused on a similar problem and estimated the meaning of mathematical expressions from the MathML tag names of the target mathematical symbols and their surrounding symbols. In this research, we use a method to estimate the meaning of the target mathematical symbol using mathematical formulas in natural language sentences such as those described in mathematics textbooks and reference books.

This research uses a method of estimating meaning from previously accumulated text data. For characters that appear in mathematical formulas that are used with multiple meanings, we collect sentences in natural language that describe each meaning as data sentences. The system compares the collected data sentences with sentences containing the mathematical symbol, measures the degree of similarity between the sentences, and then evaluates the degree of similarity to determine the valid meaning of the target symbol.
– Comparison using only morphological analysis:
This method divides both the input sentence and the text data into morpheme units, and calculates the similarity between the sentences based on the frequency of appearance of the morphemes using cosine similarity. In this study, we use MeCab as a morphological analysis tool, and similar tools are also used in the morphological analysis part of other comparison methods.
– Comparison by TF-IDF:
After decomposing both sentences into morphemes, the TF-IDF value for each morpheme is computed. In this method, the number of occurrences of morphemes included in each sentence is compiled into a list, and the list is used to calculate the cos similarity between sentences.
– Comparison by N-gram:
After breaking down both sentences into morpheme units, dictionaries for each 1-gram, 2-gram, and 3-gram shall be created. This method measures the number of occurrences of morphemes in each sentence based on the dictionary and calculates cos similarity. In this method, 1-gram, 2-gram, and 3-gram results are respectively computed .

Currently, we are running the three methods independently, so we use TF-IDF and N-gram together or weight each method to improve the results while taking advantage of the strengths of each algorithm. From these two points of view, we will continue to work on determining an estimation method with better precision, and on creating a method that can perform estimations at the same level even when other meanings and symbols are added. Specifically, the goal is to be able to produce over 90% success stories in all symbols and meanings.

Keywords: Technology, education, estimation, mathematical symbol, TF-IDF, N-gram, morphological analysis.

Event: INTED2025
Track: Innovative Educational Technologies
Session: Technology Enhanced Learning
Session type: VIRTUAL