INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL Online ISSN 1841-9844, ISSN-L 1841-9836, Volume: 17, Issue: 5, Month: October, Year: 2022 Article Number: 4696, https://doi.org/10.15837/ijccc.2022.5.4696 CCC Publications Online Healthcare Privacy Disclosure User Group Profile Modeling Based on Multimodal Fusion Yong Wang Yong Wang School of Economics and Management Beijing Jiaotong University Beijing, 100044, China 17113137@bjtu.edu.cn Abstract With the spread of COVID-19, online healthcare is rapidly evolving to assist the public with health, reduce exposure and avoid the risk of cross-infection. Online healthcare platform requires more information from patients than offline, and insufficient or incorrect information may delay or even mislead treatment. Therefore, it is valuable to predict users’ privacy disclosure behaviors while fully protecting their information, which can provide healthcare services for users accurately and realize a personalized online healthcare environment. Compared with the traditional static on- line healthcare platform user privacy disclosure behavior influence factor analysis, this paper uses multimodal fusion and group profile technology to build a user privacy disclosure model and lay the foundation for personalized online healthcare services. This paper proposes a cross-modal fusion modeling approach to address the problem that the information of each modality cannot be fully utilized in the current online healthcare privacy disclosure modeling. A multimodal user profile approach is used to construct personal and group profiles, and the privacy disclosure behavioral characteristics reflected by both are integrated to realize accurate personalized services for online healthcare. The case study shows that compared with the static unimodal privacy disclosure model, the accuracy of our method gains significant improvement, which is helpful for precision healthcare services and online healthcare platform development. Keywords: multimodal fusion technology, group profiling, privacy disclosure, online healthcare platform, personalized healthcare services. 1 Introduction With the spread of COVID-19, many patients prefer to stay home rather than go to the doctor. “Another benefit is the ‘green effect’ telehealth helps prevent patients from traveling as much as possible for treatment and avoid those expensive helicopters, planes and ambulances, especially for rural patients, reducing unnecessary carbon dioxide gas, as documented by the University of New Mexico Telehealth Institute," Summit Healthcare Regional Healthcare Center telehealth specialist Susy Salvo-Wendt, a telehealth specialist at Summit Healthcare Regional Center said. In the face of https://doi.org/10.15837/ijccc.2022.5.4696 2 the sudden epidemic in 2020, online healthcare platforms have become an important place for people to seek healthcare advice, exchange health information, and pay attention to the prevention and control of the epidemic during the COVID-19, thanks to features such as being geographically independent, identifying suspected cases, reducing exposure, avoiding cross-infection, and easing public anxiety. According to relevant statistics, the average number of new registered users of Good Doctor online healthcare platform in February 2020 increased nearly 3.5 times a month compared to December 2019, and the daily demand for online health consultation and healthcare knowledge consultation submitted by users increased nearly 6.5 times a day. It can be said that there was a significant surge in the public demand for the use of online healthcare platforms due to the New Coronary Pneumonia outbreak. Advances in technology have facilitated the delivery of medical and health information, and the widespread use of online technology in the medical field has effectively improved treatment outcomes and reduced healthcare costs [1]. The disclosure of health information by patients on online health- care platforms ensures smooth communication with physicians, improves correct diagnosis, increases scientific knowledge, medical and health information, and raises self-care awareness. Personal health information is vital to users’ lives, and correct advice from doctors based on relevant information can improve users’ health, but incorrect diagnosis can also put users at risk, which means it is important to establish a user health database, creating a challenge for the construction and development of online healthcare platforms. The use of online healthcare services by users involves the disclosure of a large amount of personal health information, which is of great research significance as an important data source for health databases. However, due to the asymmetry of information among platform users, there are differences in the way users express and deliver their information, which result in the neglect of users’ different needs and varying degrees of demand, and thus provide services in a “one-size-fits-all" manner, without matching the services to the different needs of users. It can be said that the characteristics and needs of users are like the symptoms of patients, and only on the basis of “look", “smell", “ask" and “cut" can the doctor “prescribe the right medicine". If the user is not well informed of specific information, the service will be counterproductive and make it more difficult for the user to access health resources. Therefore, the platform should provide different ways for different users, combine their information literacy and knowledge ability, assist users to complete the online resources, solve the difficulties of retrieval, and avoid the problems of too much professionalism and poor adaptability of service content. In addition, more and more people are seeking medical services from online healthcare platforms in order to save time and reduce the cost of treatment. However, individuals face many threats when using online healthcare services, such as disclosure of private medical information [2]. The contradiction between the necessity of information disclosure and the risk of information leakage has become a key issue affecting users’ use or continued use of online healthcare platforms. The frequent occurrence of privacy leaks raises users’ concerns about personal privacy security, online privacy leaks harm patients’ interests, and privacy concerns reduce users’ willingness to disclose information. Compared to offline, online healthcare platforms require more supporting information from patients, and insufficient or incorrect information may delay or even mislead treatment, making it difficult to fully utilize the online healthcare platform for healthcare services. Therefore, it is necessary to promote the disclosure of user information based on the protection of user privacy. Privacy disclosure refers to the voluntary and active disclosure of personal information by individ- uals to others during social interactions. Research around personal health privacy disclosure can be broadly divided into four perspectives. The first one is from a technical perspective, which studies the development of personal health management information systems [3], the construction and develop- ment of online healthcare platforms [4], and methods of protecting medical and health privacy [5]. The second one is from the legal perspective to study the laws and regulations of health information and to consider how to reduce the privacy concerns of users [6, 7]. The third one is from the business analysis perspective, which analyzes the current state of healthcare services, the current state of applications and research in the health field, opportunities and challenges. The fourth level is from the perspective of factors influencing healthcare privacy disclosure. Zhang et al. [8] analyzed the factors influencing users’ privacy concerns in online health communities by combining a dual algorithm model and protec- tion motivation theory, and found that privacy concerns, information support, and emotional support https://doi.org/10.15837/ijccc.2022.5.4696 3 had significant effects on users’ willingness to disclose personal health information. Bansal et al. [9] constructed a model based on utility theory to explore the effects of personal personality, trust and privacy concerns on users’ willingness to disclose health information online. Wang et al. [10] studied website services, website reciprocity norms and user trust factors, and constructed a model of factors influencing users’ willingness to self-disclose on healthcare websites. The above studies analyzed user disclosure behavior in terms of emotion, benefit and risk, and pointed out the continuous influence of privacy concern on privacy disclosure willingness and privacy disclosure behavior. Overall there are few results that have examined medical privacy disclosure at the level of factors in- fluencing privacy disclosure. Most of the existing studies consider the negative effects of personal med- ical privacy disclosure from the perspective of individual perceptions and suggest privacy-protective behaviors to users, and less often consider factors that encourage users to disclose their own health information. Besides, the benefits and risks perceived by patients in the state of health and the state of seeking healthcare consultation are different, and the influence of both on the willingness to dis- close privacy information are different. Under adequate physical control conditions, willingness can directly determine user behavior. The stronger a user’s willingness to disclose information to an online healthcare platform, the more positive the privacy disclosure behavior will be. User online healthcare privacy disclosure behavior in this paper is the result of the interaction of short-term and long term disclosure willingness. Level of explanation theory predicts that the level of explanation discounts the perceived outcome or the value of the outcome, in which case the value of the far-future outcome is lower than the near-future outcome. There is a temporal discount for user privacy willingness, and with the user’s present moment as the standard point of reference, the long term disclosure intention is a far psychological distance away in time; the short term disclosure intention is a near psychological distance away in time. The value of long term disclosure intention is lower compared to the value of short term disclosure intention, so when online healthcare platform users make privacy disclosure behavior decisions, they are more likely to be influenced by short term disclosure intention. Based on the above analysis, to solve the problems faced by online healthcare platforms at present, firstly, we should make full use of information disclosed by users and label users to precise healthcare services; secondly, we should comprehensively analyze the characteristics of users’ privacy disclosure behaviors and increase the efforts of users to disclose their privacy to online healthcare platforms. Therefore, this paper proposes a multimodal fusion user profile technique to build user profiles to model users’ online healthcare privacy disclosure behavior. In this paper, User Profiling Based on Multimodal Fusion and Stacking, Model Combination, and Fusion (UMF-SCF) is used, which can deeply fuse multiple data sources of user online healthcare privacy disclosure such as basic user information. This paper combines the personal and group profiles of users’ healthcare privacy disclosure to reflect the characteristics of users’ privacy disclosure behavior and realize personalized healthcare services. The purpose of this paper is to explore the factors influencing users’ willingness to disclose their personal medical privacy, and to investigate the mechanisms of these influencing factors on the will- ingness to disclose and disclosure behavior, with a view to assisting in optimizing the functions of online healthcare websites and promoting the development of online healthcare care. This study can both promote users’ disclosure of their necessary personal medical privacy, laying the foundation for accurate personalized medicine, and assist medical service providers in obtaining and applying a large amount of user health data in a legitimate way. Compared to existing work, the contributions of this paper can be summarized as follows. (1) First proposes multimodal fusion user profile modeling method for online healthcare privacy disclosure. (2) The UMF-SCF multimodal fusion technique is used to construct a characteristic profile of user’s personal privacy disclosure behavior and enhance the fusion of multiple data sources of user’s healthcare privacy disclosure including basic user information, historical disclosure behavior, perceived risk, perceived profit, short term disclosure intention, and long term disclosure intention, which solves the problem that each modality cannot interact deeply in the user profile modeling process. (3) Based on K-means, we construct user group profiles by clustering user privacy disclosure behavior personal profiles. (4) A specific case study, including a questionnaire survey of online healthcare platform users https://doi.org/10.15837/ijccc.2022.5.4696 4 and a comparison of baseline model performance, verifies that the online healthcare privacy disclosure modeling method with multimodal user profiles proposed in this paper can effectively improve accuracy and contribute to precision healthcare services. The subsequent sections of this paper are organized as follows: Section 2 systematically investigates existing research on online privacy disclosure, user profile construction and multimodal fusion, and analyzes their advantages and disadvantages; Section 3 describes multiple data sources for online user healthcare privacy disclosure; Section 4 introduces the multimodal user profiling method to online healthcare privacy disclosure modeling in detail; Section 6 describes the case study of this paper in detail and evaluates the accuracy of the results by comparing the baseline models and questionnaires; Section 7 summarizes the work of this paper. 2 Related Work 2.1 Online Healthcare Privacy Disclosure Willingness to disclose is an ability to control privacy through the user’s privacy concern and privacy control [11]. WANG et al. [12] explored the factors influencing users’ willingness to disclose information in the mobile environment. Sapuppo [13] compared the results of two questionnaires and found that the main reasons affecting users’ willingness to disclose in the Internet were mobile. Based on the analysis of privacy computing theory, L.WANG et al. [14] conducted a study from the perspectives of different individual users and different mobile service providers to explore the user’s willingness to disclose and the truthfulness of disclosure. Papadopoulou et al. [15] analyzed the main factors of trust that influence the willingness to disclose personal privacy in both e-commerce and mobile commerce contexts, and showed that in mobile commerce, the main reason for user disclosure is “trust". According to Bergström [16], social network users have different levels of “trust" and different online privacy concerns, which have a direct impact on online privacy settings and disclosure levels. With the development of data economy, users’ huge private information has become a treasure for data users. The change in users’ attitude toward personal privacy has also led to a change in users’ willingness to disclose their privacy information. Users no longer blindly protect their personal privacy information and more and more users disclose their privacy information on their own, which shows that the willingness to disclose privacy is one of the manifestations of users’ ability to control their personal information. Through many research results on user information disclosure, researchers found that users compare their expected benefits and possible risks in the process of whether to disclose their privacy. Users decide to disclose when the benefits and risks reach a certain trade-off point. According to [17], users first calculate the risks and benefits of privacy disclosure and choose to disclose information when the risk of disclosure tends to be none, or when the risk is much less than the benefit. 2.2 User Profiles User profiles better describe users, makes customer information more vivid and easier to com- puterize, and is at the root of some big companies’ use of big data theory. User profiles are used to identify users and to determine how to treat them, for example, which products they like, what their educational background is, their spending power and their social connections, and also to reflect their personality through their behavior, i.e., to tag them. The user information can be used for data mining of user profiles, for example, using clustering algorithms to analyze the distribution of age groups and occupations of users with high blood pressure. To perform massive data processing, it is inseparable from frequent various operations, however, tagging provides us with a way to quantify the difficult information, so that computers can programmatically process tedious information and also understand users through some algorithms and models. When the computer has achieved how to understand the user, whether it is a search engine or a recommendation engine or various commercial applications such as advertising and marketing, it can further improve the accuracy of the results and the efficiency of information acquisition as well as recommendation information. For each different https://doi.org/10.15837/ijccc.2022.5.4696 5 company and project, the goals of the user personas they create and the final state they want to achieve are different. Through the purpose of enterprise’s profiles of some users as a research point, the profile can have the following three applications: the first is to cluster and analyze the massive user data to split the users into different user groups; the second is to have a good understanding for the enterprise’s product-oriented objects through core analysis to better understand the users; the third is a product application based on the user profile. First of all, if we want to make a reasonable profile of customers, we need to collect information about the customers of the enterprise, and the way to collect information is usually in the form of website crawling or offline questionnaires, and then we can define the basic outline of the profile according to the target, and also make a basic under- standing of the user’s situation. The basic information of users mainly includes user ID, age, gender, occupation, geography, education background, family income status, and so on. It is also possible to obtain information about users’ spending and clicking behavior from some enterprise platforms. After having a profile, it is necessary to define the details of the profile, and it is necessary to analyze all the teams involved in the profile, or the different levels of staff and management in a company, product sales staff or senior developer, etc. The extended or dynamic information of the user mainly includes the preferred product model, the user’s activity, hobbies and preferences, usual browsing behavior and purchasing tendencies. When a company segments its users, it divides the user metrics into several categories and indicates which data information plays a dominant role, which is used as the main basis for user classification. 2.3 Multimodal Fusion In general, modality refers to the way things happen or exist, and multimodality refers to various combinations of two or more modalities in various forms. For each source or form of information, it can be referred to as a modality. The current research focuses on the processing of three modalities: image, text, and voice. The reason for the fusion of modalities is that different modalities have different expressions and different perspectives on things, so there are some crossover (redundant information exists), complementary (better than single feature) phenomena, and there may even be a variety of different information interactions between modalities, and if the multimodal information can be processed reasonably, the feature information can be enriched. That is, in general, the distinctive features of multimodality are redundancy and complementarity. Traditional feature fusion algorithms can be divided into three main categories: Bayesian decision theory-based, sparse representation theory-based, and deep learning theory-based methods. Among them, deep learning methods can fuse every layer according to the level of fusion. Pixel level fuses the original data at the smallest granularity. Feature level fuses the abstract features, including early and late fusion. Early means fusing the features first and then outputting the model, the disadvantage is that it cannot make full use of the complementarity between multiple modal data, and there is the problem of information redundancy. Late is divided into two forms: fusion and non-fusion. Non- fusion is similar to integrated learning, where different modalities get their own results and then unified scoring for fusion, and the model of this method is independently robust; fusion means free fusion in the process of feature generation, which is more flexible. Besides, decision level is used to fuse the decision results, hybrid level hybrid fuses multiple fusion methods In recent years, social media user profiles has attracted more and more researchers’ attention. Among the existing user profile modeling techniques, research on how to fuse multiple user data sources or modalities in order to obtain more accurate user profiles is quite limited and has some shortcomings. On the one hand, some of the user profile research works are conducted only on a single modality, which is difficult to characterize users comprehensively; on the other hand, most of the studies using multiple modalities only integrate data sources at feature level or decision level [18, 19, 20], and even though some studies are able to perform fusion at two levels [21, 22], they still lack the exploration of deep fusion of multiple data sources. Based on the above analysis, this paper builds a UMF-SCF model on the multi-source data of user privacy disclosure collected and processed by the questionnaire to construct personal privacy disclosure behavior profiles of users, and then constructs group profiles by clustering the personal profiles through K-means. The use of multimodal techniques to construct personal profiles and the combination of https://doi.org/10.15837/ijccc.2022.5.4696 6 Table 1: User history disclosure behavior factors. Historical disclosure behavior Explanation Information Provided Previously provided information to online healthcare platforms Information Recording Information can be found on online healthcare platforms Private Information Disclosure Once mentioned personal things on online healthcare platforms Information Consistency The information provided to the online platform is consistent with the actual situation personal profiles and group profiles can achieve a comprehensive characterization of users’ online healthcare privacy disclosure behavioral characteristics and provide help for online healthcare precision services. 3 Online Healthcare Privacy Disclosure Data There are various data reflecting users’ privacy disclosure behaviors in the online healthcare process, including users’ basic information, historical disclosure behaviors, perceived risks, perceived profits, short term disclosure intentions, and long term disclosure intentions. 3.1 Basic User Information Due to the differences in age, education and gender, there are also differences in users’ willingness to disclose personal information. Therefore, this paper uses basic user information as one of the basic attributes to construct personal and group profiles, including gender, age, education, occupation, time of using online healthcare platforms and experience of privacy leakage. 3.2 Historical Disclosure Behavior Historical disclosure behavior refers to users who have provided information to an online healthcare platform in the past or the historical information can be found on the platform, etc. Since a user who has provided information to a healthcare platform in the past is more likely to provide information to the platform in the future as well, this paper considers historical disclosure as one of the influencing factors of user disclosure behavior. 3.3 Perceived Risk Perceived risk refers to the potential risk or loss that users perceive when disclosing privacy to online healthcare platforms [23]. Users’ perceived risk has a negative impact on the willingness to disclose. Therefore, in this paper, several perceived risk factors are selected as the basic attributes of users to construct personal and group profiles of users. Table 2 presents the explanation of each factor. 3.4 Perceived Profits Perceived profits refer to the benefits and rewards that users perceive can be brought to them when disclosing information to online healthcare platforms [23]. [12] have shown that perceived profits have a positive relationship on users’ intention to disclose information. Users can get a better service experience when disclosing information to online healthcare platforms, which enables doctors to clearly https://doi.org/10.15837/ijccc.2022.5.4696 7 Table 2: User perceived risk factors. Perceived Risk Explanation Information provision risk Providing information to online healthcare platforms is risky. Information leakage Information provided to online healthcare platforms may be leaked. Risk of information use Information provided to online healthcare platforms may be used inappropriately. Table 3: User perceived profits factors and their implications. Perceived Profits Explanation Service acquisition Providing information to online healthcare platforms facilitates access to appropriate services. Personalized service Providing information to online healthcare platforms facilitates access to personalized services. Health benefits Providing information to online healthcare platforms can be beneficial in helping to solve health problems. Doctor-patient communication Providing information to online healthcare platforms facilitates communication with doctors. understand their basic body information and get better treatment advice that is more suitable for their situation, thus better solving health problems. 3.5 Willingness to Disclose The user’s willingness to disclose plays a decisive influence on the disclosure behavior. The stronger the user’s willingness to disclose, the higher the probability of generating disclosure behavior. Disclo- sure willingness is divided into short term and long term disclosure intention. Short term intention to disclose is a user’s psychological activity in the present regarding providing information to online healthcare platforms. Generally, users are willing to disclose information in the present and tend to disclose in the future. However, users’ willingness to disclose in the distant future may change, thus affecting their future disclosure behavior. Therefore, both users’ willingness to disclose in the near future and willingness to disclose in the far future are among the influencing factors of users’ disclosure behavior. The value of short term intention to disclose is greater than that of long term intention to disclose [23]. Table 4: Users’ willingness to disclose in short term and long term. Willingness to disclose in short term and long term Explanation Short term disclosure intention Whether there are information consulting doctors willing to provide information to online healthcare platforms at the moment. Long term disclosure intention Whether information will be provided to online healthcare platforms in the future upon request or when it would help with health diagnosis. Health benefits Providing information to online healthcare platforms can be beneficial in helping to solve health problems. Doctor-patient communication Providing information to online healthcare platforms facilitates communication with doctors. https://doi.org/10.15837/ijccc.2022.5.4696 8 online user healthcare privacy disclosure objectives user profile label system index construction online healthcare platform user profile label dataset questionnaires Likert scale data acquisition user group profile 1 user group profile 2 ...... user group profile n user profiles clustering similarity analysis, cluster Digitization Labelling Clustering User Personal Profile Construction User Group Profile Construction UMF- SCF user personal profiles Figure 1: Online healthcare privacy disclosure modeling process for multimodal user profiling. 4 Multimodal User Profiling Method to Online Healthcare Privacy Disclosure Modeling The online healthcare privacy disclosure modeling for multimodal user profiles consists of three stages: digitization, labeling, and clustering. The specific process are shown in Figure 1. Firstly, the user privacy disclosure target is clarified and the online healthcare user privacy disclosure labeling system is established; then the survey questionnaire method is used to obtain data, and the sample digitization is realized after pre-processing; next, based on the sample set and labeling system, the multi-data sources of user online healthcare privacy disclosure are extracted with feature labels to form the labeling dataset, and UMF-SCF is used to make the interaction between the multi-modalities to form the user online healthcare privacy disclosure personal profiles. Then, based on the labeled data set, we combine similarity analysis and K-means clustering to classify the users and get the final user group profiles. 4.1 Multimodal Fusion of Privacy Disclosure Data Since users’ healthcare privacy disclosure behaviors are reflected by multiple data in the online healthcare process, it is crucial to establish deep interactions between multiple modalities, so this paper uses the multimodal fusion technique UMF-SCF [24] to comprehensively characterize users’ future privacy disclosure behaviors. 4.1.1 UMF-SCF Overall Architecture The UMF-SCF contains a modal fusion layer and a stacking layer, as shown in Figure 2. The modal fusion layer has 25 cross-modal learning joint representation networks, such as FusionBR and Fusion BRPHD, for each of the 25 model combination forms. Among them, B denotes the basic information features of users, R denotes the perceived risk features of users’ privacy disclosure, P denotes the per- ceived profit features, H denotes the historical disclosure behavior features of users, and D denotes the privacy disclosure intention features of users (including short term and long term disclosure intention). The role of this layer is to generate the joint or shared feature representations of each modality, and to make the features of different modalities interact deeply by combining data sources and nonlinear functions. In addition, since different modalities have different levels of importance for different task goals, this layer also employs an attention mechanism to use modality-level weighted scoring for the extracted features. The Stacking layer implements decision-level fusion, which obtains the prediction probabilities of the output of the modal fusion layer and uses a multilayer perceptron to obtain the prediction results of the privacy disclosure behavior of the final online healthcare platform users. https://doi.org/10.15837/ijccc.2022.5.4696 9 B R P H D B R B P B H B D R P R H R D P H P D H D B R P B R H B R D B P H B P DB H D R P H R P D R H D B R P H B R P D B R H D B P H D B R P H DR P H D Fusio nBR Fusio nBP Fusio nBH Fusio nBD Fusio nRP Fusio nRH Fusio nRD Fusio nPH Fusio nPD Fusio nH D Fusio nBRH D Fusio nBRPH Fusio nBPH D Fusio nBRPD Fusio nRPH D Fusio nBRPH D ...... ...... ...... ...... ...... ...... ...... MLP S tacking Mo del Fusio n Predictio n Pro bability Figure 2: UMF-SCF2 overall architecture. 4.1.2 FusionBRP The cross-modal learning joint representation network is the core of the UMF-SCF model. In this paper, FusionBRP is used as an example to introduce the cross-modal learning joint representation network, which has three layers containing embedding layer, interaction layer and decision layer. Its structure is shown in Figure 3. The embedding layer is pre-trained for each input to obtain embedding representations of different modalities. Taking FusionBRP as an example, the embedding layer contains user basic information embedding (B1), perceived risk embedding (R1), perceived profits embedding (P1). The interaction layer is the same two-layer structure, where each modal representation is first transformed using the hidden layer, and then for each transformed modal representation, the asso- ciation representation with the other transformed modal embedding is added, so that the feature representation of each modal after the interaction layer is obtained, which contains the interaction information corresponding to the other modal. M 1b = tanh ( Um1 M 1 a + Wm1m2 Um2 M 2 a + Wm1m3 Um3 M 3 a + Wm1m4 Um4 M 4 a ) (1) where, M∗ denotes the modal embedding B, R, and P respectively; a is the number of layers of the interaction layer, if a = 1, then b = 2, if a = 2, then b = 3; the neural network activation function uses T anh; U∗ is the transformation matrix corresponding to the modal embedding in the hidden layer of the a-th interaction layer. W∗∗ is the correlation weight matrix of the corresponding two modal embedding in the interaction layer of layer a. After the interaction layer, each user can get 4 high-level representations, including B3, R3, P3, and B3 ⊕ R3 ⊕ P3. The decision layer maps the 4 representations to their label category space by means of a linear layer, which are CB, CR, CP and CBRP , and then a softmax layer is used to obtain the probability distribution of category c. The linear mapping layer is defined as follows. C∗ = W∗−cA3 + b∗−c (2) where A ∈ {B3, R3, P3, B3 ⊕ R3 ⊕ P3}, the C∗ denotes the label category space corresponding to A, and W∗−c denotes the linear layer weights corresponding to A, and b∗−c is the corresponding bias value. https://doi.org/10.15837/ijccc.2022.5.4696 10 ...... Embe dding Laye r Inte raction Laye r...... ...... Figure 3: Structure of FusionBRP. The softmax layer is defined as shown below. p∗c = exp (C∗c )∑K k=1 exp ( C∗k ) (3) where p∗c denotes the probability of B3, R3, P3, B3 ⊕ R3 ⊕ P3 predicting category c, K is the number of categories, and C∗c denotes the c-th category of a certain labeled category space. 4.1.3 Loss Representation and Attention Mechanism for Multimodal Fusion The loss function of FusionBRP consists of three components: the loss function of each modal representation Lm, the loss of the joint representation Ld, the loss of consistency between the modal representations . These three loss functions are defined as follows. Lm = − ∑ p∈Pm ∑ x∈Xt K∑ k=1 prk(x) log(p(x)) (4) LBRP = − ∑ x∈Xt K∑ k=1 prk(x) log ( pBRPk (B3(x) ⊕ R3(x) ⊕ P3(x)) ) (5) Ld = − ∑ (p1,p2)∈Pd ∑ x∈Xt K∑ k=1 p1(x) log (p2(x)) (6) where Pm = { pBk , p R k , p P k } , is the set of probabilities for computing Lm the set of prediction proba- bilities. p∗k is the probability that a modality predicts the k-th category. Xt denotes the set of user samples with labels, i.e., the training set. prk denotes the true label of the k-th category of a given sample. Pd = {( pBk , p R k ) , ( pBk , p P k ) , ( pRk , p P k )} , is the set of values for calculating the set of predicted probabilities. Since different modalities show different contributions to the classification task for different at- tributes of users, this paper introduces an attention mechanism to linearly weight the above three losses to obtain the final loss. The attention in this paper can be interpreted as follows: for different modalities performing the same classification task, the influence of short term disclosure intention on users’ disclosure behavior is greater than that of long term disclosure intention, so short term disclosure intention is more important and should be given more weight; for the same modality per- forming different classification tasks, the influence of perceived risk on long term disclosure intention https://doi.org/10.15837/ijccc.2022.5.4696 11 Table 5: User online healthcare privacy disclosure data. ID 8 Gender Male Age 34 Academic qualifications Master’s Degree Career Programmer Years of using online healthcare platform 1 Have experienced privacy leakage No Perceived risk 4.33 Perceived profit 2.25 Historical disclosure behavior 1.25 Recent disclosure intentions 2.75 Willingness to disclose in the future 2.50 is significantly greater than that of short term disclosure intention, so more weight should be given to perceived risk classification forward disclosure The perceived risk has a significantly greater impact on the willingness to disclose in the near future than the willingness to disclose in the near future. The calculation formula is as follows. Ltotal = 2∑ i=0 w[i] · LosSLm [i] + w[3] · LBRP + 6∑ j=4 w[j] · LoSSLd [j − 4] + w[6] · L2Loss (7) where w denotes the list of weight coefficients introduced by the attention mechanism, and LossLm denotes the list of Lm, and LossLd denotes the list of Ld. w enables to balance the representation of each modality, the joint representation, and the consistency. 4.2 Personal Profiles and Group Profiles of Users’ Online Healthcare Privacy Disclosure 4.2.1 Personal Profile Construction for Online Healthcare Privacy Disclosure of Users Table 5 shows the data generated by a user who has used an online health care platform. The user profile built from this data is shown in Figure 4. 4.2.2 Group Profile Construction for Online Healthcare Privacy Disclosure of Users Due to the large differences between people, it is difficult to achieve personalized or accurate recommendation of medical services by only constructing personal profiles of users, so this paper also constructs online healthcare privacy disclosure group profiles for users based on personal profiles. The user group profile statistically analyzes the similarity of multiple users, clustering users with similar characteristics to form several user groups, and summarizes and refines the common indexes within the user groups. The construction of user group profiles is based on personal profiles and then similarity analysis and aggregation. In this paper, we use Vector Space Model (VSM) [25] to calculate the similarity of users and the classical K-means [26] algorithm to classify user groups. VSM first represents user features as vectors, and then finds the distance between vectors by the cosine distance calculation method, which is its similarity. Then the similarity formula is as follows. vim(user1, user2 ) = cos ∑Fi=1 fif′i∏n j=1 √ f 2j + f 2j (8) where F is the number of user features and fi is the i-th representation of the user 1 feature vector, and f′i is the i-th representation of the user 2 feature vector. User profile clustering is a method of classifying users according to their features and attributes, which can classify them into several categories and ensure that the differences within categories are as https://doi.org/10.15837/ijccc.2022.5.4696 12 us e r id 1 male 3 4 ye ars old mas te r de g re e prog ramme r no privacy le akag e e xpe rie nce us e the online he althcare platform for 1 ye ar s trong pe rce ive d ris k me dium pe rce ive d profit low his torical dis clos ure me dium s hort te rm dis clos ure inte ntion me dium long te rm dis clos ure inte ntion Figure 4: Personal profile of users’ online healthcare privacy disclosure. small as possible, while the differences between categories are as large as possible. Take user profile classification as an example, the specific steps of K-means algorithm clustering are: 1) randomly select K users as the center of clustering; 2) according to the similarity, group the remaining users into the class; 3) according to the first clustering result, recalculate the center of each of the K classes by taking the arithmetic average of the dimension of the respective feature vectors of all users in the class; 4) repeat step 2; 5) repeat step 3 and 4 until the dissimilarity between the clustering result and the last clustering result is less than the set threshold; 6) obtain the end-user clustering result. In addition, this paper uses the simplest way to select the K value, which is calculated as follows. K = sqrt(N/2) (9) where N denotes the total number of clustering units, i.e., the number of user samples involved in clustering. 5 Case Studies In this paper, we explain the above process of modeling online healthcare privacy disclosure with multimodal user profiles through a case study. 5.1 Design of Profile Labeling System By analyzing the data of users’ online healthcare privacy disclosure, this paper establishes six basic elements of users’ basic information, perceived risk, perceived profits, historical disclosure behavior, short term disclosure intention, and long term disclosure intention as subdivision indicators to design the label system of user profile. The final constructed label system of user privacy disclosure profile of online healthcare platform is shown in Figure 5. Among them, the six labels include a total of 25 measurement variables, as shown in Table 6. The difference in the privacy disclosure profiles of users of online healthcare platforms lies in the label weighting design, i.e., different user group characteristics are reflected in different degrees of importance on a certain label. Therefore, this paper expresses users’ tendency to disclose privacy in a certain aspect with the help of a 5-point Likert scale from low to high. The mean value of each user https://doi.org/10.15837/ijccc.2022.5.4696 13 user profile us e r bas ic information pe rce ive d ris k pe rce ive d profits his torical dis clos ure be havior s hort te rm dis clos ure inte ntion long te rm dis clos ure inte ntion Figure 5: User privacy disclosure profile labeling system. Table 6: Measurement variables for user profiling labels. user basic information gender age qualification occupation time to use the online healthcare platform have experienced a privacy leakage perceived risk Providing information to online healthcare platforms is risky the information provided to the online healthcare platform may be leaked Information provided to online healthcare platforms may be used inappropriately perceived profits Providing information to the online healthcare platform is conducive to obtaining corresponding services Providing information to the online healthcare platform is conducive to enjoying personalized services Providing information to online healthcare platforms can help solve health problems Providing information to the online healthcare platform can better communicate with doctors or other users historical disclosure behavior once provided information to the online healthcare platform personal information can be found on the online healthcare platform once mentioned personal things on the online healthcare platform the information shared to the online healthcare platform is consistent with the actual situation Short term disclosure intention if you are ill, you will provide information to online healthcare platform or doctors to obtain services if you are ill, you will provide information when the doctor or online healthcare platform requests information if you are ill, you will provide information that will help you stay healthy if you are ill, you will provide information to other users of the online healthcare platform for help Long term disclosure intention intend to provide information to online healthcare platform or doctor to obtain services in the future intend to provide personal information when online healthcare platform or doctor ask for information in the future intend to provide information to online healthcare platform or doctor when providing information helps to maintain health will continue to share information with online healthcare platform or doctor as you do now in the future https://doi.org/10.15837/ijccc.2022.5.4696 14 Table 7: Average accuracy of user label prediction for each method. Method Basicinformation Perceived risk Perceived profitss Historical disclosure behavior Recent disclosure intentions Long term disclosure intention Manual Feature+SVM 0.720 0.653 0.632 0.678 0.614 0.683 Concat+SVM 0.709 0.682 0.675 0.705 0.603 0.690 UDMF 0.765 0.703 0.688 0.726 0.619 0.711 UMF-SCF without attention 0.789 0.725 0.723 0.759 0.705 0.799 UMF-SCF 0.812 0.788 0.758 0.806 0.769 0.856 psychological preference variable in each subgroup is chosen as the user profile label weight, while the basic user information is ranked by frequency using attribute values, thus defining the weight value of the user profile label. 5.2 Data Acquisition This paper uses questionnaires to obtain the basic data of users’ online healthcare privacy disclo- sure, so as to build personal and group profiles. The user data obtained through the questionnaire has the characteristics of targeted and specific information, which can make the questions more clearly and specifically oriented, thus facilitating the precise analysis of the subsequent study. The questionnaire consists of 25 questions in TABLE VI, and all questions, except for basic personal information, are on a 5-point Likert scale, with 1 to 5, 1 being “strongly disagree" and 5 being “strongly agree". This study continued to distribute questionnaires for 14 days, and a total of 700 questionnaires were distributed. After removing 120 questionnaires that had not used the online healthcare platform and 10 consecutive questionnaires with the same options, 580 valid questionnaires were finally obtained. In this paper, the data of 464 of these users were used as the training set and the data of 116 users were used as the test set. In order to construct the user feature vector, the 19 measurement variables of perceived risk, perceived profits, historical disclosure behavior, short term disclosure intention and long term disclosure intention in the questionnaire must be quantified. In this paper, scores of 1, 2, 3, 4, and 5 are assigned to markers 1 to 5 on a 5-point Likert scale, and the scores are summed according to the options selected by the users to obtain scores for 5 labels such as perceived risk. A user is considered to have a very low performance level on this label if the score ∈ 0.0, 1.0], low if the score ∈ (1.0, 2.0], medium if the score ∈ (2.0, 3.0], high if the score ∈ (3.0, 4.0], and very high if the score ∈ 4.0, 5.0]. 5.3 User Online Healthcare Privacy Disclosure Personal Profile Construction The user’s personal profile is constructed, i.e., predicting the user’s basic information, perceived risk, perceived profits, historical disclosure behavior, short term disclosure intention and long term disclosure intention as shown in Figure 4. We use UMF-SCF to fuse and predict multiple features of users, and set the baseline model including manual features + SVM, Concat + SVM, UDMF, and UMF-SCF without attention mechanism to compare and evaluate the accuracy performance of the UMF-SCF model for user personal profile construction. In this paper, “Accuracy” is used as the evaluation metric. The average accuracy predicted by each method on the test set is shown below. By Table 7, UMF-SCF achieves leading results in predicting all six features of users; Manual Feature+SVM is a unimodal embedding method, and compared with the other four multimodal fusion methods, the multimodal fusion method performs significantly better than unimodal embedding in the user profiling problem; Concat+SVM is a multimodal fusion method but performs worse than UMF- SCF because it only achieves simple splicing without sufficient interaction between multiple modalities; https://doi.org/10.15837/ijccc.2022.5.4696 15 UDMF only uses a simple neural network to fuse each data source without deep interaction between multiple modalities; UMF-SCF without attention mechanism treats the importance of each modality for different classification tasks as the same, and performs worse than UMF-SCF, indicating that the setting of different weights is beneficial to improve the performance of the model. 5.4 User Online Healthcare Privacy Disclosure Group Profile Construction In order to effectively obtain the differentiated profiles of online healthcare platform user groups, this paper selects perceived risk, perceived profits, historical disclosure behavior, short term dis- closure intention and long term disclosure intention as feature factors, constructs a feature vector for each user, and calculates the similarity degree values between each two users by VSM. For example, the feature vector of user 1 is (4.33, 2.25, 1.25, 2.75, 2.50), and the feature vector of user 2 is (1.00, 3.50, 4.25, 4.75, 4.75), then the similarity degree between user 1 and user 2 is cos ( 4.33×1.00+2.25×3.50+1.25×4.25+2.75×4.75+2.50×4.75√ (4.332+1.002)× √ (2.252+3.502)× √ (1.252+4.252)× √ (2.752+4.752)× √ (2.502+4.752) ) = 1.78. Next, the K-means algorithm and the calculated similarity values between two users are used to classify the user groups. In this paper, there are 464 users in the training set as clustering objects, and the K-value is calculated as 15 by equation 9. The 15 users with the smallest similarity between two users are selected as the initial clustering centers according to the minimum-maximum principle method, and 15 clusters are finally obtained according to the steps of the K-means clustering method. The center of each cluster is the representative of all objects in that cluster, and its individual param- eters are the reflection of the collective common characteristics. The 17 user groups obtained in this paper are as follows. User group A = user 1, user 56, user 156, user 159, ... , user486 User group B = user 2, user 33, user 78, user 255, ... , user578 ... User group O = user 89, user 112, user 139, user 201, ... , user487 To evaluate the accuracy of the group profiles constructed in this paper, we used the data of the remaining 116 users for testing. The labels of the users’ personal profiles are first represented by feature vectors, and then their similarity is calculated with the representatives of the user profiles of the 15 clustering centers, and they were categorized into the user groups where the clustering centers with the largest similarity values were located, and then the accuracy of online healthcare privacy disclosure of these 116 users was investigated separately through questionnaires. The results show that about 89% of the users think that their willingness are the same as the group profiles. From the perspective of online healthcare services, these users will have access to more accurate diagnoses than the other 11%, and the platform may give the remaining 11% a broader range of treatment options. Regarding disclosure behavior are basically accurately predicted, and the user group profile established in this paper can effectively model the characteristics of users’ online healthcare privacy disclosure behavior. It is worth noting that the factors influencing users’ medical privacy disclosure behaviors considered in this paper are still limited, which restricts the accuracy of behavioral predictions to a certain extent. In the future, we will further explore the privacy disclosure intention of online healthcare users by combining privacy protection, domestic and international service platform differences, and demographic factors to further assist the development of online healthcare privacy platforms 6 Conclusion To address the challenges of user privacy disclosure in current online healthcare platforms, this paper proposes a multimodal fusion user profile modeling method. The method achieves cross-modal interaction of multiple data sources for user privacy disclosure through UMF-SCF, constructs per- sonal profiles of user online healthcare privacy disclosure, and comprehensively characterization the characteristics of user privacy disclosure behavior; constructs group profiles of user online healthcare privacy disclosure based on personal profiles and K-means clustering algorithm, and integrates the user characteristics presented by personal profiles and group profiles to more accurately capture the https://doi.org/10.15837/ijccc.2022.5.4696 16 online healthcare platform users’ needs. According to the case study, the accuracy of the method pro- posed in this paper is significantly improved compared with unimodal and some advanced multimodal methods, and the questionnaires show that the accuracy of the user profile constructed by this method reaches 89%. The multimodal fusion user profile modeling method can effectively maximize the user online healthcare privacy disclosure behavior while guaranteeing the user information without leakage and promote the personalized services of online healthcare. References [1] Boonstra A, Broekhuis M. Barriers to the acceptance of electronic medical records by physicians from systematic review to taxonomy and interventions[J]. BMC health services research, 2010, 10(1): 1-17. [2] Sun S, Zhang J, Zhu Y, et al. Exploring users’ willingness to disclose personal information in online healthcare communities: The role of satisfaction[J]. Technological Forecasting and Social Change, 2022, 178: 121596. [3] Kobrinskii B A, Grigoriev O G, Molodchenkov A I, et al. Artificial intelligence technologies application for personal health management[J]. IFAC-PapersOnLine, 2019, 52(25): 70-74. [4] Dai Q Y, Hong X B, Cai J, et al. Deep Learning Based Recommendation Algorithm in Online Medical Platform[C]//International Conference on Brain Inspired Cognitive Systems. Springer, Cham, 2018: 34-43. [5] Dhiman G, Juneja S, Mohafez H, et al. Federated learning approach to protect healthcare data over big data scenario[J]. Sustainability, 2022, 14(5): 2500. [6] Edemekong P F, Annamaraju P, Haydel M J. Health insurance portability and accountability act[J]. 2018. [7] Lye C T, Forman H P, Gao R, et al. Assessment of US hospital compliance with regulations for patients’ requests for medical records[J]. JAMA network open, 2018, 1(6): e183014-e183014. [8] Zhang X, Liu S, Chen X, et al. Health information privacy concerns, antecedents, and information disclosure intention in online health communities[J]. Information & Management, 2018, 55(4): 482-493. [9] Bansal G, Gefen D. The impact of personal dispositions on information sensitivity, privacy concern and trust in disclosing health information online[J]. Decision support systems, 2010, 49(2): 138- 150. [10] Wang, Yuchao, & Sun, Y. Q. The influence of service and reciprocity norm on self-disclosure intention in virtual health community[J]. Intelligence Science, 36(5), 149-157. [11] Zhang, Yue, & Zhu, Qing-Hua. Review on foreign study of information privacy[J]. Library and Information Service, 58(13), 140-148. [12] Wang T, Duong T D, Chen C C. Intention to disclose personal information via mobile applications: A privacy calculus perspective[J]. International journal of information management, 2016, 36(4): 531-542. [13] Sapuppo A. Privacy analysis in mobile social networks: the influential factors for disclosure of personal data[J]. International Journal of Wireless and Mobile Computing, 2012, 5(4): 315-326. [14] Wang L, Yan J, Lin J, et al. Let the users tell the truth: Self-disclosure intention and self-disclosure honesty in mobile social networking[J]. International Journal of Information Management, 2017, 37(1): 1428-1440. https://doi.org/10.15837/ijccc.2022.5.4696 17 [15] Papadopoulou P, Pelet J E. Trust and privacy in the shift from e-commerce to m-commerce: A comparative approach[C]//Conference on e-Business, e-Services and e-Society. Springer, Berlin, Heidelberg, 2013: 50-60. [16] Bergström A. Online privacy concerns: A broad approach to understanding the concerns of different groups for different uses[J]. Computers in Human Behavior, 2015, 53: 419-426. [17] Keith M, Thompson S, Hale J, et al. Examining the rationality of location data disclosure through mobile devices[J]. 2012. [18] Xiao J, Ye H, He X, et al. Attentional factorization machines: Learning the weight of feature interactions via attention networks[J]. arXiv preprint arXiv:1708.04617, 2017. [19] Wei H, Zhang F, Yuan N J, et al. Beyond the words: Predicting user personality from hetero- geneous information[C]//Proceedings of the tenth ACM international conference on web search and data mining. 2017: 305-314. [20] Wöllmer M, Weninger F, Knaup T, et al. Youtube movie reviews: Sentiment analysis in an audio-visual context[J]. IEEE Intelligent Systems, 2013, 28(3): 46-53. [21] Gu Y, Yang K, Fu S, et al. Hybrid attention based multimodal network for spoken language clas- sification[C]//Proceedings of the Conference. association for Computational Linguistics. meeting. NIH Public Access, 2018, 2018: 2379. [22] Gu Y, Chen S, Marsic I. Deep mul timodal learning for emotion recognition in spoken lan- guage[C]//2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018: 5079-5083. [23] Liu, W., & Wu, D.J. Research on factors influencing user privacy disclosure behavior of online healthcare platforms[J]. Journal of Medical Informatics, 42(6), 16-23. [24] Zhang Z, Feng X, Qian T. User profiling based on multimodal fusion technology[J]. Beijing Da Xue Xue Bao, 2020, 56(1): 105-111. [25] Salton G, Wong A, Yang C S. A vector space model for automatic indexing[J]. Communications of the ACM, 1975, 18(11): 613-620. [26] Hartigan J A, Wong M A. Algorithm AS 136: A k-means clustering algorithm[J]. Journal of the royal statistical society. series c (applied statistics), 1979, 28(1): 100-108. Copyright ©2022 by the authors. Licensee Agora University, Oradea, Romania. This is an open access article distributed under the terms and conditions of the Creative Commons Attribution-NonCommercial 4.0 International License. Journal’s webpage: http://univagora.ro/jour/index.php/ijccc/ This journal is a member of, and subscribes to the principles of, the Committee on Publication Ethics (COPE). https://publicationethics.org/members/international-journal-computers-communications-and-control https://doi.org/10.15837/ijccc.2022.5.4696 18 Cite this paper as: Yong Wang (2022). Online Healthcare Privacy Disclosure User Group Profile Modeling Based on Multimodal Fusion, International Journal of Computers Communications & Control, 17(5), 4696, 2022. https://doi.org/10.15837/ijccc.2022.5.4696 Introduction Related Work Online Healthcare Privacy Disclosure User Profiles Multimodal Fusion Online Healthcare Privacy Disclosure Data Basic User Information Historical Disclosure Behavior Perceived Risk Perceived Profits Willingness to Disclose Multimodal User Profiling Method to Online Healthcare Privacy Disclosure Modeling Multimodal Fusion of Privacy Disclosure Data UMF-SCF Overall Architecture FusionBRP Loss Representation and Attention Mechanism for Multimodal Fusion Personal Profiles and Group Profiles of Users' Online Healthcare Privacy Disclosure Personal Profile Construction for Online Healthcare Privacy Disclosure of Users Group Profile Construction for Online Healthcare Privacy Disclosure of Users Case Studies Design of Profile Labeling System Data Acquisition User Online Healthcare Privacy Disclosure Personal Profile Construction User Online Healthcare Privacy Disclosure Group Profile Construction Conclusion