Jaffa6
Jaffa6 t1_jdlk3j9 wrote
There was a paper a while back (Chinchilla?) that indicated that for the best results, model size and the amount of data you give it should grow proportionally and that many then-SotA models were undertrained in terms of how much data they were given. You might find it interesting.
But as a tangent, I think ML focuses too much on chasing accuracy. You see it constantly in SotA papers where they're claiming things like "We improved our GLUE score by 0.1 compared to SotA, and all it took was spending the GDP of Switzerland on electricity and GPUs!"
And it's still a model that hallucinates way too much, contains bias, and just generally isn't worth all that time, money, and pollution.
Jaffa6 t1_jdc264k wrote
Reply to comment by geoffroy_lesage in Question for use of ML in adaptive authentication by geoffroy_lesage
For testing as a proof of concept, you could probably just use a shallow feedforward network. I don't think you need any complex or deep architecture here.
Jaffa6 t1_jdc1gz4 wrote
Reply to comment by geoffroy_lesage in Question for use of ML in adaptive authentication by geoffroy_lesage
It's possible, but I think you'd struggle to improve it (though I freely admit that I don't know enough maths to say). But yeah, it's never going to be a reliable method at all.
To be honest, I'd expect you to have more problems with people not being able to sign in as themselves (inconsistent behaviour) than signing in as other people deliberately.
Jaffa6 t1_jdbzs22 wrote
Reply to comment by geoffroy_lesage in Question for use of ML in adaptive authentication by geoffroy_lesage
This is unfortunately going to be a bit harsh, but it's worth knowing sooner rather than later: Cryptography (which this essentially is) is a VERY difficult field and creating a secure encryption scheme is very difficult.
Wanting to encrypt and decrypt without the key being stored anywhere is an admirable goal, but this is certainly not the way I'd recommend doing it and it's not likely to be secure this way.
If you're dead set on doing it like this, then pretty much any neural network can do it. You're just inputting numbers and wanting numbers out.
I guess your training data would be many sets of behavioural data from each user, say at least 50 users, and training it to predict the user from that data, but heavily penalising it if it matches another user too.
Jaffa6 t1_jdbysmg wrote
Broadly speaking, machine learning models are huge black boxes that you can't really explain the behaviour of.
It's going to be very difficult (if it's even possible) to guarantee that a certain user's behaviour will create a unique key because it would really just be multiplying and adding some different numbers (which come from the factors you mentioned).
You can certainly generate a key, though.
Much simpler is, as someone else suggested, just using something like the device's MAC address. But then you'll run into issues with them being locked out if they change address.
Jaffa6 t1_jd03par wrote
Reply to comment by Haghiri75 in Alpaca-7B and Dalai, how can I get coherent results? by Haghiri75
That's odd.
Quantisation should make it go from (e.g.) 32 bit floats to 16bit floats, but I wouldn't expect it to lose that much coherency at all. Did they say somewhere that that's why?
Jaffa6 t1_javzwj6 wrote
Reply to comment by inFamous_16 in [R] Variable size input to pre-trained BERT model by inFamous_16
No worries, shoot me a message if you need a hand!
Jaffa6 t1_javl6ef wrote
Reply to comment by inFamous_16 in [R] Variable size input to pre-trained BERT model by inFamous_16
No problem.
I believe that if you're using a BERT-esque model, you do indeed need to do "full" tokenisation (part of which is creating the attention mask and padding) because BERT expects its input to be a list of token indices. E.g. Given the token mapping {"a": 1, "cow": 2, "cat": 3, "dog": 4}, tokenisation would turn "a cat" into [1, 3] which is in the form that BERT expects.
And since BERT comes with a token mapping (due to pre-training), if you're just putting in your own features (say, number of likes and number of retweets), they'll quite possibly just get interpreted as random tokens if their numbers match up with known token indices.
If your features are already the right kind (tokenised text, with the resultant indices matching the correct BERT token indices), I suppose you could do truncation/padding yourself and feed that input directly to BERT.
But it'll probably end up simpler and less error-prone to let BERT tokenise it for you (e.g. via HuggingFace's `AutoTokenizer.from_pretrained('bert-base')`)
Jaffa6 t1_jav3gj2 wrote
Reply to comment by inFamous_16 in [R] Variable size input to pre-trained BERT model by inFamous_16
I believe that you don't really lose the context because you also have an attention mask which basically says "don't pay attention to these tokens" and every pad token is masked in it.
Jaffa6 t1_jdq1rua wrote
Reply to comment by Vegetable-Skill-9700 in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
It's worth noting that some models were designed according to this once it came out, and I believe it did have some impact in the community, but yeah wouldn't surprise me if it's still a problem.
Glad you liked it!